man Jcode (3): Japanese Charset Handler

SYNOPSIS

use Jcode;
#
# traditional
Jcode::convert(\$str, $ocode, $icode, "z");
# or OOP!
print Jcode->new($str)->h2z->tr($from, $to)->utf8;

DESCRIPTION

<Japanese document is now available as Jcode::Nihongo. >

Jcode.pm supports both object and traditional approach. With object approach, you can go like;

  $iso_2022_jp = Jcode->new($str)->h2z->jis;

Which is more elegant than:

  $iso_2022_jp = $str;
  &jcode::convert(\$iso_2022_jp, 'jis', &jcode::getcode(\$str), "z");

For those unfamiliar with objects, Jcode.pm still supports "getcode()" and "convert()."

If the perl version is 5.8.1, Jcode acts as a wrapper to Encode, the standard charset handler module for Perl 5.8 or later.

Methods

Methods mentioned here all return Jcode object unless otherwise mentioned.

Constructors

$j = Jcode->new($str [, $icode])

Creates Jcode object $j from $str. Input code is automatically checked unless you explicitly set $icode. For available charset, see getcode below.

For perl 5.8.1 or better, $icode can be any encoding name that Encode understands.

  $j = Jcode->new($european, 'iso-latin1');

When the object is stringified, it returns the EUC-converted string so you can <print $j> instead of <print $j->euc>.

Passing Reference

$j->set($str [, $icode])

Sets $j's internal string to $str. Handy when you use Jcode object repeatedly (saves time and memory to create object).

 # converts mailbox to SJIS format
 my $jconv = new Jcode;
 $/ = 00;
 while(&lt;&gt;){
     print $jconv->set(\$_)->mime_decode->sjis;
 }

$j->append($str [, $icode]);

Appends $str to $j's internal string.

$j = jcode($str [, $icode]);

shortcut for Jcode->new() so you can go like;

Encoded Strings

In general, you can retrieve encoded string as $j->encoded.

$sjis = jcode($str)->sjis

What you code is what you get :)

$iso_2022_jp = $j->iso_2022_jp

Same as "$j->h2z->jis". Hankaku Kanas are forcibly converted to Zenkaku.

For perl 5.8.1 and better, you can also use any encoding names and aliases that Encode supports. For example:

  $european = $j->iso_latin1; # replace '-' with '_' for names.

FYI: Encode::Encoder uses similar trick.

$j->fallback($fallback)

[@lines =] $jcode->jfold([$width, $newline_str, $kref])

folds lines in jcode string every $width

(default: 72) where $width is the number of ``halfwidth'' character. Fullwidth Characters are counted as two.

with a newline string spefied by $newline_str (default: ``\n'').

Rudimentary kinsoku suppport is now available for Perl 5.8.1 and better.

$length = $jcode->jlength();

returns character length properly, rather than byte length.

Methods that use MIME::Base64

To use methods below, you need MIME::Base64. To install, simply

   perl -MCPAN -e 'CPAN::Shell->install("MIME::Base64")'

If your perl is 5.6 or better, there is no need since MIME::Base64 is bundled.

$mime_header = $j->mime_encode([$lf, $bpl])

Hankaku vs. Zenkaku

$j->h2z([$keep_dakuten]): Converts X201 kana (Hankaku) to X208 kana (Zenkaku). When $keep_dakuten is set, it leaves dakuten as is (That is, ``ka + dakuten'' is left as is instead of being converted to ``ga'')
You can retrieve the number of matches via $j->nmatch;
$j->z2h: Converts X208 kana (Zenkaku) to X201 kana (Hankaku).
You can retrieve the number of matches via $j->nmatch;

Regexp emulators

To use "->m()" and "->s()", you need perl 5.8.1 or better.

$j->tr($from, $to, $opt);: Applies "tr/$from/$to/"
$j->s($patter, $replace, $opt);: Applies "s/$pattern/$replace/$opt"
[@match = ] $j->m($pattern, $opt);: Applies "m/$patter/$opt"

Instance Variables

If you need to access instance variables of Jcode object, use access methods below instead of directly accessing them (That's what OOP is all about)

FYI, Jcode uses a ref to array instead of ref to hash (common way) to optimize speed (Actually you don't have to know as long as you use access methods instead; Once again, that's OOP)

$j->r_str: Reference to the EUC-coded String.
$j->icode: Input charcode in recent operation.
$j->nmatch: Number of matches (Used in $j->tr, etc.)

Subroutines

($code, [$nmatch]) = getcode($str)

BUGS

For perl is 5.8.1 or later, Jcode acts as a wrapper to Encode. Meaning Jcode is subject to bugs therein.

ACKNOWLEDGEMENTS

This package owes a lot in motivation, design, and code, to the jcode.pl for Perl4 by Kazumasa Utashiro <[email protected]>.

Hiroki Ohzaki <[email protected]> has helped me polish regexp from the very first stage of development.

JEncode by [email protected] has inspired me to integrate Encode to Jcode. He has also contributed Japanese POD.

And folks at Jcode Mailing list <[email protected]>. Without them, I couldn't have coded this far.

COPYRIGHT

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.