Jcode(3) Japanese Charset Handler

Methods

Methods mentioned here all return Jcode object unless otherwise mentioned.

Constructors

$j = Jcode->new($str [, $icode])
Creates Jcode object $j from $str. Input code is automatically checked unless you explicitly set $icode. For available charset, see getcode below.

For perl 5.8.1 or better, $icode can be any encoding name that Encode understands.

  $j = Jcode->new($european, 'iso-latin1');

When the object is stringified, it returns the EUC-converted string so you can <print $j> instead of <print $j->euc>.

Passing Reference
Instead of scalar value, You can use reference as

Jcode->new(\$str);

This saves time a little bit. In exchange of the value of $str being converted. (In a way, $str is now ``tied'' to jcode object).

$j->set($str [, $icode])
Sets $j's internal string to $str. Handy when you use Jcode object repeatedly (saves time and memory to create object).

 # converts mailbox to SJIS format
 my $jconv = new Jcode;
 $/ = 00;
 while(&lt;&gt;){
     print $jconv->set(\$_)->mime_decode->sjis;
 }
$j->append($str [, $icode]);
Appends $str to $j's internal string.
$j = jcode($str [, $icode]);
shortcut for Jcode->new() so you can go like;

Encoded Strings

In general, you can retrieve encoded string as $j->encoded.

$sjis = jcode($str)->sjis
$euc = $j->euc
$jis = $j->jis
$sjis = $j->sjis
$ucs2 = $j->ucs2
$utf8 = $j->utf8
What you code is what you get :)
$iso_2022_jp = $j->iso_2022_jp
Same as "$j->h2z->jis". Hankaku Kanas are forcibly converted to Zenkaku.

For perl 5.8.1 and better, you can also use any encoding names and aliases that Encode supports. For example:

  $european = $j->iso_latin1; # replace '-' with '_' for names.

FYI: Encode::Encoder uses similar trick.

$j->fallback($fallback)
For perl is 5.8.1 or better, Jcode stores the internal string in UTF-8. Any character that does not map to ->encoding are replaced with a '?', which is Encode standard.

  my $unistr = "\x{262f}"; # YIN YANG
  my $j = jcode($unistr);  # $j->euc is '?'

You can change this behavior by specifying fallback like Encode. Values are the same as Encode. "Jcode::FB_PERLQQ", "Jcode::FB_XMLCREF", "Jcode::FB_HTMLCREF" are aliased to those of Encode for convenice.

  print $j->fallback(Jcode::FB_PERLQQ)->euc;   # '\x{262f}'
  print $j->fallback(Jcode::FB_XMLCREF)->euc;  # '&#x262f;'
  print $j->fallback(Jcode::FB_HTMLCREF)->euc; # '&#9775;'

The global variable $Jcode::FALLBACK stores the default fallback so you can override that by assigning the value.

  $Jcode::FALLBACK = Jcode::FB_PERLQQ; # set default fallback scheme
[@lines =] $jcode->jfold([$width, $newline_str, $kref])
folds lines in jcode string every $width
(default: 72) where $width is the number of ``halfwidth'' character. Fullwidth Characters are counted as two.

with a newline string spefied by $newline_str (default: ``\n'').

Rudimentary kinsoku suppport is now available for Perl 5.8.1 and better.

$length = $jcode->jlength();
returns character length properly, rather than byte length.

Methods that use MIME::Base64

To use methods below, you need MIME::Base64. To install, simply

   perl -MCPAN -e 'CPAN::Shell->install("MIME::Base64")'

If your perl is 5.6 or better, there is no need since MIME::Base64 is bundled.

$mime_header = $j->mime_encode([$lf, $bpl])
Converts $str
to MIME-Header documented in RFC1522. When $lf is specified, it uses $lf to fold line (default: \n). When $bpl is specified, it uses $bpl for the number of bytes (default: 76; this number must be smaller than 76).

For Perl 5.8.1 or better, you can also encode MIME Header as:

  $mime_header = $j->MIME_Header;

In which case the resulting $mime_header is MIME-B-encoded UTF-8 whereas "$j->mime_encode()" returnes MIME-B-encoded ISO-2022-JP. Most modern MUAs support both.

$j->mime_decode;
Decodes MIME-Header in Jcode object. For perl 5.8.1 or better, you can also do the same as:

  Jcode->new($str, 'MIME-Header')

Hankaku vs. Zenkaku

$j->h2z([$keep_dakuten])
Converts X201 kana (Hankaku) to X208 kana (Zenkaku). When $keep_dakuten is set, it leaves dakuten as is (That is, ``ka + dakuten'' is left as is instead of being converted to ``ga'')

You can retrieve the number of matches via $j->nmatch;

$j->z2h
Converts X208 kana (Zenkaku) to X201 kana (Hankaku).

You can retrieve the number of matches via $j->nmatch;

Regexp emulators

To use "->m()" and "->s()", you need perl 5.8.1 or better.

$j->tr($from, $to, $opt);
Applies "tr/$from/$to/"
on Jcode object where $from and $to are EUC-JP strings. On perl 5.8.1 or better, $from and $to can also be flagged UTF-8 strings.

If $opt is set, "tr/$from/$to/$opt" is applied. $opt must be 'c', 'd' or the combination thereof.

You can retrieve the number of matches via $j->nmatch;

The following methods are available only for perl 5.8.1 or better.

$j->s($patter, $replace, $opt);
Applies "s/$pattern/$replace/$opt"
. $pattern and "replace" must be in EUC-JP or flagged UTF-8. $opt are the same as regexp options. See perlre for regexp options.

Like "$j->tr()", "$j->s()" returns the object itself so you can nest the operation as follows;

  $j->tr("a-z", "A-Z")->s("foo", "bar");
[@match = ] $j->m($pattern, $opt);
Applies "m/$patter/$opt"
. Note that this method DOES NOT RETURN AN OBJECT so you can't chain the method like "$j->s()".

Instance Variables

If you need to access instance variables of Jcode object, use access methods below instead of directly accessing them (That's what OOP is all about)

FYI, Jcode uses a ref to array instead of ref to hash (common way) to optimize speed (Actually you don't have to know as long as you use access methods instead; Once again, that's OOP)

$j->r_str
Reference to the EUC-coded String.
$j->icode
Input charcode in recent operation.
$j->nmatch
Number of matches (Used in $j->tr, etc.)