VERSION
version 0.017SYNOPSIS
use utf8::all; # Turn on UTF-8, all of it.
open my $in, '<', 'contains-utf8'; # UTF-8 already turned on here
print length 'foo bXr'; # 7 UTF-8 characters
my $utf8_arg = shift @ARGV; # @ARGV is UTF-8 too (only for main)
DESCRIPTION
The "use utf8" pragma tells the Perl parser to allow UTF-8 in the program text in the current lexical scope. This also means that you can now use literal Unicode characters as part of strings, variable names, and regular expressions."utf8::all" goes further:
- "charnames" are imported so "\N{...}" sequences can be used to compile Unicode characters based on names.
- On Perl "v5.11.0" or higher, the "use feature 'unicode_strings'" is enabled.
- "use feature fc" and "use feature unicode_eval" are enabled on Perl 5.16.0 and higher.
- Filehandles are opened with UTF-8 encoding turned on by default (including STDIN, STDOUT, STDERR). Meaning that they automatically convert UTF-8 octets to characters and vice versa. If you don't want UTF-8 for a particular filehandle, you'll have to set "binmode $filehandle".
- @ARGV gets converted from UTF-8 octets to Unicode characters (when "utf8::all" is used from the main package). This is similar to the behaviour of the "-CA" perl command-line switch (see perlrun).
- "readdir", "readlink", "readpipe" (including the "qx//" and backtick operators), and "glob" (including the "<>" operator) now all work with and return Unicode characters instead of (UTF-8) octets.
Lexical scope
The pragma is lexically-scoped, so you can do the following if you had some reason to:
{ use utf8::all; open my $out, '>', 'outfile'; my $utf8_str = 'foo bXr'; print length $utf8_str, "\n"; # 7 print $out $utf8_str; # out as utf8 } open my $in, '<', 'outfile'; # in as raw my $text = do { local $/; <$in>}; print length $text, "\n"; # 10, not 7!
Instead of lexical scoping, you can also use "no utf8::all" to turn off the effects.
Note that the effect on @ARGV and the "STDIN", "STDOUT", and "STDERR" file handles is always global!
INTERACTION WITH AUTODIE
If you use autodie, which is a great idea, you need to use at least version 2.12, released on June 26, 2012 <https://metacpan.org/source/PJF/autodie-2.12/Changes#L3>. Otherwise, autodie obliterates the IO layers set by the open pragma. See RT #54777 <https://rt.cpan.org/Ticket/Display.html?id=54777> and GH #7 <https://github.com/doherty/utf8-all/issues/7>.AVAILABILITY
The project homepage is <http://metacpan.org/release/utf8-all/>.The latest version of this module is available from the Comprehensive Perl Archive Network (CPAN). Visit <http://www.perl.com/CPAN/> to find a CPAN site near you, or see <https://metacpan.org/module/utf8::all/>.
SOURCE
The development version is on github at <http://github.com/doherty/utf8-all> and may be cloned from <git://github.com/doherty/utf8-all.git>BUGS AND LIMITATIONS
You can make new bug reports, and view existing ones, through the web interface at <https://github.com/doherty/utf8-all/issues>.COMPATIBILITY
The filesystems of Dos, Windows, and OS/2 do not (fully) support UTF-8. The "readlink" and "readdir" functions and "glob" operators will therefore not be replaced on these systems.AUTHORS
- Michael Schwern <[email protected]>
- Mike Doherty <[email protected]>
COPYRIGHT AND LICENSE
This software is copyright (c) 2009 by Michael Schwern <[email protected]>.This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.