Convert::YText(3) Quotes strings suitably for rfc2822 local part

VERSION

Version 0.2

SYNOPSIS

use Convert::YText qw(encode_ytext decode_ytext);

$encoded=encode_ytext($string); $decoded=decode_ytext($encoded);

($decoded eq $string) || die ``this should never happen!'';

DESCRIPTION

Convert::YText converts strings to and from ``YText'', a format inspired by xtext defined in RFC1894, the MIME base64 and quoted-printable types (RFC 1394). The main goal is encode a UTF8 string into something safe for use as the local part in an internet email address (RFC2822).

By default spaces are replaced with ``+'', ``/'' with ``~'', the characters ``A-Za-z0-9_.-'' encode as themselves, and everything else is written ``=USTR='' where USTR is the base64 (using ``A-Za-z0-9_.'' as digits) encoding of the unicode character code. The encoding is configurable (see below).

PROCEDURAL INTERFACE

The module can can export "encode_ytext" which converts arbitrary unicode string into a ``safe'' form, and "decode_ytext" which recovers the original text. "validate_ytext" is a heuristic which returns 0 for bad input.

OBJECT ORIENTED INTERFACE.

For more control, you will need to use the OO interface.

new

Create a new encoding object.

Arguments

Arguments are by name (i.e. a hash).

DIGIT_STRING ("A-Za-z0-9_.") Must be 64 characters long
ESCAPE_CHAR ('=') Must not be in digit string.
SPACE_CHAR ('+') Non digit to replace space. Can be the empty string.
SLASH_CHAR ( '~') Non digit to replace slash. Can be the empty string.
EXTRA_CHARS ('._\-') Other characters to leave unencoded.

encode

Arguments

a string to encode.

Returns

encoded string

decode

Arguments

a string to decode.

Returns

encoded string

valid

Simple necessary but not sufficient test for validity.

DISCUSSION

According to RFC 2822, the following non-alphanumerics are OK for the local part of an address: ``!#$%&'*+-/=?^_`{|}~''. On the other hand, it seems common in practice to block addresses having ``%!/|`#&?'' in the local part. The idea is to restrict ourselves to basic ASCII alphanumerics, plus a small set of printable ASCII, namely ``=_+-~.''.

The characters '+' and '-' are pretty widely used to attach suffixes (although usually only one works on a given mail host). It seems ok to use '+-', since the first marks the beginning of a suffix, and then is a regular character. The character '.' also seems mostly permissable.

AUTHOR

David Bremner, <[email protected]<gt>

COPYRIGHT

Copyright (C) 2011 David Bremner. All Rights Reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.