App::CELL::Guide(3) Introduction to App::CELL (POD-only module)

VERSION

Version 0.219

SYNOPSIS


$ perldoc App::CELL::Guide

INTRODUCTION

App::CELL is the Configuration, Error-handling, Localization, and Logging (CELL) framework for applications written in Perl. In the ``APPROACH'' section, this Guide describes the CELL approach to each of these four areas, separately. Then, in the </RATIONALE> section, it presents the author's reasons for bundling them together.

HISTORY

CELL was written by Smithfarm in 2013 and 2014, initially as part of the Dochazka project [[ link to SourceForge ]]. Due to its generic nature, it was spun off into a separate project.

GENERAL APPROACH

This section presents CELL's approach to each of its four principal functions: ``Configuration'', ``Error handling'', Localization, and Logging.

Approach to configuration

CELL provides the application developer and site administrator with a straightforward and powerful way to define configuration parameters as needed by the application. If you are familiar with Request Tracker, you will know that there is a directory ("/opt/..." by default) which contains two files, called "RT_Config.pm" and "RT_SiteConfig.pm" --- as their names would indicate, they are actually Perl modules. The former is provided by the upstream developers and contains all of RT's configuration parameters and their ``factory default'' settings. The content of the latter is entirely up to the RT site administrator and contains only those parameters that need to be different from the defaults. Parameter settings in "RT_SiteConfig.pm", then, override the defaults set in "RT_Config.pm".

App::CELL provides this same functionality in a drop-in Perl module, with some subtle differences. While RT uses a syntax like this:

   set( 'MY_PARAM', ...arguments...);

where "...arguments..." is a list of scalar values (as with any Perl subroutine), App::CELL uses a slightly different format:

   set( 'MY_PARAM', $scalar );

where $scalar can be any scalar value, i.e. including references.

(Another difference is that App::CELL provides both immutable site parameters _and_ mutable "meta" configuration parameters, whereas RT's meta parameters are only used by RT itself.) For more information on configuration, see ``Configuration in depth''.

Error handling

To facilitate error handling and make the application's source code easier to read and understand, or at least mitigate its impenetrability, CELL provides the App::CELL::Status module, which enables functions in the application to return status objects if desired.

Status objects have the following principal attributes: "level", "code", "args", and "payload", which are given by the programmer when the status object is constructed, as well as attributes like "text", "lang", and "caller", which are derived by CELL. In addition to the attributes, "Status.pm" also provides some useful methods for processing status objects.

In order to signify an error, subroutine "foo_dis" could for example do this:

    return $CELL->status_err( code => 'Gidget displacement %s out of range',
        args => [ $displacement ],
    );

(Instead of having the error text in the "code", it could be placed in a message file in the sitedir with a code like DISP_OUT_OF_RANGE.)

On success, "foo_dis" could return an 'OK' status with the gidget displacement value in the payload:

    return $CELL->status_ok( payload => $displacement );

The calling function could check the return value like this:

    my $status = foo_dis();
    return $status if $status->not_ok;
    my $displacement = $status->payload;

For details, see App::CELL::Status and App::CELL::Message.

CELL's error-handling logic is inspired by brian d foy's article ``Return error objects instead of throwing exceptions''

    L<http://www.effectiveperlprogramming.com/2011/10/return-error-objects-instead-of-throwing-exceptions/>

Localization

This CELL component, called ``Localization'', gives the programmer a way to encapsulate a ``message'' (in its simplest form, a string) within a message object and then use that object in various ways.

So, provided the necessary message files have been loaded, the programmer could do this:

    my $message = $CELL->message( code => 'FOOBAR' );
    print $message->text, '\n'; # message FOOBAR in the default language
    print $message->text( lang => 'de' ) # same message, in German

Messages are loaded when CELL is initialized, from files in the site configuration directory. Each file contains messages in a particular language. For example, the file "Dochazka_Message_en.conf" contains messages relating to the Dochazka application, in the English language. To provide the same messages in German, the file would be copied to "Dochazka_Message_de.conf" and translated.

Since message objects are used by App::CELL::Status, it is natural for the programmer to put error messages, warnings, etc. in message files and refer to them by their codes.

"App::CELL::Message" could also be extended to provide methods for encrypting messages and/or converting them into various target formats (JSON, HTML, Morse code, etc.).

For details, see </Localization in depth> and <App::CELL::Message>.

Logging

For logging, CELL uses Log::Any and optionally extends it by adding the caller's filename and line number to each message logged.

Message and status objects have 'log' methods, of course, and by default all statuses (except 'OK') are logged upon creation.

Here's how to set up (and do) logging in the application:

    use App::CELL::Log qw( $log );
    $log->init( ident => 'AppFoo' );
    $log->debug( "Three lines into AppFoo" );

App::CELL::Log provides its own singleton, but since all method calls are passed to Log::Any, anyway, the App::CELL::Log singleton behaves just like its Log::Any counterpart. This is useful, e.g., for testing log messages:

    use Log::Any::Test;
    $log->contains_only_ok( "Three lines into AppFoo" );

To actually see your log messages, you have to do something like this:

    use Log::Any::Adapter ('File', $ENV{'HOME'} . '/tmp/CELLtest.log');

DETAILED SPECIFICATIONS

Configuration

Three types of parameters

CELL recognizes three types of configuration parameters: "meta", "core", and "site". These parameters and their values are loaded from files prepared and placed in the sitedir in advance.

Meta parameters

Meta parameters are by definition mutable: the application can change a meta parameter's value any number of times, and App::CELL will not care. Initial "meta" param settings are placed in a file entitled "$str_MetaConfig.pm" (where $str is a string free of underscore characters) in the sitedir. For example, if the application name is FooApp, its initial "meta" parameter settings could be contained in a file called "FooApp_MetaConfig.pm". At initialization time, App::CELL looks in the sitedir for files matching this description, and attempts to load them. (See ``How configuration files are named''.)

Core parameters

As in Request Tracker, "core" paramters have immutable values and are intended to be used as ``factory defaults'', set by the developer, that the site administrator can override by setting site parameters. If the application is called FooApp, its core configuration settings could be contained in a file called "FooApp_Config.pm" located in the sitedir. (See ``How configuration files are named'' for details.)

Site parameters

Site parameters are kept separate from core parameters, but are closely related to them. As far as the application is concerned, there are only site parameters. How this works is best explained by two examples.

Let "FOO" be an application that uses App::CELL.

In the first example, core param "FOO" is set to ``Bar'' and site param "FOO" is not set at all. When the application calls "$site->FOO" the core parameter value ``Bar'' is returned.

In the second example, the core param "FOO" is set to ``Bar'' and site param "FOO" is also set, but to a different value: ``Whizzo''. In this scenario, when the application calls "$site->FOO" the site parameter (``Whizzo'') value is returned.

This setup allows the site administrator to customize the application.

Site parameters are set in a file called "$str_SiteConfig.pm", where $str could be the appname.

Conclusion

How these three types of parameters are defined and used is up to the application. As far as App::CELL is concerned, they are all optional.

App::CELL itself has its own internal meta, core, and site parameters, but these are located elsewhere --- in the so-called ``sharedir'', a directory that is internal to the App::CELL distro/package.

All these internal parameters start with "CELL_" and are stored in the same namespaces as the application's parameters. That means the application programmer should avoid using parameters starting with "CELL_".

Where configuration files are located

sitedir

Configuration parameters are placed in specially-named files within a directory referred to by App::CELL as the ``site configuration directory'', or ``sitedir''. This directory is not a part of the App::CELL distribution and App::CELL does not create it. Instead, the application is expected to provide the full path to this directory to CELL's initialization route, either via an argument to the function call or with the help of an environment variable. CELL's initialization routine calls App::CELL::Load::init to do the actual work of walking the directory.

This ``sitedir'' (site configuration directory) is assumed to be the place (or a place) where the application can store its configuration information in the form of "core", "site", and "meta" parameters. For ``LOCALIZATION'' purposes, "message" codes and their corresponding texts (in one or more languages) can be stored here as well, if desired.

sharedir

CELL itself has an analogous configuration directory, called the ``sharedir'', where it's own internal configuration defaults are stored. CELL's own core parameters can be overridden by the application's site params, and in some cases this can even be desirable. For example, the parameter "CELL_DEBUG_MODE" can be overridden in the site configuration to tell CELL to include debug-level messages in the log.

During initialization, CELL walks first the sharedir, and then the sitedir, looking through those directories and all their subdirectories for meta, core, site, and message configuration files.

The sharedir is part of the App::CELL distro and CELL's initialization routine finds it via a call to the "dist_dir" routine in the File::ShareDir module.

How the sitedir is specified

The sitedir must be created and populated with configuration files by the application programmer. Typically, this directory would form part of the application distro and the site administrator would be expected to make a site configuration file for application-specific parameters. The application developer and site administrator have flexibility in this regard --- CELL's initialization routine, "$CELL->load" will work without a sitedir, with one sitedir, or even with multiple sitedirs.

No sitedir

It is possible, but probably not useful, to call "$CELL->load" without a sitedir parameter and without any sitedir specified in the environment. In this case, CELL just loads the sharedir and returns OK.

One sitedir

If there is only one sitedir, there are three possible ways to specify it to CELL's load routine: (1) a "sitedir" parameter, (2) an "enviro" parameter, or (3) the hard-coded "CELL_SITEDIR" environment variable.

Multiple sitedirs

If the application needs to load configuration parameters from multiple sitedirs, this can be accomplished simply by calling "$CELL->load" multiple times with different "sitedir" arguments.

Sitedir search algorithm

Every time it is called, the load routine uses the following algorithm to search for a/the sitedir:

"sitedir" parameter --- a "sitedir" parameter containing the full path to the sitedir can be passed. If it is present, CELL will try it first. If needed for portability, the path can be constructed using File::Spec (e.g. the "catfile" method) or similar. It should be string containing the full path to the directory. If the "sitedir" argument points to a valid sitedir, it is loaded and OK is returned. If a "sitedir" argument is present but invalid, an ERR status results. If no "sitedir" argument was given, CELL continues to the next step.
"enviro" parameter --- if no "sitedir" parameter is given, "$CELL->load" looks for a parameter called "enviro" which it interprets as the name of an environment variable containing the sitedir path. If the "enviro" argument points to a valid sitedir, it is loaded and OK is returned. If an "enviro" argument is present but invalid, an ERR status results. If there is no "enviro" argument at all, CELL continues to the next step.
"CELL_SITEDIR" environment variable --- if no viable sitedir can be found by consulting the function call parameters, CELL's load routine falls back to this hardcoded environment variable. If the "CELL_SITEDIR" environment variable exists and points to a valid sitedir, it is loaded and OK is returned. If it exists but the directory is invalid, an ERR status is returned. If the environment variable doesn't exist, CELL writes a warning to the log (all attempts to find the sitedir failed). The return status in this case can be either WARN (if no sitedir was found in a previous call to the function) or OK if at least one sitedir has been loaded.

The "load" routine is re-entrant: it can be called any number of times. On first call, it will load CELL's own sharedir, as well as any sitedir that can be found using the above algorithm. All further calls will just run the sitedir search algorithm again. Each time it will find and load at most one sitedir. CELL maintains a list of loaded sitedirs in "$meta->CELL_META_SITEDIR_LIST".

For examples of how to call the "load" routine, see ``SYNOPSIS'' in App::CELL.

How configuration files are named

Once it finds a valid sitedir, CELL walks it (including all its subdirectories), assembling a list of filenames matching one four regular expressions:

"^.+_MetaConfig.pm$" (meta)
"^.+_Config.pm$" (core)
"^.+_SiteConfig.pm$" (site)
"^.+_Message(_[^_]+){0,1}.conf$" (message)

Files with names that don't match any of the above regexes are ignored.

After the directory is walked, the files are loaded (i.e. parsed for config params and messages).

The syntax of these files is simple and should be obvious from an examination of CELL's own configuration files in the sharedir ("config/" in the distro). All four types of configuration file are there, with comments.

Since the configuration files are Perl modules, Perl itself is leveraged to parse them. Values can be any legal scalar value, so references to arrays, hashes, or subroutines can be used, as well as simple numbers and strings. For details, see ``SITE CONFIGURATION DIRECTORY'', App::CELL::Config and App::CELL::Load.

Message file parsing is done by a parsing routine that resides in App::CELL::Load. For details on the syntax and how the parser works, see LOCALIZATION.

Configuration diagnostics

CELL provides several ways for the application to find out if the configuration files were loaded properly. First of all, the load routine ("$CELL->load") returns a status object: if the status is not OK, something went wrong and the application should look at the status more closely.

After program control returns from the load routine, the following methods and attributes can be used to find out what happened:

"$site->CELL_SHAREDIR_LOADED" (boolean value)
"$site->CELL_SHAREDIR_FULLPATH" (full path of CELL's sharedir)
"$meta->CELL_META_SITEDIR_LOADED" (boolean value: true if at least one sitedir has been loaded)
"$meta->CELL_META_SITEDIR_LIST" (reference to a list of all sitedirs that have been loaded --- full paths)

Verbose and debug mode

The load routine takes two options to increase its verbosity. The first option, "verbose", can be passed like this:

    my $status = $CELL->load( verbose => 1 );

It causes the load routine to write additional information to the log. Since even this can easily be too much, the default value for "verbose" is zero (terse logging).

The load routine also has a "debug" mode which should be activated in combination with "verbose". Debug mode is actually a function of the CELL logger, and is activated like this:

    $log->init( debug_mode => 1 );

Ordinarily the logger suppresses all log messages below "info" level (i.e., "debug" and "trace"). When "debug_mode" is activated, all messages are logged, regardless of level.

Error handling

STATUS OBJECTS

The most frequent case will be a status code of ``OK'' with no message (shown here with optional ``payload'', which is whatever the function is supposed to return on success:

    # all green
    return App::CELL::Status->new( level => 'OK',
                                  payload => $my_return_value,
                                );

To ensure this is as simple as possible in cases when no return value (other than the simple fact of an OK status) is needed, we provide a special constructor method:

    # all green
    return App::CELL::Status->ok;

In most other cases, we will want the status message to be linked to the filename and line number where the "new" method was called. If so, we call the method like this:

    # relative to me
    App::CELL::Status->new( level => 'ERR', 
                           code => 'CODE1',
                           args => [ 'foo', 'bar' ],
                         );

It is also possible to report the caller's filename and line number:

    # relative to my caller
    App::CELL::Status->new( level => 'ERR', 
                           code => 'CODE1',
                           args => [ 'foo', 'bar' ],
                           caller => [ caller ],
                         );

It is also possible to pass a message object in lieu of "code" and "msg_args" (this could be useful if we already have an appropriate message on hand):

    # with pre-existing message object
    App::CELL::Status->new( level => 'ERR', 
                           msg_obj => $my_msg;
                         );

Permitted levels are listed in the @permitted_levels package variable in "App::CELL::Log".

Localization

Introduction

To an application programmer, localization may seem like a daunting proposition, and All strings the application displays to users must be replaced by variable names. Then you have to figure out where to put all the strings, translate them into multiple languages, write a library (or find an existing one) to display the right string in the right language at the right time and place. What is more, the application must be configurable, so the language can be changed by the user or the site administrator.

All of this is a lot of work, particularly for already existing, non-localized applications, but even for new applications designed from the start to be localizable.

App::CELL's objective is to provide a simple, straightforward way to write and maintain localizable applications in Perl. Notice the key word ``localizable'' --- the application may not, and most likely will not, be localized in the initial stages of development, but that is the time when localization-related design decisions need to be made. App::CELL tries to take some of the guesswork out of those decisions.

Later, when it really is time for the application to be translated into one or more additional languages, this becomes a relatively simple matter of translating a bunch of text strings that are grouped together in one or more configuration files with syntax so trivial that no technical expertise is needed to work with them. (Often, the person translating the application is not herself technically inclined.)

Localization with App::CELL

All strings that may potentially need be localized (even if we don't have them translated into other languages yet) are placed in message files under the site configuration directory. In order to be found and parsed by App::CELL, message files must meet some basic conditions:
1. file name format: "AppName_Message_lang.conf"
2. file location: anywhere under the site configuration directory
3. file contents: must be parsable

Format of message file names

At initialization time, App::CELL walks the site configuration directory tree looking for filenames that meet certain regular expressions. The regular expression for message files is:

    ^.+_Message(_[^_]+){0,1}.conf$

In less-precise human terms, this means that the initialization routine looks for filenames consisting of at least three, but possibly four, components:

1. the application name (this can be anything)
2. followed by "_Message"
3. optionally followed by "_languagetag" where "languagetag" is a language tag (see "..link.." for details)
4. ending in ".conf"

Examples:

    CELL_Message.conf
    CELL_Message_en.conf
    CELL_Message_cs-CZ.conf
    DifferentApplication_Message.conf

Location of message files

As noted above, message files will be found as long as they are readable and located anywhere under the base site configuration directory. For details on how this base site configuration directory is searched for and determined, see ``..link..''.

How message files are parsed

Message files are parsed line-by-line. The parser routine is "parse_message_file" in the "CELL::Load" module. Lines beginning with a hash sign ('#') are ignored. The remaining lines are divided into ``stanzas'', which must be separated by one or more blank lines.

Stanzas are interpreted as follows: the first line of the stanza should contain a message code, which is simply a string. Any legal Perl scalar value can be used, as long as it doesn't contain white space. CELL itself uses ALL_CAPS strings starting with "CELL_".

The remaining lines of the stanza are assumed to be the message text. Two caveats here:

1. In the configuration file, message text strings can be written on multiple lines
2. However, this is intended purely as a convenience for the application programmer. When "parse_message_file" encounters multiple lines of text, it simply concatenated them together to form a single, long string.

For details, see the "parse_message_file" function in "App::CELL::Load", as well as App::CELL's own message file(s) in "config/CELL" directory of the App::CELL distro.

How the language is determined

Internally, each message text string is stored along with a language tag, which defines which language the message text is written in. The language tag is derived from the filename using a regular expression like this one:

    _Message_([^_]+).conf$

(The part in parentheses signifies the part between "_Message_" and ".conf" --- this is stored in the "language" attribute of the message object.)

No sanity checks are conducted on the language tag. Whatever string the regular expression produces becomes the language tag for all messages in that file. If no language tag is found, CELL first looks for a config parameter called "CELL_DEFAULT_LANGUAGE" and, failing that, the hard-coded fallback value is "en".

I'll repeat that, since it's important: CELL assumes that the message file names contain the relevant language tag. If the message file name is "MyApp_Message_foo-bar.conf", then CELL will tag all messages in that file as being in the "foo-bar" language. Message files can also be named like this: "MyApp_Message.conf", i.e. without a language tag. In this case, CELL will attempt to determine the default language from a site configuration parameter ("CELL_DEFAULT_LANGUAGE"). If this parameter is not set, then CELL will give up and assume that all message text strings are in English (language tag "en" --- CELL's author's native tongue).

Language tags in general

See the W3C's ``Language tags in HTML and XML'' white paper for a detailed explanation of language tags:

    L<http://www.w3.org/International/articles/language-tags/>

And see here for list of all language tags:

    L<http://www.langtag.net/registries/lsr-language.txt>

Note that you should use hyphens, and not underscores, to separate components within the language tag, i.e.:

    MyApp_Message_cs-CZ.conf   # correct
    MyApp_Message_cs_CZ.conf   # WRONG!!

Non-ASCII characters in config/message file names: may or may not work. Better to avoid them.

Normal usage

In normal usage, the programmer adds messages to the respective message files. After CELL initialization, these messages (or, more precisely, message code-language pairs) will be available to the programmer to use, either directly via CELL::Message->new or indirectly as status codes.

If a message code has text strings in multiple languages, these language variants can be obtained by specifying the "lang" parameter to CELL::Message->new. If the "lang" parameter is not specified, CELL will always try to use the default language ("CELL_DEF_LANG" or English if that parameter has not been set).

Logging

CELL's logging facility is based on Log::Any. In practice, this means that App::CELL::Log is simply a wrapper around this useful module. To use it, one imports the Log::Any singleton via App::CELL like this:

    use App::CELL qw( $log );

Since this is the Log::Any singleton, all Log::Any methods can be used with it. CELL provides some conveniences, but they are optional. Actually, if the developer does not intend to use any of CELL's conveniences, there is no reason to import it through App::CELL at all and one can use Log::Any directly. In this case, CELL's log messages will go to the same log as the application's provided the Log::Any category is the same as the CELL "appname".

See ``Verbose and debug mode'' for a description of how to increase logging verbosity of the load routine.

CAVEATS

Internal parameters

App::CELL stores its own parameters (mostly meta and core, but also one site param) in a separate directory, but when loaded they end up in the same namespaces as the application's meta, core, and site parameters. The names of these internal parameters are always prefixed with "CELL_".

Therefore, the application programmer should avoid using parameters starting with "CELL_".

Mutable and immutable parameters

It is important to realize that, although core parameters can be overriden by site parameters, internally the values of both are immutable. Although it is possible to change them by cheating, the 'set' method of $core and $site will refuse to change the value of an existing core/site parameter.

Therefore, use $meta to store mutable values.

Taint mode

Since it imports configuration data at runtime from files supplied by the user, App::CELL should not be run under taint mode. The "load" routine checks this and will refuse to do anything if running with "-T".

To recapitulate: don't run App::CELL in taint mode.

Installation issues with CELL internal sharedir

The easiest way to install App::CELL is to use a package manager (e.g. "zypper"). Another way to install directly from CPAN using, e.g., "cpanm"). The former way installs to the "vendor_perl" tree, while the latter installs to the "site_perl" tree.

If you install two different versions of App::CELL, one via package manager and another directly from CPAN, a conflict can arise, and it may be necessary to examine CELL's log to determine which one is being used.

Even after running, e.g., "cpanm -U App::CELL", to uninstall from "site_perl", I found that CELL's internal sharedir remained intact in the "site_perl" tree and had to be wiped manually.

As long as you always install either one way or other other (i.e. package manager or direct from CPAN), you won't get bitten by this.

COMPONENTS

App::CELL

This top-level module exports a singleton, $CELL, which is all the application programmer needs to gain access to the CELL's key functions.

App::CELL::Config

This module provides CELL's Configuration functionality.

App::CELL::Guide

This guide.

App::CELL::Load

This module hides all the complexity of loading messages and config params from files in two directories: (1) the App::CELL distro sharedir containing App::CELL's own configuration, and (2) the site configuration directory, if present.

App::CELL::Log

Logging is accomplished by using and extending Log::Any.

App::CELL::Message

Localization is on the wish-list of many software projects. With CELL, the programmer can easily design and write my application to be localizable from the very beginning, without having to invest much effort.

App::CELL::Status

Provides CELL's error-handling functionality. Since status objects inherit from message objects, the application programmer can instruct CELL to generate localized status messages (errors, warnings, notices) if desired.

App::CELL::Test

Some routines used by CELL's test suite.

App::CELL::Util

Some generalized utility routines.

RATIONALE

In the author's experience, applications written for ``users'' (however that term may be defined) frequently need to:
1. be configurable by the user or site administrator
2. handle errors robustly, without hangs and crashes
3. potentially display messages in various languages
4. log various types of messages to syslog

Since these basic functions seem to work well together, CELL is designed to provide them in an integrated, well-documented, straightforward, and reusable package.