Paranoid::IO::Line(3) Paranoid Line-based I/O functions

VERSION

$Id: lib/Paranoid/IO/Line.pm, 2.01 2016/06/23 00:34:49 acorliss Exp $

SYNOPSIS


use Paranoid::IO::Line;
PIOMAXLNSIZE = 4096;
$nlines = sip($filename, @lines);
$nlines = sip($filename, @lines, 1);
$nlines = tailf($filename, @lines);
$nlines = tailf($filename, @lines, 1);
$nlines = tailf($filename, @lines, 1, -100);
piolClose($filename);
$nlines = slurp($filename, @lines);

DESCRIPTION

This module extends and leverages Paranoid::IO's capabilities with an eye towards line-based text files, such as log files. It does so while maintaining a paranoid stance towards I/O. For that reason the functions here only work on limited chunks of data at a time, both in terms of maximum memory kept in memory at a time and the maximum record length. Paranoid::IO provides PIOMAXFSIZE which controls the former, but this module provides PIOMAXLNSIZE which controls the latter.

Even with the paranoid slant of these functions they should really be treated as convenience functions which can simplify higher level code without incurring any significant risk to the developer or system. They inherit not only opportunistic I/O but platform-agnostic record separators via internal use of pchomp from Paranoid::Input.

NOTE: while this does build off the foundation provided by Paranoid::IO it is important to note that you should not work on the same files using :<Paranoid::IO>'s functions while also using the functions in this module. While the former works from raw I/O the latter has to manage buffers in order to identify record boundaries. If you were to, say, sip from a file, then pread or pseek elsewhere it would render those buffers not only useless, but corrupt. This is important to note since the functions here do leverage the file handle caching features provided by popen.

It should also be noted that since we're anticipating line-based records we expect every line, even the last line in a file, to be properly terminated with a record separator (new line sequence).

As with all Paranoid modules string descriptions of errors can be retrieved from Paranoid::ERROR as they occur.

SUBROUTINES/METHODS

PIOMAXLNSIZE

The valute returned/set by this lvalue function is the maximum line length supported by functions like sip (documented below). Unless explicitly set this defaults to 2KB. Any lines found which exceed this are discarded.

sip

    $nlines = sip($filename, @lines);
    $nlines = sip($filename, @lines, 1);

This function allows you to read a text file into memory in chunks, the lines of which are placed into the passed array reference. The chunks are read in at up to PIOMAXFSIZE in size at a time. File locking is used and autochomping is also supported.

This returns the number of lines extracted or boolean false if any errors occurred, such as lines exceeding PIOMAXLNSIZE or other I/O errors. If there were no errors but also no content it will return 0 but true, which will satisfy boolean tests.

The passed array is always purged prior to execution. This can potentially help differentiate types of errors:

    $nlines = sip($filename, @lines);
    warn "successfully extracted lines" 
        if $nlines and scalar @lines;
    warn "no errors, but no lines" 
        if $nlines and ! scalar @lines;
    warn "line length exceeded on some lines" 
        if !$nlines and scalar @lines;
    warn "I/O errors or all lines exceeded line length" 
        if !$nlines and ! scalar @lines;

Typically, if all one cares about is extracting good lines and discarding bad ones all you need is:

    warn "good to go" if scalar @lines or $nlines;
 
    # or, more likely:
    if (@lines) {
        # process input...
    }

NOTE: sip does try to check the file stat with every call. This allows us to automatically flush buffers and reopen files in the event that the file you're sipping from was truncated, deleted, or overwritten.

nlsip

    $nlines = nlsip($filename, @lines);
    $nlines = nlsip($filename, @lines, 1);

A very thin wrapper for sip that disables file locking.

tailf

    $nlines = tailf($filename, @lines);
    $nlines = tailf($filename, @lines, 1);
    $nlines = tailf($filename, @lines, 1, -100);

The only difference between this function and sip is that tailf opens the file and immediately seeks to the end. If an optional fourth argument is passed it will seek backwards to extract and return that number of lines (if possible). Depending on the number passed one must be prepared for enough memory to be allocated to store PIOMAXLNSIZE * that number. If no number is specified it is assumed to be -10. Specifying this argument on a file already opened by sip or tailf will have no effect.

Return values are identical to sip.

nltailf

    $nlines = nltailf($filename, @lines);
    $nlines = nltailf($filename, @lines, -100);
    $nlines = nltailf($filename, @lines, -100, 1);

A very thin wrapper for tailf that disables file locking.

slurp

  $nlines = slurp($filename, @lines);
  $nlines = slurp($filename, @lines, 1);

This function is essentially another wrapper for sip, but with some different behavior. While sip was written from the expectation that the developer would be either working on chunks from a very large file or a file that may grow while being accessed. slurp, on the other hand, expects to work exclusively on small files that can safely fit into memory. It also sees no need to cache file handles since all operations will subsequently be done in memory.

Files with slurp are explicitly closed after the read. All the normal safeguards apply: PIOMAXFSIZE is the largest amount of data that will be read into memory, and all lines must be within PIOMAXLNSIZE.

nlslurp

  $nlines = nlslurp($filename, @lines);
  $nlines = nlslurp($filename, @lines, 1);

A very thin wrapper for slurp that disables file locking.

piolClose

  $rv = piolClose($filename);

This closes all file handles and deletes any existing buffers. Works indiscriminatley and returns the exit value of pclose.

DEPENDENCIES

  • Fcntl
  • Paranoid
  • Paranoid::Debug
  • Paranoid::Input
  • Paranoid::IO

BUGS AND LIMITATIONS

While all of these functions will just as happily accept file handles as well as file names doing will almost certainly cause any number of bugs. Beyond the inherited Paranoid::IO issues (like not getting the fork-safe features for any file handle opened directly by the developer) there are other issues. Buffers, for instance, can only be managed by one consistent name, there is no way to correlate them and make them interchangeable. There are other subtleties as well, but there is no need to detail them all.

Suffice it to say that when using this module one should only use file names, and use them consistently.

AUTHOR

Arthur Corliss ([email protected])

LICENSE AND COPYRIGHT

This software is licensed under the same terms as Perl, itself. Please see http://dev.perl.org/licenses/ for more information.

(c) 2005 - 2015, Arthur Corliss ([email protected])