File::RsyncP(3) Perl Rsync client

SYNOPSIS


use File::RsyncP;
my $rs = File::RsyncP->new({
logLevel => 1,
rsyncCmd => "/bin/rsync",
rsyncArgs => [
"--numeric-ids",
"--perms",
"--owner",
"--group",
"--devices",
"--links",
"--ignore-times",
"--block-size=700",
"--relative",
"--recursive",
"-v",
],
});
#
# Receive files from remote srcDirectory to local destDirectory
# by running rsyncCmd with rsyncArgs.
#
$rs->remoteStart(1, srcDirectory);
$rs->go(destDirectory);
$rs->serverClose;
#
# Send files to remote destDirectory from local srcDirectory
# by running rsyncCmd with rsyncArgs.
#
$rs->remoteStart(0, destDirectory);
$rs->go(srcDirectory);
$rs->serverClose;
#
# Receive files from a remote module to local destDirectory by
# connecting to an rsyncd server. ($module is the name from
# /etc/rsyncd.conf.)
#
my $port = 873;
$rs->serverConnect($host, $port);
$rs->serverService($module, $authUser, $authPasswd, 0);
$rs->serverStart(1, ".");
$rs->go(destDirectory);
$rs->serverClose;
#
# Get finals stats. This is a hashref containing elements
# totalRead, totalWritten, totalSize, plus whatever the FileIO
# module might add.
#
my $stats = $rs->statsFinal;

DESCRIPTION

File::RsyncP is a perl implementation of an Rsync client. It is compatible with Rsync 2.5.5 - 2.6.3 (protocol versions 26-28). It can send or receive files, either by running rsync on the remote machine, or connecting to an rsyncd deamon on the remote machine.

What use is File::RsyncP? The main purpose is that File::RsyncP separates all file system I/O into a separate module, which can be replaced by any module of your own design. This allows rsync interfaces to non-filesystem data types (eg: databases) to be developed with relative ease.

File::RsyncP was initially written to provide an Rsync interface for BackupPC, <http://backuppc.sourceforge.net>. See BackupPC for programming examples.

File::RsyncP does not yet provide a command-line interface that mimics native Rsync. Instead it provides an API that makes it possible to write simple scripts that talk to rsync or rsyncd.

The File::RsyncP::FileIO module contains the default file system access functions. File::RsyncP::FileIO may be subclassed or replaced by a custom module to provide access to non-filesystem data types.

Getting Started

First some background. When you run rsync is parses its command-line arguments, then it either connects to a remote rsyncd daemon, or runs an rsync on the remote machine via ssh or rsh. At this point there are two rsync processes: the one you invoked and the one on the remote machine. The one on the local machine is called the client, and the one on the remote machine is the server. One side (either the client or server) will send files and the other will receive files. The sending rsync generates a file list and sends it to the receiving side. The receiving rsync will fork a child process.

File::RsyncP does not (yet) have a command-line script that mimics rsync's startup processing. Think of File::RsyncP as one level below the command-line rsync. File::RsyncP implements the client side of the connection, and File::RsyncP knows how to run the remote side (eg, via rsh or ssh) or to connect to a remote rsyncd daemon. File::RsyncP automatically adds the internal --server and --sender options (if necessary) to the options passed to the remote rsync.

To initiate any rsync session the File::RsyncP->new function should be called. It takes a hashref of parameters:

logLevel
An integer level of verbosity. Zero means be quiet, 1 will give some general information, 2 will some output per file, higher values give more output. 10 will include byte dumps of all data read/written, which will make the log output huge.
rsyncCmd
The command to run the remote peer of rsync. By default the rsyncArgs are appended to the rsyncCmd to create the complete command before it is run. This behavior is affected by rsyncCmdType.

rsyncCmd can either be a single string giving the path of the rsync command to run (eg: /bin/rsync) or a list containing the command and arguments, eg:

    rsyncCmd => [qw(
        /bin/ssh -l user host /bin/rsync
    )],

or:

    rsyncCmd => ["/bin/ssh", "-l", $user, $host, "/bin/rsync"],

Also, rsyncCmd can also be set to a code reference (ie: a perl sub). In this case the code is called without arguments or other processing. It is up to the perl code you supply to exec() the remote rsync.

This option is ignored if you are connecting to an rsyncd daemon.

rsyncCmdType
By default the complete remote rsync command is created by taking rsyncCmd and appending rsyncArgs. This beavhior can be modified by specifying certain values for rsyncCmdType:
'full'
rsyncCmd is taken to be the complete command, including all rsync arguments. It is the caller's responsibility to build the correct remote rsync command, togheter will all the rsync arguments. You still need to specify rsyncArgs, so the local File::RsyncP knows how to behave.
'shell'
rsyncArgs are shell escaped before appending to rsyncCmd.

This option is ignored if you are connecting to an rsyncd daemon.

rsyncArgs
A list of rsync arguments. The full remote rsync command that is run will be rsyncCmd appended with --server (and optionally --sender if the remote is a sender) and finally all of rsyncArgs.
protocol_version
What we advertize our protocol version to be. Default is 28.
logHandler
A subroutine reference to a function that handles all the log messages. The default is a subroutine that prints the messages to STDERR.
pidHandler
An optional subroutine reference to a function that expects two integers: the pid of the rsync process (ie: the pid on the local machine that is likely ssh) and the child pid when we are receiving files. If defined, this function is called once when the rsync process is forked, and again when the child is forked during receive.
fio
The file IO object that will handle all the file system IO. The default is File::RsyncP::FileIO->new.

This can be replaced with a new module of your choice, or you can subclass File::RsyncP::FileIO.

timeout
Timeout in seconds for IO. Default is 0, meaning no timeout. Uses alarm() and it is the caller's responsbility to catch the alarm signal.
doPartial
If set, a partial rsync is done. This is to support resuming full backups in BackupPC. When doPartial is set, the --ignore-times option can be set on a per-file basis. On each file in the file list, File::RsyncP::FileIO->ignoreAttrOnFile() is called on each file, and this returns whether or not attributes should be ignored on that file. If ignoreAttrOnFile() returns 1 then it's as though --ignore-times was set for that file.

An example of calling File::RsyncP->new is:

    my $rs = File::RsyncP->new({
                logLevel   => 1,
                rsyncCmd => ["/bin/rsh", $host,  "-l", $user, "/bin/rsync"],
                rsyncArgs  => [
                        "--numeric-ids",
                        "--perms",
                        "--owner",
                        "--group",
                        "--devices",
                        "--links",
                        "--ignore-times",
                        "--block-size=700",
                        "--relative",
                        "--recursive",
                        "-v",
                    ],
            });

A fuller example showing most of the parameters and qw() for the rsyncArgs is:

    my $rs = File::RsyncP->new({
                logLevel   => 1,
                rsyncCmd => ["/bin/rsh", $host,  "-l", $user, "/bin/rsync"],
                rsyncArgs  => [qw(
                        --numeric-ids
                        --perms
                        --owner
                        --group
                        --devices
                        --links
                        --ignore-times
                        --block-size=700
                        --relative
                        --recursive
                        -v
                    )],
                logHandler => sub {
                        my($str) = @_;    
                        print MyHandler "log: $str\n";
                    };
                fio        => File::RsyncP::FileIO->new({
                                logLevel   => 1,
                            });
            });

Talking to a remote Rsync

File::RsyncP can talk to a remote rsync using this sequence of functions:
remoteStart(remoteSend, remoteDir)
Starts the remote server by executing the command specified in the rsyncCmd parameter to File::RsyncP->new, together with the rsyncArgs.

If the client is receiving files from the server then remoteSend should be non-zero and remoteDir is the source directory on the remote machine. If the client is sending files to the remote server then remoteSend should be zero and remoteDir is the destination directory on the remote machine. Returns undef on success and non-zero on error.

go(localDir)
Run the client rsync. localDir is the source directory on the local machine if the client is sending files, or it is the destination directory on the local machine if the client is receiving files. Returns undef on success.
serverClose()
Call this after go() to finish up. Returns undef on success.
statsFinal()
This can be optionally called to pickup the transfer stats. It returns a hashref containing elements totalRead, totalWritten, totalSize, plus whatever the FileIO module might add.
abort()
Call this function to abort the transfer.

An example of sending files to a remote rsync is:

    #
    # Send files to remote destDirectory from local srcDirectory
    # by running rsyncCmd with rsyncArgs.
    #
    $rs->remoteStart(0, destDirectory);
    $rs->go(srcDirectory);
    $rs->serverClose;

An example of receiving files from a remote rsync is:

    #
    # Receive files from remote srcDirectory to local destDirectory
    # by running rsyncCmd with rsyncArgs.
    #
    $rs->remoteStart(1, srcDirectory);
    $rs->go(destDirectory);
    $rs->serverClose;

Talking to a remote Rsync daemon

File::RsyncP can connect to a remote Rsync daemon using this sequence of functions:
serverConnect(host, port)
Connect to the Rsync daemon on the given string host and integer port. The port argument is optional and it defaults to 873. On error serverConnect returns a string error message. On success it returns undef.
serverService(module, authUser, authPasswd, authRequired)
Specify which module to use (a ``module'' is the symbolic name that appears inside ``[...]'' /etc/rsyncd.conf), the user's credentials (authUser and authPasswd) and whether authorization is mandatory (authRequired). If set to a non-zero value, authRequired ensures that the remote Rsync daemon requires authentication. If necessary, this is to ensure that you don't connect to an insecure Rsync daemon. The auth arguments are optional if the selected rsyncd module doesn't require authentication.

See the rsyncd.conf manual page for more information. For example, if a host called navajo had a /etc/rsyncd.conf contains these lines:

   [test]
           path = /data/test
           comment = test module
           auth users = craig, celia
           secrets file = /etc/rsyncd.secrets

and /etc/rsyncd.secrets contained:

    craig:xxx

then you could connect to this rsyncd using:

    $rs->serverConnect("navajo", 873);
    $rs->serverService("test", "craig", "xxx", 0);

The value of the authRequired argument doesn't matter in this case.

On error serverService returns a string error message. On success it returns undef.

serverStart(remoteSend, remoteDir)
Starts the remote server. If the client is receiving files from the server then remoteSend should be non-zero. If the client is sending files to the remote server then remoteSend should be zero. The remoteDir typically starts with the module name, followed by any directory below the module. Or remoteDir can be just ``.'' to refer to the top-level module directory. Returns undef on success.
go(localDir)
Run the client rsync. localDir is the source directory on the local machine if the client is sending files, or it is the destination directory on the local machine if the client is receiving files. Returns undef on success.
serverClose()
Call this after go() to finish up. Returns undef on success.
abort()
Call this function to abort the transfer.

An example of sending files to a remote rsyncd daemon is:

    #
    # Send files to a remote module from a local srcDirectory by
    # connecting to an rsyncd server.  ($module is the name from
    # /etc/rsyncd.conf.)
    #
    my $port = 873;
    $rs->serverConnect($host, $port);
    $rs->serverService($module, $authUser, $authPasswd);
    $rs->serverStart(0, ".");
    $rs->go(srcDirectory);
    $rs->serverClose;

An example of receiving files from a remote rsyncd daemon is:

    #
    # Receive files from a remote module to local destDirectory by
    # connecting to an rsyncd server.  ($module is the name from
    # /etc/rsyncd.conf.)
    #
    my $port = 873;
    $rs->serverConnect($host, $port);
    $rs->serverService($module, $authUser, $authPasswd);
    $rs->serverStart(1, ".");
    $rs->go(destDirectory);
    $rs->serverClose;

LIMITATIONS

The initial version of File::RsyncP (0.10) has a number of limitations:
  • File::RsyncP only implements a modest subset of Rsync options and features. In particular, as of 0.10 only these options are supported:

            --numeric-ids
            --perms|-p
            --owner|-o
            --group|-g
            --devices|D
            --links|-l
            --ignore-times|I
            --block-size=i
            --verbose|-v
            --recursive|-r
            --relative|-R
    

    Hardlinks are currently not supported. Other options that only affect the remote side will work correctly since they are passed to the remote Rsync unchanged.

  • Also, --relative semantics are not implemented to match rsync, and the trailing ``/'' behavior of rsync (meaning directory contents, not the directory itself) are not implemented in File::RsyncP.
  • File::RsyncP does not yet provide a command-line interface that mimics native Rsync.
  • File::RsyncP might work with slightly earlier versions of Rsync but has not been tested. It certainly will not work with antique versions of Rsync.
  • File::RsyncP does not compute file deltas (ie: it behaves as though --whole-file is specified) or implement exclude or include options when sending files. File::RsyncP does handle file deltas and exclude and include options when receiving files.
  • File::RsyncP does not yet implement server functionality (acting like the remote end of a connection or a daemon). Since the protocol is relatively symmetric this is not difficult to add, so it should appear in a future version.

AUTHOR

File::RsyncP::FileList was written by Craig Barratt <[email protected]> based on rsync 2.5.5.

Rsync was written by Andrew Tridgell <[email protected]> and Paul Mackerras. It is available under a GPL license. See http://rsync.samba.org.

LICENSE

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License in the LICENSE file along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.