orheostream(2) large data streams

Other Alias

irheostream

ABSTRACT

This class provides a stream interface for large data management. File decompresion is assumed using gzip and a recursive seach in a directory list is provided for input.

     orheostream foo("NAME", "suffix");

is like

     ofstream foo("NAME.suffix").

However, if NAME does not end with `.suffix', then `.suffix' is automatically added. By default, compression is performed on the fly with gzip, adding an additional `.gz' suffix. The flush action is nicely handled in compression mode:

     foo.flush();

This feature allows intermediate results to be available during long computations. The compression can be deactivated while opening a file by an optional argument:

     orheostream foo("NAME", "suffix", io::nogz);

An existing compressed file can be reopen in append mode: new results will be appended at the end of an existing file:

     orheostream foo("NAME", "suffix", io::app);

Conversely,

        irheostream foo("NAME","suffix");

is like

        ifstream foo("NAME.suffix").

However, we look at a search path environment variable RHEOPATH in order to find NAME while suffix is assumed. Moreover, gzip compressed files, ending with the `.gz' suffix is assumed, and decompression is done.

Finally, a set of useful functions are provided.

DESCRIPTION

The following code:

        irheostream is("results", "data");

will recursively look for a `results[.data[.gz]]' file in the directory mentionned by the RHEOPATH environment variable.

For instance, if you insert in our ".cshrc" something like:

        setenv RHEOPATH ".:/home/dupont:/usr/local/math/demo"

the process will study the current directory `.', then, if neither `square.data.gz' nor `square.data' exits, it scan all subdirectory of the current directory. Then, if file is not founded, it start recusively in `/home/dupond' and then in `/usr/local/math/demo'.

File decompression is performed by using the gzip command, and data are pipe-lined directly in memory.

If the file start with `.' as `./square' or with a `/' as `/home/oscar/square', no search occurs and RHEOPATH environment variable is not used.

Also, if the environment variable RHEOPATH is not set, the default value is the current directory `.'.

For output stream:

        orheostream os("newresults", "data");

file compression is assumed, and "newresults.data.gz" will be created.

File loading and storing are mentionned by a message, either:

        ! load "./results.data.gz"

or:

        ! file "./newresults.data.gz" created.

on the clog stream. By adding the following:

        clog << noverbose;

you turn off these messages
 (see iorheo(4)).

IMPLEMENTATION

class irheostream : public boost::iostreams::filtering_stream<boost::iostreams::input> {
public:
    irheostream() : boost::iostreams::filtering_stream<boost::iostreams::input>() {}
    irheostream(const std::string& name, const std::string& suffix = std::string());
    virtual ~irheostream();
    void open  (const std::string& name, const std::string& suffix = std::string());
    void close();
protected:
    std::ifstream _ifs;
};
static const bool dont_gzip = false;
class orheostream : public boost::iostreams::filtering_stream<boost::iostreams::output> {
public:
    orheostream() : boost::iostreams::filtering_stream<boost::iostreams::output>() {}
    orheostream(const std::string& name, const std::string& suffix = std::string(),
        io::mode_type mode = io::out);
    virtual ~orheostream();
    void open  (const std::string& name, const std::string& suffix = std::string(),
        io::mode_type mode = io::out);
    void flush();
    void close();
    const std::string& filename() const { return _full_name; }
protected:
    void  _open_internal (io::mode_type mode);
    void _close_internal ();
// data:
    io::mode_type     _mode;
    std::string       _full_name;
};
std::string itos (std::string::size_type i);
std::string ftos (const Float& x);
// catch first occurence of string in file
bool scatch (std::istream& in, const std::string& ch, bool full_match = true);
// has_suffix("toto.suffix", "suffix") -> true
bool has_suffix (const std::string& name, const std::string& suffix);
// "toto.suffix" --> "toto"
std::string delete_suffix (const std::string& name, const std::string& suffix);
// "/usr/local/dir/toto.suffix" --> "toto.suffix"
std::string get_basename (const std::string& name);
// "/usr/local/dir/toto.suffix" --> "/usr/local/dir"
std::string get_dirname (const std::string& name);
// "toto" --> "/usr/local/math/data/toto.suffix"
std::string get_full_name_from_rheo_path (const std::string& rootname, const std::string& suffix);
// "." + "../geodir" --> ".:../geodir"
void append_dir_to_rheo_path (const std::string& dir);
// "../geodir" + "." --> "../geodir:."
void prepend_dir_to_rheo_path (const std::string& dir);
bool file_exists (const std::string& filename);
// string to float
bool is_float (const std::string&);
Float to_float (const std::string&);
// in TMPDIR environment variable or "/tmp" by default
std::string get_tmpdir();