man amanda-archive-format (5): Format of amanda archive streams

DESCRIPTION

The Amanda archive format is designed to be a simple, efficient means of interleaving multiple simultaneous files, allowing an arbitrary number of data streams for a file. It is a streaming format in the sense that the writer need not know the size of files until they are completely written to the archive, and the reader can process the archive in constant space.

DATA MODEL

The data stored in an archive consists of an unlimited number of files. Each file consists of a number of "attributes", each identified by a 16-bit ID. Each attribute can contain an unlimited amount of data.

Attribute IDs less than 16 (AMAR_ATTR_APP_START) are reserved for special purposes, but the remaining IDs are available for application-specific uses.

STRUCTURE

RECORDS

A record can be either a header record or a data record. A header record serves as a "checkpoint" in the file, with a magic value that can be used to recognize archive files.

A header record has a fixed size of 28 bytes, as follows:

  28 bytes:    magic string

The magic string is the ASCII text "AMANDA ARCHIVE FORMAT " followed by a decimal representation of the format version number (currently '1'), padded to 28 bytes with NUL bytes.

A data record has a variable size, as follows:

  2 bytes:     file number
  2 bytes:     attribute ID
  4 bytes:     data size (N)
  N bytes:     data

The file number and attribute ID serve to identify the data stream to which this data belongs. The low 31 bits of the data size give the number of data bytes following, while the high bit (the EOA bit) indicates the end of the attribute, as described below. Because records are generally read into memory in their entirety, the data size must not exceed 4MB (4194304 bytes). All integers are in network byte order.

A header record is distinguished from a data record by the magic string. The file number 0x414d, corresponding to the characters "AM", is forbidden and must be skipped on writing.

Attribute ID 0 (AMAR_ATTR_FILENAME) gives the filename of a file. This attribute is mandatory for each file, must be nonempty, must fit in a single record, and must precede any other attributes for the same file in the archive. The filename should be a printable string (ASCII or UTF-8), to facilitate use of generic archive-display utilities, but the format permits any nonempty bytestring. The filename cannot span multiple records.

Attribute ID 1 (AMAR_ATTR_EOF) signals the end of a file. This attribute must contain no data, but should have the EOA bit set.

CONNECTION TO DATA MODEL

Each file in an archive is assigned a file number distinct from any other active file in the archive. The first record for a file must have attribute ID 0 (AMAR_ATTR_FILENAME), indicating a filename. A file ends with an empty record with ID 1 (AMAR_ATTR_EOF). For every file at which a reader might want to begin reading, the filename record should be preceded by a header record. How often to write header records is left to the discretion of the application.

All data records with the same file number and attribute ID are considered a part of the same attribute. The boundaries between such records are not significant to the contents of the attribute, and both readers and writers are free to alter such boundaries as necessary.

The final data record for each attribute has the high bit (the EOA bit) of its data size field set. A writer must not reuse an attribute ID within a file. An attribute may be terminated by a record containing both data and an EOA bit, or by a zero-length record with its EOA bit set.

AUTHOR

Dustin J. Mitchell <[email protected]>

: Zmanda, Inc. (http://www.zmanda.com)