fitsmd5(1) Compute/update the DATAMD5 keyword/value

SYNOPSIS

fitsmd5 [-u] [-s] [-a] <FITS files...>

DESCRIPTION

fitsmd5 computes the MD5 signature of all data sections in a FITS file, and prints out the results on stdout. This command can optionally update the main FITS header in modifying the value of the DATAMD5 key.

This command is useful to give a unique ID to a FITS file. The algorithm simply browses through all data sections in the input file and passes the data blocks to an MD5 hash function. The final result is a 128-bit signature that can be used to uniquely identify the file.

This approach is meant to provide a tool to tag FITS files with unique IDs, it is not meant to be used as a checksum for file integrity (the CKSUM key is the solution for that), although it could be used in that spirit. The main point is that only data sections are taken into account, leaving the possibility of changing the headers without affecting the data signature.

MD5 hashing is cryptographically strong, which means the probability of having two different FITS files getting the same ID is almost zero. It should be good enough to assign a unique ID to several tens of thousands of frames. Since there is still a tiny but non-zero possibility that two different files will get an identical key, this approach is not recommended to tag very large numbers of files (typically: millions of them). If you do have a large database of FITS files, using a timestamp is usually a better approach.

The MD5 signature is a good solution to tag a list of FITS files which might have originated from various sources on which the database maintainer has no control. Typically, calibration databases holding calibration frames for a given instrument, receive data from different actors who might not be in sync with unique file naming conventions. This command makes sure it is always possible to assign a unique ID to each frame.

Notice that if the input FITS file has no data section, the returned MD5 key will be non-zero (it is exactly d41d8cd98f00b204e9800998ecf8427e). This signature also offers the interesting property that if two files have exactly the same pixels (bit-wise comparisons) they will get the same ID, this is useful e.g. for regression tests.

If you want to produce files containing the DATAMD5 key in their main headers, you should use the qfits library, which always inserts this key. If you are working with other FITS-processing software, you should allocate an empty DATAMD5 placeholder and apply this command with the -u option to update the value.

Notice that this command can also compute the MD5 sum of a complete file, not just its data sections (see -a option). In this mode, the command is completely identical to the GNU md5sum command, which is used to compute checksums on files. Input files in that case need not be FITS, though they still need to be regular files.

OPTIONS

-u
Try to update the DATAMD5 keyword in the main header if present.
-s
Silent mode: run without printing any message.
-a
Compute the MD5 sum on all bits in the file. In this mode, the command behaves like the GNU md5sum command, to be used e.g. as a checksum. This option excludes all others.

FILES

Input files to fitsmd5 shall comply with the FITS format, except when used with -a option.