SYNOPSISncd [ -c compressor ] [ -o filename ] [ -bcdhLnqsv ] [-o filestem ] [ -d|f|l|p|t string ] ... [arg1] [arg2]
The Normalized Compression Distance between two objects is defined as
- NCD(a,b) = (C(a,b) - min(C(a),C(b))) / max(C(a),C(b))
- means "the compressed size of the concatenation of a and b"
- means "the compressed size of a"
means "the compressed size of b"
ncd will print a non-negative number (typically, but not always, 0 <= x < 1.1) representing how different the two objects are. Smaller numbers represent more similar files. The largest number is somewhere near 1. It is not exactly 1 due to imperfections in compression techniques or other irregularities underlying compressor, but for most standard compression algorithms you are unlikely to see a number above 1.1 in any case.
Three compressors are available by default: bzlib, zlib and blocksort. These may be selected with an option in the complearn.conf, see complearn (5) for more details.
- -f, --file-mode=FILE
- select file mode
- -l, --literal-mode=STRING
- select string literal mode; this is the default. The next argument is a string which, if containing white space, may be enclosed in double-quotes (")
- -p, --plainlist-mode=FILE
- select plain list mode; argument is a file which contains a list of files to be individually evaluated
- -t, --termlist-mode=FILE
- select term list mode; argument is a file which contains string literals to be individually evaluated
- -d, --directory-mode=DIR
- select directory mode; argument is a path which contains files to be individually evaluated
- -c, --compressor=compressor
- use and set compressor to use
- -L, --list
- list available builtin compressors as well as available compression modules. Modules are loaded from the modules subdirectory of /usr/lib/complearn.
- -s, --size
- get, in place of NCD, the compressed size of a single FILE, STRING, or DIR
- -n, --nexus
- Nexus output format for distance matrix
- -o, --output=FILE
- specify binary output filestem, if different from distmatrix, the default. An extension (.clb, .nex, or .txt) will be added, as appropriate to the output file type.
- -b, --binary
- output results to binary file; the default name is distmatrix.clb
- -q, --quiet
- suppress ASCII output and messages
- -v, --verbose
- activate verbose mode
- -h, --help
- show help options and exit
per-user and system configuration files
for further details.
standard module automatic loading area. Any shared object compressor
modules found here will be loaded on startup.
If this environment variable is set, CompLearn will search the given directory and load any CompLearn compression modules it finds there (such as the libart.so example included with the CompLearn source distribution) none