ncd(1) compute the Normalized Compression Distance

SYNOPSIS

ncd [ -c compressor ] [ -o filename ] [ -bcdhLnqsv ] [-o filestem ] [ -d|f|l|p|t string ] ... [arg1] [arg2]

DESCRIPTION

The Normalized Compression Distance between two objects is defined as

NCD(a,b) = (C(a,b) - min(C(a),C(b))) / max(C(a),C(b))
where
C(a,b)
means "the compressed size of the concatenation of a and b"
C(a)
means "the compressed size of a"
C(b)
means "the compressed size of b"
  

ncd will print a non-negative number (typically, but not always, 0 <= x < 1.1) representing how different the two objects are. Smaller numbers represent more similar files. The largest number is somewhere near 1. It is not exactly 1 due to imperfections in compression techniques or other irregularities underlying compressor, but for most standard compression algorithms you are unlikely to see a number above 1.1 in any case.

Three compressors are available by default: bzlib, zlib and blocksort. These may be selected with an option in the complearn.conf, see complearn (5) for more details.

ENUMERATION MODES

-f, --file-mode=FILE
select file mode
-l, --literal-mode=STRING
select string literal mode; this is the default. The next argument is a string which, if containing white space, may be enclosed in double-quotes (")
-p, --plainlist-mode=FILE
select plain list mode; argument is a file which contains a list of files to be individually evaluated
-t, --termlist-mode=FILE
select term list mode; argument is a file which contains string literals to be individually evaluated
-d, --directory-mode=DIR
select directory mode; argument is a path which contains files to be individually evaluated

OPTIONS

-c, --compressor=compressor
use and set compressor to use
-L, --list
list available builtin compressors as well as available compression modules. Modules are loaded from the modules subdirectory of /usr/lib/complearn.
-s, --size
get, in place of NCD, the compressed size of a single FILE, STRING, or DIR
-n, --nexus
Nexus output format for distance matrix
-o, --output=FILE
specify binary output filestem, if different from distmatrix, the default. An extension (.clb, .nex, or .txt) will be added, as appropriate to the output file type.
-b, --binary
output results to binary file; the default name is distmatrix.clb
-q, --quiet
suppress ASCII output and messages
-v, --verbose
activate verbose mode
-h, --help
show help options and exit

FILES

$HOME/.complearn/complearn.conf
/usr/share/complearn/complearn.conf
/usr/local/share/complearn/complearn.conf

 per-user and system configuration files
see complearn(5) for further details.

$HOME/.complearn/modules
/usr/lib/complearn/modules

 standard module automatic loading area.  Any shared object compressor
modules found here will be loaded on startup.

ENVIRONMENT

COMPLEARNMODPATH

 If this environment variable is set, CompLearn will search the given directory and load any CompLearn compression modules it finds there (such as the libart.so example included with the CompLearn source distribution) none

DIAGNOSTICS

none