israndom(1)
randomness testing using data compressors over fixedsize alphabets
SYNOPSIS
israndom [a alphasize] [c compressor] [s samplelen] [qhnr] [filename]
DESCRIPTION
israndom
tests a sequence of symbols for randomness. israndom tries to determine if a given sequence of trials could reasonably be assumed to be from a random uniform distribution over a fixedsize alphabet of 2256 symbols.
 israndom assumes that each sequence (or sample trial) is represented by exactly one byte. The only exceptions to this rule are in the case of the

n
and
r
options which ignore newlines and carriage returns, respectively (see below).
 israndom is based on the mathematical ideas of Shannon, Kolmogorov, and Cilibrasi and uses the following formula to determine an expected size for a sample of

k
trials of a uniform distribution over an
alphasize
symbol alphabet.
Each symbol takes
log(alphasize)
bits, so the total cost (in bits)
c
for the ensemble of samples is
k log(alphasize)
bits. This number is rounded up to the nearest byte and increased by one to arrive at the final estimate of the expected communication cost on the assumption of uniform randomness.
 If the compressed size of

k
samples is less than
c
then this represents a
randomness deficiency
and the randomness test fails. israndom will exit with a nonzero exit status. If israndom indicates that a source is nonrandom, this fact is effectively certain if the compression module is correct and invertable. If the compressed size is at least the threshhold value
c
then the file appears to be random and passes the test and israndom will exit with a 0 return value. In either case, it will print the alphabet size, expected compressed size, sample count, and randomness difference before exitting with an appropriate return code.
 The default number of samples is 393216. Although larger sizes should increase accuracy, using too few samples will cause the method to fail to be able to resolve randomness in certain situations. This is a theoretically unavoidable fact for all effective randomness tests.

 If a filename is given, it is read to find the samples to analyze. If the filename "" is given, or no filename is given at all, then israndom reads from standard input.

 If text files are to be used, it is important to specify one or both of n and r since without these, end of line characters will be misinterpreted as samples.

OPTIONS
 c compressor_name

set compressor explicitly to compressor_name instead of the default, bzlib. For basic analysis, bzlib is usually sufficient. For detecting complex or subtle biases, a more powerful compression module such as lzma (lzmax) or ppmd (ppmdx) will detect more types of nonrandomness. Because LempelZiv types are universal, all effective randomness tests can be captured as a kind of compression discriminant function.
 n

ignore newlines (so that text files may be used)
 r

ignore carriage returns (so that text files may be used)
 a alphasize

set alphabet size to alphasize an integer between 2 and 256. If you do not specify an alphabet size, it is automatically determined by the contents of the samples.
 s samplecount

Use samplecount samples instead of the default of 393216. Using a number that is too small here will reduce the accuracy of the test, causing everything to appear to be random. If 0 is used, it means to read until EOF.
 q

quiet mode, with no extra status messages
 h

print help and exit.
EXAMPLES
First, we can verify that the cryptographicly strong random number generator is correct:
 israndom /dev/urandom

 Next, we can notice that the "od" command, without extra options, is not random because it prints out addresses and spaces predictably. Most compressors can tell by the regular spaces that it is not random:

 od /dev/urandom  israndom n r

 but if we remove spaces using 'tr' then a more powerful compressor, lzmax, is required to demonstrate the nonrandomness of the sequence:

 od /dev/urandom  tr d ' '  israndom n r c lzmax

 Removing the address lines using an

od
option yields the expected result once again that the sequence is effectively random:
 od An /dev/urandom  tr d ' '  israndom n r c lzmax

 The above sequence is not actually random, because every third octal digit

only ranges from 0 to 3 since 377 octal is the same as 256 decimal. This
subtle pattern is detectable using 10 million samples and the advanced
ppmdx compressor:
 od An /dev/urandom  tr d ' '  israndom n r c ppmdx s 10000000

 As a sanity check, we see that even in extreme analysis as above, /dev/urandom

still checks out okay as random, even with newlines and carriage returns
removed for good measure.
 cat /dev/urandom  israndom n r c ppmdx s 10000000

ENVIRONMENT
No environment variables.
BUGS
Please report bugs to the Debian BTS.