sam-stats(1) ea-utils: produce digested statistics


sam-stats [,options/] [,file1/] [,file2/...,filen/]


Version: 1.38.681

Produces lots of easily digested statistics for the files listed

Options (default in parens):

-D Keep track of multiple alignments -O PREFIX Output prefix enabling extended output (see below) -R FIL Coverage/RNA output (coverage, 3' bias, etc, implies -A) -A Report all chr sigs, even if there are more than 1000 -b INT Number of reads to sample for per-base stats (1M) -S INT Size of ascii-signature (30) -x FIL File extension for handling multiple files (stats) -M Only overwrite if newer (requires -x, or multiple files) -B Input is bam, don't bother looking at magic -z Don't fail when zero entries in sam


If one file is specified, then the output is to standard out. If multiple files are specified, or if the -x option is supplied, the output file is <filename>.<ext>. Default extension is 'stats'.

Complete Stats:

: mean, max, stdev, median, Q1 (25 percentile), Q3
: # of entries in the sam file, might not be # reads
: phred scale used
: # reads used for qual stats
mapped reads
: number of aligned reads (unique probe id sequences)
mapped bases
: total of the lengths of the aligned reads
: number of forward-aligned reads
: number of reverse-aligned reads
snp rate
: mismatched bases / total bases (snv rate)
ins rate
: insert bases / total bases
del rate
: deleted bases / total bases
pct mismatch
: percent of reads that have mismatches
pct align
: percent of reads that aligned
len <STATS>
: read length stats, ignored if fixed-length
mapq <STATS>
: stats for mapping qualities
insert <STATS>
: stats for insert sizes
: percentage of mapped bases per chr, followed by a signature

Subsampled stats (1M reads max):

base qual <STATS> : stats for base qualities %A,%T,%C,%G : base percentages

Meaning of the per-chromosome signature:

A ascii-histogram of mapped reads by chromosome position. It is only output if the original SAM/BAM has a header. The values are the log2 of the # of mapped reads at each position + ascii '0'.

Extended output mode produces a set of files:

: primary output
: fastx-toolkit compatible output
: per-reference counts & coverage
: mismatch distribution
: length distribution (if applicable)
: mapping quality distribution