SYNOPSIS
fastq-mcf [,options/] ,<adapters.fa> <reads.fq> /[,mates1.fq /...]DESCRIPTION
Version: 1.04.676Detects levels of adapter presence, computes likelihoods and locations (start, end) of the adapters. Removes the adapter sequences from the fastq file(s).
Stats go to stderr, unless -o is specified.
Specify -0 to turn off all default settings
If you specify multiple 'paired-end' inputs, then a -o option is required for each. IE: -o read1.clip.q -o read2.clip.fq
OPTIONS
- -h
- This help
- -o FIL
- Output file (stats to stdout)
- -s N.N
- Log scale for adapter minimum-length-match (2.2)
- -t N
- % occurance threshold before adapter clipping (0.25)
- -m N
- Minimum clip length, overrides scaled auto (1)
- -p N
- Maximum adapter difference percentage (10)
- -l N
- Minimum remaining sequence length (19)
- -L N
- Maximum remaining sequence length (none)
- -D N
- Remove duplicate reads : Read_1 has an identical N bases (0)
- -k N
- sKew percentage-less-than causing cycle removal (2)
- -x N
- 'N' (Bad read) percentage causing cycle removal (20)
- -q N
- quality threshold causing base removal (10)
- -w N
- window-size for quality trimming (1)
- -H
- remove >95% homopolymer reads (no)
- -X
- remove low complexity reads (no)
- -0
- Set all default parameters to zero/do nothing
- -U|u
- Force disable/enable Illumina PF filtering (auto)
- -P N
- Phred-scale (auto)
- -R
- Don't remove N's from the fronts/ends of reads
- -n
- Don't clip, just output what would be done
- -C N
- Number of reads to use for subsampling (300k)
- -S
- Save all discarded reads to '.skip' files
- -d
- Output lots of random debugging stuff
Quality adjustment options:
- --cycle-adjust
- CYC,AMT Adjust cycle CYC (negative = offset from end) by amount AMT
- --phred-adjust
- SCORE,AMT Adjust score SCORE by amount AMT
- --phred-adjust-max
- SCORE Adjust scores > SCORE to SCOTE
Filtering options*:
- --[mate-]qual-mean
- NUM Minimum mean quality score
- --[mate-]qual-gt
- NUM,THR At least NUM quals > THR
- --[mate-]max-ns
- NUM Maxmium N-calls in a read (can be a %)
- --[mate-]min-len
- NUM Minimum remaining length (same as -l)
- --homopolymer-pct
- PCT Homopolymer filter percent (95)
- --lowcomplex-pct
- PCT Complexity filter percent (95)
If mate- prefix is used, then applies to second non-barcode read only
Adapter files are 'fasta' formatted:
Specify n/a to turn off adapter clipping, and just use filters
Increasing the scale makes recognition-lengths longer, a scale of 100 will force full-length recognition of adapters.
Adapter sequences with _5p in their label will match 'end's, and sequences with _3p in their label will match 'start's, otherwise the 'end' is auto-determined.
Skew is when one cycle is poor, 'skewed' toward a particular base. If any nucleotide is less than the skew percentage, then the whole cycle is removed. Disable for methyl-seq, etc.
Set the skew (-k) or N-pct (-x) to 0 to turn it off (should be done for miRNA, amplicon and other low-complexity situations!)
Duplicate read filtering is appropriate for assembly tasks, and never when read length < expected coverage. -D 50 will use 4.5GB RAM on 100m DNA reads - be careful. Great for RNA assembly.
*Quality filters are evaluated after clipping/trimming
Homopolymer filtering is a subset of low-complexity, but will not be separately tracked unless both are turned on.