cutadapt(1) remove adapter sequences from high-throughput sequencing reads

SYNOPSIS

cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq

For paired-end reads:

cutadapt -a ADAPT1 -A ADAPT2 [options] -o out1.fastq -p out2.fastq in1.fastq in2.fastq

DESCRIPTION

Replace "ADAPTER" with the actual sequence of your 3' adapter. IUPAC wildcard characters are supported. The reverse complement is *not* automatically searched. All reads from input.fastq will be written to output.fastq with the adapter sequence removed. Adapter matching is error-tolerant. Multiple adapter sequences can be given (use further -a options), but only the best-matching adapter will be removed.

Input may also be in FASTA format. Compressed input and output is supported and auto-detected from the file name (.gz, .xz, .bz2). Use the file name '-' for standard input/output. Without the -o option, output is sent to standard output.

OPTIONS

--help
show all command-line options
--version
show program's version number and exit
-h, --help
show this help message and exit
--debug
Print debugging information.
-f FORMAT, --format=,FORMAT/
Input file format; can be either 'fasta', 'fastq' or 'sra-fastq'. Ignored when reading csfasta/qual files. Default: auto-detect from file name extension.
Finding adapters::
Parameters -a, -g, -b specify adapters to be removed from each read (or from the first read in a pair if data is paired). If specified multiple times, only the best matching adapter is trimmed (but see the --times option). When the special notation 'file:FILE' is used, adapter sequences are read from the given FASTA file.
-a ADAPTER, --adapter=,ADAPTER/
Sequence of an adapter ligated to the 3' end (paired data: of the first read). The adapter and subsequent bases are trimmed. If a '$' character is appended ('anchoring'), the adapter is only found if it is a suffix of the read.
-g ADAPTER, --front=,ADAPTER/
Sequence of an adapter ligated to the 5' end (paired data: of the first read). The adapter and any preceding bases are trimmed. Partial matches at the 5' end are allowed. If a '^' character is prepended ('anchoring'), the adapter is only found if it is a prefix of the read.
-b ADAPTER, --anywhere=,ADAPTER/
Sequence of an adapter that may be ligated to the 5' or 3' end (paired data: of the first read). Both types of matches as described under -a und -g are allowed. If the first base of the read is part of the match, the behavior is as with -g, otherwise as with -a. This option is mostly for rescuing failed library preparations - do not use if you know which end your adapter was ligated to!
-e ERROR_RATE, --error-rate=,ERROR_RATE/
Maximum allowed error rate (no. of errors divided by the length of the matching region). Default: 0.1
--no-indels
Allow only mismatches in alignments. Default: allow both mismatches and indels
-n COUNT, --times=,COUNT/
Remove up to COUNT adapters from each read. Default: 1
-O MINLENGTH, --overlap=,MINLENGTH/
If the overlap between the read and the adapter is shorter than MINLENGTH, the read is not modified. Reduces the no. of bases trimmed due to random adapter matches. Default: 3
--match-read-wildcards
Interpret IUPAC wildcards in reads. Default: False
-N, --no-match-adapter-wildcards
Do not interpret IUPAC wildcards in adapters.
--no-trim
Match and redirect reads to output/untrimmed-output as usual, but do not remove adapters.
--mask-adapter
Mask adapters with 'N' characters instead of trimming them.
Additional read modifications:
-u LENGTH, --cut=,LENGTH/
Remove bases from each read (first read only if paired). If LENGTH is positive, remove bases from the beginning. If LENGTH is negative, remove bases from the end. Can be used twice if LENGTHs have different signs.
-q [5'CUTOFF,]3'CUTOFF, --quality-cutoff=,[5/'CUTOFF,]3'CUTOFF
Trim low-quality bases from 5' and/or 3' ends of each read before adapter removal. Applied to both reads if data is paired. If one value is given, only the 3' end is trimmed. If two comma-separated cutoffs are given, the 5' end is trimmed with the first cutoff, the 3' end with the second.
--nextseq-trim=,3/'CUTOFF
NextSeq-specific quality trimming (each read). Trims also dark cycles appearing as high-quality G bases (EXPERIMENTAL).
--quality-base=,QUALITY_BASE/
Assume that quality values in FASTQ are encoded as ascii(quality + QUALITY_BASE). This needs to be set to 64 for some old Illumina FASTQ files. Default: 33
--trim-n
Trim N's on ends of reads.
-x PREFIX, --prefix=,PREFIX/
Add this prefix to read names. Use {name} to insert the name of the matching adapter.
-y SUFFIX, --suffix=,SUFFIX/
Add this suffix to read names; can also include {name}
--strip-suffix=,STRIP_SUFFIX/
Remove this suffix from read names if present. Can be given multiple times.
--length-tag=,TAG/
Search for TAG followed by a decimal number in the description field of the read. Replace the decimal number with the correct length of the trimmed read. For example, use --length-tag 'length=' to correct fields like 'length=123'.
Filtering of processed reads:
--discard-trimmed, --discard
Discard reads that contain an adapter. Also use -O to avoid discarding too many randomly matching reads!
--discard-untrimmed, --trimmed-only
Discard reads that do not contain the adapter.
-m LENGTH, --minimum-length=,LENGTH/
Discard trimmed reads that are shorter than LENGTH. Reads that are too short even before adapter removal are also discarded. In colorspace, an initial primer is not counted. Default: 0
-M LENGTH, --maximum-length=,LENGTH/
Discard trimmed reads that are longer than LENGTH. Reads that are too long even before adapter removal are also discarded. In colorspace, an initial primer is not counted. Default: no limit
--max-n=,COUNT/
Discard reads with too many N bases. If COUNT is an integer, it is treated as the absolute number of N bases. If it is between 0 and 1, it is treated as the proportion of N's allowed in a read.
Output:
--quiet
Print only error messages.
-o FILE, --output=,FILE/
Write trimmed reads to FILE. FASTQ or FASTA format is chosen depending on input. The summary report is sent to standard output. Use '{name}' in FILE to demultiplex reads into multiple files. Default: write to standard output
--info-file=,FILE/
Write information about each read and its adapter matches into FILE. See the documentation for the file format.
-r FILE, --rest-file=,FILE/
When the adapter matches in the middle of a read, write the rest (after the adapter) into FILE.
--wildcard-file=,FILE/
When the adapter has N bases (wildcards), write adapter bases matching wildcard positions to FILE. When there are indels in the alignment, this will often not be accurate.
--too-short-output=,FILE/
Write reads that are too short (according to length specified by -m) to FILE. Default: discard reads
--too-long-output=,FILE/
Write reads that are too long (according to length specified by -M) to FILE. Default: discard reads
--untrimmed-output=,FILE/
Write reads that do not contain the adapter to FILE. Default: output to same file as trimmed reads
Colorspace options:
-c, --colorspace
Enable colorspace mode: Also trim the color that is adjacent to the found adapter.
-d, --double-encode
Double-encode colors (map 0,1,2,3,4 to A,C,G,T,N).
-t, --trim-primer
Trim primer base and the first color (which is the transition to the first nucleotide)
--strip-f3
Strip the _F3 suffix of read names
--maq, --bwa
MAQ- and BWA-compatible colorspace output. This enables -c, -d, -t, --strip-f3 and -y '/1'.
--no-zero-cap
Do not change negative quality values to zero in colorspace data. By default, they are since many tools have problems with negative qualities.
-z, --zero-cap
Change negative quality values to zero. This is enabled by default when -c/--colorspace is also enabled. Use the above option to disable it.
Paired-end options:
The -A/-G/-B/-U options work like their -a/-b/-g/-u counterparts, but are applied to the second read in each pair.
-A ADAPTER
3' adapter to be removed from second read in a pair.
-G ADAPTER
5' adapter to be removed from second read in a pair.
-B ADAPTER
5'/3 adapter to be removed from second read in a pair.
-U LENGTH
Remove LENGTH bases from second read in a pair (see --cut).
-p FILE, --paired-output=,FILE/
Write second read in a pair to FILE.
--pair-filter=(any|both)
Which of the reads in a paired-end read have to match the filtering criterion in order for it to be filtered. Default: any
--interleaved
Read and write interleaved paired-end reads.
--untrimmed-paired-output=,FILE/
Write second read in a pair to this FILE when no adapter was found in the first read. Use this option together with --untrimmed-output when trimming pairedend reads. Default: output to same file as trimmed reads
--too-short-paired-output=,FILE/
Write second read in a pair to this file if pair is too short. Use together with --too-short-output.
--too-long-paired-output=,FILE/
Write second read in a pair to this file if pair is too long. Use together with --too-long-output.

Citation

Marcel Martin. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.Journal, 17(1):10-12, May 2011. http://dx.doi.org/10.14806/ej.17.1.200

AUTHOR

This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.