man gmap (1): Genomic Mapping and Alignment Program

SYNOPSIS

gmap [,OPTIONS/...] ,<FASTA files/...,>, or/ cat <FASTA files...> | gmap [OPTIONS...]

OPTIONS

Input options (must include -d or -g)

-D, --dir=,directory/: Genome directory. Default (as specified by --with-gmapdb to the configure program) is ,/var/cache/gmap/
-d, --db=,STRING/: Genome database. If argument is '?' (with the quotes), this command lists available databases.
-k, --kmer=,INT/: kmer size to use in genome database (allowed values: 16 or less). If not specified, the program will find the highest available kmer size in the genome database
--sampling=,INT/: Sampling to use in genome database. If not specified, the program will find the smallest available sampling value in the genome database within selected k-mer size
-G, --genomefull: Use full genome (all ASCII chars allowed; built explicitly during setup), not compressed version
-g, --gseg=,filename/: User-supplied genomic segment
-1, --selfalign: Align one sequence against itself in FASTA format via stdin (Useful for getting protein translation of a nucleotide sequence)
-2, --pairalign: Align two sequences in FASTA format via stdin, first one being genomic and second one being cDNA
--cmdline=,STRING/,STRING: Align these two sequences provided on the command line, first one being genomic and second one being cDNA
-q, --part=,INT//INT: Process only the i-th out of every n sequences e.g., 0/100 or 99/100 (useful for distributing jobs to a computer farm).
--input-buffer-size=,INT/: Size of input buffer (program reads this many sequences at a time for efficiency) (default 1000)

Computation options

-B, --batch=,INT/: Batch mode (default = 2)

         Mode     Offsets       Positions       Genome
           0      see note      mmap            mmap
           1      see note      mmap & preload  mmap
(default) 2      see note      mmap & preload  mmap & preload
           3      see note      allocate        mmap & preload
           4      see note      allocate        allocate
           5      expand        allocate        allocate
Note: For a single sequence, all data structures use mmap: If mmap not available and allocate not chosen, then will use fileio (very slow)
Note about --batch and offsets: Expansion of offsets can be controlled: independently by the --expand-offsets flag. The --batch=,5/ option is equivalent to --batch=,4/ plus --expand-offsets=,1/
--expand-offsets=,INT/: Whether to expand the genomic offsets index Values: 0 (no, default), or 1 (yes). Expansion gives faster alignment, but requires more memory
--nosplicing: Turns off splicing (useful for aligning genomic sequences onto a genome)
--min-intronlength=,INT/: Min length for one internal intron (default 9). Below this size, a genomic gap will be considered a deletion rather than an intron.
--max-intronlength-middle=,INT/: Max length for one internal intron (default 200000)
--max-intronlength-ends=,INT/: Max length for first or last intron (default 10000)
--trim-end-exons=,INT/: Trim end exons with fewer than given number of matches (in nt, default 12)
-w, --localsplicedist=,INT/: Max length for known splice sites at ends of sequence (default 2000000)
-L, --totallength=,INT/: Max total intron length (default 2400000)
-x, --chimera-margin=,INT/: Amount of unaligned sequence that triggers search for the remaining sequence (default 30). Enables alignment of chimeric reads, and may help with some non-chimeric reads. To turn off, set to zero.
--no-chimeras: Turns off finding of chimeras. Same effect as --chimera-margin=,0/
-t, --nthreads=,INT/: Number of worker threads
-c, --chrsubset=,string/: Limit search to given chromosome
-z, --direction=,STRING/: cDNA direction (sense_force, antisense_force, sense_filter, antisense_filter,or auto (default))
--canonical-mode=,INT/: Reward for canonical and semi-canonical introns 0=low reward, 1=high reward (default), 2=low reward for high-identity sequences and high reward otherwise
--cross-species: Use a more sensitive search for canonical splicing, which helps especially for cross-species alignments and other difficult cases
--allow-close-indels=,INT/: Allow an insertion and deletion close to each other (0=no, 1=yes (default), 2=only for high-quality alignments)
--microexon-spliceprob=,FLOAT/: Allow microexons only if one of the splice site probabilities is greater than this value (default 0.95)
--cmetdir=,STRING/: Directory for methylcytosine index files (created using cmetindex) (default is location of genome index files specified using -D, -V, and -d)
--atoidir=,STRING/: Directory for A-to-I RNA editing index files (created using atoiindex) (default is location of genome index files specified using -D, -V, and -d)
--mode=,STRING/: Alignment mode: standard (default), cmet-stranded, cmet-nonstranded, atoi-stranded, atoi-nonstranded, ttoc-stranded, or ttoc-nonstranded. Non-standard modes requires you to have previously run the cmetindex or atoiindex programs (which also cover the ttoc modes) on the genome
-p, --prunelevel: Pruning level: 0=no pruning (default), 1=poor seqs, 2=repetitive seqs, 3=poor and repetitive

Output types

-S, --summary: Show summary of alignments only
-A, --align: Show alignments
-3, --continuous: Show alignment in three continuous lines
-4, --continuous-by-exon: Show alignment in three lines per exon
-Z, --compress: Print output in compressed format
-E, --exons=,STRING/: Print exons ("cdna" or "genomic")
-P, --protein_dna: Print protein sequence (cDNA)
-Q, --protein_gen: Print protein sequence (genomic)
-f, --format=,INT/: Other format for output (also note the -A and -S options and other options listed under Output types):
psl (or 1) = PSL (BLAT) format,
gff3_gene (or 2) = GFF3 gene format,
gff3_match_cdna (or 3) = GFF3 cDNA_match format,
gff3_match_est (or 4) = GFF3 EST_match format,
splicesites (or 6) = splicesites output (for GSNAP splicing file),
introns = introns output (for GSNAP splicing file),
map_exons (or 7) = IIT FASTA exon map format,
map_ranges (or 8) = IIT FASTA range map format,
coords (or 9) = coords in table format,
sampe = SAM format (setting paired_read bit in flag),
samse = SAM format (without setting paired_read bit)

Output options

-n, --npaths=,INT/: Maximum number of paths to show (default 5). If set to 1, GMAP will not report chimeric alignments, since those imply two paths. If you want a single alignment plus chimeric alignments, then set this to be 0.
--suboptimal-score=,INT/: Report only paths whose score is within this value of the best path. By default, if this option is not provided,
the program prints all paths found.
-O, --ordered: Print output in same order as input (relevant only if there is more than one worker thread)
-5, --md5: Print MD5 checksum for each query sequence
-o, --chimera-overlap: Overlap to show, if any, at chimera breakpoint
--failsonly: Print only failed alignments, those with no results
--nofails: Exclude printing of failed alignments
-V, --snpsdir=,STRING/: Directory for SNPs index files (created using snpindex) (default is location of genome index files specified using -D and -d)
-v, --use-snps=,STRING/: Use database containing known SNPs (in <STRING>.iit, built previously using snpindex) for tolerance to SNPs
--split-output=,STRING/: Basename for multiple-file output, separately for nomapping,
uniq, mult, (and chimera, if --chimera-margin is selected)
--failed-input=,STRING/: Print completely failed alignments as input FASTA or FASTQ format to the given file. If the --split-output flag is also given, this file is generated in addition to the output in the .nomapping file.
--append-output: When --split-output or --failedinput is given, this flag will append output to the existing files. Otherwise, the default is to create new files.
--output-buffer-size=,INT/: Buffer size, in queries, for output thread (default 1000). When the number of results to be printed exceeds this size, the worker threads are halted until the backlog is cleared
--translation-code=,INT/: Genetic code used for translating codons to amino acids and computing CDS Integer value (default=1) corresponds to an available code at http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi
-F, --fulllength: Assume full-length protein, starting with Met
-a, --cdsstart=,INT/: Translate codons from given nucleotide (1-based)
-T, --truncate: Truncate alignment around full-length protein, Met to Stop Implies -F flag.
-Y, --tolerant: Translates cDNA with corrections for frameshifts

Options for GFF3 output

--gff3-add-separators=,INT/: Whether to add a ### separator after each query sequence Values: 0 (no), 1 (yes, default)

Options for SAM output

--no-sam-headers: Do not print headers beginning with '@'
--sam-use-0M: Insert 0M in CIGAR between adjacent insertions and deletions Required by Picard, but can cause errors in other tools
--force-xs-dir: For RNA-Seq alignments, disallows XS:A:? when the sense direction is unclear, and replaces this value arbitrarily with XS:A:+. May be useful for some programs, such as Cufflinks, that cannot handle XS:A:?. However, if you use this flag, the reported value of XS:A:+ in these cases will not be meaningful.
--md-lowercase-snp: In MD string, when known SNPs are given by the -v flag,
prints difference nucleotides as lower-case when they,
differ from reference but match a known alternate allele
--action-if-cigar-error: Action to take if there is a disagreement between CIGAR length and sequence length Allowed values: ignore, warning (default), abort
--read-group-id=,STRING/: Value to put into read-group id (RG-ID) field
--read-group-name=,STRING/: Value to put into read-group name (RG-SM) field
--read-group-library=,STRING/: Value to put into read-group library (RG-LB) field
--read-group-platform=,STRING/: Value to put into read-group library (RG-PL) field

Options for quality scores

--quality-protocol=,STRING/: Protocol for input quality scores. Allowed values: illumina (ASCII 64-126) (equivalent to -J 64 -j -31) sanger (ASCII 33-126) (equivalent to -J 33 -j 0)
Default is sanger (no quality print shift): SAM output files should have quality scores in sanger protocol
: Or you can specify the print shift with this flag:
-j, --quality-print-shift=,INT/: Shift FASTQ quality scores by this amount in output (default is 0 for sanger protocol; to change Illumina input to Sanger output, select -31)

External map file options

-M, --mapdir=,directory/: Map directory
-m, --map=,iitfile/: Map file. If argument is '?' (with the quotes),
this lists available map files.
-e, --mapexons: Map each exon separately
-b, --mapboth: Report hits from both strands of genome
-u, --flanking=,INT/: Show flanking hits (default 0)
--print-comment: Show comment line for each hit

Alignment output options

-N, --nolengths: No intron lengths in alignment
-I, --invertmode=,INT/: Mode for alignments to genomic (-) strand: 0=Don't invert the cDNA (default) 1=Invert cDNA and print genomic (-) strand 2=Invert cDNA and print genomic (+) strand
-i, --introngap=,INT/: Nucleotides to show on each end of intron (default 3)
-l, --wraplength=,INT/: Wrap length for alignment (default 50)

Filtering output options

--min-trimmed-coverage=,FLOAT/: Do not print alignments with trimmed coverage less this value (default=0.0, which means no filtering) Note that chimeric alignments will be output regardless of this filter
--min-identity=,FLOAT/: Do not print alignments with identity less this value (default=0.0, which means no filtering) Note that chimeric alignments will be output regardless of this filter Help options
--check: Check compiler assumptions
--version: Show version
--help: Show this help message
Other tools of GMAP suite are located in /usr/lib/gmap

SYNOPSIS

OPTIONS

Input options (must include -d or -g)

LAST SEARCHED