gmap(1) Genomic Mapping and Alignment Program

SYNOPSIS

gmap [,OPTIONS/...] ,<FASTA files/...,>, or/ cat <FASTA files...> | gmap [OPTIONS...]

OPTIONS

Input options (must include -d or -g)

-D, --dir=,directory/
Genome directory. Default (as specified by --with-gmapdb to the configure program) is ,/var/cache/gmap/
-d, --db=,STRING/
Genome database. If argument is '?' (with the quotes), this command lists available databases.
-k, --kmer=,INT/
kmer size to use in genome database (allowed values: 16 or less). If not specified, the program will find the highest available kmer size in the genome database
--sampling=,INT/
Sampling to use in genome database. If not specified, the program will find the smallest available sampling value in the genome database within selected k-mer size
-G, --genomefull
Use full genome (all ASCII chars allowed; built explicitly during setup), not compressed version
-g, --gseg=,filename/
User-supplied genomic segment
-1, --selfalign
Align one sequence against itself in FASTA format via stdin (Useful for getting protein translation of a nucleotide sequence)
-2, --pairalign
Align two sequences in FASTA format via stdin, first one being genomic and second one being cDNA
--cmdline=,STRING/,STRING
Align these two sequences provided on the command line, first one being genomic and second one being cDNA
-q, --part=,INT//INT
Process only the i-th out of every n sequences e.g., 0/100 or 99/100 (useful for distributing jobs to a computer farm).
--input-buffer-size=,INT/
Size of input buffer (program reads this many sequences at a time for efficiency) (default 1000)

Computation options

-B, --batch=,INT/
Batch mode (default = 2)


         Mode     Offsets       Positions       Genome
           0      see note      mmap            mmap
           1      see note      mmap & preload  mmap
 (default) 2      see note      mmap & preload  mmap & preload
           3      see note      allocate        mmap & preload
           4      see note      allocate        allocate
           5      expand        allocate        allocate

Note: For a single sequence, all data structures use mmap
If mmap not available and allocate not chosen, then will use fileio (very slow)
Note about --batch and offsets: Expansion of offsets can be controlled
independently by the --expand-offsets flag. The --batch=,5/ option is equivalent to --batch=,4/ plus --expand-offsets=,1/
--expand-offsets=,INT/
Whether to expand the genomic offsets index Values: 0 (no, default), or 1 (yes). Expansion gives faster alignment, but requires more memory
--nosplicing
Turns off splicing (useful for aligning genomic sequences onto a genome)
--min-intronlength=,INT/
Min length for one internal intron (default 9). Below this size, a genomic gap will be considered a deletion rather than an intron.
--max-intronlength-middle=,INT/
Max length for one internal intron (default 200000)
--max-intronlength-ends=,INT/
Max length for first or last intron (default 10000)
--trim-end-exons=,INT/
Trim end exons with fewer than given number of matches (in nt, default 12)
-w, --localsplicedist=,INT/
Max length for known splice sites at ends of sequence (default 2000000)
-L, --totallength=,INT/
Max total intron length (default 2400000)
-x, --chimera-margin=,INT/
Amount of unaligned sequence that triggers search for the remaining sequence (default 30). Enables alignment of chimeric reads, and may help with some non-chimeric reads. To turn off, set to zero.
--no-chimeras
Turns off finding of chimeras. Same effect as --chimera-margin=,0/
-t, --nthreads=,INT/
Number of worker threads
-c, --chrsubset=,string/
Limit search to given chromosome
-z, --direction=,STRING/
cDNA direction (sense_force, antisense_force, sense_filter, antisense_filter,or auto (default))
--canonical-mode=,INT/
Reward for canonical and semi-canonical introns 0=low reward, 1=high reward (default), 2=low reward for high-identity sequences and high reward otherwise
--cross-species
Use a more sensitive search for canonical splicing, which helps especially for cross-species alignments and other difficult cases
--allow-close-indels=,INT/
Allow an insertion and deletion close to each other (0=no, 1=yes (default), 2=only for high-quality alignments)
--microexon-spliceprob=,FLOAT/
Allow microexons only if one of the splice site probabilities is greater than this value (default 0.95)
--cmetdir=,STRING/
Directory for methylcytosine index files (created using cmetindex) (default is location of genome index files specified using -D, -V, and -d)
--atoidir=,STRING/
Directory for A-to-I RNA editing index files (created using atoiindex) (default is location of genome index files specified using -D, -V, and -d)
--mode=,STRING/
Alignment mode: standard (default), cmet-stranded, cmet-nonstranded, atoi-stranded, atoi-nonstranded, ttoc-stranded, or ttoc-nonstranded. Non-standard modes requires you to have previously run the cmetindex or atoiindex programs (which also cover the ttoc modes) on the genome
-p, --prunelevel
Pruning level: 0=no pruning (default), 1=poor seqs, 2=repetitive seqs, 3=poor and repetitive

Output types

-S, --summary
Show summary of alignments only
-A, --align
Show alignments
-3, --continuous
Show alignment in three continuous lines
-4, --continuous-by-exon
Show alignment in three lines per exon
-Z, --compress
Print output in compressed format
-E, --exons=,STRING/
Print exons ("cdna" or "genomic")
-P, --protein_dna
Print protein sequence (cDNA)
-Q, --protein_gen
Print protein sequence (genomic)
-f, --format=,INT/
Other format for output (also note the -A and -S options and other options listed under Output types):
 psl (or 1) = PSL (BLAT) format,
 gff3_gene (or 2) = GFF3 gene format,
 gff3_match_cdna (or 3) = GFF3 cDNA_match format,
 gff3_match_est (or 4) = GFF3 EST_match format,
 splicesites (or 6) = splicesites output (for GSNAP splicing file),
 introns = introns output (for GSNAP splicing file),
 map_exons (or 7) = IIT FASTA exon map format,
 map_ranges (or 8) = IIT FASTA range map format,
 coords (or 9) = coords in table format,
 sampe = SAM format (setting paired_read bit in flag),
 samse = SAM format (without setting paired_read bit)

Output options

-n, --npaths=,INT/
Maximum number of paths to show (default 5). If set to 1, GMAP will not report chimeric alignments, since those imply two paths. If you want a single alignment plus chimeric alignments, then set this to be 0.
--suboptimal-score=,INT/
Report only paths whose score is within this value of the best path. By default, if this option is not provided,
 the program prints all paths found.
-O, --ordered
Print output in same order as input (relevant only if there is more than one worker thread)
-5, --md5
Print MD5 checksum for each query sequence
-o, --chimera-overlap
Overlap to show, if any, at chimera breakpoint
--failsonly
Print only failed alignments, those with no results
--nofails
Exclude printing of failed alignments
-V, --snpsdir=,STRING/
Directory for SNPs index files (created using snpindex) (default is location of genome index files specified using -D and -d)
-v, --use-snps=,STRING/
Use database containing known SNPs (in <STRING>.iit, built previously using snpindex) for tolerance to SNPs
--split-output=,STRING/
Basename for multiple-file output, separately for nomapping,
 uniq, mult, (and chimera, if --chimera-margin is selected)
--failed-input=,STRING/
Print completely failed alignments as input FASTA or FASTQ format to the given file. If the --split-output flag is also given, this file is generated in addition to the output in the .nomapping file.
--append-output
When --split-output or --failedinput is given, this flag will append output to the existing files. Otherwise, the default is to create new files.
--output-buffer-size=,INT/
Buffer size, in queries, for output thread (default 1000). When the number of results to be printed exceeds this size, the worker threads are halted until the backlog is cleared
--translation-code=,INT/
Genetic code used for translating codons to amino acids and computing CDS Integer value (default=1) corresponds to an available code at http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi
-F, --fulllength
Assume full-length protein, starting with Met
-a, --cdsstart=,INT/
Translate codons from given nucleotide (1-based)
-T, --truncate
Truncate alignment around full-length protein, Met to Stop Implies -F flag.
-Y, --tolerant
Translates cDNA with corrections for frameshifts

Options for GFF3 output

--gff3-add-separators=,INT/
Whether to add a ### separator after each query sequence Values: 0 (no), 1 (yes, default)

Options for SAM output

--no-sam-headers
Do not print headers beginning with '@'
--sam-use-0M
Insert 0M in CIGAR between adjacent insertions and deletions Required by Picard, but can cause errors in other tools
--force-xs-dir
For RNA-Seq alignments, disallows XS:A:? when the sense direction is unclear, and replaces this value arbitrarily with XS:A:+. May be useful for some programs, such as Cufflinks, that cannot handle XS:A:?. However, if you use this flag, the reported value of XS:A:+ in these cases will not be meaningful.
--md-lowercase-snp
In MD string, when known SNPs are given by the -v flag,
 prints difference nucleotides as lower-case when they,
 differ from reference but match a known alternate allele
--action-if-cigar-error
Action to take if there is a disagreement between CIGAR length and sequence length Allowed values: ignore, warning (default), abort
--read-group-id=,STRING/
Value to put into read-group id (RG-ID) field
--read-group-name=,STRING/
Value to put into read-group name (RG-SM) field
--read-group-library=,STRING/
Value to put into read-group library (RG-LB) field
--read-group-platform=,STRING/
Value to put into read-group library (RG-PL) field

Options for quality scores

--quality-protocol=,STRING/
Protocol for input quality scores. Allowed values: illumina (ASCII 64-126) (equivalent to -J 64 -j -31) sanger (ASCII 33-126) (equivalent to -J 33 -j 0)
Default is sanger (no quality print shift)
SAM output files should have quality scores in sanger protocol
Or you can specify the print shift with this flag:
-j, --quality-print-shift=,INT/
Shift FASTQ quality scores by this amount in output (default is 0 for sanger protocol; to change Illumina input to Sanger output, select -31)

External map file options

-M, --mapdir=,directory/
Map directory
-m, --map=,iitfile/
Map file. If argument is '?' (with the quotes),
 this lists available map files.
-e, --mapexons
Map each exon separately
-b, --mapboth
Report hits from both strands of genome
-u, --flanking=,INT/
Show flanking hits (default 0)
--print-comment
Show comment line for each hit

Alignment output options

-N, --nolengths
No intron lengths in alignment
-I, --invertmode=,INT/
Mode for alignments to genomic (-) strand: 0=Don't invert the cDNA (default) 1=Invert cDNA and print genomic (-) strand 2=Invert cDNA and print genomic (+) strand
-i, --introngap=,INT/
Nucleotides to show on each end of intron (default 3)
-l, --wraplength=,INT/
Wrap length for alignment (default 50)

Filtering output options

--min-trimmed-coverage=,FLOAT/
Do not print alignments with trimmed coverage less this value (default=0.0, which means no filtering) Note that chimeric alignments will be output regardless of this filter
--min-identity=,FLOAT/
Do not print alignments with identity less this value (default=0.0, which means no filtering) Note that chimeric alignments will be output regardless of this filter Help options
--check
Check compiler assumptions
--version
Show version
--help
Show this help message

Other tools of GMAP suite are located in /usr/lib/gmap