pbalign(1) Mapping PacBio sequences to references


usage: pbalign [-h] [--verbose] [--version] [--profile] [--debug]
[--regionTable REGIONTABLE] [--configFile CONFIGFILE] [--pulseFile PULSEFILE] [--algorithm {blasr,bowtie,gmap}] [--maxHits MAXHITS] [--minAnchorSize MINANCHORSIZE] [--useccs {useccs,useccsall,useccsdenovo}] [--noSplitSubreads] [--concordant] [--nproc NPROC] [--algorithmOptions ALGORITHMOPTIONS] [--maxDivergence MAXDIVERGENCE] [--minAccuracy MINACCURACY] [--minLength MINLENGTH] [--scoreFunction {alignerscore,editdist,blasrscore}] [--scoreCutoff SCORECUTOFF] [--hitPolicy {randombest,allbest,random,all,leftmost}] [--filterAdapterOnly] [--forQuiver] [--loadQVs] [--byread] [--metrics METRICS] [--seed SEED] [--tmpDir TMPDIR] inputFileName referencePath outputFileName

Mapping PacBio sequences to references using an algorithm selected from a selection of supported command-line alignment algorithms. Input can be a fasta, pls.h5, bas.h5 or ccs.h5 file or a fofn (file of file names). Output is in either cmp.h5 or sam format.

positional arguments:

The input file can be a fasta, plx.h5, bax.h5, ccs.h5 file or a fofn.
Either a reference fasta file or a reference repository.
The output cmp.h5 or sam file.

optional arguments:

-h, --help
show this help message and exit
--verbose, -v
Set the verbosity level
show program's version number and exit
Print runtime profile at exit
Catch exceptions in debugger (requires ipdb)
--regionTable REGIONTABLE
Specify a region table for filtering reads.
--configFile CONFIGFILE
Specify a set of user-defined argument values.
--pulseFile PULSEFILE
When input reads are in fasta format and output is a cmp.h5 this option can specify pls.h5 or bas.h5 or FOFN files from which pulse metrics can be loaded for Quiver.
--algorithm {blasr,bowtie,gmap}
Select an aligorithm from ('blasr', 'bowtie', 'gmap'). Default algorithm is blasr.
--maxHits MAXHITS
The maximum number of matches of each read to the reference sequence that will be evaluated. Default value is 10.
The minimum anchor size defines the length of the read that must match against the reference sequence. Default value is 12.
--useccs {useccs,useccsall,useccsdenovo}
Map the ccsSequence to the genome first, then align subreads to the interval that the CCS reads mapped to.
useccs: only maps subreads that span the length of
the template.
useccsall: maps all subreads.
useccsdenovo: maps ccs only.
Do not split reads into subreads even if subread regions are available. Default value is False.
Map subreads of a ZMW to the same genomic location.
--nproc NPROC
Number of threads. Default value is 8.
--algorithmOptions ALGORITHMOPTIONS
Pass alignment options through.
--maxDivergence MAXDIVERGENCE
The maximum allowed percentage divergence of a read from the reference sequence. Default value is 30.
--minAccuracy MINACCURACY
The minimum percentage accuracy of alignments that will be evaluated. Default value is 70.
--minLength MINLENGTH
The minimum aligned read length of alignments that will be evaluated. Default value is 50.
--scoreFunction {alignerscore,editdist,blasrscore}
Specify a score function for evaluating alignments.
alignerscore : aligner's score in the SAM tag 'as'.
editdist : edit distance between read and reference. blasrscore : blasr's default score function.
Default value is alignerscore.
--scoreCutoff SCORECUTOFF
The worst score to output an alignment.
--hitPolicy {randombest,allbest,random,all,leftmost}
Specify a policy for how to treat multiple hit
: selects a random hit.
: selects all hits.
: selects all the best score hits.
randombest: selects a random hit from all best score hits.
leftmost : selects a hit which has the best score and the
smallest mapping coordinate in any reference.
Default value is randombest.
If specified, do not report adapter-only hits using annotations with the reference entry.
The output cmp.h5 file which will be sorted, loaded with pulse QV information, and repacked, so that it can be consumed by quiver directly. This requires the input file to be in PacBio bas/pls.h5 format, and --useccs must be None. Default value is False.
Similar to --forQuiver, the only difference is that --useccs can be specified. Default value is False.
Load pulse information using -byread option instead of -bymetric. Only works when --forQuiver or --loadQVs are set. Default value is False.
--metrics METRICS
Load the specified (comma-delimited list of) metrics instead of the default metrics required by quiver. This option only works when --forQuiver or --loadQVs are set. Default: DeletionQV,DeletionTag,InsertionQV,MergeQV,SubstitutionQV
--seed SEED
Initialize the random number generator with a none-zero integer. Zero means that current system time is used. Default value is 1.
--tmpDir TMPDIR
Specify a directory for saving temporary files. Default is ,/scratch/.