SYNOPSIS
miraconvert [-f <fromtype>] [-t <totype> [-t <totype> ...]] [-aAbCdhimMsuZ] [-cflnNoPqrtvxXyYz {...}] {infile} {outfile} [<totype> <totype> ...]OPTIONS
- -f <fromtype>
- load this type of project files, where fromtype is:
- caf a complete assembly or single sequences from CAF
- maf a complete assembly or single sequences from CAF
- fasta sequences from a FASTA file
- fastq sequences from a FASTQ file
- gb[f|k|ff] sequences from a GenBank file
- phd sequences from a PHD file
- fofnexp sequences in EXP files from file of filenames
- -t <totype>
- write the sequences/assembly to this type (multiple mentions of -t are allowed):
- ace sequences or complete assembly to ACE
- caf sequences or complete assembly to CAF
- maf sequences or complete assembly to MAF
- sam complete assembly to SAM
- samnbb like above, but leaving out reference (backbones) in mapping assemblies
- gb[f|k|ff] sequences or consensus to GenBank
- gff3 consensus to GFF3
- wig assembly coverage info to wiggle file
- gcwig assembly gc content info to wiggle file
- fasta sequences or consensus to FASTA file (qualities to .qual)
- fastq sequences or consensus to FASTQ file
- exp sequences or complete assembly to EXP files in directories. Complete assemblies are suited for gap4 import as directed assembly. Note: using caf2gap to import into gap4 is recommended though
- text complete assembly to text alignment (only when -f is caf, maf or gbf)
- html complete assembly to HTML (only when -f is caf, maf or gbf)
- tcs complete assembly to tcs
- hsnp surrounding of SNP tags (SROc, SAOc, SIOc) to HTML (only when -f is caf, maf or gbf)
- asnp analysis of SNP tags (only when -f is caf, maf or gbf)
- cstats contig statistics file like from MIRA (only when source contains contigs)
- crlist contig read list file like from MIRA (only when source contains contigs)
- maskedfasta reads where sequencing vector is masked out (with X) to FASTA file (qualities to .qual)
- scaf sequences or complete assembly to single sequences CAF
- -a
- Append to target files instead of rewriting
- -A
- Do not Adjust sequence case
- When reading formats which define clipping points, and saving to formats which do not have clipping information, miraconvert normally adjusts the case of read sequences: lower case for clipped parts, upper case for unclipped parts of reads. Use -A if you do not want this. See also -C.
- Applies only to files/formats which do not contain contigs.
- -b
- Blind data
- Replaces all bases in reads/contigs with a 'c'
- -C
- Perform hard clip to reads
- When reading formats which define clipping points, will save only the unclipped part into the result file.
- Applies only to files/formats which do not contain contigs.
- -d
- Delete gap only columns
- When output is contigs: delete columns that are entirely gaps (like after having deleted reads during editing in gap4 or similar)
- When output is reads: delete gaps in reads
- -F
- Filter read groups to different files
- Works only for input files with readgroups (CAF/MAF) 3 (or 4) files generated: one or two for paired, one for unpaired and one for debris reads.
- Reads in paired file are interlaced by default, use -F twice to create separate files.
- -m
- Make contigs (only for -t = caf or maf)
- Encase single reads as contig singlets into the CAF/MAF file.
- -n <filename>
- when given, selects only reads or contigs given by name in that file.
- -N <filename>
- like -n, but sorts output according to order given in file.
- -i
- when -n is used, inverts the selection
- -o <quality>t
- FASTQ quality Offset (only for -f = 'fastq')
- Offset of quality values in FASTQ file. Default of 33 loads Sanger/Phred style files, using 0 tries to automatically recognise.
- -P <string>
- String with MIRA parameters to be parsed
- Useful when setting parameters affecting consensus calling like -CO:mrpg etc.
- E.g.: -P "454_SETTINGS -CO:mrpg=3"
- -q <quality>
- Set default quality for bases in file types without quality values. Furthermore, do not stop if expected quality files are missing (e.g. '.fasta')
- -R <name>
- Rename contigs/singlets/reads with given name string to which a counter is appended.
- Known bug: will create duplicate names if input contains contigs/singlets as well as free reads, i.e. reads not in contigs nor singlets.
- -S <name>
- (name)Scheme for renaming reads, important for paired-ends. Only 'solexa' is currently supported.
- -T
- When converting single reads, trim/clip away stretches of N and X and ends of reads. Note: remember to use -C to also perform a hard clip (e.g. with FASTA as output).
- -v
- Print version number and exit
- -Y <integer>
- Yield. Max (clipped/padded) bases to convert.
- When used on reads: output will contain first reads of file where length of clipped bases totals at least -Y. When used on contigs: output will contain first contigs of file where length of padded contigs totals at least -Y.
The following switches work only when input (CAF or MAF) contains contigs. Beware: CAF and MAf can also contain just reads.
- -M
- Do not extract contigs (or their consensus), but the sequence of the reads they are composed of.
- -r [cCqf]
- Recalculate consensus and / or consensus quality values and / or SNP feature tags.
- 'c' recalc cons & cons qualities (with IUPAC)
- 'C' recalc cons & cons qualities (forcing non-IUPAC)
- 'q' recalc consensus qualities only
- 'f' recalc SNP features
- Note: only the last of cCq is relevant, f works as a switch and can be combined with cQq (e.g. "-r C -r f")
- Note: if the CAF/MAF contains multiple strains, recalculation of cons & cons qualities is forced, you can just influence whether IUPACs are used or not.
- -s
- split output into multiple files instead of creating a single file
- -u
- 'fillUp strain genomes'
- Fill holes in the genome of one strain (N or @) with sequence from a consensus of other strains
- Takes effect only with -r and -t gbf or fasta/q in FASTA/Q: bases filled up are in lower case in GBF: bases filled up are in upper case
- -Q <integer>
- Defines minimum quality a consensus base of a strain must have, consensus bases below this will be 'N' Default: 0
- Only used with -r, and -f is caf/maf and -t is (fasta or gbf)
- -V <integer>
- Defines minimum coverage a consensus base of a strain must have, bases with coverage below this will be 'N' Default: 0
- Only used with -r, and -t is (fasta or gbf)
- -x <integer>
- Minimum contig or unclipped read length
- When loading, discard all contigs / reads with a length less than this value. Default: 0 (=switched off)
- Note: not applied to reads in contigs!
- -X <integer>
- Similar to -x but applies only to reads and then to the clipped length.
- -y <integer>
- Minimum average contig coverage When loading, discard all contigs with an average coverage less than this value. Default: 1
- -z <integer>
- Minimum number of reads in contig When loading, discard all contigs with a number of reads less than this value. Default: 0 (=switched off)
- -l <integer>
- when output as text or HTML: number of bases shown in one alignment line. Default: 60.
- -c <character>
- when output as text or HTML: character used to pad endgaps. Default: ' ' (blank)
EXAMPLES
- miraconvert source.maf dest.sam
- miraconvert source.caf dest.fasta wig ace
- miraconvert -x 2000 -y 10 source.caf dest.caf
- miraconvert -x 40 -C -F -F source.maf .fastq
BUGS
To report bugs or ask for features, please use the ticketing system at:AUTHOR
Bastien Chevreux <[email protected]>This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.