DESCRIPTION
ART is a set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking real sequencing process with empirical error models or quality profiles summarized from large recalibrated sequencing data.art_454 can be used for Simulation of 454 Pyrosequencing.
USAGE
SINGLE-END SIMULATION
art_454 [-s] [-a ] [-t] [-r rand_seed] [ -p read_profile ] [ -c num_flow_cycles ] <INPUT_SEQ_FILE> <OUTPUT_FILE_PREFIX> <FOLD_COVERAGE>PAIRED-END SIMULATION
art_454[-s] [-a ] [-t] [-r rand_seed] [ -p read_profile ] [ -c num_flow_cycles ] <INPUT_SEQ_FILE> <OUTPUT_FILE_PREFIX> <FOLD_COVERAGE> <MEAN_FRAG_LEN> <STD_DEV>
AMPLICON SEQUENCING SIMULATION
art_454 [-s] [-a ] [-t] [-r rand_seed] [ -p read_profile ] [ -c num_flow_cycles ] <-A|-B> <INPUT_SEQ_FILE> <OUTPUT_FILE_PREFIX> <#_READS/#_READ_PAIRS_PER_AMPLICON>OPTIONS
MANDATORY OPTIONS
- INPUT_SEQ_FILE - the filename of DNA/RNA reference sequences in FASTA format
- OUTPUT_FILE_PREFIX - the prefix or directory of output read data file (*.fq) and read alignment file (*.aln)
- FOLD_COVERAGE - the fold of read coverage over the reference sequences
- MEAN_FRAG_LEN - the average DNA fragment size for paired-end read simulation
- STD_DEV - the standard deviation of the DNA fragment size for paired-end read simulation
- #READS_PER_AMPLICON - number of reads per amplicon (for 5'end amplicon sequencing)
- #READ_PAIRS_PER_AMPLICON - number of read pairs per amplicon (for two-end amplicon sequencing)
OPTIONAL PARAMETERS
- -A indicate to perform single-end amplicon sequencing simulation
- -B indicate to perform paired-end amplicon sequencing simulation
- -M indicate to use CIGAR 'M' instead of '=/X' for alignment match/mismatch
- -a indicate to output the ALN alignment file
- -s indicate to output the SAM alignment file
- -d print out warning messages for debugging
- -t indicate to simulate reads from the built-in GS FLX Titanium profile [default: GS FLX profile]
- -r specify a fixed random seed for the simulation (to generate two identical datasets from two different runs)
- -c specify the number of flow cycles by the sequencer [ default: 100 for GS-FLX, and 200 for GS-FLX Titanium ]
- -p specify user's own read profile for simulation
NOTE: the name of a read profile is the directory containing read profile data files. please read the REAME file about the format of 454 read profile data files and. and the default filenames of these data files.
EXAMPLES
- 1) singl-end simulation with 20X coverage
- art_454 -s seq_reference.fa ./outdir/single_dat 20
- 2) paired-end simulation with the mean fragment size 1500 and STD 20 using GS FLX Titanium platform
- art_454 -s -t seq_reference.fa ./outdir/paired_dat 10 1500 20
- 3) paired-end simulation with a fixed random seed
- art_454 -s -r 777 seq_reference.fa ./outdir/paired_fxSeed 10 2500 50
- 4) single-end amplicon sequencing with 10 reads per amplicon
- art_454 -A -s amplicon_ref.fa ./outdir/amp_single 10
- 5) paired-end amplicon sequencing with 10 read pairs per amplicon
- art_454 -B -s amplicon_ref.fa ./outdir/amp_paired 10
AUTHOR
This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.