dsimulator(1) generate synthetic reads for a random genome


dsimulator genlen:double [-cdouble(20.)] [-bdouble(.5)] [-rint] [-mint(10000)] [-sint(2000)][-xint(4000)] [-edouble(.15)][-Mfile]


dsimulator first generates a fake genome of size genlen*1Mb long, that has an AT-bias of -b. It then generates sample reads of mean length -m from a log-normal length distribution with standard deviation -s, but ignores reads of length less than -x. It collects enough reads to cover the genome -c times and introduces -e fraction errors into each read where the ratio of insertions, deletions, and substitutions are set by defined constants INS_RATE (default 73%) and DEL_RATE (default 20%) within generate.c. One can also control the rate at which reads are picked from the forward and reverse strands by setting the defined constant FLIP_RATE (default 50/50). The -r option seeds the random number generator for the generation of the genome so that one can reproducibly generate the same underlying genome to sample from. If this parameter is missing, then the job id of the invocation seeds the random number generator. The output is sent to the standard output (i.e. it is a UNIX pipe). The output is in Pacbio .fasta format suitable as input to fasta2DB(1). Finally, the -M option requests that the coordinates from which each read has been sampled are written to the indicated file, one line per read, ASCII encoded. This "map" file essentially tells one where every read belongs in an assembly and is very useful for debugging and testing purposes. If a read pair is say b,e then if b < e the read was sampled from [b,e] in the forward direction, and if b > e from [e,b] in the reverse direction.