pocketsphinx_batch(1) Run speech recognition in batch mode

SYNOPSIS

pocketsphinx_batch -hmm hmmdir -dict dictfile [ options ]...

DESCRIPTION

Run speech recognition over a list of utterances in batchmode. A list of arguments follows:

-adchdr
Size of audio file header in bytes (headers are ignored)
-adcin
Input is raw audio data
-agc
Automatic gain control for c0 ('max', 'emax', 'noise', or 'none')
-agcthresh
Initial threshold for automatic gain control
-allphone
phoneme decoding with phonetic lm
-allphone_ci
Perform phoneme decoding with phonetic lm and context-independent units only
-alpha
Preemphasis parameter
-argfile
file giving extra arguments.
-ascale
Inverse of acoustic model scale for confidence score calculation
-aw
Inverse weight applied to acoustic scores.
-backtrace
Print results and backtraces to log file.
-beam
Beam width applied to every frame in Viterbi search (smaller values mean wider beam)
-bestpath
Run bestpath (Dijkstra) search over word lattice (3rd pass)
-bestpathlw
Language model probability weight for bestpath search
-build_outdirs
Create missing subdirectories in output directory
-cepdir
files directory (prefixed to filespecs in control file)
-cepext
Input files extension (suffixed to filespecs in control file)
-ceplen
Number of components in the input feature vector
-cmn
Cepstral mean normalization scheme ('current', 'prior', or 'none')
-cmninit
Initial values (comma-separated) for cepstral mean when 'prior' is used
-compallsen
Compute all senone scores in every frame (can be faster when there are many senones)
-ctl
file listing utterances to be processed
-ctlcount
No. of utterances to be processed (after skipping -ctloffset entries)
-ctlincr
Do every Nth line in the control file
-ctloffset
No. of utterances at the beginning of -ctl file to be skipped
-ctm
output in CTM file format (may require post-sorting)
-debug
level for debugging messages
-dict
pronunciation dictionary (lexicon) input file
-dictcase
Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only)
-dither
Add 1/2-bit noise
-doublebw
Use double bandwidth filters (same center freq)
-ds
Frame GMM computation downsampling ratio
-fdict
word pronunciation dictionary input file
-feat
Feature stream type, depends on the acoustic model
-featparams
containing feature extraction parameters.
-fillprob
Filler word transition probability
-frate
Frame rate
-fsg
format finite state grammar file
-fsgctl
file listing FSG file to use for each utterance
-fsgdir
directory for FSG files
-fsgext
extension for FSG files (including leading dot)
-fsgusealtpron
Add alternate pronunciations to FSG
-fsgusefiller
Insert filler words at each state.
-fwdflat
Run forward flat-lexicon search over word lattice (2nd pass)
-fwdflatbeam
Beam width applied to every frame in second-pass flat search
-fwdflatefwid
Minimum number of end frames for a word to be searched in fwdflat search
-fwdflatlw
Language model probability weight for flat lexicon (2nd pass) decoding
-fwdflatsfwin
Window of frames in lattice to search for successor words in fwdflat search
-fwdflatwbeam
Beam width applied to word exits in second-pass flat search
-fwdtree
Run forward lexicon-tree search (1st pass)
-hmm
containing acoustic model files.
-hyp
output file name
-hypseg
output with segmentation file name
-input_endian
Endianness of input data, big or little, ignored if NIST or MS Wav
-jsgf
grammar file
-keyphrase
to spot
-kws
file with keyphrases to spot, one per line
-kws_delay
Delay to wait for best detection score
-kws_plp
Phone loop probability for keyword spotting
-kws_threshold
Threshold for p(hyp)/p(alternatives) ratio
-latsize
Initial backpointer table size
-lda
containing transformation matrix to be applied to features (single-stream features only)
-ldadim
Dimensionality of output of feature transformation (0 to use entire matrix)
-lifter
Length of sin-curve for liftering, or 0 for no liftering.
-lm
trigram language model input file
-lmctl
a set of language model

The -hmm and -dict arguments are always required. Either -lm or -fsg is required, depending on whether you are using a statistical language model or a finite-state grammar. To do batchmode recognition, you will need to specify a control file, using -ctl This is a simple text file containing one entry per line. Each entry is the name of an input file relative to the -cepdir directory, and without the filename extension (which is given in the -cepext argument).

If you are using acoustic feature files as input (see sphinx_fe(1) for information on how to generate these), you can also specify a subpart of a file, using the following format:

FILENAME START-FRAME END-FRAME UTTERANCE-ID

AUTHOR

Written by numerous people at CMU from 1994 onwards. This manual page by David Huggins-Daines <[email protected]>

COPYRIGHT

Copyright © 1994-2007 Carnegie Mellon University. See the file COPYING included with this package for more information.