SYNOPSIS
sequitur-g2p [,OPTION/]... ,FILE/...DESCRIPTION
Grapheme-to-Phoneme ConversionSamples can be either in plain format (one word per line followed by phonetic transcription) or Bliss XML Lexicon format.
OPTIONS
- --version
- show program's version number and exit
- -h, --help
- show this help message and exit
- -p FILE, --profile=,FILE/
- Profile execution time and store result in FILE
- -R, --resource-usage
- Report resource usage execution time
- -Y, --psyco
- Use Psyco to speed up execution
- --tempdir=,PATH/
- store temporary files in PATH
- -t FILE, --train=,FILE/
- read training sample from FILE
- -d FILE / N%, --devel=,FILE/ / N%
- read held-out training sample from FILE or use N% of the training data
- -x FILE, --test=,FILE/
- read test sample from FILE
- --checkpoint
- save state of training in regular time intervals. The name of the checkpoint file is derived from --writemodel.
- --resume-from-checkpoint=,FILE/
- load checkpoint FILE and continue training
- -T, --transpose
- Transpose model, i.e. do phoneme-to-grapheme conversion
- -m FILE, --model=,FILE/
- read model from FILE
- -n FILE, --write-model=,FILE/
- write model to FILE
- --continuous-test
- report error rates on development and test set in each iteration
- -S, --self-test
- apply model to development set and report error rates
- -s l1,l2,r1,r2, --size-constraints=,l1/,l2,r1,r2
- multigrams must have l1 ... l2 left-symbols and r1 ... r2 right-symbols
- -E, --no-emergence
- do not allow new joint-multigrams to be added to the model
- --viterbi
- estimate model using maximum approximation rather than true EM
- -r, --ramp-up
- ramp up the model
- -W, --wipe-out
- wipe out probabilities, retain only model structure
- -C, --initialize-with-counts
- initialize probabilities estimation by counting how many times every graphone occurs in the training set, disregarding possible overlaps
- -i MINITERATIONS, --min-iterations=,MINITERATIONS/
- minimum number of EM iterations during training
- -I MAXITERATIONS, --max-iterations=,MAXITERATIONS/
- maximum number of EM iterations during training
- --eager-discount-adjustment
- re-adjust discounts in each iteration
- --fixed-discount=,D/
- set discount to D and keep it fixed
- -e ENC, --encoding=,ENC/
- use character set encoding ENC
- -P, --phoneme-to-phoneme
- train/apply a phoneme-to-phoneme converter
- --test-segmental
- evaluate only at segmental level, i.e. do not count syllable boundaries and stress marks
- -B FILE, --result=,FILE/
- store test result in table FILE (for use with bootlog or R)
- -a FILE, --apply=,FILE/
- apply grapheme-to-phoneme conversion to words read from FILE
- -V Q, --variants-mass=,Q/
- generate pronunciation variants until \sum_i p(var_i) >= Q (only effective with --apply)
- --variants-number=,N/
- generate up to N pronunciation variants (only effective with --apply)
- -f FILE, --fake=,FILE/
- use a translation memory (read from sample FILE) instead of a genuine model (use in combination with -x to evaluate two files against each other)
- --stack-limit=,N/
-
limit size of search stack to N elements