man sequitur-g2p (1): grapheme-to-phoneme conversion tool

SYNOPSIS

sequitur-g2p [,OPTION/]... ,FILE/...

DESCRIPTION

Grapheme-to-Phoneme Conversion

Samples can be either in plain format (one word per line followed by phonetic transcription) or Bliss XML Lexicon format.

OPTIONS

--version: show program's version number and exit
-h, --help: show this help message and exit
-p FILE, --profile=,FILE/: Profile execution time and store result in FILE
-R, --resource-usage: Report resource usage execution time
-Y, --psyco: Use Psyco to speed up execution
--tempdir=,PATH/: store temporary files in PATH
-t FILE, --train=,FILE/: read training sample from FILE
-d FILE / N%, --devel=,FILE/ / N%: read held-out training sample from FILE or use N% of the training data
-x FILE, --test=,FILE/: read test sample from FILE
--checkpoint: save state of training in regular time intervals. The name of the checkpoint file is derived from --writemodel.
--resume-from-checkpoint=,FILE/: load checkpoint FILE and continue training
-T, --transpose: Transpose model, i.e. do phoneme-to-grapheme conversion
-m FILE, --model=,FILE/: read model from FILE
-n FILE, --write-model=,FILE/: write model to FILE
--continuous-test: report error rates on development and test set in each iteration
-S, --self-test: apply model to development set and report error rates
-s l1,l2,r1,r2, --size-constraints=,l1/,l2,r1,r2: multigrams must have l1 ... l2 left-symbols and r1 ... r2 right-symbols
-E, --no-emergence: do not allow new joint-multigrams to be added to the model
--viterbi: estimate model using maximum approximation rather than true EM
-r, --ramp-up: ramp up the model
-W, --wipe-out: wipe out probabilities, retain only model structure
-C, --initialize-with-counts: initialize probabilities estimation by counting how many times every graphone occurs in the training set, disregarding possible overlaps
-i MINITERATIONS, --min-iterations=,MINITERATIONS/: minimum number of EM iterations during training
-I MAXITERATIONS, --max-iterations=,MAXITERATIONS/: maximum number of EM iterations during training
--eager-discount-adjustment: re-adjust discounts in each iteration
--fixed-discount=,D/: set discount to D and keep it fixed
-e ENC, --encoding=,ENC/: use character set encoding ENC
-P, --phoneme-to-phoneme: train/apply a phoneme-to-phoneme converter
--test-segmental: evaluate only at segmental level, i.e. do not count syllable boundaries and stress marks
-B FILE, --result=,FILE/: store test result in table FILE (for use with bootlog or R)
-a FILE, --apply=,FILE/: apply grapheme-to-phoneme conversion to words read from FILE
-V Q, --variants-mass=,Q/: generate pronunciation variants until \sum_i p(var_i) >= Q (only effective with --apply)
--variants-number=,N/: generate up to N pronunciation variants (only effective with --apply)
-f FILE, --fake=,FILE/: use a translation memory (read from sample FILE) instead of a genuine model (use in combination with -x to evaluate two files against each other)
--stack-limit=,N/: limit size of search stack to N elements

SYNOPSIS

DESCRIPTION

OPTIONS

LAST SEARCHED