sequitur-g2p(1) grapheme-to-phoneme conversion tool

SYNOPSIS

sequitur-g2p [,OPTION/]... ,FILE/...

DESCRIPTION

Grapheme-to-Phoneme Conversion

Samples can be either in plain format (one word per line followed by phonetic transcription) or Bliss XML Lexicon format.

OPTIONS

--version
show program's version number and exit
-h, --help
show this help message and exit
-p FILE, --profile=,FILE/
Profile execution time and store result in FILE
-R, --resource-usage
Report resource usage execution time
-Y, --psyco
Use Psyco to speed up execution
--tempdir=,PATH/
store temporary files in PATH
-t FILE, --train=,FILE/
read training sample from FILE
-d FILE / N%, --devel=,FILE/ / N%
read held-out training sample from FILE or use N% of the training data
-x FILE, --test=,FILE/
read test sample from FILE
--checkpoint
save state of training in regular time intervals. The name of the checkpoint file is derived from --writemodel.
--resume-from-checkpoint=,FILE/
load checkpoint FILE and continue training
-T, --transpose
Transpose model, i.e. do phoneme-to-grapheme conversion
-m FILE, --model=,FILE/
read model from FILE
-n FILE, --write-model=,FILE/
write model to FILE
--continuous-test
report error rates on development and test set in each iteration
-S, --self-test
apply model to development set and report error rates
-s l1,l2,r1,r2, --size-constraints=,l1/,l2,r1,r2
multigrams must have l1 ... l2 left-symbols and r1 ... r2 right-symbols
-E, --no-emergence
do not allow new joint-multigrams to be added to the model
--viterbi
estimate model using maximum approximation rather than true EM
-r, --ramp-up
ramp up the model
-W, --wipe-out
wipe out probabilities, retain only model structure
-C, --initialize-with-counts
initialize probabilities estimation by counting how many times every graphone occurs in the training set, disregarding possible overlaps
-i MINITERATIONS, --min-iterations=,MINITERATIONS/
minimum number of EM iterations during training
-I MAXITERATIONS, --max-iterations=,MAXITERATIONS/
maximum number of EM iterations during training
--eager-discount-adjustment
re-adjust discounts in each iteration
--fixed-discount=,D/
set discount to D and keep it fixed
-e ENC, --encoding=,ENC/
use character set encoding ENC
-P, --phoneme-to-phoneme
train/apply a phoneme-to-phoneme converter
--test-segmental
evaluate only at segmental level, i.e. do not count syllable boundaries and stress marks
-B FILE, --result=,FILE/
store test result in table FILE (for use with bootlog or R)
-a FILE, --apply=,FILE/
apply grapheme-to-phoneme conversion to words read from FILE
-V Q, --variants-mass=,Q/
generate pronunciation variants until \sum_i p(var_i) >= Q (only effective with --apply)
--variants-number=,N/
generate up to N pronunciation variants (only effective with --apply)
-f FILE, --fake=,FILE/
use a translation memory (read from sample FILE) instead of a genuine model (use in combination with -x to evaluate two files against each other)
--stack-limit=,N/
limit size of search stack to N elements