sequitur-g2p(1) grapheme-to-phoneme conversion tool


sequitur-g2p [,OPTION/]... ,FILE/...


Grapheme-to-Phoneme Conversion

Samples can be either in plain format (one word per line followed by phonetic transcription) or Bliss XML Lexicon format.


show program's version number and exit
-h, --help
show this help message and exit
-p FILE, --profile=,FILE/
Profile execution time and store result in FILE
-R, --resource-usage
Report resource usage execution time
-Y, --psyco
Use Psyco to speed up execution
store temporary files in PATH
-t FILE, --train=,FILE/
read training sample from FILE
-d FILE / N%, --devel=,FILE/ / N%
read held-out training sample from FILE or use N% of the training data
-x FILE, --test=,FILE/
read test sample from FILE
save state of training in regular time intervals. The name of the checkpoint file is derived from --writemodel.
load checkpoint FILE and continue training
-T, --transpose
Transpose model, i.e. do phoneme-to-grapheme conversion
-m FILE, --model=,FILE/
read model from FILE
-n FILE, --write-model=,FILE/
write model to FILE
report error rates on development and test set in each iteration
-S, --self-test
apply model to development set and report error rates
-s l1,l2,r1,r2, --size-constraints=,l1/,l2,r1,r2
multigrams must have l1 ... l2 left-symbols and r1 ... r2 right-symbols
-E, --no-emergence
do not allow new joint-multigrams to be added to the model
estimate model using maximum approximation rather than true EM
-r, --ramp-up
ramp up the model
-W, --wipe-out
wipe out probabilities, retain only model structure
-C, --initialize-with-counts
initialize probabilities estimation by counting how many times every graphone occurs in the training set, disregarding possible overlaps
minimum number of EM iterations during training
maximum number of EM iterations during training
re-adjust discounts in each iteration
set discount to D and keep it fixed
-e ENC, --encoding=,ENC/
use character set encoding ENC
-P, --phoneme-to-phoneme
train/apply a phoneme-to-phoneme converter
evaluate only at segmental level, i.e. do not count syllable boundaries and stress marks
-B FILE, --result=,FILE/
store test result in table FILE (for use with bootlog or R)
-a FILE, --apply=,FILE/
apply grapheme-to-phoneme conversion to words read from FILE
-V Q, --variants-mass=,Q/
generate pronunciation variants until \sum_i p(var_i) >= Q (only effective with --apply)
generate up to N pronunciation variants (only effective with --apply)
-f FILE, --fake=,FILE/
use a translation memory (read from sample FILE) instead of a genuine model (use in combination with -x to evaluate two files against each other)
limit size of search stack to N elements