phonetisaurus-calculateER(1) estimates grapheme-to-phoneme error rate


phonetisaurus-calculateER --hyp "hypseq or file" --ref "refseq or file" --usep "" [OPTIONS]



This tool evaluates performance of grapheme-to-phoneme tools.


-h, --help

show this help message and exit
--hyp HYP, -w HYP
The file/string containing G2P/ASR hypotheses.
--ref REF, -r REF
The file/string containing G2P/ASR reference transcriptions.
--usep USEP, -u USEP
Character or regex separating units in a sequence. Defaults to ' '.
--fsep FSEP, -s FSEP
Character or regex separating fields in a sequence. Defaults to '\t'.
--format FORMAT, -f FORMAT
Input format. One of 'cmu', 'htk', 'g2p'. Defaults to 'g2p'.
--ignore IGNORE, -i IGNORE
Ignore specified characters when encountered in a HYPOTHESIS. A ' ' separated list.
--regex_ignore REGEX_IGNORE, -n REGEX_IGNORE
Ignore specified characters when encountered in a HYPOTHESIS. A regular expression.
--ignore_both, -b
Apply --ignore and --regex_ignore to both the HYPOTHESIS and REFERENCE files. Useful for analysis.
--testfile TESTFILE, -t TESTFILE
The test file in dictionary format. 1 word, 1 pronunciation per line, separated by '\t'.
--prefix PREFIX, -p PREFIX
Prefix used to generate the wordlist, hypothesis and reference files. Defaults to 'test'.
Path to the phoneticizer model.
--mbrdecode, -e
Use the LMBR decoder.
--alpha ALPHA, -a ALPHA
Alpha for the mbr decoder.
--order ORDER, -o ORDER
N-gram order for the mbr decoder.
Avg. N-gram precision factor for LMBR decoder. (.85)
--ratio RATIO, -y RATIO
N-gram ratio factor for LMBR decoder. (.72)
--beam BEAM, -z BEAM
LMBR/N-best search beam. Larger->Slower, better. (1500)
--verbose, -v
Verbose mode.