estimate-ngram(1) estimates n-gram language model

SYNOPSIS

estimate-ngram [Options]

DESCRIPTION

Estimates an n-gram language model by cumulating n-gram count statistics, smoothing observed counts, and building a backoff n-gram model. Parameters can be optionally tuned to optimize development set performance.

Filename argument can be an ASCII file, a compressed file (ending in .Z or .gz), or '-' to indicate stdin/stdout.

OPTIONS

-h, -help
Print this message.
-verbose <int>
Set verbosity level.
Default: 1
-o, -order <int>
Set the n-gram order of the estimated LM.
Default: 3
-v, -vocab <file>
Fix the vocab to only words from the specified file.
-u, -unk <boolean>
Replace all out of vocab words with <unk>.
Default: false
-t, -text <files>
Add counts from text files.
-c, -counts <files>
Add counts from counts files.
-s, -smoothing <ML, FixKN, FixModKN, FixKN#, KN, ModKN, KN#>
Specify smoothing algorithms.
Default: ModKN
-wf, -weight-features <features-template>
Specify n-gram weighting features.
-p, -params <file>
Set initial model params.
-oa, -opt-alg <Powell, LBFGS, LBFGSB>
Specify optimization algorithm.
Default: Powell
-op, -opt-perp <file>
Tune params to minimize dev set perplexity.
-ow, -opt-wer <file>
Tune params to minimize lattice word error rate.
-om, -opt-margin <file>
Tune params to minimize lattice margin.
-wb, -write-binary <boolean>
Write LM/counts files in binary format.
Default: false
-wp, -write-params <file>
Write tuned model params to file.
-wv, -write-vocab <file>
Write LM vocab to file.
-wc, -write-counts <file>
Write n-gram counts to file.
-wec, -write-eff-counts <file>
Write effective n-gram counts to file.
-wlc, -write-left-counts <file>
Write left-branching n-gram counts to file.
-wrc, -write-right-counts <file>
Write right-branching n-gram counts to file.
-wl, -write-lm <file>
Write ARPA backoff LM to file.
-ep, -eval-perp <files>
Compute test set perplexity.
-ew, -eval-wer <files>
Compute test set lattice word error rate.
-em, -eval-margin <files>
Compute test set lattice margin.