SRILM Manual Pages

Programs

These are the top-level executables that are currently part of SRILM:

ngram-count: count N-grams and estimate language models
ngram-merge: merge N-gram counts
ngram: apply N-gram language models
ngram-class: induce word classes from N-gram statistics
disambig: disambiguate text tokens using an N-gram model
hidden-ngram: tag hidden events between words
nbest-lattice: rescore N-best lists and lattices
nbest-mix: interpolate N-best posterior probabilities
segment: segment text using N-gram language model
segment-nbest: rescore and segment N-best lists using N-gram language models

Utility Scripts

Additional tools implemented as scripts:

training-scripts: miscellaneous conveniences for language model training
lm-scripts: manipulate N-gram language models
ppl-scripts: manipulate perplexities
pfsg-scripts: create and manipulate finite-state networks
nbest-scripts: rescore and evaluate N-best lists

File Formats

Some of the data formats used by SRILM:

ngram-format: ARPA backoff N-gram models
classes-format: Word class definitions
pfsg-format: Decipher(TM) probabilistic finite-state grammars
nbest-format: N-best hypotheses lists

LM Library Classes

These are some of the basic classes of the SRILM library. Note that this list is woefully incomplete, as this part of the documentation is largely yet to be written.

LM: Generic language model
Vocab: Vocabulary indexing for SRILM
Prob: Probabilities for SRILM
File: Wrapper for stdio streams

Speech Group Home Page
SRI Home Page

Last updated $Date: 2006/07/11 00:24:06 $ by stolcke@speech.sri.com