SRILM Manual Pages

Programs

These are the top-level executables that are currently part of SRILM:

ngram-count
count N-grams and estimate language models
ngram-merge
merge N-gram counts
ngram
apply N-gram language models
ngram-class
induce word classes from N-gram statistics
disambig
disambiguate text tokens using an N-gram model
hidden-ngram
tag hidden events between words
nbest-lattice
rescore N-best lists and lattices
nbest-mix
interpolate N-best posterior probabilities
segment
segment text using N-gram language model
segment-nbest
rescore and segment N-best lists using N-gram language models

Utility Scripts

Additional tools implemented as scripts:

training-scripts
miscellaneous conveniences for language model training
lm-scripts
manipulate N-gram language models
ppl-scripts
manipulate perplexities
pfsg-scripts
create and manipulate finite-state networks
nbest-scripts
rescore and evaluate N-best lists

File Formats

Some of the data formats used by SRILM:

ngram-format
ARPA backoff N-gram models
classes-format
Word class definitions
pfsg-format
Decipher(TM) probabilistic finite-state grammars
nbest-format
N-best hypotheses lists

LM Library Classes

These are some of the basic classes of the SRILM library. Note that this list is woefully incomplete, as this part of the documentation is largely yet to be written.

LM
Generic language model
Vocab
Vocabulary indexing for SRILM
Prob
Probabilities for SRILM
File
Wrapper for stdio streams

Back to Speech Group Home Page
Back to SRI Home Page


Last updated $Date: 2006/07/11 00:24:06 $ by stolcke@speech.sri.com