N-Gram decoding and lattice generation are some of the new features
being added to the ISIP public domain speech recognition system. As
compared to processing a lattice, N-Gram decoding is far more resource
intensive. Hence, great care must be taken while designing the data
structures and algorithmic flow.
In N-Gram decoding, the decoder generally outputs the most likely
hypothesis, i.e the 1-best. In lattice generation, the decoder needs
to generate an N-best list of hypotheses. The N-best list is then
converted into a lattice (word graph) format. The generated lattice
can be applied to a second pass of decoding (typically referred to as
lattice rescoring), in which the search space is restricted to those
paths generated in the N-best pass through the N-Gram language model.
Lattice rescoring can be used to efficiently test new acoustic
models or language models at a reduced computational complexity.
In this talk, we will present the concepts on N-Gram language model,
N-Gram decoding and lattice generation, and the way these features
were efficiently implemented in the ISIP recognition system.