N-GRAM DECODING AND LATTICE GENERATION

N-GRAM DECODING

Two special data structures - Ngram and Ngram_node.

The basic data members of Ngram class are

Hash_table** ngram_table_d; int_4 ngram_order_d;

The basic data members of Ngram_node class are

int_4 history_length_d; Word** history_d;

Word* current_word_d; float_4 gramscore_d;

float_4 back_off_d;
Use hash-table to store N-Gram nodes for quickly accessing the N-Gram node and efficiently computing the LM score

lexical tree.

In N-Gram decoding, each word (except the sentence start word) can be followed by any other word, so for large vocabulary speech recognition the lexical tree would be very large.

A small start lexical tree is built specially for the sentence start word.

A big N-Gram lexical tree is built only once. For all other words except sentence start, we reuse this lexical tree, and compute the LM scores on the fly while decoding.