/ Language Modeling / Tutorial Book / Tutorials / Software / Home

6.1.3 Overview: Grammar Definition

The development of language models for speech recognition can be traced directly to Chomsky's formal language theory. This theory specifies a hierarchy of grammars (loosely defined as rules for a language) and automata (language models) that can recognize sentences in that language. In order to explain the hierarchy, we must first formally define a grammar, G, as:

G = (V, T, P, S) where:

V contains the set of all non-terminal symbols.
T contains the set of all terminal symbols.
P is a set of production or rewrite rules.
S is a special symbol called the start symbol.

As an example, each word in the sentence, "Julie loves speech" is a terminal symbol contained in T. A set of production rules, P, for a grammar that can generate the sentence is shown below:

S -> NP VP
VP -> V NP
NP -> NOUN
NP -> NAME
NOUN -> speech
NAME -> Julie Ethan
VERB -> loves chases

The set of non-terminal symbols, V, include NP, VP, NOUN, NAME, and VERB. Finally, a language consists of all possible strings of terminal symbols that can be generated by the production rules of the grammar. Other possible strings in this language include "Julie chases Ethan" and "Ethan loves speech." However, the string "speech loves Julie" is also possible, but unlikely to occur in normal conversation. This illustrates the simplicity of the example grammar and the need for greater complexity to represent natural spoken language. Continue to 6.1.4 for further description of Chomsky's grammar hierarchy and how it can be used to adequately model the complexity of spoken language.

Glossary / Help / Support / Site Map / Contact Us / ISIP Home