One of the central goals of this project is to integrate natural language
parsing, which has been largely developed with respect to written texts, with
speech recognition. We have demonstrated that parsing
technology can be successfully applied to speech transcripts and we have
shown that the kinds of syntactic structures posited by a statistical parser
can form the basis for a high-performance language model. These results
suggest that a combined speech recognition/parsing system should perform
extremely well. There is still a substantial amount of engineering and
scientific work to be performed before we have achieved that integration.
Currently we are investigating just what the interface between the speech
recognition and parsing components should be in a combined system. It turns
out that the basic data structures in each component lattices in speech
recognition, charts in parsing are in principle quite compatible;
theoretically at least one could imagine running a parser in parallel with an
acoustic model. This is a bold
and attractive architecture, but we suspect that at the current stage it is
impractical; the number of word hypotheses would simply overwhelm the parser.
We are thus investigating ways of pruning the hypothesis space, perhaps by
using a standard trigram language model, and of compacting the set of
hypotheses, perhaps by using sausages instead of lattices; probably some
combination of the two will turn out to be viable.