Overview

This research focuses on identifying methods for increasing the flexibility of an interaction with a spoken language dialog system, while balancing the critical need for response efficiency in a vehicular environment. This has entailed developing a demonstrational vehicular dialog system and using it as a mechanism for exploring the fundamental research issues. The demonstration system integrates the ISIP public domain speech recognition system as a component in the DARPA Communicator architecture.

The dialog system uses a DARPA Communicator Hub Compliant architecture, composed of a number of servers that interact with each other through the DARPA Communicator Hub. Our system is composed of the Hub and five primary servers:

Audio Server receives signals using microphones from users and then sends the signals to an automatic speech recognizer. It then sends synthesized speech which is gotten from the Speech Synthesizer to users.
Speech Recognizer takes signals from Audio Server and produces a word lattice.
Semantic Parser takes word lattices from Speech Recognizer and parses them and produces the best interpretations in the form of semantic frames.
Dialog Manager retrieves semantic frame from the Parser; clarifies information from user if required; uses clarification and conversational context to resolve ambiguities; translates the semantic frame to a database query; retrieves information and responds to user.
MySQL Database Application receives SQL queries from the Dialog Manager; retrieves data from a MySQL database and accesses the web to retrieve additional information if necessary.

These five servers communicate via the hub to accomplish the task of understanding and responding to a spoken request. The DARPA hub communication interface uses scripts to direct the flow of information among the servers. The In-Vehicle Dialog System currently contains information about the Mississippi State University campus and surrounding Starkville area, but is designed in a modular fashion to easily support addition of other city information as well as additional servers.

The Semantic Parser and Dialog Manager code base were originally written by the Center for Speech and Language Research at the University of Colorado for a travel reservation application. Through data collection and analysis, a new symantic grammar and dialog manager code base were derived to support an in-vehicle application. The current semantic grammar contains approximately 500 rules and over 2000 words.