State-of-the-art speech recognition systems have achieved low error rates on
medium complexity tasks such as Wall Street Journal (WSJ) which involve clean
data. However, the performance of these systems rapidly degrades as the
background noise level increases. With the growing popularity of
low-bandwidth miniature communication devices such as cell phones, palm
computers, and smart pagers, a much greater demand is being created for robust
voice interfaces. Speech recognition systems are now required to perform at a
near-zero error rate under various noise conditions. Further, since many of
these portable devices use lossy compression to conserve bandwidth, speech
recognition system performance must also not degrade when subjected to
compression, packet loss, and other common wireless communication system
artifacts. Speech coding, for example, is known to have a negative effect on
the accuracy of speech recognition systems.
Aurora, a working group of ETSI, has been formed to address many of the issues involved in using speech recognition in mobile environments. Aurora's main task is the development of a distributed speech recognition (DSR) system standard that provides a client/server framework for human-computer interaction. In this framework, the client side performs the speech collection and signal processing (feature extraction) using software and hardware collectively termed as a front end. The processed data is transmitted to the server for recognition and subsequent processing. The exact form and function of the front end is a design factor in the overall DSR structure. Our collaboration with ETSI focuses on evaluating the performance of different front ends on the WSJ task for a variety of impairments:
|