Phonetic Units / Audio / Demonstrations / Software / Home

This audio demonstration is intended to help you appreciate how elusive linguistic units are in real speech. Several types of speech data have been segmented and transcribed so that you can listen to a hierarchy of linguistic units (e.g., phones, syllables) and compare across different articulation styles. The examples included here are drawn from several popular recognition research tasks. What you should learn from this demo is that a phone exists over a duration too brief to be perceived in isolation - acoustic context plays a key role in our ability to transcribe words.

Clearly Articulated		Read Speech
	Isolated word recognition is an easy task because words are separated by lengthy amounts of silence. Often, words are much more clearly articulated than in continuous speech. Error rates on such tasks are often one or two orders of magnitude lower than conversational speech.		In the early days of LVCSR research, read speech databases were used so that large amounts of training data could be generated in a straightforward manner. Read speech is typically well-articulated so that listeners can easily understand the content.
Command and Control		Conversational
	Command and control applications, such as voice interfaces to modern window-based computing systems, feature speech that is intended to maximize the information transfer in a minimum amount of audio data.		Ignore everything said above, and you have conversational speech, which contains many interesting linguistic phenomena. In such speech, it is often hard to find words - phrases are much easier to indentify.