This audio demonstration is intended to help you appreciate how elusive
linguistic units are in real speech. Several types of speech data have
been segmented and transcribed so that you can listen to a hierarchy
of linguistic units (e.g., phones, syllables) and compare across
different articulation styles. The examples included here are drawn from
several popular recognition research tasks. What you should learn from
this demo is that a phone exists over a duration too brief to be
perceived in isolation - acoustic context plays a key role in our
ability to transcribe words.
Clearly Articulated
|
Read Speech
|
|
Isolated word recognition is an easy task because words are separated
by lengthy amounts of silence. Often, words are much more clearly
articulated than in continuous speech. Error rates on such tasks
are often one or two orders of magnitude lower than conversational
speech.
|
|
In the early days of LVCSR research, read speech databases were
used so that large amounts of training data could be generated
in a straightforward manner. Read speech is typically well-articulated
so that listeners can easily understand the content.
|
Command and Control
|
Conversational
|
|
Command and control applications, such as voice interfaces to
modern window-based computing systems, feature speech that is
intended to maximize the information transfer in a minimum amount
of audio data.
|
|
Ignore everything said above, and you have conversational speech,
which contains many interesting linguistic phenomena.
In such speech, it is often hard to find words - phrases are much
easier to indentify.
|
|