RESEGMENTATION AND TRANSCRIPTION OF SWITCHBOARD
Jonathan Hamaker, Neeraj Deshmukh, Aravind Ganapathiraju, Joseph Picone

Inistitute for Signal and Information Processing
Department of Electrical and Computer Engineering
Mississippi State University
email: {hamaker, deshmukh, ganapath, picone}@isip.msstate.edu
phone/fax: 601-325-3149; office: 413 Simrall
URL: /hse/ies

Abstract:

The SWITCHBOARD Corpus has become one of the most important benchmarks for assessing improvements in large vocabulary conversational speech (LVCSR). The high error rates on SWB are largely attributable to an acoustic model mismatch, the high frequency of poorly articulated monosyllabic words, and large variations in pronunciations. An improved quality of segmentations and transcriptions translates well to improved acoustic modeling. The goal of our SWB resegmentation project is to (1) resegment the data into utterances of approximately 10 seconds in duration using boundaries based on naturally-occurring silence and linguistic phrase boundaries, and to (2) correct the transcriptions. A system trained on a subset of this data resulted in a 1.9% absolute reduction in word error rate. Equally exciting is the fact that recognition error rates on monosyllabic words dropped from 70.0% to 63.3% - a decrease of 6.7%. Since monosyllabic words dominate the SWB corpus, this is a particularly significant result.