Initial versions of SWITCHBOARD were primarily used for developing speaker recognition systems. A preliminary evaluation of LVCSR systems using SWITCHBOARD was done in late 1994. This was followed by a full evaluation in April 1995 where the best system gave a WER of 43% with speaker adaptation and 50% without speaker adaptation. Since these initial evaluations there has been slow but considerable improvement in the performance of LVCSR systems. This is evidenced by the most recent evaluations held in Fall 1997. The table below summarizes the results.

SITE	SWB-II WER	CALLHOME WER	OVERALL WER
BBN	35.5	53.7	44.9
BU	41.5	58.2	50.1
CMU-ISL	35.1	54.4	45.1
CU-HTK	39.2	57.6	48.7
DRAGON	39.9	57.4	48.9
SRI	42.5	57.5	50.2

Fall 1997 NIST Hub-5E Evaluation Results

Parallel to work done as part of the evaluations, the summer workshops held at CLSP, Johns Hopkins University have provided significant insights into problems posed by conversational speech, such as in SWITCHBOARD.

1996 LVCSR Workshop

Speech Data Modeling: This work involved the use of an ANN-based hybrid system for experimentation with multi-band and multi-scale input features. Their main aim was to observe speaker and speech variations through signal processing and acoustic modeling. The best performance achieved was 59% WER compared to a baseline of 64.9% WER.
Data-driven Pronunciation Modeling: This work was based on the premise that conversational speech contained more pronunciation variations than was provided by traditional phone-based lexicons. Thus, they developed a decision-tree based technique to learn mappings from baseform phones to alternate pronunciations. The best performance achieved was 45.3% WER compared to a baseline of 46.4%.
Hidden Speaking Mode Modeling: During this work a new conditioning variable, the mode, which reflected dynamic features of the observed speech from acoustics and text was introduced. The acoustic features included speaking rate, presence and duration of silence, counts of long and short pauses, etc. Language model features included discourse function of the utterance, presence of disfluencies, word frequencies, etc. The best performance achieved was 53.7% WER compared to a baseline of 54.8% WER.
Dependency Language Modeling: This work was an attempt to use linguistic structure to get better language models for conversational speech. The use of dependency grammars and Maximum Entropy modeling was pursued. The best system was obtained through experimentation with multi-band and multi-scale input features. The best performance achieved was 45.4% WER compared to a baseline of 46.2%.

1997 LVCSR Workshop

comprehensive summary