Switchboard
Each release of transcription data for this project will be a superset
of the previous release (in other words, you need only download the latest
release). All transcriptions and segmentations developed in this project
are based on the audio data from the following SWITCHBOARD release:
Switchboard-1 Telephone Speech Corpus: Release 2 August, 1997
For information regarding SWITCHBOARD, please consult the
LDC web site.
For more details about this project, see the
project overview.
A
mailing list
is also available to discuss progress and key issues of the project.
Transcriptions and Word Alignments:
- (01/29/03)
Download manually corrected word alignments: Several lexicon
items were fixed in the 10/19/02 release, and about 45 start/stop
times that had negative durations (stop time preceded the start
time) were repaired. We are no longer actively developing this
resource, but continue to include bug fixes. Included in this
release are the final transcriptions for the entire database, the
complete lexicon, and automatic word alignments.
- (03/21/01)
Download the ICSI Transcriptions:
This release differs from the 03/15/01/release only by one utterance.
Two utterances were merged to form one utterance, and the phone
transcriptions were corrected.
The original ICSI data is available from the
WS97 ftp site
at the Center for Language and Speech Processing (CSLP)
at Johns Hopkins University. It can also be downloaded from the
ISIP mirror
of this data.
- (11/26/02)
Download the Penn Treebank Transcriptions:
This release contains a few bug fixes in the 10/19/02 release,
reflecting changes described above in the word alignments and
segmentations. This Penn Treebank release contains an alignment
of the ISIP hand-aligned word transcriptions to the Penn Treebank
word transcriptions for all 1126 SWB conversations that are
included in the Treebank. For the words which are in agreement
between the two transcriptions, time marks are given. For words
that do not agree, we estimate the times for the Treebank
transcriptions using the ISIP transcriptions. The transcriptions
also include all instances of silence, laughter and noise.
Documentation:
-
Transcription FAQ:
provide on-line feedback about key issues.
-
Conventions:
download a document describing our transcription conventions.
-
SWB Statistics:
download a statistical analysis of the SWB corpus.
-
SWB Models:
A copy of the SWB models file that we use.
-
Education:
an on-line educational resource for learning about the SWB
corpus.
-
Reports:
Quarterly reports summarizing the progress made on the project.
Tools:
-
Software:
our transcription and segmentation tool.
-
Spiker:
this is a simple C program to correct Switchboard files that
have been corrupted by flipping of their bits.
-
ISIP Recognizer:
download a public domain speech recognition
system under development in ISIP.
General Information:
-
Overview:
an overview of the SWITCHBOARD (SWB) resegmentation project.
-
Personnel: the people that make SWB resegmentation happen.
-
Job Opportunities:
do you want to be a SWITCHBOARD validator?
- Timesheets: a
list of due dates for timesheets.
-
CLSP Workshops:
summer workshops on conversational speech recognition.
Up |
Home |
Courses |
Projects |
Proposals |
Publications
Please direct questions or comments to
joseph.picone@gmail.com