JEIDA COMMON SPEECH DATA CORPUS
The Japan Electronic Industry Development Association's Common Speech
Data (JCSD) Corpus is an isolated phrase corpus consisting of 150
speakers (75 males/75 females) and almost 200,000 utterances. It
represents an important milestone in Japanese speech recognition
technology development. The JCSD Corpus was originally collected in
1986 in Japan in a nationwide project managed by Professor Shuichi
Itahashi in coordination with the Japan Electronic Industry
Association (JEIDA). Its importance to Japanese speech recognition
technology development is, to some extent, comparable to Texas
Instruments' famous 46-word speaker-dependent corpus. The JCSD Corpus
was one of the first industry-standard and freely available corpora
for the study of Japanese language speech recognition. Most of the
competitive Japanese language speech recognition systems developed in
Japan have been benchmarked on various subsets of this corpus. Hence,
it is one of the most important standards of comparisons that exist
for Japanese language systems.
Software:
-
jeida_validator:
This is the TCL code for the GUI alone.
-
jeida_tools:
This is the full distribution of all tools used to validate the
database, including binaries for Solaris 2.5.1
Documentation:
-
User's Guide:
This guide gives a detailed explanation of this project.
Sponsors:
-
LDC:
This project was sponsored by the Linguistic Data Consortium
which makes this
data
commercially available along with other speech-related
databases.
[an error occurred while processing this directive]