Downloads:
- (09/30/05)
isip_proto_v5.18_creare: This release has same functionality as v5.17 except
that the likelihood scores for the one best output are time normalized.
To install this package, follow the instructions below.
- tar xzvf isip_proto_v5.18_creare.tar.gz
- cd isip_proto_v5.18_creare
- ./configure
- gmake
- gmake install
- source ISIP_ENV.sh
A typical command line for training crossword 8-mixture triphone
models will look like this:
cd <some_exp_train_directory>;
wsj_run -mixtures 8 -model_type xwrd_triphone -train_mfc_list
all_mfcc_features.list -split_threshold 10 -merge_threshold 10 -num_occ_threshold 50
-cpus_train redeye
A typical command line for testing will look like this:
cd <some_exp_test_directory>;
wsj_run -mixtures 8 -model_type xwrd_triphone -test_mfc_list
all_mfcc_features.list
-cpus_test redeye \
-models_path <path_to_some_exp_train_directory>
The above commandline will output 1-best output with normalized
likelihood scored at phone level. If word level normalized likelihood
score is required then say "-align_mode word" in the commandline used
for decoding. The optimum threshold on the DET plot was found to be
-69.81 i.e. anything above -69.81 can be considered less
likely than anything below the threshold.
- (09/23/05)
Models for Bravo data: These are 8-mixture crossword triphone models
that were trained on the 499 utterances from the Bravo data set. If there is an experiment run
previous and the user wants to replace the old models with these models
then replace the following directory: $ISIP_WSJ/exp/train/baum_welch/xwrd_tri/final_models. If tested on the same utterances the WER
will be 0.3%. The features were provided by Creare.
To install this package, follow the instructions below.
- tar xzvf bravo_final_models.tar.gz
- cp -rf final_models $ISIP_WSJ/exp/train/baum_welch/xwrd_tri/final_models
- (09/23/05)
isip_proto_v5.17: This release has same functionality as v5.16 except
for the bug that caused "nan" and "inf" values to appear as confidence
scores has been fixed.
To install this package, follow the instructions below.
- tar xzvf isip_proto_v5.17_creare.tar.gz
- cd isip_proto_v5.17_creare
- ./configure
- gmake
- gmake install
- source ISIP_ENV.sh
- (05/10/05)
Production System (r00_n11_t03): Production System release
with the endpoint detection utility. This utility can operate in
two modes: 1) "signal_only(Default)": In this mode the utility
writes only the endpointed data to the output files. 2) "all": In
this mode the utility will chop the entire utterance into smaller
segments and saves them to files.
To install this package, follow the instructions below.
- tar xzvf isip_r00_n11_t03.tar.gz
- cd isip_r00_n11_t03
- ./configure [--prefix=/<install directory>] [--with-audiofile-prefix=/<audiofile install directory>] [--with-sphere-prefix=/<sphere install directory>] [--with-sctk-prefix=/<sctk install directory>]
- source ISIP_BASE_ENV.sh
- make depend
- make install
-
(03/30/05)
isip_proto_v5.16 :
In this release, we have added the capability to compute and
output the average posteriori score per frame for each link in
the lattice (word graph). Similarly, the average posteriori score per
frame for each word in the 1-best hypothesis is also computed.
To install this package, follow the instructions given below. Detailed
instructions are included in the release's AAREADME.text file.
- tar xzvf isip_proto_v5.16_creare.tar.gz
- cd isip_proto_v5.16_creare
- ./configure
- gmake
- gmake install
- source ISIP_ENV.sh
The instructions to compute the posteriori score
assume that the acoustic models have already been generated using
the
Multiple-CPU ASR Tutorial (v5.0)
package. See the instructions with the Multiple-CPU ASR Tutorial
(v5.0) release on how to train the models.
Once the acoustic models are trained, the same directory setup
that is created by Multiple-CPU ASR Tutorial (v5.0) is used for
lattice generation, and then, for posteriori scores computation
from these lattices.
Steps to generate lattices:
- Download the
output_lattice.list
file and move it to the
$ISIP_WSJ/exp/decode/baum_welch/xwrd_tri/grammar_decoding/lists/
directory, where $ISIP_WSJ is a shell environment
variable that points to the Multiple-CPU ASR Tutorial (v5.0).
- Download the
params_lattice.text
file and move it to the
$ISIP_WSJ/exp/decode/baum_welch/xwrd_tri/grammar_decoding/
directory.
- Generate lattices using the following commandline:
trace_projector -p \ $ISIP_WSJ/exp/decode/baum_welch/xwrd_tri/grammar_decoding/params_lattice.text
Steps to generate posteriori scores using the lattices generated
in the previous step:
- Download the
input_lattice_posterior.list
file and move it to the
$ISIP_WSJ/exp/decode/baum_welch/xwrd_tri/grammar_decoding/lists/
directory.
- Download the
output_lattice_posterior.list
file and move it to the
$ISIP_WSJ/exp/decode/baum_welch/xwrd_tri/grammar_decoding/lists/
directory.
- Download the
output_posterior.list
file and move it to the
$ISIP_WSJ/exp/decode/baum_welch/xwrd_tri/grammar_decoding/lists/
directory.
- Download the
params_lattice_posterior.text
file and move it to the
$ISIP_WSJ/exp/decode/baum_welch/xwrd_tri/grammar_decoding/
directory.
- Generate posteriori scores using the following commandline:
trace_projector -p \ $ISIP_WSJ/exp/decode/baum_welch/xwrd_tri/grammar_decoding/params_lattice_posterior.text
The posteriori score per frame for each word is output as the
third column in the one-best hypotheses given by the
$ISIP_WSJ/exp/decode/baum_welch/xwrd_tri/grammar_decoding/lists/output_posterior.list list. A sample output hypothesis may look like
this file.
From the experiments conducted on the FAA data, it was
empirically observed that the words with average posteriori
score per frame greater than the threshold of around -68 can be
considered true with high confidence.
-
(03/24/05)
isip_questions.text: This file is used during state tying, a
decision tree based framework is used to cluster phonetically similar
sounds. This download consists of:
a) a master isip_questions_master.text file which consists of questions
corresponding to the 41 monophones in the monophones_master.text file.
b) monophones_master.text file.
c) a isip_questions_29.text file which contains the phonetic
question sets for the 29 monophones in the monophones_29.text file
d) monophones_29.text file.
How to create the isip_questions.text file?
This file can be easily obtained from the master questions file
(isip_questions_master.text), the reduced monophones
set(monophones_#.text file) and the full monophones_master.text file
which has 41 monophones. "diff" the monophones in the reduced set and
the full monophones set and remove the missing monophones from the
master isip_questions file to create the new reduced
isip_questions.text file. Suppose, there are no monophones present for
a particular set of phonetic questions then the questions can be
removed from the file.
- tar xzvf isip_question_files.tar.gz
-
(03/02/05)
isip_proto_v5.16_beta : This package is an upgraded version of prototype system v5.15. This system has an additional feature that is not present in v5.15 which is to compute posteior scores of every word in the wordgraph. These posteriors can be used as a confidence estimate.
- tar xzvf isip_proto_v5.16_beta.tar.gz
- cd isip_proto_v5.16_beta
- ./configure
- make
- make install
- source ISIP_ENV.sh
A typical command line for generating lattices would look like this:
trace_projector -p
params_lattice.text
The above commandline will generate lattices which will be used as input for posterior computation. The commandline for computing posterior lattice is as follows:
trace_projector -p params_lattice_posterior.text
As you would have noticed the parameter files look very similar for both the commandlines above. But there are 3 main differences:
1) There is a new parameter called compute_posterior. By default compute_posterior is set as 'no', but for posterior generation it should be specified as 'yes'.
2) The input_lattice list for posterior computation will be the output_lattice list used by the first commandline.
3) The output_lattice list for posterior computation will point to the files into which we would like to write the lattice along with the posterior score.
-
(01/31/05)
lexicon_and_monophone_files.tar.gz (v1.0) :
This tar package contains the following:
1) 'monophones.text' file which has all the monophones corresponding to
these 18 words:
Bravo,
Delta,
Echo,
FoxTrot,
Golf,
Hotel,
India,
Juliet,
Kilo,
Mike,
November,
Oscar,
Papa,
Quebec,
Tango,
Victor,
Whiskey,
Yankee.
2) 'lexicon.text' file which has the monophone mapings for the above words.
3) 'master_lexicon.text' file which contains the monophone mapings for
arounf 30,000 words.
4) 'create_triphones.pl' perl script that uses the monophones as the
input to generate a all_xwrd_triphones.list file.
Command:
perl create_triphones.pl notags_monophones.text > all_xwrd_triphones.list
5) 'notags_monophones.text' file contains simply the monophones from
the monophones.text file without the comments. This file is used by
the perl script to generate the all_xwrd_triphones.list file.
-
(12/08/04)
gen_trans_with_sp.pl (v1.0) :
This script will generate the monophone transcription files for the
corresponding word transcription files. This script will generate the
transcriptions with 'sp' between word boundaries. In order to create the
'no sp' monophone transcription file, just remove the sp from the file
created by the script.
A typical command line for scoring a lattice will look like this:
gen_trans_with_sp.pl lexicon.text all_word_transcription.text output_file
Note: the all_word_transcription.text file in this case will not
contain the utterance id. i.e it will contain just the word
transcriptions.
-
(11/07/04)
Multiple-CPU ASR Tutorial (v5.0): This package is used to run
recognition experiments on the FAA data. A word error rate of 0.8% will
be obtained if we train and test on the same data with state tying
thresholds as described in the commandline below. The decoding is
performed using a loop grammar.
- tar xzvf asr_va_tutorial_v5.0.tar.gz
- cd asr_package
- cd asr_va_tutorial_v5.0
- source <install directory for v5.15>ISIP_ENV.sh
- ./configure --prefix=.
- make
- make install
- source ISIP_WSJ_ENV.sh
- wsj_run -help
A typical command line for training crossword 8-mixture triphone
models will look like this:
cd <some_exp_train_directory>;
wsj_run -mixtures 8 -model_type xwrd_triphone -train_mfc_list
all_mfcc_features.list -split_threshold 10 -merge_threshold 10 -num_occ_threshold 50
-cpus_train isip218 isip218
If the test data is going to be unseen during training, then it is
recomemded to use an num_occ_threshold of 400 and the merge and split
thresholds around 20. These thresholds were found by cross validating
on the FAA data.
A typical command line for testing will look like this:
cd <some_exp_test_directory>;
wsj_run -mixtures 8 -model_type xwrd_triphone -test_mfc_list
all_mfcc_features.list
-cpus_test isip218 isip218 \
-models_path <path_to_some_exp_train_directory>
The above commandline will generate output files which contain
the triphone hypothesis. If required the triphone results can be
converted to their corresponding monophone equavalent using the
newly added utility to this package called 'convert_tri_to_mono'.
The new utility is a perl script that gets installed along with
the other utilities in the package.
The commandline to convert the triphone result to monophone is as
follows:
convert_tri_to_mono <triphone output filename> <monophone
output filename >
-
(10/05/04)
Models trained on the segmented FAA data (Prototype):
The models trained on the segmented FAA data can be downloaded from here.
These tar package contains the entire train directory. The extracted
train directory must replace the old train directory in your $ISIP_WSJ/exp
directory.
- tar xzvf models_faa.tar.gz
-
(10/05/04)
Segmented FAA features (mfcc):
The segmented FAA features can be downloaded by clicking the link above.
- tar xzvf segmented_faa_features.tar.gz
-
(10/05/04)
Segmented FAA raw data:
The segmented FAA data can be downloaded by clicking the link above.
- tar xzvf segmented_faa_raw.tar.gz
-
(09/27/04)
Det curve plotting package:
This package is provided by NIST(National Institute of Standards and
Technology) for plotting the DET curves. It has been slightly modified
to suit specific requirements. This software requires Matlab.
- tar xzvf det_package.tar.gz
-
(09/27/04)
gen_wer.pl (v1.0) :
This is a scoring script that post processes the lattice generated using
the prototype system. Please be sure you have Perl installed on your system.
A typical command line for scoring a lattice will look like this:
gen_wer.pl lattices_path output_path delta_value format_level alignment_file
- (07/09/04)
Multiple-CPU ASR Tutorial (v4.0):
The fourth release of a package that is a modified version of
the Aurora scripts. This package is primarily meant for word spotting
experiments.
Please be sure that you have already installed the
ISIP prototype system (v5.14) before running this application.
To install this package, follow the instructions below.
Detailed instructions are included in the release's AAREADME.text
and the INSTRUCTIONS.text files.
- tar xzvf asr_va_tutorial_v4.0.tar.gz
- cd asr_va_tutorial_v4.0
- source <install directory for v5.14>ISIP_ENV.sh
- ./configure --prefix=.
- make
- make install
- source ISIP_WSJ_ENV.sh
- wsj_run -help
A typical command line for decoding 1-mixture monophone models will
look like this:
cd <exp_directory>;
wsj_run -mixtures 1 -model_type monophone -decode_mode
grammar_decoding -align_mode phone -test_mfc_list
test_1247_v4.0_mfc.list \
-cpus_test isip218 isip218 -models_path .
- (11/07/03)
Multiple-CPU ASR Tutorial (v3.0):
The third release of a package that is a modified version of
the Aurora scripts. This package supports a few new features
including ngram decoding that can be used to generate the
alignments for the unseen phrases. The ngram decoding is based
on our Switchboard language model. Note that the decoding will
require about 700 MB of main memory because of the large
vocabulary size.
Please be sure that you have already installed the
ISIP prototype system (v5.14) before running this application.
To install this package, follow the instructions below.
Detailed instructions are included in the release's AAREADME.text file.
- tar xzvf asr_va_tutorial_v3.0.tar.gz
- cd asr_va_tutorial_v3.0
- source <install directory for v5.14>ISIP_ENV.sh
- ./configure --prefix=.
- make
- make install
- source ISIP_WSJ_ENV.sh
- wsj_run -help
A typical command line for training crossword 4-mixture triphone
models will look like this:
cd <some_exp_train_directory>;
wsj_run -mixtures 4 -model_type xwrd_triphone -train_mfc_list
train_1249_v3.0_mfc.list \
-cpus_train isip218 isip218
A typical command line for testing (generating alignments) will
look like this:
cd <some_exp_test_directory>;
wsj_run -mixtures 4 -model_type xwrd_triphone -decode_mode bigram_decoding -align_mode word -test_mfc_list
devtest_364_v3.0_mfc.list
-cpus_test isip218 isip218 \
-models_path <path_to_some_exp_train_directory>
- (11/07/03)
Multiple-CPU ASR Tutorial (v2.0):
The second release of a package that is a modified version of
the previous version. It supports training cross-word models
and network decoding for the 1249 pre-transcribed speech files
provided in the Creare Phase 02 data.
Please be sure that you have already installed the
ISIP prototype system (v5.14) before running this application.
To install this package, follow the instructions below.
Detailed instructions are included in the release's AAREADME.text file.
- tar xzvf asr_va_tutorial_v2.0.tar.gz
- cd asr_va_tutorial_v2.0
- source <install directory for v5.14>ISIP_ENV.sh
- ./configure --prefix=.
- make
- make install
- source ISIP_WSJ_ENV.sh
- wsj_run -help
A typical command line for training crossword 4-mixture triphone
models will look like this:
cd <some_exp_train_directory>;
wsj_run -mixtures 4 -model_type xwrd_triphone -train_mfc_list
train_1249_v2.0_mfc.list \
-cpus_train isip218 isip218
A typical command line for testing (generating alignments) will
look like this:
cd <some_exp_test_directory>;
wsj_run -mixtures 4 -model_type xwrd_triphone -align_mode phone
-test_mfc_list
devtest_1249_v2.0_mfc.list
-cpus_test isip218 isip218 \
-models_path <path_to_some_exp_train_directory>
This should result in a WER of 2.0%.
- (11/07/03)
Frontend for Waveform Audio Files (v1.0): The frontend
which converts the audio files in WAV format, sampled at 8 kHz
into MFCC features. An AAREADME.text file included in the
package provides detailed instructions.
- (11/07/03)
Production System (r00_n11_t02): Production System release
that supports both read and write the WAV format with ADPCM
compression, supported by SGI's audiofile library.
To install this package, follow the instructions below.
- tar xzvf isip_r00_n11_t02.tar.gz
- cd isip_r00_n11_t02
- ./configure [--prefix=/<install directory>] [--with-audiofile-prefix=/<audiofile install directory>] [--with-sphere-prefix=/<sphere install directory>] [--with-sctk-prefix=/<sctk install directory>]
- source ISIP_BASE_ENV.sh
- make depend
- make install
- (10/30/03)
Forced Alignments (v1.0): The forced alignments of the
data collection phase_02.
- (09/29/03)
Verification System (v1.0): The first release of a verification
toolkit based on the production system. An AAREADME file
included in the release provides detailed instructions.
- (08/06/03)
Production System (r00_n11_t01): Production System release
that supports both read and write in the NIST's Sphere format
and the formats supported by SGI's audiofile library.
To install this package, follow the instructions below.
- tar xzvf isip_r00_n11_t01.tar.gz
- cd isip_r00_n11_t01
- ./configure [--prefix=/<install directory>] [--with-audiofile-prefix=/<audiofile install directory>] [--with-sphere-prefix=/<sphere install directory>] [--with-sctk-prefix=/<sctk install directory>]
- source ISIP_BASE_ENV.sh
- make depend
- make install
- (07/24/03)
Production System (r00_n11_t00): Production System release
that supports NIST's Sphere format and the formats supported by
SGI's audiofile library.
To install this package, follow the instructions below.
- tar xzvf isip_r00_n11_t00.tar.gz
- cd isip_r00_n11_t00
- ./configure [--prefix=/<install directory>] [--with-audiofile-library=/<audiofile lib directory>] [--with-audiofile-includes=/<audiofile include directory>] [--with-sp-library=/<sphere lib directory>] [--with-sp-includes=/<sphere include directory>]
- source ISIP_BASE_ENV.sh
- make depend
- make install
- (07/07/03)
MFCC Features (v00): A recipe for converting 16 kHz raw
files to MFCC features stored in raw files.
- (06/26/03)
Monophone Tutorial Overview (v01): This contains a brief
synopsis of each step required to train and evaluate a
single-mixture context-independent (monophone) system
implemented in the Multiple-CPU ASR Tutorial (v1.0) package.
- (06/25/03)
Monophone Training Overview (v00): This page gives the
overview of recipe used in the monophone training process
implemented in the Multiple-CPU ASR Tutorial (v1.0) package.
- (06/17/03)
Creare Phase-01 Set (v1.0): This package contains the
training set, test set and devtest definitions. These
definitions will allow you to train and decode.
- (06/16/03)
Multiple-CPU ASR Tutorial (v1.0):
The first release of a package that is a modified version of the
Aurora scripts, and supports a few new features including
network decoding.
Please be sure that you have already installed the
ISIP prototype system (v5.14) before running this application.
To install this package, follow the instructions below.
Detailed instructions are included in the release's AAREADME.text file.
- tar xzvf asr_va_tutorial_v1.0.tar.gz
- cd asr_va_tutorial_v1.0
- source <install directory for v5.14>ISIP_ENV.sh
- ./configure --prefix=.
- make
- make install
- source ISIP_WSJ_ENV.sh
- wsj_run -help
A typical command line for training single mixture monophone
models will look like this:
A typical command line for testing will look like this:
cd <some_exp_test_directory>;
wsj_run -test_mfc_list
devtest_255_v1.0_mfc.list
-cpus_test isip206 isip207 isip208 isip209 -models_path \ <path_to_some_exp_train_directory>
This should result in a WER of 3.4%.
- (06/04/03)
Project Bibliography (v00): This list below gives a good
overview of various approaches that might be relevant to this
project.
|