Cognitive Assessment Using Voice Analysis


Downloads:
  • (09/30/05) isip_proto_v5.18_creare: This release has same functionality as v5.17 except that the likelihood scores for the one best output are time normalized.

    To install this package, follow the instructions below.

    • tar xzvf isip_proto_v5.18_creare.tar.gz
    • cd isip_proto_v5.18_creare
    • ./configure
    • gmake
    • gmake install
    • source ISIP_ENV.sh

    A typical command line for training crossword 8-mixture triphone models will look like this:

      cd <some_exp_train_directory>;
      wsj_run -mixtures 8 -model_type xwrd_triphone -train_mfc_list all_mfcc_features.list -split_threshold 10 -merge_threshold 10 -num_occ_threshold 50
      -cpus_train redeye


    A typical command line for testing will look like this:

      cd <some_exp_test_directory>;
      wsj_run -mixtures 8 -model_type xwrd_triphone -test_mfc_list all_mfcc_features.list -cpus_test redeye \
      -models_path <path_to_some_exp_train_directory>

    The above commandline will output 1-best output with normalized likelihood scored at phone level. If word level normalized likelihood score is required then say "-align_mode word" in the commandline used for decoding. The optimum threshold on the DET plot was found to be -69.81 i.e. anything above -69.81 can be considered less likely than anything below the threshold.

  • (09/23/05) Models for Bravo data: These are 8-mixture crossword triphone models that were trained on the 499 utterances from the Bravo data set. If there is an experiment run previous and the user wants to replace the old models with these models then replace the following directory: $ISIP_WSJ/exp/train/baum_welch/xwrd_tri/final_models. If tested on the same utterances the WER will be 0.3%. The features were provided by Creare.

    To install this package, follow the instructions below.

    • tar xzvf bravo_final_models.tar.gz
    • cp -rf final_models $ISIP_WSJ/exp/train/baum_welch/xwrd_tri/final_models


  • (09/23/05) isip_proto_v5.17: This release has same functionality as v5.16 except for the bug that caused "nan" and "inf" values to appear as confidence scores has been fixed.

    To install this package, follow the instructions below.

    • tar xzvf isip_proto_v5.17_creare.tar.gz
    • cd isip_proto_v5.17_creare
    • ./configure
    • gmake
    • gmake install
    • source ISIP_ENV.sh


  • (05/10/05) Production System (r00_n11_t03): Production System release with the endpoint detection utility. This utility can operate in two modes: 1) "signal_only(Default)": In this mode the utility writes only the endpointed data to the output files. 2) "all": In this mode the utility will chop the entire utterance into smaller segments and saves them to files.

    To install this package, follow the instructions below.

    • tar xzvf isip_r00_n11_t03.tar.gz
    • cd isip_r00_n11_t03
    • ./configure [--prefix=/<install directory>] [--with-audiofile-prefix=/<audiofile install directory>] [--with-sphere-prefix=/<sphere install directory>] [--with-sctk-prefix=/<sctk install directory>]
    • source ISIP_BASE_ENV.sh
    • make depend
    • make install


  • (03/30/05) isip_proto_v5.16 : In this release, we have added the capability to compute and output the average posteriori score per frame for each link in the lattice (word graph). Similarly, the average posteriori score per frame for each word in the 1-best hypothesis is also computed. To install this package, follow the instructions given below. Detailed instructions are included in the release's AAREADME.text file.

    • tar xzvf isip_proto_v5.16_creare.tar.gz
    • cd isip_proto_v5.16_creare
    • ./configure
    • gmake
    • gmake install
    • source ISIP_ENV.sh

    The instructions to compute the posteriori score assume that the acoustic models have already been generated using the Multiple-CPU ASR Tutorial (v5.0) package. See the instructions with the Multiple-CPU ASR Tutorial (v5.0) release on how to train the models. Once the acoustic models are trained, the same directory setup that is created by Multiple-CPU ASR Tutorial (v5.0) is used for lattice generation, and then, for posteriori scores computation from these lattices.

    Steps to generate lattices:

    1. Download the output_lattice.list file and move it to the $ISIP_WSJ/exp/decode/baum_welch/xwrd_tri/grammar_decoding/lists/ directory, where $ISIP_WSJ is a shell environment variable that points to the Multiple-CPU ASR Tutorial (v5.0).

    2. Download the params_lattice.text file and move it to the $ISIP_WSJ/exp/decode/baum_welch/xwrd_tri/grammar_decoding/ directory.

    3. Generate lattices using the following commandline:
      trace_projector -p \ $ISIP_WSJ/exp/decode/baum_welch/xwrd_tri/grammar_decoding/params_lattice.text


    Steps to generate posteriori scores using the lattices generated in the previous step:

    1. Download the input_lattice_posterior.list
      file and move it to the $ISIP_WSJ/exp/decode/baum_welch/xwrd_tri/grammar_decoding/lists/ directory.

    2. Download the output_lattice_posterior.list file and move it to the $ISIP_WSJ/exp/decode/baum_welch/xwrd_tri/grammar_decoding/lists/ directory.

    3. Download the output_posterior.list file and move it to the $ISIP_WSJ/exp/decode/baum_welch/xwrd_tri/grammar_decoding/lists/ directory.

    4. Download the params_lattice_posterior.text file and move it to the $ISIP_WSJ/exp/decode/baum_welch/xwrd_tri/grammar_decoding/ directory.

    5. Generate posteriori scores using the following commandline:
      trace_projector -p \ $ISIP_WSJ/exp/decode/baum_welch/xwrd_tri/grammar_decoding/params_lattice_posterior.text

    The posteriori score per frame for each word is output as the third column in the one-best hypotheses given by the $ISIP_WSJ/exp/decode/baum_welch/xwrd_tri/grammar_decoding/lists/output_posterior.list list. A sample output hypothesis may look like this file. From the experiments conducted on the FAA data, it was empirically observed that the words with average posteriori score per frame greater than the threshold of around -68 can be considered true with high confidence.

  • (03/24/05) isip_questions.text: This file is used during state tying, a decision tree based framework is used to cluster phonetically similar sounds. This download consists of:

    a) a master isip_questions_master.text file which consists of questions corresponding to the 41 monophones in the monophones_master.text file.
    b) monophones_master.text file.
    c) a isip_questions_29.text file which contains the phonetic question sets for the 29 monophones in the monophones_29.text file
    d) monophones_29.text file.

    How to create the isip_questions.text file?

    This file can be easily obtained from the master questions file (isip_questions_master.text), the reduced monophones set(monophones_#.text file) and the full monophones_master.text file which has 41 monophones. "diff" the monophones in the reduced set and the full monophones set and remove the missing monophones from the master isip_questions file to create the new reduced isip_questions.text file. Suppose, there are no monophones present for a particular set of phonetic questions then the questions can be removed from the file.

    • tar xzvf isip_question_files.tar.gz

  • (03/02/05) isip_proto_v5.16_beta : This package is an upgraded version of prototype system v5.15. This system has an additional feature that is not present in v5.15 which is to compute posteior scores of every word in the wordgraph. These posteriors can be used as a confidence estimate.

    • tar xzvf isip_proto_v5.16_beta.tar.gz
    • cd isip_proto_v5.16_beta
    • ./configure
    • make
    • make install
    • source ISIP_ENV.sh

    A typical command line for generating lattices would look like this:

      trace_projector -p params_lattice.text

      The above commandline will generate lattices which will be used as input for posterior computation. The commandline for computing posterior lattice is as follows:

      trace_projector -p params_lattice_posterior.text

      As you would have noticed the parameter files look very similar for both the commandlines above. But there are 3 main differences:
      1) There is a new parameter called compute_posterior. By default compute_posterior is set as 'no', but for posterior generation it should be specified as 'yes'.
      2) The input_lattice list for posterior computation will be the output_lattice list used by the first commandline.
      3) The output_lattice list for posterior computation will point to the files into which we would like to write the lattice along with the posterior score.
  • (01/31/05) lexicon_and_monophone_files.tar.gz (v1.0) : This tar package contains the following:

    1) 'monophones.text' file which has all the monophones corresponding to these 18 words:
    Bravo, Delta, Echo, FoxTrot, Golf, Hotel, India, Juliet, Kilo, Mike, November, Oscar, Papa, Quebec, Tango, Victor, Whiskey, Yankee.

    2) 'lexicon.text' file which has the monophone mapings for the above words.

    3) 'master_lexicon.text' file which contains the monophone mapings for arounf 30,000 words.

    4) 'create_triphones.pl' perl script that uses the monophones as the input to generate a all_xwrd_triphones.list file.
    Command: perl create_triphones.pl notags_monophones.text > all_xwrd_triphones.list

    5) 'notags_monophones.text' file contains simply the monophones from the monophones.text file without the comments. This file is used by the perl script to generate the all_xwrd_triphones.list file.

  • (12/08/04) gen_trans_with_sp.pl (v1.0) : This script will generate the monophone transcription files for the corresponding word transcription files. This script will generate the transcriptions with 'sp' between word boundaries. In order to create the 'no sp' monophone transcription file, just remove the sp from the file created by the script.

    A typical command line for scoring a lattice will look like this:

    gen_trans_with_sp.pl lexicon.text all_word_transcription.text output_file

    Note: the all_word_transcription.text file in this case will not contain the utterance id. i.e it will contain just the word transcriptions.

  • (11/07/04) Multiple-CPU ASR Tutorial (v5.0): This package is used to run recognition experiments on the FAA data. A word error rate of 0.8% will be obtained if we train and test on the same data with state tying thresholds as described in the commandline below. The decoding is performed using a loop grammar.

    • tar xzvf asr_va_tutorial_v5.0.tar.gz
    • cd asr_package
    • cd asr_va_tutorial_v5.0
    • source <install directory for v5.15>ISIP_ENV.sh
    • ./configure --prefix=.
    • make
    • make install
    • source ISIP_WSJ_ENV.sh
    • wsj_run -help

    A typical command line for training crossword 8-mixture triphone models will look like this:

      cd <some_exp_train_directory>;
      wsj_run -mixtures 8 -model_type xwrd_triphone -train_mfc_list all_mfcc_features.list -split_threshold 10 -merge_threshold 10 -num_occ_threshold 50
      -cpus_train isip218 isip218

      If the test data is going to be unseen during training, then it is recomemded to use an num_occ_threshold of 400 and the merge and split thresholds around 20. These thresholds were found by cross validating on the FAA data.

    A typical command line for testing will look like this:

      cd <some_exp_test_directory>;
      wsj_run -mixtures 8 -model_type xwrd_triphone -test_mfc_list all_mfcc_features.list -cpus_test isip218 isip218 \
      -models_path <path_to_some_exp_train_directory>


    The above commandline will generate output files which contain the triphone hypothesis. If required the triphone results can be converted to their corresponding monophone equavalent using the newly added utility to this package called 'convert_tri_to_mono'. The new utility is a perl script that gets installed along with the other utilities in the package.

    The commandline to convert the triphone result to monophone is as follows:

    convert_tri_to_mono <triphone output filename> <monophone output filename >

  • (10/05/04) Models trained on the segmented FAA data (Prototype): The models trained on the segmented FAA data can be downloaded from here. These tar package contains the entire train directory. The extracted train directory must replace the old train directory in your $ISIP_WSJ/exp directory.

    • tar xzvf models_faa.tar.gz

  • (10/05/04) Segmented FAA features (mfcc): The segmented FAA features can be downloaded by clicking the link above.

    • tar xzvf segmented_faa_features.tar.gz

  • (10/05/04) Segmented FAA raw data: The segmented FAA data can be downloaded by clicking the link above.

    • tar xzvf segmented_faa_raw.tar.gz

  • (09/27/04) Det curve plotting package: This package is provided by NIST(National Institute of Standards and Technology) for plotting the DET curves. It has been slightly modified to suit specific requirements. This software requires Matlab.

    • tar xzvf det_package.tar.gz

  • (09/27/04) gen_wer.pl (v1.0) : This is a scoring script that post processes the lattice generated using the prototype system. Please be sure you have Perl installed on your system.

    A typical command line for scoring a lattice will look like this:

    gen_wer.pl lattices_path output_path delta_value format_level alignment_file

  • (07/09/04) Multiple-CPU ASR Tutorial (v4.0): The fourth release of a package that is a modified version of the Aurora scripts. This package is primarily meant for word spotting experiments. Please be sure that you have already installed the ISIP prototype system (v5.14) before running this application. To install this package, follow the instructions below. Detailed instructions are included in the release's AAREADME.text and the INSTRUCTIONS.text files.

    • tar xzvf asr_va_tutorial_v4.0.tar.gz
    • cd asr_va_tutorial_v4.0
    • source <install directory for v5.14>ISIP_ENV.sh
    • ./configure --prefix=.
    • make
    • make install
    • source ISIP_WSJ_ENV.sh
    • wsj_run -help

    A typical command line for decoding 1-mixture monophone models will look like this:

      cd <exp_directory>;
      wsj_run -mixtures 1 -model_type monophone -decode_mode grammar_decoding -align_mode phone -test_mfc_list test_1247_v4.0_mfc.list \
      -cpus_test isip218 isip218 -models_path .


  • (11/07/03) Multiple-CPU ASR Tutorial (v3.0): The third release of a package that is a modified version of the Aurora scripts. This package supports a few new features including ngram decoding that can be used to generate the alignments for the unseen phrases. The ngram decoding is based on our Switchboard language model. Note that the decoding will require about 700 MB of main memory because of the large vocabulary size. Please be sure that you have already installed the ISIP prototype system (v5.14) before running this application. To install this package, follow the instructions below. Detailed instructions are included in the release's AAREADME.text file.

    • tar xzvf asr_va_tutorial_v3.0.tar.gz
    • cd asr_va_tutorial_v3.0
    • source <install directory for v5.14>ISIP_ENV.sh
    • ./configure --prefix=.
    • make
    • make install
    • source ISIP_WSJ_ENV.sh
    • wsj_run -help

    A typical command line for training crossword 4-mixture triphone models will look like this:

      cd <some_exp_train_directory>;
      wsj_run -mixtures 4 -model_type xwrd_triphone -train_mfc_list train_1249_v3.0_mfc.list \
      -cpus_train isip218 isip218

    A typical command line for testing (generating alignments) will look like this:

      cd <some_exp_test_directory>;
      wsj_run -mixtures 4 -model_type xwrd_triphone -decode_mode bigram_decoding -align_mode word -test_mfc_list devtest_364_v3.0_mfc.list -cpus_test isip218 isip218 \
      -models_path <path_to_some_exp_train_directory>


  • (11/07/03) Multiple-CPU ASR Tutorial (v2.0): The second release of a package that is a modified version of the previous version. It supports training cross-word models and network decoding for the 1249 pre-transcribed speech files provided in the Creare Phase 02 data. Please be sure that you have already installed the ISIP prototype system (v5.14) before running this application. To install this package, follow the instructions below. Detailed instructions are included in the release's AAREADME.text file.

    • tar xzvf asr_va_tutorial_v2.0.tar.gz
    • cd asr_va_tutorial_v2.0
    • source <install directory for v5.14>ISIP_ENV.sh
    • ./configure --prefix=.
    • make
    • make install
    • source ISIP_WSJ_ENV.sh
    • wsj_run -help

    A typical command line for training crossword 4-mixture triphone models will look like this:

      cd <some_exp_train_directory>;
      wsj_run -mixtures 4 -model_type xwrd_triphone -train_mfc_list train_1249_v2.0_mfc.list \
      -cpus_train isip218 isip218

    A typical command line for testing (generating alignments) will look like this:

      cd <some_exp_test_directory>;
      wsj_run -mixtures 4 -model_type xwrd_triphone -align_mode phone -test_mfc_list devtest_1249_v2.0_mfc.list -cpus_test isip218 isip218 \
      -models_path <path_to_some_exp_train_directory>

    This should result in a WER of 2.0%.

  • (11/07/03) Frontend for Waveform Audio Files (v1.0): The frontend which converts the audio files in WAV format, sampled at 8 kHz into MFCC features. An AAREADME.text file included in the package provides detailed instructions.

  • (11/07/03) Production System (r00_n11_t02): Production System release that supports both read and write the WAV format with ADPCM compression, supported by SGI's audiofile library.

    To install this package, follow the instructions below.

    • tar xzvf isip_r00_n11_t02.tar.gz
    • cd isip_r00_n11_t02
    • ./configure [--prefix=/<install directory>] [--with-audiofile-prefix=/<audiofile install directory>] [--with-sphere-prefix=/<sphere install directory>] [--with-sctk-prefix=/<sctk install directory>]
    • source ISIP_BASE_ENV.sh
    • make depend
    • make install


  • (10/30/03) Forced Alignments (v1.0): The forced alignments of the data collection phase_02.

  • (09/29/03) Verification System (v1.0): The first release of a verification toolkit based on the production system. An AAREADME file included in the release provides detailed instructions.

  • (08/06/03) Production System (r00_n11_t01): Production System release that supports both read and write in the NIST's Sphere format and the formats supported by SGI's audiofile library.

    To install this package, follow the instructions below.

    • tar xzvf isip_r00_n11_t01.tar.gz
    • cd isip_r00_n11_t01
    • ./configure [--prefix=/<install directory>] [--with-audiofile-prefix=/<audiofile install directory>] [--with-sphere-prefix=/<sphere install directory>] [--with-sctk-prefix=/<sctk install directory>]
    • source ISIP_BASE_ENV.sh
    • make depend
    • make install


  • (07/24/03) Production System (r00_n11_t00): Production System release that supports NIST's Sphere format and the formats supported by SGI's audiofile library.

    To install this package, follow the instructions below.

    • tar xzvf isip_r00_n11_t00.tar.gz
    • cd isip_r00_n11_t00
    • ./configure [--prefix=/<install directory>] [--with-audiofile-library=/<audiofile lib directory>] [--with-audiofile-includes=/<audiofile include directory>] [--with-sp-library=/<sphere lib directory>] [--with-sp-includes=/<sphere include directory>]
    • source ISIP_BASE_ENV.sh
    • make depend
    • make install


  • (07/07/03) MFCC Features (v00): A recipe for converting 16 kHz raw files to MFCC features stored in raw files.

  • (06/26/03) Monophone Tutorial Overview (v01): This contains a brief synopsis of each step required to train and evaluate a single-mixture context-independent (monophone) system implemented in the Multiple-CPU ASR Tutorial (v1.0) package.

  • (06/25/03) Monophone Training Overview (v00): This page gives the overview of recipe used in the monophone training process implemented in the Multiple-CPU ASR Tutorial (v1.0) package.

  • (06/17/03) Creare Phase-01 Set (v1.0): This package contains the training set, test set and devtest definitions. These definitions will allow you to train and decode.

  • (06/16/03) Multiple-CPU ASR Tutorial (v1.0): The first release of a package that is a modified version of the Aurora scripts, and supports a few new features including network decoding. Please be sure that you have already installed the ISIP prototype system (v5.14) before running this application. To install this package, follow the instructions below. Detailed instructions are included in the release's AAREADME.text file.

    • tar xzvf asr_va_tutorial_v1.0.tar.gz
    • cd asr_va_tutorial_v1.0
    • source <install directory for v5.14>ISIP_ENV.sh
    • ./configure --prefix=.
    • make
    • make install
    • source ISIP_WSJ_ENV.sh
    • wsj_run -help

    A typical command line for training single mixture monophone models will look like this:


    A typical command line for testing will look like this:

      cd <some_exp_test_directory>;
      wsj_run -test_mfc_list devtest_255_v1.0_mfc.list -cpus_test isip206 isip207 isip208 isip209 -models_path \ <path_to_some_exp_train_directory>

    This should result in a WER of 3.4%.

  • (06/04/03) Project Bibliography (v00): This list below gives a good overview of various approaches that might be relevant to this project.