Steps to follow for building a speech recognizer to recognize -------------------------------------------------------------- the telelphone numbers : ------------------------ 1. First recording is to be done on the DAT machine for obtaining audio file you will get the audio file in raw format. Command : narecord -s 8000 test.raw 2. Then you convert this raw file to wav file because this is the format the feature extraction program accepts Command : /ftp/pub/resources/courses/ece_8993_speech/homework/1998/ problem_07/balakrishnama/scripts/raw_to_nist.sh 3. Once you obtain the .wav file (thought the scripts is raw_to_nist we get the wav format) all you have to do is extract the features of this file. These are required by our decoder as input. Command 1 :feature/cparam -m -w 25 -p 12 -d -g -e -H NIST data/four_digit/ test.wav data/four_digit/test.mfcc Command 2 : feature/cview -h -n 39 test.mfcc > test.dat 4. The program cview prints out all the values in different format our decoder requires it in 39 dimensional features, convert to this format using the script. (You will have to also use some macros because there will be other things like column number etc.) The data you obtain from cview will be of format : [1] 9.3333 [2] 5.4444 [3] 4.55555 [4] 3.4444 [5] 4.5555 [6] 4.56778 ...... ...... You need to strip all of the [..] numbers which are frame numbers and just retain the feature values and they all should be in one line. Save it to a file this becomes your input.text 5. Then, the final step use the decoder trace_projection to recognize and obtain the output (spech to text file). nice -19 /ftp/pub/resources/courses/ece_8993_speech/homework/1998/ utilities/decoder/trace_projection/bin/i386_SunOS_5.6/trace_projector -p data/input_files/params.text -n 5 -c 3 -g 2 -demo Before executing you have to make ready your params.text file please refer to : /ftp/pub/resources/courses/ece_8993_speech/homework/1998/problem_07/ balakrishnama/data/input_files/params.text Most of the files will be used from our main decoder version but these are the files we need to make ready for our experiment : grammar.text lexicon.text input.text - this is ready from step (4) For grammar.text and lexicon.text please refer to my directory. 6. If you are running the decoder in -demo mode then after the command you need to key in the number of frames, no. of frames of your input file can be found by doing wc on your input file which is input.text. So as you key the no. of frames you will get the output in your output file which is shown in params.text. 7. That's about it ! The output may or may not be the same as you spoke but silences and sp may be easily recognized. This is because our system is trained for telephonic data so it wouldn't recognize accurately the data obtained from a DAT machine. All the commands used : ----------------------- # Record the file from DAT machine in raw format # convert the raw file to wav file using the script scripts/raw_to_nist.sh # To obtain the mfcc file from wav file to get the feature values to be used as input for the decoder feature/cparam -m -w 25 -p 12 -d -g -e -H NIST data/four_digit/test.wav data/four_digit/test.mfcc # To print the mfcc feature values to a file feature/cview -h -n 39 data/four_digit/test.mfcc > data/four_digit/test.dat