This is a graphical user interface tool for speech segmentation and
speech transcription. The tool provides spectrograms and energy plots,
speech selection, and audio playback capabilities. The tool is a
single channel version which is specifically designed for quick access
to multiple files from a single speaker (mono). It is written partly
in object-oriented C++ (using GNU's gcc compiler) which is interfaced
to Tcl-Tk (v8.0) utilities.
To download the current version of the Transcriber tool click here. Also please feel free to send us comments or suggestions regarding the tool. The interface: The main features include
The energy plot display: The energy plot features include
The signal plot display: The signal plot features include
The spectrogram display: The spectrogram features include
The Transcriber tool currently supports only 16 bit single channel linear data (RAW). In order to use the Transcriber with other types of data you will need to use the NIST SPHERE tools to convert your data to RAW format. In order to use your own data with the Transcriber you will need to set up a configuration file with parameters like the audio device, audio server, sample frequency, sample number of bytes etc. You will also need to specify the lexicon file path (lexfile) and the call file path (callfile). The lexicon file for all purposes is a user defied reference dictionary that can be viewed, searched, and modified according to one's preference. The call file contains the location of the transcription file, audio list and comment file. Each of the three previous parameters are significant in which the transcription file contains a set key value pairs that describe each entry in the file. The comment file on the other hand contains a set of bookmarks that tells you the start and stop time along with the duration of the transcription process. Finally, the audio list contains the location of all the audio data that is associated with the given transcription file. An example directory structure of the Transcriber follows: There are several options that are available for using the display and audio facilities. These options are accessible by clicking on the Config button on the main screen. In the first section under Session File the current configuration file, comment file, transcription file and lexicon file are listed. You can even browse through and select another configuration file via the Browse button. In the second section under Audio-Related Parameters you have the option of setting the audio device (sparc, dat, ncd, x86) to you system. You can also select the audio server (speaker, headphone, line) from the options offered. In the third section under Energy Plot Parameters you have the option of changing the energy plot parameters. The options include changing the frame length, window length and the RMS scale factor of the energy plot. You can even enlarge or diminish the size of the energy plot canvas by setting the Canvas Size option to your preference. In the fourth section under Spectrogram Parameters you have the option of setting the brightness and contrast of the spectrogram to any specific value instead of using the slider bars on the main screen. Finally in the last section under Miscellaneous you have the option of preemphasizing the data for the energy plot and spectrogram. You can either set the preemphasis on or off depending on your preference. You can also set the preemphasis coefficient to any desired value. The user also has the option of windowing the data using the standard window functions like Hamming, Hanning, rectangular, Bartlett and Blackman. The Transcriber also has a very nifty auto fill facility which automatically completes the word by hitting the Tab key. However, the auto fill facility will not work if the word to be completed is not in the Lexicon file. Also if a word completion has several possible outcomes, a pop up box will be generated which will list all possible completion for that word. You can then select the desired word from the generated list. The Transcriber can be used not only for speech segmentation and speech transcription but also for viewing signals. In order to view the signal without having to deal with the transcription protocols just click on the the Lock button on the screen. Apart form viewing the signal there are several display options that are available. Some of these displays options include having the ability to zoom into a section of the signal, change the amplitude of the signal and set the brightness and contrast of the spectrogram. |