|   | 
 
  
   
   A:
   
    
     
     
     
      | 
       
       acoustic model
       
       | 
      
       
       model used by a speech recognizer for decoding language spoken
       by a person and modeling numerically how the language
       sounds when spoken in a form that can be stored on a computer.
       
       | 
      
     
     
     
      | 
       
       annotation graph
       
       | 
      
       
       a formal framework for representing linguistic annotations of time series 
       data. Annotation graphs abstract away from file formats, coding schemes 
       and user interfaces, providing a logical layer for annotation systems.
       
       | 
      
     
   
   | 
 
 
 
 |   | 
 
  
   
   B:
   
    
     
     
     
      | 
       
       Bayes' Rule
       
       | 
      
       
       an equation that expresses a decomposes a posterior probability
       (e.g., P(W/A)) into the product of a conditional likelihood
       (e.g., P(A/W)) and a prior (e.g., P(W)) divided by a likelihood
       (e.g., P(A)). From a high-level point of view, this rule provides
       a way to combine new data with existing knowledge. It also
       provides a theory for learning or training intelligent systems.
       
       | 
      
     
     
     
      | 
       
       best-first search
       
       | 
      
       
       search algorithm that uses an evaluation function, h(N), to indicate the
       relative goodness of pursuing a node. It evaluates hypotheses as they
       evolve.  
       
       | 
      
     
     
     
      | 
       
       big-endian byte order
       
       | 
      
       
       byte order where the first byte (at the lowest storage address) in a
       sequence is the most significant.
       
       | 
      
     
     
     
      | 
       
       biphone
       
       | 
      
       
       contained in context dependent models.  It models left or right
       context. 
       
       | 
      
     
     
     
      | 
       
       breadth-first search
       
       | 
      
       
       search algorithm that explores all alternatives simultaneously
       level-by-level. 
       
       | 
      
     
   
   | 
 
 
 
 |   | 
 
  
   
   C:
   
    
     
     
     
      | 
       
       cepstral mean subtraction
       
       | 
      
       
       technique addressing distortions.  It subtracts the mean cepstral
       value from each feature vector and then produces a normalized cepstrum
       vector which can better capture the acoustics where recognition occurs.
       
       | 
      
     
     
     
      | 
       
       cepstrum
       
       | 
      
       
       transforms the log-spectrum of the speech signal, thus simulating human
       hearing above certain frequencies. 
       
       | 
      
     
     
     
      | 
       
       client
       
       | 
      
       
       computer or program that can download files for manipulation, run
       applications, or request application-based services from a file
       server.  
       
       | 
      
     
     
     
      | 
       
       clustering
       
       | 
      
       
       model parameters are initialized using sufficient statistics estimated
       from different regions of the training data.  
       
       | 
      
     
     
     
      | 
       
       coarticulation
       
       | 
      
       
       Variation in a phenome due to the
       influence of neighboring phenomes.  
       
       | 
      
     
     
     
      | 
       
       Concurrent Versions System (CVS)
       
       | 
      
       
       open-source network-transparent
       version control system.  
       
       | 
      
     
     
     
      | 
       
       confusion pairs
       
       | 
      
       
       a pair of words which have been identified by the
       scoring software as a word likely to be misrecognized
       (first word in the pai) by another word (second word in
       the pair). Confusion words represent diagnostic information
       that can be used to improve the performance of the system.
       
       | 
      
     
     
     
      | 
       
       context dependent model
       
       | 
      
       
       phone model that takes into account the phenomenon of
       coarticulation because a phone may be voiced differently depending on
       the other phones surrounding it.  
       
       | 
      
     
     
     
      | 
       
       context free grammar
       
       | 
      
       
       grammars that allow production rules, which have only non-terminal
       symbols on the right-hand side, to increase their power and
       flexibility beyond grammars, but require a push-down automata.  
       
       | 
      
     
     
     
      | 
       
       context independent model
       
       | 
      
       
       phone models that do not consider the influence of surrounding
       phonemes on the pronunciation of a given phoneme.
       
       | 
      
     
     
     
      | 
       
       context sensitive grammar
       
       | 
      
       
       grammars that allow production rules, which have terminal symbols on the
       left-hand side and the right-hand side, to represent the context
       of a word more specifically than lower levels,
       but require a more powerful automata
       to recognize sentences in the language.  
       
       | 
      
     
     
     
      | 
       
       continuous speech recognition
       
       | 
      
       
       sequences of words that are not separated by a pause when spoken.
       
       | 
      
     
     
     
      | 
       
       cross-validation
       
       | 
      
       
       allows using the training database to validate recognition performance.
       One common method is called V-fold which divides the database into V
       equal parts.  Each part serves as an independent test set, leaving the
       remaining (V-1) parts for training.  This method is often used when the
       training data is limited.
       
       | 
      
     
     
     
      | 
       
       cross-word
       
       | 
      
       
       triphone models that extend across word boundaries.
       
       | 
      
     
   
   | 
 
 
 
 |   | 
 
  
   
   D:
   
    
     
     
     
      | 
       
       decision trees
       
       | 
      
       
       binary tree to classify target objects by asking binary questions
       in a hierarchical manner.  
       
       | 
      
     
     
     
      | 
       
       decoder
       
       | 
      
       
       also known as a recognizer or evaluation. This module implements
       Bayes' rule and produces the most probable word sequence. It
       can be implemented in many different ways including Viterbi beam
       search and a stack search.
       
       | 
      
     
     
     
      | 
       
       deletion
       
       | 
      
       
       the type of error in which the recognizer's hypothesis doesn
       not contain a word in the reference transcription.
       The frequency of deletion and insertion errors can be controlled by an
       insertion penalty.
       
       | 
      
     
     
     
      | 
       
       depth-first search
       
       | 
      
       
       search algorithm that explores a single path until it reaches its
       conclusion.  
       
       | 
      
     
     
     
      | 
       
       digital signal processing (DSP)
       
       | 
      
       
       analysis of signals in digital form to obtain useful
       information.  For speech signals, the information extracted includes
       attributes needed by the speech recognizer.  It is also known as
       feature extraction or front-end processing.  
       
       | 
      
     
     
     
      | 
       
       downsample
       
       | 
      
       
       reducing the sample rate of a digital sound file. 
       
       | 
      
     
   
   | 
 
 
 
 |   | 
 
  
   
   E:
   
    
     
     
     
      | 
       
       energy
       
       | 
      
       
       an attribute of a signal that corresponds to the magnitude of the
       speech signal.
       
       | 
      
     
     
     
      | 
       
       energy normalization
       
       | 
      
       
       technique addressing normalization where energy is computed as the log
       of the signal energy.  
       
       | 
      
     
     
     
      | 
       
       enlistment
       
       | 
      
       
       a programmer's copy of the source code development environment
       which is used to modify and debug code. Often this is stored in a
       user's local environment, and not readily accessible by other software
       developers.
       
       | 
      
     
   
   | 
 
 
 
 |   | 
 
  
   
   F:
   
    
     
     
     
      | 
       
       Fast Fourier Transform
       
       | 
      
       
       computationally fast method to compute a Fourier Transform.  
       
       | 
      
     
     
     
      | 
       
       feature
       
       | 
      
       
       attribute of speech needed by the recognizer to differentiate words and
       phonemes.   
       
       | 
      
     
     
     
      | 
       
       feature extraction
       
       | 
      
       
       process of measuring certain attributes of speech needed by the speech
       recognizer to differentiate phonemes of a word.  It is also known as
       front-end processing and signal processing.  
       
       | 
      
     
     
     
      | 
       
       feature stream
       
       | 
      
       
       the speech signal can be decomposed into a sequence of feature vectors, 
       typically spaced 10 ms in time, that represent a parameterization of the 
       salient information in the signal.
       
       | 
      
     
     
     
      | 
       
       feature vector
       
       | 
      
       
       list of numerical measurements of speech attributes.
       
       | 
      
     
     
     
      | 
       
       finite impulse response
       
       | 
      
       
       a digital filter consisting of a transfer function that has only zeroes, 
       thereby creating an impulse response that is finite in duration. 
       
       | 
      
     
     
     
      | 
       
       finite state machine
       
       | 
      
       
       machine giving the probabilities of being in a state at
       a particular time in the past, based on direct observation. 
       
       | 
      
     
     
     
      | 
       
       flat-start
       
       | 
      
       
       simple and effective technique used to initialize an acoustic model.  It
       computes the global mean variance from the training data and sets
       the model parameters to these values.  
       
       | 
      
     
     
     
      | 
       
       foundation classes
       
       | 
      
       
       a hierarchy of classes that provide a rich programming environment 
       loaded with useful classes such as I/O, vectors, matrices, data 
       structures, and algorithms.
       
       | 
      
     
     
     
      | 
       
       Fourier Transform
       
       | 
      
       
       extracts the frequency components of a signal in the time domain.  
       
       | 
      
     
     
     
      | 
       
       frame
       
       | 
      
       
       interval over which features are measured. 
       
       | 
      
     
     
     
      | 
       
       frequency domain
       
       | 
      
       
       characteristics of a digital signal pertaining to frequency spectrum.
       
       | 
      
     
     
     
      | 
       
       front-end processing
       
       | 
      
       
       algorithms applied to extract features needed by the speech recognizer;
       also known as feature extraction and signal processing.
       
       | 
      
     
     
     
      | 
       
       fully qualified filename
       
       | 
      
       
       a filename that includes the complete path to the file
       (e.g., "/home/jdoe/foo.text" is a fully qualified version
       of "foo.text").
       
       | 
      
     
   
   | 
 
 
 
 |   | 
 
  
   
   G:
   
    
     
     
     
      | 
       
       GNU General Public License
       
       | 
      
       
       intended to guarantee your freedom to share and change free software.
       
       | 
      
     
     
     
      | 
       
       GUI Graphical User Interface
       
       | 
      
       
       creates and configures the speech input format, the algorithms for
       extracting features, and the output format in a signal flow graph.
       
       | 
      
     
     
     
      | 
       
       
       Gaussian mixture model
       
       | 
      
       
       a statistical model in which the overall probability
       distribution is synthesized from a weighted sum of
       individual Gaussian distributions. This is a very
       powerful form of statistical modeling since arbitrarily
       complex distributions can be approximated with
       a parametrically controlled amount of precision.
       
       | 
      
     
   
   | 
 
 
 
 |   | 
 
  
   
   H:
   
    
     
     
     
      | 
       
       GUI Graphical User Interface
       
       | 
      
       
       creates and configures the speech input format, the algorithms for
       extracting features, and the output format in a signal flow graph.
       
       | 
      
     
     
     
      | 
       
       header
       
       | 
      
       
       information used to determine the file format and the specific details
       about that file.
       
       | 
      
     
     
     
      | 
       
       Hidden Markov Models
       
       | 
      
       
       statistical technique yielding the statistical likelihood that a
       particular sound was produced given a known word was spoken.  The
       models are based on a Markov Chain which describes a sequence of
       random variables, each conditionally dependent on the
       previous variable.  
       
       | 
      
     
     
     
      | 
       
       HMM trellis
       
       | 
      
       
       physical representation of the hypothesis space as it unfolds in time.
       
       | 
      
     
   
   | 
 
 
 
 |   | 
 
  
   
   I:
   
    
     
     
     
      | 
       
       initialization
       
       | 
      
       
       process that sets the model parameters to some initial values
       before training the acoustic models.  It also facilitates convergence
       on a solution more quickly.
       
       | 
      
     
     
     
      | 
       
       insertion
       
       | 
      
       
       the type of error in which the recognizer produces a word (or symbol)
       hypothesis that does not correspond to any word in the reference
       transcription. Insertion errors often occur when the recognizer
       outputs two symbols that correspond to one symbol. One of these
       symbols will be tagged as a substitution error when non-time-aligned
       scoring is used, and the other will be tagged as an insertion error.
       Insertion errors are also common when noise is mistakenly recognized
       as speech.
       
       | 
      
     
     
     
      | 
       
       isolated word recognition
       
       | 
      
       
       occurs when the speaker must pause after each word spoken. 
       
       | 
      
     
   
   | 
 
 
 
 |   | 
 
  
   
   J:
   
    
     
     
     
      | 
       
       Java Speech Grammar Format
       
       | 
      
       
       The JavaTM Speech 
       Grammar Format is a platform-independent, vendor-independent 
       textual representation of grammars for use in speech 
       recognition. Grammars are used by speech recognizers 
       to determine what the recognizer should listen for, 
       and so describe the utterances a user may say. JSGF 
       adopts the style and conventions of the Java programming 
       language in addition to use of traditional grammar 
       notations. 
       
       | 
      
     
   
   | 
 
 
 
 |   | 
 
  | 
   
   K:
   
    
   
   | 
 
 
 
 |   | 
 
  
   
   L:
   
    
     
     
     
      | 
       
       language model
       
       | 
      
       
       specifies the order in which words are likely to occur  
       
       | 
      
     
     
     
      | 
       
       Large Vocabulary Speech Recognition (LVSR)
       
       | 
      
       
       system in English that contains a list of words and phones.  It
       typically uses phone models because of its vast vocabulary.  
       
       | 
      
     
     
     
      | 
       
       lexicon
       
       | 
      
       
       a list of the words that can be recognized by a speech recognition 
       system along with their pronunciation, or expansion into some 
       fundamental set of units corresponding to the acoustic models
       
       | 
      
     
     
     
      | 
       
       linear prediction
       
       | 
      
       
       a mathematical operation where future values of a discrete-time signal 
       are estimated as a linear function of previous samples
       
       | 
      
     
     
     
      | 
       
       little-endian byte order
       
       | 
      
       
       byte order where the first byte (at the lowest storage address) in a
       sequence is the least significant.
       
       | 
      
     
     
     
      | 
       
       log-spectrum
       
       | 
      
       
       used to simulate the way humans hear sounds above certain frequencies.
       
       | 
      
     
     
     
      | 
       
       low pass filter
       
       | 
      
       
       filter that removes high frequency signals and allows low frequency
       signals to pass.
       
       | 
      
     
   
   | 
 
 
 
 |   | 
 
  
   
   M:
   
    
     
     
     
      | 
       
       Mel-Frequency Cepstrum Coefficients (MFCC)
       
       | 
      
       
       method that analyzes how the Fourier transform extracts frequency
       components of a signal in the time-domain.  
       
       | 
      
     
     
     
      | 
       
       mixture splitting
       
       | 
      
       
       process of splitting an existing mixture into N other mixtures based
       on some technique or algorithm (clustering, variance splitting, etc...).
       
       | 
      
     
   
   | 
 
 
 
 |   | 
 
  
   
   N:
   
    
     
     
     
      | 
       
       N-gram
       
       | 
      
       
       finite number of previous words used to predict a set of next words.
       
       | 
      
     
     
     
      | 
       
       NIST
       
       | 
      
       
       National Institute of Standards and Technology.
       An organization that conducts third-party evaluation of
       human language technology. The
       < a href="http://www.nist.gov/speech/">NIST web site
       contains many useful resources including a scoring package called
       SCLITE
       that provides an industry-standard means for scoring speech
       recognition results.
       
       | 
      
     
     
     
      | 
       
       natural byte order
       
       | 
      
       
       byte order native to a certain system.
       
       | 
      
     
     
     
      | 
       
       natural language processing
       
       | 
      
       
       provides a source of knowledge needed by the recognizer or language
       model.
       
       | 
      
     
     
     
      | 
       
       Network
       
       | 
      
       
       computes next words from a probable path through a finite state network.
       It decodes the set of next words.  
       
       | 
      
     
   
   | 
 
 
 
 |   | 
 
  | 
   
   O:
   
    
   
   | 
 
 
 
 |   | 
 
  
   
   P:
   
    
     
     
     
      | 
       
       parse
       
       | 
      
       
       refers to the problem of determining if a given sequence could have been
       generated from a given state machine.  
       
       | 
      
     
     
     
      | 
       
       phone model
       
       | 
      
       
       model that obtains certain phonemes in order to create a complete model
       of a word.  
       
       | 
      
     
     
     
      | 
       
       phoneme
       
       | 
      
       
       any of the small units of speech sound in a language that assists to
       distinguish one word from another; (Ex. The phoneme, aa, is the a
       sound in father, and the phoneme, jh, is the j sound in joy.) 
       
       | 
      
     
     
     
      | 
       
       pipe
       
       | 
      
       
       SunOs allows you to send the output of one program to another 
       program.  Use the "|" (pipe) character to do this.  (Ex.  command1 | 
       command2  sends the output of the program command1 to command2.)
       
       | 
      
     
     
     
      | 
       
       pruning
       
       | 
      
       
       process that removes unlikely paths from consideration and saves
       resource usage in both memory and time.  
       
       | 
      
     
   
   | 
 
 
 
 |   | 
 
  | 
   
   Q:
   
    
   
   | 
 
 
 
 |   | 
 
  
   
   R:
   
    
     
     
     
      | 
       
       RAW file
       
       | 
      
       
       binary audio data in big endian or
       little endian byte order.
       
       | 
      
     
     
     
      | 
       
       Revision Control System (RCS)
       
       | 
      
       
       manages multiple 
       
       revisions of files.
       
       | 
      
     
     
     
      | 
       
       recipe
       
       | 
      
       
       single entity that stores information from each component
       within a signal flow graph. 
       
       | 
      
     
     
     
      | 
       
       recognition error
       
       | 
      
       
       an error of a speech recognition system.
       
       | 
      
     
     
     
      | 
       
       reestimation
       
       | 
      
       
       phase in training known as the refinement process that begins after the
       acoustic models have been seeded with initial values.  It applies
       special algorithms to reestimate the model parameters until
       convergence occurs.
       
       | 
      
     
     
     
      | 
       
       
       reference transcription
       
       | 
      
       
       To evaluate a speech recognition system, the output
       hypothesis must be compared to the "answer", known as
       the reference transcription.
       
       | 
      
     
     
     
      | 
       
       regular expressions
       
       | 
      
       
       a language that is used to describe patterns to be matched when
       searching over large repositories of data. Regular expressions
       are the backbone of the UNIX operating system, and supported by
       tools such as egrep and bash. Several publicly available portable
       libraries exist to support such interfaces.
       
       | 
      
     
     
     
      | 
       
       regular grammar
       
       | 
      
       
       grammars that requires every production rule contain at least one
       terminal on the right-hand side.  
       
       | 
      
     
     
     
      | 
       
       repository
       
       | 
      
       
       an archive on our server that is used by the configuration
       management software to maintain all versions of our software.
       Specific versions of the software, including the most recent version
       under development, can be retrieved using the configuration management
       software.
       
       | 
      
     
     
     
      | 
       
       rsh
       
       | 
      
       
       command that lets you execute another command on a remote system
       and get the output back to your local system. 
       
       | 
      
     
   
   | 
 
 
 
 |   | 
 
  
   
   S:
   
    
     
     
     
      | 
       
       sample
       
       | 
      
       
       a digitized audio segment taken from an original recording and
       inserted, often repetitively, in a new digital recording.  
       
       | 
      
     
     
     
      | 
       
       scalar classes
       
       | 
      
       
       the classes that represent the fundamental math building blocks of the 
       IFC environment. These classes perform the same functions as 
       their counterparts in standard C/C++ programming languages
       (int, short, long, float, double, ...), but add the ability to
       read and write themselves to an object-oriented file system
       known as Signal Object File (Sof).
       
       | 
      
     
     
     
      | 
       
       scoring
       
       | 
      
       
       the process by which a recognition system's output is
       compared to reference transcriptions containing the "correct"
       answer. Errors are tabulated and presented in a format
       that can help a user understand the deficiencies of the system.
       NIST distributes a scoring package that is widely used
       within the community.
       
       | 
      
     
     
     
      | 
       
       server
       
       | 
      
       
       computer that provides client stations with access to files and printers
       as shared resources to a computer network. 
       
       | 
      
     
     
     
      | 
       
       signal flow graph
       
       | 
      
       
       graphical representation of an input source receiving a signal, passing
       the signal to algorithms for processing, and producing an output with
       data from the signal. 
       
       | 
      
     
     
     
      | 
       
       signal modeling
       
       | 
      
       
       process of representing a signal based on some defined model that is
       useful to the system
       
       | 
      
     
     
     
      | 
       
       smoothing
       
       | 
      
       
       SRI tool used during training to provide users a way of generating
       broader language models.  It allows all words sequences to occur with
       some probability. 
       
       | 
      
     
     
     
      | 
       
       Sof
       
       | 
      
       
       (Signal Object File)
       ISIP's internal format for storing any type
       of C++ data. The file is essentially an indexing scheme that
       keeps track of the locations of all object in the file.
       Files can be stored in a binary or text format.
       
       | 
      
     
     
     
      | 
       
       spectrogram
       
       | 
      
       
       visual display of vocal frequencies measured over some window of time.
       
       | 
      
     
     
     
      | 
       
       speech file
       
       | 
      
       
       any sound file containing human speech
       
       | 
      
     
     
     
      | 
       
       speech recognizer
       
       | 
      
       
       computer program that attempts to decode digital speech.  
       
       | 
      
     
     
     
      | 
       
       stack search
       
       | 
      
       
       A depth-first approach to search in speech recognition.
       Extremely useful for N-best hypothesis generation.
       
       | 
      
     
     
     
      | 
       
       state-tying
       
       | 
      
       
       occurs when phones are in similar states.  The states are tied together
       because of the sparsity of training data.  It reduces system complexity
       and allows synthesis of unseen models. 
       
       | 
      
     
     
     
      | 
       
       substitution
       
       | 
      
       
       the type of error in which a word in the reference transcription
       is replaced by an incorrect word in the recgonizer's hypothesis.
       Technically, the start and stop times of the hypothesis must overlap
       with the referene string for an error to be counted as a substitution.
       
       | 
      
     
   
   | 
 
 
 
 |   | 
 
  
   
   T:
   
    
     
     
     
      | 
       
       tar file
       
       | 
      
       
       an archive file format that is used to store many files and directories
       in an efficient and portable manner. The acronym tar represents
       tape archive file, and dates back to the use of magnetic
       tape. Today, tar is one of the most common mechanisms used to transfer
       groups of files and directories from one machine to another.
       
       | 
      
     
     
     
      | 
       
       time domain
       
       | 
      
       
       characteristics of a digital signal pertaining to its change over time.
       
       | 
      
     
     
      | 
       
       time-synchronous
       
       | 
      
       
       refers to an approach in the speech recognition search process in 
       which all active hypotheses are extended one frame at a time as 
       each new feature vector arrives.
       
       | 
      
     
     
     
      | 
       
       training
       
       | 
      
       
       process that wants to converge on a solution yielding the most likely
       sequence of vectors for a given acoustic unit.  
       
       | 
      
     
     
     
      | 
       
       transcription error
       
       | 
      
       
       may be the error of a human transcriber or the error of a computer
       transcribe. 
       
       | 
      
     
     
     
      | 
       
       triphone
       
       | 
      
       
       contained in context dependent models.  It models left and right
       context.
       
       | 
      
     
   
   | 
 
 
 
 |   | 
 
  | 
   
   U:
   
    
   
   | 
 
 
 
 |   | 
 
  
   
   V:
   
    
     
     
     
      | 
       
       variance splitting
       
       | 
      
       
       technique for successively splitting each mixture component in the
       distribution until the desired number of mixture components have been
       created.  It will preserve the variance of the distribution while
       shifting the mean of the distribution by some factor of standard
       deviation.  
       
       | 
      
     
     
     
      | 
       
       Viterbi beam search
       
       | 
      
       
       a suboptimal search algorithm based on the principle of dynamic
       programming in which the most promising hypotheses are maintained,
       and other hypotheses are discarded. The term "beam" is used
       because the analogy can be made with how you search around a dark
       room with a flashlight. The name Viterbi is used because this
       search approach is similar to Viterbi decoding, which is a special
       case of dynamic programming pioneered in communication
       systems. A Viterbi beam search is essentially a breadth-first
       suboptimal search in which only the most promising candidates
       are pursued. Several thresholds on the overall likelihoods of
       the hypotheses are applied to select the most promising candidates.
       
       | 
      
     
   
   | 
 
 
 
 |   | 
 
  
   
   W:
   
    
     
     
     
      | 
       
       WAV
       
       | 
      
       
       Microsoft WAV format, a format used in the Microsoft Windows
       operating system to store audio files. Support for WAV is provided
       through Silicon Graphics'
       Audio File Library.
       
       | 
      
     
     
     
      | 
       
       waveform
       
       | 
      
       
       a mathematical and visual representation of an analog wave, usually a
       graph obtained by plotting a characteristic of the wave against time.
       
       | 
      
     
     
     
      | 
       
       window
       
       | 
      
       
       a collection of samples surrounding a frame which takes the feature
       measurements and conveys a smoother representation of the speech data.
       
       | 
      
     
     
     
      | 
       
       Word Error Rate (WER)
       
       | 
      
       
       a measure of the accuracy of a speech recognition system that tabulates
       three types of errors: substitutions, deletions and insertions. This is
       typically computed using a standard set of tools provided by NIST.
       
       | 
      
     
     
     
      | 
       
       word-internal
       
       | 
      
       
       a triphone model that remains within word boundaries.  
       
       | 
      
     
     
     
      | 
       
       word model
       
       | 
      
       
       a model for each of the phonemes produced for an entire word.
       
       | 
      
     
   
   | 
 
 
 
 |   | 
 
  | 
   
   X:
   
    
   
   | 
 
 
 
 |   | 
 
  | 
   
   Y:
   
    
   
   | 
 
 
 
 |   | 
 
  | 
   
   Z:
   
    
   
   |