BAYESIAN DECISION TREE FOR CLASSIFICATION

EVALUATION PARADIGM FOR SURNAME GENERATION PROBLEM

18,494 names and 25,648 manually transcribed pronunciations in the database using Worldbet symbols
Divide the database into train and test sets (3 cuts of train-test pairs)
Context length is used as feature to describe the sound
Build different models using different context lengths
Evaluate these models on the test data and compare the performance of these models against the reference model (manually transcribed pronunciations)