Most phone-based systems have 3 state HMM models regardless of the task
Syllable models are typically chosen to be proportional to average duration for that task
The initial model choice is empirical
May have impact on the ability of the trained model to fit the training data