2430 conversations 500 speakers 240 hours of data Over 3 million words of text Initial transcriptions done at TI Linguistic and acoustic segmentations tried previously