Automatic Discovery of EEG Cohorts From Clinical Records



Electronic medical records (EMRs) collected at every hospital in the country collectively contain a staggering wealth of biomedical knowledge. This information could be transformative if properly harnessed. Our focus in this research project is the automatic interpretation of a clinical EEG big data resource known as the TUH EEG Corpus (TUH EEG). Clinicians can retrieve relevant EEG signals and EEG reports using standard queries (e.g. “Young patients with focal cerebral dysfunction who were treated with Topamax”).

An important outcome is the existence of an annotated big data archive of EEGs that will greatly increase accessibility for non-experts in neuroscience, bioengineering and medical informatics who would like to study EEG data. The creation of this resource through the development of efficient automated data wrangling techniques demonstrates that a much wider range of big data bioengineering applications are now tractable.