Project Focus
Project Leaders
1. Joseph Picone, PhD; Iyad Obeid, PhD
Neural Engineering Data Consortium
College of Engineering, Temple University
Philadelphia, Pennsylvania, U.S.A.
2. Sanda M. Harabagiu, PhD
Human Language Technology Research Institute
University of Texas at Dallas
Dallas, Texas, U.S.A.
Electronic medical records (EMRs) collected at every hospital in the country collectively contain a staggering wealth of biomedical knowledge. EMRs can include unstructured text, temporally constrained measurements (e.g., vital signs), multichannel signal data (e.g., EEGs), and image data (e.g., MRIs). This information could be transformative if properly harnessed. Information about patient medical problems, treatments, and clinical course is essential for conducting comparative effectiveness research. Uncovering clinical knowledge that enables comparative research is the primary goal of this research.
Our focus in this research project is the automatic interpretation of a
clinical EEG big data resource known as the TUH EEG Corpus (TUH EEG).
This corpus was collected over 14 years at Temple University Hospital
and consists of over 28,000 sessions and 15,000 patients. Clinicians will
be able to retrieve relevant EEG signals and EEG reports using standard
queries (e.g. “Young patients with focal cerebral dysfunction who were
treated with Topamax”). We will automatically annotate EEG events that
contribute to a diagnosis. Automated techniques are used to discover and
time-align the underlying EEG events using semi-supervised learning.
Clinical concepts, their type, polarity and modality are being discovered
automatically, as well as spatial and temporal information. In addition,
we are extracting the medical concepts describing the clinical picture
of patients from the EEG reports. We are developing a patient cohort
retrieval system that will operate on the extracted clinical knowledge.
An important outcome of this research will be the existence of an
annotated big data archive of EEGs that will greatly increase
accessibility for non-experts in neuroscience, bioengineering and medical
informatics who would like to study EEG data. The creation of this
resource through the development of efficient automated data wrangling
techniques will demonstrate that a much wider range of big data
bioengineering applications are now tractable.