AUTOMATIC INTERPRETATION OF DIGITAL PATHOLOGY IMAGES

Overarching Goal

The overarching goal of the project was threefold. First, we integrated digital imaging into Temple University Hospital’s clinical operations. Then, we established a public database of pathology slides. Lastly, we developed an image classification system to automatically annotate each pathology slide. The result was a fully functional digital pathology imaging system. After the database of pathology slides were annotated using this system, they were archived and used for support, education, and/or research purposes.

Past Goals

  • Integrated digital imaging into Temple University Hospital's clinical operations.
  • Created a database of pathology slides.
  • Developed an image classification system that automatically annotates each slide.
  • As a result, a fully functional digital pathology imaging system was developed.

Current Goals

  • Combine image data extracted from a digital slide scanner with text data extracted from medical reports.
  • Create and maintain a pathology knowledge graph.
  • Linking images to text in reports.
  • Create a Visio-and-Language Retrieval System.

Future Goals

  • Create a retrieval system that works in a traditional fashion where a user supplies her query via mouse or keyboard. Eventually, we hope to create a conversational-based retrieval system.
  • We plan to release more than one million images.

Anticipated Outcomes

First, the open-source unencumbered corpus that we will develop will be released to the community to support machine learning research. The Neural Engineering Data Consortium (NEDC) has been in existence since 2012 and is known internationally for distributing the world’s largest open-source corpus, the TUH EEG Corpus (TUEG), to support machine learning research in electroencephalography (EEG).This corpus currently has over 4,000 subscribers.

Second, we will develop two robust systems: (1) an image processing system that rapidly flags tissue samples that are cancerous or abnormal, and (2) a text processing system that identifies key medical concepts and relationships in reports and integrates this information along with events identified in the images into a searchable archive.

Third, we will develop an interactive tool that pathologists can use to query the database using natural language queries and form-filling. The ability to retrieve slides from a flexible query interface is very important to clinicopathologic correlative studies, which are critical for the study of the clinical pathologic features, treatment response and prognosis of uncommon diseases. The existing query methods in most pathology software are inefficient.

Timeline