Automatic Interpretation of Digital Pathology Images Using Deep Learning
In this NSF-funded project, we are developing a digital imaging system using big data and machine learning algorithms to automatically characterize pathology slides. We have developed a sustainable facility to rapidly collect automatically annotated slide images. This project has produced the necessary data resources to support the development of high performance deep learning models.
Overview
We had proposed a collaborative and interdisciplinary project to
detect and characterize cancerous cells in digitized images of
pathology slides, while also producing the world's largest catalog
of research-grade digital pathology slides.
The research allowed our uniquely qualified team, which spans the
disciplines of pathology, engineering and computer science, to pursue
center-level investments from NSF and NIH.
Over 10 million slides are read each year in the U.S. alone. Tapping into a fraction of this data allows significant advancement of the science. Healthcare providers and machine learning researchers will be able to access an open source high-quality searchable archive of clinical data. More information on this project can be found here.
Goals
Our initial goal will be to combine image data extracted from a digital slide scanner with text data extracted from medical reports. Pathology reports contain unstructured text data that describe patient histories, medications and diagnoses.
We are developing our database, processing high-resolution digital pathology images, and performing integrated text processing using Visio-and-Language queries.