Digital Pathology Data and Resources
Corpora:
TUDP
Software:
IMGV |
MLTL
Documentation:
Coming Soon...
What's New:
2022 |
2021 |
2020 |
2019 |
2018 |
2017
The TUDP Corpus is freely available. The only reason we require
registration is that we need to track who downloads the data. We also
want to be able to inform you of any updates to the releases.
To request access to the TUDP Corpus, please fill out
this form.
You will receive an automatically-generated username and password
via email. Data collected is unencumbered and can be used for both
research and commercialization purposes.
Once you have obtained the username and password, you can selectively
download portions of the corpus using your browser. You can also
use our anonymous rsync service to download the data.
Because of the size of the files, it is likely that your
Internet connection may crash. Therefore, we recommend you wrap
rsync with a script that automatically restarts rsync. This will
allow your download to continue until it is finished. Please
download this script named
nedc_rsync.sh
and modify it accordingly to download the data you want to retrieve.
The wonderful utility rsync is our preferred way to download our
resources. Rsync is available on Linux and Mac platforms.
It allows you to easily keep your copy of the data in sync with ours.
Windows users can get access to rsync by
installing MobaXterm.
Rsync is a standard part of most Mac and Linux distributions.
A typical
rsync
command to download a specific release (e.g., v2.0.0) of a
specific corpus (TUDP) is:
rsync -auxvL nedc@www.isip.piconepress.com:data/dpath/tuh_dpath/v2.0.0/ .
Note that the "." at the end of this command is very important
since it denotes the destination directory. Without a destination
directory specification, the command will not transfer any data.
The username and password are the same as what you use to access
the web-based version of these resources. If you do not have
the username and password, register by filling out
this form
and you will receive this information automatically by email.
Note that the "-L" option in rsync instructs it to follow links.
If you are downloading the entire suite of corpora, you do not
need to use "-L". If you are downloading only one corpus,
you need to use "-L".
If Internet connectivity is a problem, you can send us a 4T USB drive.
We will copy the data to this disk and send it to you. You
must arrange for postage as described below.
If you elect this option, you need to send
us a 4T USB drive and provide a UPS or FedEx account number
for return shipping.
Please send us a conventional USB-mounted
disk drive. We have had problems with other types of media
such as thumb drives. Any standard USB-powered USB 2.0 compatible
4T drive, such as a
Western Digital
or
Seagate,
will work fine. Because of the time it takes top copy the data, we
need a drive that can maintain a stable connection, and thumb
drives have proven to be unreliable.
Mail the drive to:
Joseph Picone
1610 Rhawn Street
Philadelphia, PA 19111
Tel: 708-848-2846
Please
email us
for details before shipping the drive. If you ship us a drive
directly from a reseller such as Amazon, please make sure
that the shipment contains information that we can use to
identify you. This information should include a point of
contact (POC), the name of your institution, and contact
information (name, surface mail address and
telephone number for the POC).
If you are having trouble deciding what to do,
email us
and describe what specific resources in which you are interested.
We will be happy to guide you through the process.
Corpora
-
The Temple University Digital Pathology Corpus (TUDP): Contains 3,505 breast tissue slides that have been partially annotated. More information about this work can be found here.
Software
-
Aperio ImageScope (IMGV): A tool that allows visualization and annotation of pathology images stored in a variety of formats including svs.
-
ISIP Image Classification (MLTL): Coming soon...
Documentation
-
Coming soon...
What's New
-
-
(20210617) NEDC Annotator (v5.0.2): This version now adds support for csv and xml file formats for annotations.
-
-
-
(20190323) NEDC TUH EEG Seizure (v1.5.0): This release includes the expansion of the training dataset from 1,984 files to 4,597. Calibration sequences of the new data have been manually annotated and added to the seizure spreadsheet. Annotation corrections were made to the files already existing in the training set.
-