NSF MRI: High Performance Digital Pathology Using Big Data and Machine Learning

Digital Pathology Data and Resources


Corpora: TUDP
Software: IMGV | MLTL
Documentation: Coming Soon...
What's New: 2022 | 2021 | 2020 | 2019 | 2018 | 2017

The TUDP Corpus is freely available. The only reason we require registration is that we need to track who downloads the data. We also want to be able to inform you of any updates to the releases.

To request access to the TUDP Corpus, please fill out this form. You will receive an automatically-generated username and password via email. Data collected is unencumbered and can be used for both research and commercialization purposes.

Once you have obtained the username and password, you can selectively download portions of the corpus using your browser. You can also use our anonymous rsync service to download the data.

Because of the size of the files, it is likely that your Internet connection may crash. Therefore, we recommend you wrap rsync with a script that automatically restarts rsync. This will allow your download to continue until it is finished. Please download this script named nedc_rsync.sh and modify it accordingly to download the data you want to retrieve.

The wonderful utility rsync is our preferred way to download our resources. Rsync is available on Linux and Mac platforms. It allows you to easily keep your copy of the data in sync with ours. Windows users can get access to rsync by installing MobaXterm. Rsync is a standard part of most Mac and Linux distributions. A typical rsync command to download a specific release (e.g., v2.0.0) of a specific corpus (TUDP) is:

      rsync -auxvL nedc@www.isip.piconepress.com:data/dpath/tuh_dpath/v2.0.0/ .

Note that the "." at the end of this command is very important since it denotes the destination directory. Without a destination directory specification, the command will not transfer any data.

The username and password are the same as what you use to access the web-based version of these resources. If you do not have the username and password, register by filling out this form and you will receive this information automatically by email.

Note that the "-L" option in rsync instructs it to follow links. If you are downloading the entire suite of corpora, you do not need to use "-L". If you are downloading only one corpus, you need to use "-L".

If Internet connectivity is a problem, you can send us a 4T USB drive. We will copy the data to this disk and send it to you. You must arrange for postage as described below. If you elect this option, you need to send us a 4T USB drive and provide a UPS or FedEx account number for return shipping.

Please send us a conventional USB-mounted disk drive. We have had problems with other types of media such as thumb drives. Any standard USB-powered USB 2.0 compatible 4T drive, such as a Western Digital or Seagate, will work fine. Because of the time it takes top copy the data, we need a drive that can maintain a stable connection, and thumb drives have proven to be unreliable.

Mail the drive to:

      Joseph Picone
      1610 Rhawn Street
      Philadelphia, PA 19111
      Tel: 708-848-2846

Please email us for details before shipping the drive. If you ship us a drive directly from a reseller such as Amazon, please make sure that the shipment contains information that we can use to identify you. This information should include a point of contact (POC), the name of your institution, and contact information (name, surface mail address and telephone number for the POC).

If you are having trouble deciding what to do, email us and describe what specific resources in which you are interested. We will be happy to guide you through the process.


Corpora



Software



Documentation

  • Coming soon...



What's New

  • 2021:

  • 2019:

    • (20190323) NEDC TUH EEG Seizure (v1.5.0): This release includes the expansion of the training dataset from 1,984 files to 4,597. Calibration sequences of the new data have been manually annotated and added to the seizure spreadsheet. Annotation corrections were made to the files already existing in the training set.