FCCC Digital Pathology Data and Resources (FCCC DPATH)
Corpora:
FCBR |
FCCR |
FCDP
Documentation:
ANNO |
CHAP |
CHPA |
PRES
Instructions:
BROW |
RSYN |
DISK
To request access to the FCCC DPATH corpora, please fill out this form and email a signed copy to help@nedcdata.org. Please include "Download The FCCC DPATH Corpus" in the subject line or click on this link.
Note this is an Adobe Acrobat form, and it is best filled out using Adobe Acrobat or a similarly compatible tool. We suggest you download a copy to your desktop and fill it out using a local app, rather than attempt to complete the form from within a browser.
The form must be filled out correctly or it will be returned to you. Please follow the instructions on the form very carefully, including completing the address information accurately. This is very important and we cannot accept forms with incorrect addresses.
Once your form is accepted, you will receive information about the credentials used to access the data in a separate email, and be added to our listserv. This usually takes about 24 to 48 hours. We need to track who downloads the data. We also want to be able to inform you of any updates to the releases.
Corpora
Once you have successfully registered and transmitted your ssh keys, you can download our corpora using rsync. The path for the most current release is shown with each entry.
-
The FCCC DPATH Breast Tissue Subset (FCBR) [rsync path: data/fccc_dpath/fccc_dpath_breast/v3.0.1]: A 1,463 image subset of the FCCC DPATH Corpus that contains annotated breast tissue samples. A more complete description of the corpus is provided here.
-
The FCCC DPATH Breast Tissue Crystallization Subset (FCCR) [rsync path: data/fccc_dpath/fccc_dpath_crystal/v1.0.0]: A 117 image subset of the FCCC DPATH Corpus that contains annotations of crystallization in tissue samples. A more complete description of the corpus is provided here.
-
The FCCC Digital Pathology Corpus (FCDP) [rsync path: data/fccc_dpath/fccc_dpath/v1.0.0]: A corpus of 14,276 images that is described here.
Documentation
-
Annotation Guidelines (ANNO): A document that describes how we annotate digital pathology images for cancer/no-cancer decisions.
-
Book Chapter (CHAP): A book chapter that describes our digital pathology corpora and research methodologies.
-
Book Chapter (CHPA): A book chapter that describes our digital pathology corpora and research methodologies.
-
Presentation (PRES): A presentation that summarizes our digital pathology research.
Instructions
As of January 2026, our released corpora are now distributed using ssh keys and rsync. The process for this begins with your submission of the above form. Once approved, you will receive instructions on how to transmit your key and access the data.
Rsync, which is available on Linux and Mac platforms, is our preferred way of downloading data. It allows you to easily keep your copy of the data in sync with ours. Windows users can get access to rsync by installing MobaXterm. Some tips on how to install and use MobaXterm are here. Before you attempt to download an entire corpus, you should test your ability to download data by executing this command:
rsync -auvxL -e "ssh -i ~/.ssh/id_ed25519" \
fccc-dpath@www.isip.piconepress.com:data/fccc_dpath/TEST .
This command must be typed on one line in your command line tool.
If for some reason this fails, change "-auxvL" to "-auxvvvL". This will generate a log file that your IT support team can use to diagnose the problems with your downloads.
Note that the "-L" option in rsync instructs it to follow links. All of our corpora are linked back to TUEG. It is best to always use the "-L" option.
If Internet connectivity is a problem, you can send us a 16T USB drive. We will copy the data to this disk and send it to you. You must arrange for postage as described below. If you elect this option, you need to send us a 16T USB drive and provide a UPS or FedEx account number for return shipping.
Please send us a conventional USB-mounted disk drive. We have had problems with other types of media such as thumb drives. Any standard USB-powered USB 2.0 or 3.0 compatible 16T drive will work fine. Because of the time it takes to copy the data, we need a drive that can maintain a stable connection, and thumb drives have proven to be unreliable.
Mail the drive to:
Joseph Picone
1610 Rhawn Street
Philadelphia, PA 19111
Tel: 708-848-2846
Please email us for details before shipping the drive. If you ship us a drive directly from a reseller such as Amazon, please make sure that the shipment contains information that we can use to identify you. This information should include a point of contact (POC), the name of your institution, and contact information (name, surface mail address and telephone number for the POC).
Please note that disk drives sent to international destinations will often get caught in Customs for weeks. Rsync is a much better option than going through your local governments.
If you are having trouble deciding what to do, email us and describe what specific resources in which you are interested. We will be happy to guide you through the process.
NSF MRI: High Performance Digital Pathology Using
Big Data and Machine Learning