File: nedc_dpath_resnet/v2.0.0/AAREADME.txt
Tool: The NEDC DPath ResNet System
Version: 2.0.0

-------------------------------------------------------------------------------
Change Log:

20231004 (SM): initial version

-------------------------------------------------------------------------------

This tool has been used to annotate various NEDC DPath Breast corpora. To
learn more about the origins of the tool, please see this publication:

 -

A. WHAT'S NEW

Version 2.0.0: 
  + Initial version

B. INSTALLATION REQUIREMENTS

Python code unfortunately often depends on a large number of add-ons, making
it very challenging to port code into new environments. This tool has been
tested extensively on Windows and Mac machines running Python v3.9.x.

Software tools required include:

 o Python 3.7.x or higher (we recommend installing Anaconda)
 o Numpy: http://www.numpy.org/
 o Torchvision: https://pytorch.org/vision/stable
 o Pillow: https://pillow.readthedocs.io/en/stable/
 o Scikit-learn: https://scikit-learn.org/stable/
 o OpenSlide Python: https://openslide.org/api/python/
 o Pandas: https://pandas.pydata.org/
 o Scikit-image: https://scikit-image.org/
 o Skimage: https://pypi.org/project/skimage/
 o Torch: https://pypi.org/project/torch/


There is a requirements.txt included in the release that helps you automate
the process of updating your environment.

C. USER'S GUIDE

C.1. WINDOW USERS

For Window users, we recommend users install Anaconda in order to run
a bash emulator.

Through the Anaconda prompt, create a new environment and specify the proper
python version:

 pip3 create -n <my_environment_name> python=3.9

Install a bash emulator that will allow running the annotation tool:

 pip3 install m2-base

Install the required packages:

 pip3 install numpy
 pip3 install torchvision
 pip3 install pillow
 pip3 install scikit_learn
 pip3 install openslide_python
 pip3 install pandas
 pip3 install scikit_image
 pip3 install skimage
 pip3 install torch

The easiest way to run this is to change your current working directory
to the root directory of the installation and execute the tool as follows:

 $ cd <my_install_location>/nedc_dpath_resnet/v2.0.0
 $ ./bin/nedc_dpath_resnet
 
Once the software has been installed, you need to do the following things if
you want to run this from any directory:

 - set the environment variable NEDC_NFC to the root directory
   of the installation:

    $ export NEDC_NFC='<my_install_location>/nedc_dpath_resnet/v2.0.0'

 - put $NEDC_NFC/bin in your path:

    $ export PATH=$PATH:$NEDC_NFC
 
You should be able to type:

 $ which nedc_dpath_resnet

C.2. LINUX/MAC USERS

For Mac users, since Mac OS X 10.8 comes with Python 2.7, you may 
need to utilize pip3 when attempting to install dependencies:

 pip3 install numpy
 pip3 install torchvision
 pip3 install pillow
 pip3 install scikit_learn
 pip3 install openslide_python
 pip3 install pandas
 pip3 install scikit_image
 pip3 install skimage
 pip3 install torch
 
The easiest way to run this is to change your current working directory
to the root directory of the installation and execute the tool as follows:

 cd <my_install_location>/nedc_dpath_resnet/v2.0.0
 ./bin/nedc_dpath_resnet
 
Once the software has been installed, you need to do the following things if
you want to run this from any directory:

 - set the environment variable NEDC_NFC to the root directory
   of the installation (e.g., <my_install_location>/nedc_dpath_resnet/v2.0.0)

 - put $NEDC_NFC/bin in your path
 
You should be able to type:

 $ which nedc_dpath_resnet

+ Note:

  If you would like to change any bash script in /bin please go to /src/shell,
  make the changes there and type 'make' to move the changed scripts to /bin  

===============================================================================
===============================================================================

These kinds of complete machine learning systems are a bit complicated because
there are a lot of files involved. We have distributed example files with
this release so that you have a working example of the complete system.
Please make sure you run the complete system before you attempt to change
the code.

-------------------------------------------------------------------------------
A. Extracting Patches

Before the data from the Dpath images can be used in training, individual
patches need to be extracted from the image. 

If you would like to extract patch please follow these steps:

1: set up datasets:

  patch extraction and training requires training and development list files, 
  by default these lists are:

  $NEDC_NFC/test/list/train.list
  $NEDC_NFC/test/list/dev.list
  
  if you want to change the name/path of these files please navigate to:
  $NEDC_NFC/src/shell/nedc_dpath_resnet_env.sh

  and change the following variables:

  TRAIN_LIST="...change to wanted filename and path..."
  DEV_LIST="...change to wanted filename and path..."

+ Note:

  The evaluation set of images does not need to have its patches extracted. Only
  the development and training images need to be extracted. This is because the
  training script only considers the evaluation and training sets. The training
  system requires patches to work.

2. customizing extract patch arguments:

  For most of our extract patch arguments we use a parameter file, the arguments
  in this file can be customized to fit your particular dataset, the file is 
  found here:

  $NEDC_NFC/lib/nedc_dpath_resnet_extract__param.txt
  
  Please look through this file and adjust the values to your dataset/wants.
  
  The output patches directory by default is:

  $NEDC_NFC/test/output/patches

  If you want to change the name of the output model please navigate to:
  $NEDC/src/shell/nedc_dpath_resnet_extract_patch.sh
  
  and change the following variable:

  ODIR="...change to wanted directory..."

+ Note:

  The extract patch step can generate hundreds of thousands of files, depending
  on the images. Make sure you have enough room to store the resulting data in 
  the directory you choose to output it to.

3. running extract patch:

  Extract patch can be run using this driver bash script:
  $NEDC_NFC/bin/nedc_dpath_resnet_extract_patch

+ Note:

  The extract patch driver script defaults to extracting patches for both 
  development and training sets in the same process. Unfortunately, the patch 
  extracting program takes roughly an hour to process each image. With the given 
  demo data of 54 images that need to be extracted, the patch extraction process
  should take roughly 2 days. Therefore, it is highly recommended to extract 
  patches in parallel using a workload manager if possible
  
-------------------------------------------------------------------------------

-------------------------------------------------------------------------------
B. Training

If you would like to run training please follow these steps:
  
1. customizing training arguments:

  For most of our training arguments we use a parameter file, the arguments in
  this file can be customized to fit your particular dataset, the file is found
  here:

  $NEDC_NFC/lib/nedc_dpath_resnet_train_param.txt
  
  Please look through this file and adjust the values to your dataset/wants
  
  The output model filename by default is:

  $NEDC_NFC/models/model.pckl

  If you want to change the name of the output model please navigate to:

  $NEDC_NFC/src/shell/nedc_dpath_resnet_env.sh

  and change the following variable:

  MODEL_FILE="...change to wanted filename..."

+ Note:

  The $NEDC_NFC/models directory contains the packaged PyTorch pretrained
  ResNet models. If you want to change the model directory and/or the pretrained
  models, you can adjust the follow variable:

  MODEL_DIR="...change to wanted directory..."

2. running training:

  Training can be run using this driver bash script:
  $NEDC_NFC/bin/nedc_dpath_resnet_train

+ Note:

  We recommend running training on GPU as it is much faster.
  
-------------------------------------------------------------------------------
B. Decoding

If you would like to run decoding please follow these steps:

1: set up datasets:

  Decoding can be run on an evaluation list file that contains paths to SVS
  images:

  $NEDC_NFC/test/list/eval.list

  If you want to change the name/path of these files please navigate to:

  $NEDC_NFC/src/shell/nedc_dpath_resnet_env.sh

  and change the following variables:

  EVAL_LIST="...change to wanted filename and path..."

  If you want to customize which datasets you run decoding on please navigate
  to the following file:

  $NEDC_NFC/src/shell/nedc_dpath_resnet_decode_slide.sh

  and locate the following line of code:

  CMD_ARG="-p $PARAM -m $MODEL_FILE -c $CSV_DIR -k $MASK_DIR $EVAL_LIST"

  You can change and add list files to decode by changing the value of the
  "CMD_ARG" variable:

  -p $PARAM -m $MODEL_FILE -c $CSV_DIR -k $MASK_DIR "...list file/s to decode..."

2. customizing decoding arguments:

  For most of our decoding arguments we use a parameter file, the arguments in
  this file can be customized to fit your particular dataset, the file is found
  here:

  $NEDC_NFC/lib/nedc_dpath_resnet_decode_slide_param.txt
  
  Please look through this file and adjust the values to your dataset/wants
  
  The input model filename by default is:

  $NEDC_NFC/models/model.pckl

  If you want to change the name/path of the model please navigate to:

  $NEDC_NFC/src/shell/nedc_dpath_resnet_env.sh

  and change the following variable:

  MODEL_FILE="...change to wanted filename and path..."  

  Decoding produces CSV, NEDC CSV, and XML files for every SVS input file, 
  the output directory for these files by default is:

  $NEDC_NFC/test/output/decode_slide
  
  If you want to change the output directory please navigate to:

  $NEDC_NFC/src/shell/nedc_dpath_resnet_decode_slide.sh                          
                                                                    
  and change the following variable:

  CSV_DIR="...change to wanted path..."

+ Note:

  The most important output file is the NEDC CSV file. The NEDC CSV file
  contains a header and predictions for every patch in the image. The normal
  CSV file contains the barebones predictions from the model. The normal CSV 
  file is useful if the user wants to post-process or reroute the prediction
  data in their own way.
  
3. running decoding:

  Decoding can be run using this driver bash script:

  $NEDC_NFC/bin/nedc_dpath_resnet_decode_slide

+ Note:

  We recommend running decoding on CPU as running it on GPU will not make
  it run any faster

+ Note:

  You can compare your results with that of those in the directory:

  $NEDC_NFC/test/data/results

  Due to randomization issues notorious in deep learning systems like these,
  the results will not be exactly the same each time. However, the produced
  results should be similiar to those provided.

-------------------------------------------------------------------------------

If you have any additional comments or questions about the data or software,
please direct them to help@nedcdata.org. We will do our best to answer them.
However, this distribution is fairly complex, so familiarity with Linux, shell,
and Python is required.

Best regards,

Joe Picone
