set_01:

set_01:

set_02

set_04: This was taken from the TUh EEG Seizure Corpus. There are roughly
        19,000 training vectors, 2,000 development test set vectors,
        and 1174 evaluation set vectors. The vectors are a single
        feature vector taken from the middle of seizure event.

set_05:

set_06:

set_07:

set_08: This set was generated in Spring 2021 using the IMLD application.
        It is 2D data.

set_09: This set was generated for the tuning experiments. The samples for
	each class in the train and dev sets were collected from five tight
	gaussian equispaced around the contour. For eval set, two such
	gaussians were used.

set_10: This set was generated for the tuning experiments to explore
	generalization. The original data for each set was augmented by
	adding gaussian noise. 

set_11: This is a dataset containing patches from TUDP v1.1.1. Each set
	contains equal number of background tissues and Invasive Ductal
	Carcinoma In-Situ (indc) tissues.

set_12: This dataset contains 10-second background and seizure segments
	in numpy data format. The naming convention:
	00000123_s001_t000_label_start_stop.npy

set_13: This is our first set created with IMLD. It is designed to test a
 	simple Gaussian classifier. There are two classes with some
	overlap.
	
set_14: Created in Spring 2022 for ECE 8527. This is a sequential decoding
        task. 1D signals with random events occurring in files.

set_15: Created in Spring 2024 for ECE 8527. This is a subset of the
	TNMG cardiology dataset. It contains 20,000 eval and dev
	files and 200,000 training files. They are all sampled
	at 300 Hz and 2,200 samples long.

set_16: The dataset is designed for QSVM (Quantum Support Vector Machine)
	experiments, with the periodic nature of the data. It contains 20,000
	total samples (10,000 for training and 10,000 for evaluation). 
	It's a binary classification dataset (labels 0 and 1) with two features,
	where each class is generated using different periodic functions (sine and
	cosine) with added Gaussian noise. Class 0 uses cos(t) for feature 1 and
	sin(2t) for feature 2, while Class 1 uses cos(2t) for feature 1 and sin(t)
	for feature 2. 

set_17:

        I would like to use the DPATH data from TUH DPATH, which
        Claudia has been using, for the ECE 8527 class project. Here
        is what I am thinking. We need to generate three spreadsheets:
        /train, /dev and /eval. We need to use nedc_dpath_gen_feats in
        resize mode to create files that contain one vector per
        annotated region. These vectors should contain the following:
        
        - take 256x256 pixels centered around the center of the patch
        - compute a 2D DCT
        - retain the top 32x32 DCT coefficients
        - do this for each color

        This will give us approximately 3,000 columns per line. There
        are about 10,000 annotated regions. So these files will be
        about 0.5G each (/train will be bigger than the other two
        combined).

        The first column should be the class label, and the remaining
        columns should be the data.

        We should be able to generate this using
        nedc_dpath_gen_feats. Please work together to get this done
        over the next couple of days - it should not take long since
        Claudia already has data like this.