set_01: set_01: set_02 set_04: This was taken from the TUh EEG Seizure Corpus. There are roughly 19,000 training vectors, 2,000 development test set vectors, and 1174 evaluation set vectors. The vectors are a single feature vector taken from the middle of seizure event. set_05: set_06: set_07: set_08: This set was generated in Spring 2021 using the IMLD application. It is 2D data. set_09: This set was generated for the tuning experiments. The samples for each class in the train and dev sets were collected from five tight gaussian equispaced around the contour. For eval set, two such gaussians were used. set_10: This set was generated for the tuning experiments to explore generalization. The original data for each set was augmented by adding gaussian noise. set_11: This is a dataset containing patches from TUDP v1.1.1. Each set contains equal number of background tissues and Invasive Ductal Carcinoma In-Situ (indc) tissues. set_12: This dataset contains 10-second background and seizure segments in numpy data format. The naming convention: 00000123_s001_t000_label_start_stop.npy set_13: This is our first set created with IMLD. It is designed to test a simple Gaussian classifier. There are two classes with some overlap. set_14: Created in Spring 2022 for ECE 8527. This is a sequential decoding task. 1D signals with random events occurring in files. set_15: Created in Spring 2024 for ECE 8527. This is a subset of the TNMG cardiology dataset. It contains 20,000 eval and dev files and 200,000 training files. They are all sampled at 300 Hz and 2,200 samples long. set_16: The dataset is designed for QSVM (Quantum Support Vector Machine) experiments, with the periodic nature of the data. It contains 20,000 total samples (10,000 for training and 10,000 for evaluation). It's a binary classification dataset (labels 0 and 1) with two features, where each class is generated using different periodic functions (sine and cosine) with added Gaussian noise. Class 0 uses cos(t) for feature 1 and sin(2t) for feature 2, while Class 1 uses cos(2t) for feature 1 and sin(t) for feature 2. set_17: I would like to use the DPATH data from TUH DPATH, which Claudia has been using, for the ECE 8527 class project. Here is what I am thinking. We need to generate three spreadsheets: /train, /dev and /eval. We need to use nedc_dpath_gen_feats in resize mode to create files that contain one vector per annotated region. These vectors should contain the following: - take 256x256 pixels centered around the center of the patch - compute a 2D DCT - retain the top 32x32 DCT coefficients - do this for each color This will give us approximately 3,000 columns per line. There are about 10,000 annotated regions. So these files will be about 0.5G each (/train will be bigger than the other two combined). The first column should be the class label, and the remaining columns should be the data. We should be able to generate this using nedc_dpath_gen_feats. Please work together to get this done over the next couple of days - it should not take long since Claudia already has data like this.