File: AAREADME.txt Database: Natus AEEG Corpus (NAEG) Version: 1.0.0 ------------------------------------------------------------------------------- Change Log: v1.0.0 (20250420): Initial release of the first 100 studies ------------------------------------------------------------------------------- This file contains some basic statistics about the Natus Ambulatory EEG (NAEG) Corpus. This subset consists of 100 studies that are nominally 72-hour continuous recordings. When you use this specific corpus in your research or technology development, we ask that you reference the corpus using this publication: Melles, A.-M., Paderewski, M., Oymann, R., Shah, V., Salazar, J., Obeid, I., & Picone, J. (2024). Annotation of Ambulatory EEGs. Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium, 1–4. doi: 10.1109/SPMB62441.2024.10842264 This publication can be retrieved from: https://isip.piconepress.com/publications/conference_presentations/2024/ieee_spmb/aeeg/ There are two main directories in this release: nedc_130_[1]: p /data/isip/data/natus_aeeg/v1.0.0 nedc_130_[1]: d ... drwxrwxr-x 3 picone isip 5 Apr 19 16:57 DOCS/ drwxrwxr-x 102 picone isip 102 Feb 21 16:23 edf/ ... /DOCS contains relevant documentation including an annotator log that includes comments about each study, montages that are used to load the data into our annotation tool, and a list of seizure types. The EEG data is stored in edf files located in the edf directory in the following directory structure: edf edf/d0142fa23def05ce051d3c56514d8fef edf/d0142fa23def05ce051d3c56514d8fef_00.edf edf/d0142fa23def05ce051d3c56514d8fef_00/ d0142fa23def05ce051d3c56514d8fef_00_000.csv d0142fa23def05ce051d3c56514d8fef_00_000.csv_bi d0142fa23def05ce051d3c56514d8fef_00_000.edf d0142fa23def05ce051d3c56514d8fef_00_001.csv d0142fa23def05ce051d3c56514d8fef_00_001.csv_bi d0142fa23def05ce051d3c56514d8fef_00_001.edf ... d0142fa23def05ce051d3c56514d8fef_00_011.csv d0142fa23def05ce051d3c56514d8fef_00_011.csv_bi d0142fa23def05ce051d3c56514d8fef_00_011.edf edf/d0142fa23def05ce051d3c56514d8fef_01.edf edf/d0142fa23def05ce051d3c56514d8fef_01 ... d0142fa23def05ce051d3c56514d8fef_11.edf d0142fa23def05ce051d3c56514d8fef_11 The study identifier is d0142fa23def05ce051d3c56514d8fef. This was split into 12 edf files, each which is 6 hours in duration. This is the way the data was delivered to us from Natus. Each of these 6-hour files was split into nominally 12 30-minute files (e.g., *_00_000.edf, *_00_001.edf), and stored in a subdirectory with the same study name and sequence number (e.g., "_00"). This was done mainly for annotator convenience. Our interactive tools run much faster when the signal is less than one hour in duration. Within this subdirectory, there are three types of files: *.edf: the EEG sampled data in European Data Format (edf) *.csv: event-based annotations using all available seizure type classes *.csv_bi: term-based annotations using only two labels (bckg and seiz) Event-based annotations are per-channel. This means the annotation contains, in addition to a start and stop time, a channel index. Seizures often can be observed on one or more channels and then spread to other channels. Event-based annotations capture this. Term-based annotations use one label that applies to all channels. These are most useful for machine learning research in which we tend to worry only about the overall classification of a segment and are not concerned about individual channels. Bi-class annotations use two labels: seizure (seiz) and background (bckg). The multi-class annotations use all available seizure types. These are described in the spreadsheet: $NAEG/v1.0.0/DOCS/seizures_types_v02.xlsx The channel arrangement for this data are consistent: channel[ 0]: 200.0 Hz (FP1) channel[ 1]: 200.0 Hz (F7) channel[ 2]: 200.0 Hz (T3) channel[ 3]: 200.0 Hz (A1) channel[ 4]: 200.0 Hz (T5) channel[ 5]: 200.0 Hz (O1) channel[ 6]: 200.0 Hz (F3) channel[ 7]: 200.0 Hz (C3) channel[ 8]: 200.0 Hz (P3) channel[ 9]: 200.0 Hz (FZ) channel[ 10]: 200.0 Hz (CZ) channel[ 11]: 200.0 Hz (PZ) channel[ 12]: 200.0 Hz (FP2) channel[ 13]: 200.0 Hz (F8) channel[ 14]: 200.0 Hz (T4) channel[ 15]: 200.0 Hz (A2) channel[ 16]: 200.0 Hz (T6) channel[ 17]: 200.0 Hz (O2) channel[ 18]: 200.0 Hz (F4) channel[ 19]: 200.0 Hz (C4) channel[ 20]: 200.0 Hz (P4) channel[ 21]: 200.0 Hz (X1) channel[ 22]: 200.0 Hz (X2) channel[ 23]: 200.0 Hz (DIF1) channel[ 24]: 57.0 Hz (EDF ANNOTATIONS) To annotate this data, we used a tcp_ar montage: nedc_130_[1]: more DOCS/montages/01_tcp_ar_natus_montage.txt # file: $NATUS_AEEG/DOCS/01_tcp_ar_natus_montage.txt # # This file contains our first attempt at a tcp_ar montage. # [Montage] montage = 0, FP1-F7: FP1 -- F7 montage = 1, F7-T3: F7 -- T3 montage = 2, T3-T5: T3 -- T5 montage = 3, T5-O1: T5 -- O1 montage = 4, FP2-F8: FP2 -- F8 montage = 5, F8-T4: F8 -- T4 montage = 6, T4-T6: T4 -- T6 montage = 7, T6-O2: T6 -- O2 montage = 8, A1-T3: A1 -- T3 montage = 9, T3-C3: T3 -- C3 montage = 10, C3-CZ: C3 -- CZ montage = 11, CZ-C4: CZ -- C4 montage = 12, C4-T4: C4 -- T4 montage = 13, T4-A2: T4 -- A2 montage = 14, FP1-F3: FP1 -- F3 montage = 15, F3-C3: F3 -- C3 montage = 16, C3-P3: C3 -- P3 montage = 17, P3-O1: P3 -- O1 montage = 18, FP2-F4: FP2 -- F4 montage = 19, F4-C4: F4 -- C4 montage = 20, C4-P4: C4 -- P4 montage = 21, P4-O2: P4 -- O2 To learn more about this, please review this publication: Lopez, S., Gross, A., Yang, S., Golmohammadi, M., Obeid, I., & Picone, J. (2016). An Analysis of Two Common Reference Points for EEGs. In IEEE Signal Processing in Medicine and Biology Symposium (pp. 1–4). Philadelphia, Pennsylvania, USA. Available at: https://www.isip.piconepress.com/publications/conference_proceedings/2016/ieee_spmb/montages/. Finally, here are some basic descriptive statistics about the data. The Linux commands used to generate these numbers are shown below. For the commands below, the starting point was here: /data/isip/data/natus_aeeg_v1.0.0/edf ( 1) Number of 30-minute edf/csv/csv_bi files: 10,875 nedc_130_[1]: find . -name "*_??_???.edf" | wc -l 10875 nedc_130_[1]: find . -name "*_??_???.csv" | wc -l 10875 nedc_130_[1]: find . -name "*_??_???.csv_bi" | wc -l 10875 ( 2) Number of 6-hour recordings: 950 nedc_130_[1]: find . -maxdepth 2 -mindepth 2 -type d | wc -l 950 ( 3) Number of patients: 100 nedc_130_[1]: find . -maxdepth 1 -mindepth 1 -type d | wc -l 100 ( 4) Number of files with seizures: 1,821 nedc_130_[1]: find . -name "*.csv" -exec grep -H "sz," {} \; | cut -d"/" -f4 | cut -d":" -f1 | sort -u | wc -l 1821 ( 5) Number of 6-hour recordings sessions with seizures: 422 nedc_130_[1]: find . -name "*.csv" -exec grep -H "sz," {} \; | cut -d"/" -f2,3 | sort -u | wc -l 422 ( 6) Number of studies with seizures: 65 nedc_130_[1]: find . -name "*.csv" -exec grep -H "sz," {} \; | cut -d"/" -f2 | sort -u | wc -l 65 ( 7) Total number of seizure events (measured using *.csv_bi): nedc_130_[1]: find . -name "*.csv_bi" -exec grep -H seiz {} \; | wc -l 3779 ( 8) Total duration: 19,494,593 secs (5,411 hours) nedc_130_[1]: find . -name "*.csv" -exec grep duration {} \; | awk '{ sum+=$4} END {print sum}' 19494593 ( 9) Total size of the corpus: 397,975 Mbytes (398.0 Gbytes) nedc_130_[1]: cd /data/isip/data/natus_aeeg/v1.0.0/edf nedc_130_[1]: du -sBM . 397899M . (10) Total duration of seizure events: 53,018.9000 secs nedc_130_[1]: find . -name "*.csv_bi" -exec grep -H "seiz," {} \; | cut -d"," -f2,3 | sed -e "s/,/ /g" | awk '{ sum +=($2-$1)} END {print sum}' 53018.9 ----------------------------- If you have any additional comments or questions about the data, please direct them to help@nedcdata.org. Best regards, Joe Picone