File: AAREADME.txt Database: TUH EEG Artifact Corpus (TUAR) Version: v3.0.1 ------------------------------------------------------------------------------- Change Log: v3.0.1 (20240207): Headers were modified. No change to the signal data. ------------------------------------------------------------------------------- The TUH EEG Artifact (TUAR) Corpus began as an effort to identify artifacts that could be used to train artifact models. In annotating v1.0.0, we identified a suitable number of events for each type of artifact. We did not annotate the entire signal. There was community interest in labeling the entire signal so the data could be used to evaluate artifact detectors. We proceeded to annotate every event on every channel. This is the major upgrade from v1.0.0. In v3.0.0, we reorganized the data into its final format to match v2.0.0 of the TUH EEG corpus. This release contains the following directories: nedc_000_[1]: p /data/isip/data/tuh_eeg_artifact/v3.0.0 nedc_000_[1]: ls -1 AAREADME.txt: the documentation file edf: EEG data and csv annotation files A more complete description of the contents of these directories is below: AAREADME.txt: A practical guide to the v2.0.0 release of TUAR edf: edf/01_tcp_ar: edf and annotations for the "tcp ar" channel configuration edf/02_tcp_le: edf and annotations for the "tcp le" channel configuration edf/03_tcp_ar_a: edf and annotations for the "tcp ar a" channel configuration To learn more about channel configurations, montages, etc., please consult this document: Ferrell, S., Mathew, V., Refford, M., Tchiong, V., Ahsan, T., Obeid, I., & Picone, J. (2020). The Temple University Hospital EEG Corpus: Electrode Location and Channel Labels. https://www.isip.piconepress.com/publications/reports/2020/tuh_eeg/electrodes It is important to understand how channel labels and montages work to understand how these channel labels work. With this release, we have completed our transition to xml and csv files for representation of annotation information. The csv files contained in this release contain this type of information: # version = csv_v1.0.0 # bname = aaaaaaju_s005_t000 # duration = 1442.0000 secs # montage_file = $TUAR/v3.0.0/DOCS/01_tcp_ar_montage.txt # annotation_label_file = $TUAR/v3.0.0/DOCS/nedc_ann_eeg_tools_map_v01.txt # channel,start_time,stop_time,label,confidence FP1-F7,22.9737,30.0688,eyem,1.000000 FP1-F7,136.7987,140.1117,eyem,1.000000 FP1-F7,145.0133,148.0498,eyem,1.000000 ... The channel label is taken from the montage file. The start and stop time are the times, in seconds, that identify the beginning and end of the event. The next entry is the name of the artifact. The labels and the annotation process are described in this publication: Buckwalter, G., Chhin, S., Rahman, S., Obeid, I., & Picone, J. (2021). Recent Advances in the TUH EEG Corpus: Improving the Interrater Agreement for Artifacts and Epileptiform Events. In I. Obeid, I. Selesnick, & J. Picone (Eds.), Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB) (pp. 1–3). IEEE. https://doi.org/10.1109/SPMB52430.2021.9672302 Please cite this publication when referring to this corpus. The pathname of a typical EEG file can be explained as follows: Filename: edf/01_tcp_ar/aaaaaaju_s005_t000.edf Components: edf: contains the edf data 01_tcp_ar: data that follows the averaged reference (AR) configuration, while annotations use the TCP channel configutation aaaaaaju_s005_t000.edf: the actual EEG file. These are split into a series of files starting with t000.edf, t001.edf, ... These represent pruned EEGs, so the original EEG is split into these segments, and the irrelevant segments of the original recording are deleted (standard clinical practice). aaaaaaju: subject identifier from TUH EEG v2.0.0 s005: session number t000: token number Annotations are contained in the corresponding *.csv files (in the same directory). Annotations consist of 10 basic types of events: eyem (21): eye movement chew (22): chewing shiv (23): shivers musc (24): muscle artifact elec (30): a catch all category used when any one of three types of artifacts occur: electrode pop, electrostatic, and lead artifact. Any portion of the signal that is not labeled can be assumed to be background (bckg). These labels are described in more detail in our annotation guidelines document: Ochal, D., Rahman, S., Ferrell, S., Elseify, T., Obeid, I., & Picone, J. (2020). The Temple University Hospital EEG Corpus: Annotation Guidelines. URL: www.isip.piconepress.com/publications/reports/2020/tuh_eeg/annotations/ Note that in this release we are also providing seizure annotations for files for which seizures occurred (*_seiz.csv). It is possible to have an artifact occurring on the same channel as a seizure event. The files labeled *seiz*.csv contain the seizure annotations. Below are some summary statistics for the data. FILES AND EVENTS: files: 310 sessions: 259 patients: 213 events: 160,073 (non-background) files w/ seizs: 42 total duration: 359,936.00 secs (5,998.93 mins / 99.98 hrs) avg file duration: 1,161,08 secs ( 19.35 mins / 0.32 hrs) EVENT HISTOGRAMS: musc: 51,052 ( 31.89% / 31.89%) eyem: 38,569 ( 24.09% / 55.99%) elec: 33,130 ( 20.70% / 76.68%) eyem_musc: 18,677 ( 11.67% / 88.35%) musc_elec: 7,651 ( 4.78% / 93.13%) chew: 6,482 ( 4.05% / 97.18%) eyem_elec: 2,422 ( 1.51% / 98.69%) eyem_chew: 864 ( 0.54% / 99.23%) shiv: 613 ( 0.38% / 99.62%) chew_musc: 243 ( 0.15% / 99.77%) elpp: 172 ( 0.11% / 99.88%) chew_elec: 152 ( 0.09% / 99.97%) eyem_shiv: 45 ( 0.03% / 100.00%) shiv_elec: 1 ( 0.00% / 100.00%) TOTAL: 160,073 (100.00% / 100.00%) ----------------------------- Some useful Linux commands: (1) Number of files: nedc_000_[1]: find edf -name "*.edf" | wc 310 310 20474 (2) Number of patients: nedc_000_[1]: find edf -name "*.edf" | cut -d"/" -f3 | cut -d"_" -f1 | sort -u | wc 213 213 1917 (3) Files with seizures: nedc_000_[1]: find edf -name "*seiz*.csv" | wc 42 42 2990 ----------------------------- If you have any additional comments or questions about the data, please direct them to help@nedcdata.org. Best regards, Joseph Picone NEDC Data Resources Development Manager