Fairseq-signals
March 19, 2026 ยท View on GitHub
Fairseq-signals is a collection of deep learning models for ECG data processing based on the fairseq.
We provide implementations of various deep learning methods on ECG data, including official implementations of our works.
List of implemented papers:
- Deep learning based ECG segmentation for delineation of diverse arrhythmias
- Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training
- Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training
- Lead-agnostic Self-supervised Learning for Local and Global Representations of Electrocardiogram*
- 3KG: Contrastive Learning of 12-Lead Electrocardiograms using Physiologically-Inspired Augmentations
- CLOCS: Contrastive Learning of Cardiac Signals Across Space, Time, and Patients
- wav2vec 2.0: A Framework for Self-supervised Learning of Speech Representations
- A Simple Framework for Contrastive Learning of Visual Representations
- ECG-FM: An Open Electrocardiogram Foundation Model*
* denotes for an official implementation
We will keep implementing new methods in this repo. If you have any recommendations, please contact us via an issue or an e-mail.
Requirements and Installation
- PyTorch version >= 1.5.0
- Python version >= 3.6
- For training new models, you'll also need an NVIDIA GPU and NCCL
- To install fairseq-signals from source and develop locally:
git clone https://github.com/Jwoo5/fairseq-signals
cd fairseq-signals
pip install --editable ./
- To preprocess ECG datasets:
pip install pandas scipy wfdb - To build cython components:
python setup.py build_ext --inplace - For large datasets install PyArrow:
pip install pyarrow
Getting Started
For uni-modal tasks (ECG Classification, ...)
Prepare ECG dataset
We provide pre-processing codes for various ECG datasets.
Pre-process
Given a directory that contains WFDB directories to be pre-processed for PhysioNet2021:
$ python fairseq_signals/data/ecg/preprocess/preprocess_physionet2021.py \
/path/to/physionet2021/ \
--dest /path/to/output \
--workers $N
Given a directory that contains .dat files from PTB-XL:
$ python fairseq_signals/data/ecg/preprocess/preprocess_ptbxl.py \
/path/to/ptbxl/records500/ \
--dest /path/to/output
Prepare data manifest
Given a directory that contains pre-processed data:
$ python fairseq_signals/data/ecg/preprocess/manifest.py \
/path/to/data/ \
--dest /path/to/manifest \
--valid-percent $valid
For patient identification:
$ python fairseq_signals/data/ecg/preprocess/manifest_identification.py \
/path/to/data \
--dest /path/to/manifest \
--valid-percent $valid
Please find more details about pre-processing and data manifest from here.
For multi-modal tasks (Multi-modal pre-training or ECG question answering)
Prepare ECG dataset
We provide pre-processing codes for the following datasets.
Pre-process
For multi-modal pre-training of ECGs with reports using the PTB-XL dataset:
$ python fairseq_signals/data/ecg_text/preprocess/preprocess_ptbxl.py \
/path/to/ptbxl \
--dest /path/to/output \
For multi-modal pre-training of ECGs with reports using the MIMIC-IV-ECG dataset:
$ python fairseq_signals/data/ecg_text/preprocess/preprocess_mimic_iv_ecg.py \
/path/to/mimic-iv-ecg \
--dest /path/to/output \
For ECG Question Answering task with the ECG-QA dataset:
- Map
ecg_idto the corresponding ECG file path (you can find these scripts in the ECG-QA repository)- For PTB-XL-based ECG-QA:
$ python mapping_ptbxl_samples.py ecgqa/ptbxl \ --ptbxl-data-dir $ptbxl_dir \ --dest $dest_dir - For MIMIC-IV-ECG-based ECG-QA:
$ python mapping_mimic_iv_ecg_samples.py ecgqa/mimic-iv-ecg \ --mimic-iv-ecg-data-dir $mimic_iv_ecg_dir \ --dest $dest_dir
- For PTB-XL-based ECG-QA:
- Preprocess ECG-QA and prepare manifests
$ fairseq_signals/data/ecg_text/preprocess/preprocess_ecgqa.py /path/to/ecgqa \ --dest /path/to/output \ --apply_paraphrase
You don't need to run additional scripts to prepare manifest files for ECG-QA dataset since it automatically generates manifest files during the pre-processing process.
Prepare data manifest
Given a directory that contains pre-processed PTB-XL data:
$ python fairseq_signals/data/ecg_text/preprocess/manifest.py \
/path/to/data \
--dest /path/to/manifest \
--valid-percent $valid
Please find more details about pre-processing and data manifest here.
Examples
We provide detailed READMEs for each model implementation:
- Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training
- Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training
- Lead-agnostic Self-supervised Learning for Local and Global Representations of Electrocardiogram*
- 3KG: Contrastive Learning of 12-Lead Electrocardiograms using Physiologically-Inspired Augmentations
- CLOCS: Contrastive Learning of Cardiac Signals Across Space, Time, and Patients
- wav2vec 2.0: A Framework for Self-supervised Learning of Speech Representations
- A Simple Framework for Contrastive Learning of Visual Representations
* denotes for an official implementation
Contact
If you have any questions or recommendations, please contact us via an issue or an e-mail.