(KDD 2025) BatteryLife

June 13, 2026 · View on GitHub

This is the official repository for BatteryLife: A Comprehensive Dataset and Benchmark for Battery Life Prediction. If you find this repository useful, we would appreciate citations to our paper and stars to this repository.

:triangular_flag_on_post: News (2026.04) BatteryLife v11 is now released, with fixes for some reported issues. Please see the update details for more information.

:triangular_flag_on_post: News (2026.02) BatteryLife has exceeded 30,000 downloads. BatteryLife v10 is now released, with fixes for issues reported over the past year (update details are available here). We sincerely appreciate the support from the community.

:triangular_flag_on_post:News (2025.10) Added the standardized SDU dataset to BatteryLife. Corrected the time_in_s column for all batteries.

🔥News (2025.08) BatteryLife downloads exceed 10,000.

🔥News (2025.07) BatteryLife downloads exceed 7,000.

🔥News (2025.06) BatteryLife downloads exceed 5,000.

:triangular_flag_on_post:News (2025.06) Added the complete Stanford dataset as "Stanford_2" (now including both releases of the Stanford dataset).

:triangular_flag_on_post:News (2025.05) BatteryLife was accpeted by KDD 2025.

🔥News (2025.05) BatteryLife downloads exceed 3,000. ​

:triangular_flag_on_post:News (2025.02) BatteryLife was released!

Highlights

(Data statistics are based on the initial release of BatteryLife.)

  • The largest battery life dataset: BatteryLife is created by integrating 16 datasets, providing 99,000 samples from 990 batteries with life labels.
  • The most diverse battery life dataset: BatteryLife contains 8 battery formats, 59 chemical systems, 9 operation temperatures, and 421 charge/discharge protocols.
  • A comprehensive benchmark for battery life prediction: BatteryLife provides 18 benchmark methods with open-source codes in this repository. The 18 benchmark methods include popular methods for battery life prediction, popular baselines in time series analysis, and a series of baselines proposed by this work.

Data availability

The processed datasets can be accessed via multiple ways:

  1. You can download the datasets from Huggingface [tutorial].
  2. You can download the datasets from Zenodo.

Note that brief introductions to each dataset are available under the directory of each dataset.

All the raw datasets are publicly available, interested users can download them from the following links:

Benchmark results of Battery Life Prediction (BLP) task

The benchmark result for battery life prediction. The comparison methods are split into five types, including

  1. Dummy, a baseline that uses the mean of training labels as the prediction.
  2. MLPs, a series of multilayer perceptron models including DLinear, MLP, and CPMLP.
  3. Transformers, a series of transformer models including PatchTST, Autoformer, iTransformer, Transformer, and CPTransformer.
  4. CNNs, a series of convolutional neural network models including CNN and MICN.
  5. RNNs, a series of recurrent neural network models including CPGRU, CPBiGRU, CPLSTM, CPBiLSTM, GRU, BiGRU, LSTM, and BiLSTM.
DatasetsLi-ionLi-ionZn-ionZn-ionNa-ionNa-ionCALBCALB
MetricsMAPE15%-AccMAPE15%-AccMAPE15%-AccMAPE15%-Acc
Dummy0.831±0.0000.296±0.0001.297±0.2140.083±0.0470.404±0.0290.067±0.0941.811±0.5500.267±0.094
DLinear0.586±0.0280.275±0.0170.814±0.0260.124±0.0200.319±0.0310.329±0.0420.164±0.0490.601±0.114
MLP0.233±0.0100.503±0.0130.805±0.1030.079±0.0550.281±0.0670.364±0.0980.149±0.0140.641±0.115
CPMLP0.179±0.0030.620±0.0040.558±0.0340.297±0.0840.274±0.0260.337±0.0380.140±0.0090.704±0.053
PatchTST0.288±0.0420.430±0.0530.716±0.0240.133±0.0010.396±0.0940.258±0.0700.347±0.0450.511±0.139
Autoformer0.437±0.0930.287±0.0670.987±0.2430.106±0.0390.372±0.0470.177±0.1280.761±0.0610.329±0.121
iTransformer0.209±0.0150.516±0.0280.690±0.1100.188±0.0370.321±0.0870.249±0.1780.164±0.0200.649±0.044
Transformer--------
CPTransformer0.184±0.0030.573±0.0160.515±0.0670.202±0.0840.255±0.0360.406±0.0840.149±0.0050.672±0.107
CNN0.337±0.0680.371±0.0500.928±0.0930.115±0.0290.307±0.0470.273±0.0270.278±0.0110.582±0.032
MICN0.249±0.0040.494±0.0190.579±0.1010.227±0.1270.305±0.0400.335±0.0650.233±0.0500.471±0.257
CPGRU0.189±0.0080.585±0.0130.616±0.0490.289±0.0760.298±0.0630.203±0.1600.141±0.0120.681±0.178
CPBiGRU0.190±0.0010.566±0.0340.774±0.2020.193±0.1560.282±0.0550.395±0.0080.160±0.0150.686±0.063
CPLSTM0.196±0.0060.585±0.0200.932±0.2270.085±0.0280.272±0.0510.386±0.0090.156±0.0730.613±0.153
CPBiLSTM0.191±0.0070.421±0.2550.645±0.0490.150±0.1040.299±0.0430.399±0.0010.173±0.0750.663±0.247
GRU&BiGRUNANANANANANANANA
LSTM&BiLSTMNANANANANANANANA

Quick start

Install

pip install -r requirements.txt
# You should also install BatteryML (https://github.com/microsoft/BatteryML)

Preprocessing [tutorial]

After downloading all raw datasets provided in "Data availability" section, you can run the following script to obtain the processed datasets:

python preprocess_scripts.py

If you download the processed datasets, you can skip this step.

  • During the development of BatteryLife, we frequently encountered problems where the processed data still contained potential issues after processing. Consequently, according to our experience, we have provided some Jupyter scripts for the double-check of processed data in the ./check_data_scripts/ folder to help the quick verification and processing of the data for the community. By conducting quick checks to ensure that all characteristic curves align with expectations, potential downstream complications can be effectively mitigated.

    • check_capacity_curves.ipynb : for checking charge and discharge capacities curve of the batteries..
    • check_soh_curves.ipynb : for checking the degradation trajectory of the batteries.
    • check_voltage_current_curves.ipynb : for checking the voltage and current curves of the batteries.

How to calculate the statistical information of aging conditions for processed data:

  • Firstly, run the aging_conditions.py script to generate the name2agingConditionID.json, which the aging condition number for each battery.
  • Secondly, run the dataset_overview_calculation.py script to calculate the aging conditions statistical information for preprocessed data.

Train the model [tutorial]

Before you start training, please move all processed datasets (such as, HUST, MATR, et al.) folders and Life labels folder (downloaded from Hugginface or Zenodo websites) into ./dataset folder under the root folder.

After that, just feel free to run any benchmark method. For example:

sh ./train_eval_scripts/CPTransformer.sh

Evaluate the model

If you want to evaluate a model in detail. We have provided the evaluation script. You can use it as follows:

sh ./train_eval_scripts/evaluate.sh

Fine-tuning [tutorial]

If you want to fine-tune the pretrained model to another dataset. We have provided the fine-tuning script and the tutorial. You can use it as follows:

sh ./train_eval_scripts/finetune_script.sh

Domain adaptation [tutorial]

If you want to do the domain adaptation to another dataset. We have provided the domain adaptation script and the tutorial. You can use it as follows:

sh ./train_eval_scripts/domain_adaptation_script.sh

Documention

Welcome contributions

Advancing AI4Battery requires standardized datasets. However, the available battery life datasets are typically stored in different places and in different formats. We have put great efforts into integrating 13 previously available datasets and 3 of our datasets. BatteryLife aims to become a unified platform for sharing standardized battery aging and lifetime datasets. We warmly welcome contributions from the community—whether by sharing new datasets or standardizing existing ones according to the BatteryLife guidelines.

To further broaden the range of available resources, we list below several open-source but currently unprocessed datasets in the battery life domain:

IndexRelease YearData Download LinkJournals/ConferencesPreprocess Status
12023Item - eVTOL Battery Dataset - Carnegie Mellon University - FigshareScientific Data
22024Dataset - Dynamic cycling enhances battery lifetime Stanford Digital RepositoryNature Energy
32025Aging matrix visualizes complexity of battery aging across hundreds of cycling protocolsEnergy & Environmental Science
42025Degradation path prediction of lithium-ion batteries under dynamic operating sequencesEnergy & Environmental ScienceOngoing
52025Non-destructive degradation pattern decoupling for early battery trajectory prediction via physics-informed learningEnergy & Environmental Science
62025A dataset of over one thousand computed tomography scans of battery cellsChemRxiv
72026Transfer from lithium to sodium: promoting battery lifetime prognosis applicationEES Batteries
82026Large battery model for multi-state co-estimation and intelligent recommendation using mixed data sourcesEnergy Storage Materials
92026Discovery Learning predicts battery cycle life from minimal experimentsNatureOngoing

If you are interested in contributing, please either submit a pull request or contact us via email at rtan474@connect.hkust-gz.edu.cn and whong719@connect.hkust-gz.edu.cn. To integrate your data into the BatteryLife repositories, please provide:

  • Raw datasets
  • Processed datasets
  • Preprocessing scripts (for reproducibility)
  • A list of contributors (for acknowledgment in the repo)
  • Papers related to the data generation (we will prompt users to cite these in the repository's Citation section).

Citation

If you use the benchmark, processed datasets, or the raw datasets produced by this work, you should cite the BatteryLife paper:

@inproceedings{10.1145/3711896.3737372,
author = {Tan, Ruifeng and Hong, Weixiang and Tang, Jiayue and Lu, Xibin and Ma, Ruijun and Zheng, Xiang and Li, Jia and Huang, Jiaqiang and Zhang, Tong-Yi},
title = {BatteryLife: A Comprehensive Dataset and Benchmark for Battery Life Prediction},
year = {2025},
isbn = {9798400714542},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3711896.3737372},
doi = {10.1145/3711896.3737372},
booktitle = {Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2},
pages = {5789–5800},
numpages = {12},
location = {Toronto ON, Canada},
series = {KDD '25}
}
  • Additionally, please cite the original papers that conducted experiments. Please cite BatteryArchive as the data source for the HNEI, SNL, MICH, MICH_EXP, and UL_PUR datasets.
  • Please cite BatteryML if you use the processed CALCE, MATR, HUST, HNEI, RWTH, SNL, and UL_PUR datasets. Our preprocessing for these 7 datasets relies heavily on BatteryML's preprocessing scripts.
  • Please cite SDU paper if you use the SDU dataset.

Acknowledgement

This repo is constructed based on the following repos:

Star History

Star History Chart

All thanks to our contributors