๐ CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges
June 13, 2025 ยท View on GitHub
๐ Resources:
- ๐ค Hugging Face Dataset
- ๐ arXiv Paper
- ๐ Project page
๐ฅ Large language models (LLMs) have demonstrated remarkable capabilities, especially the recent advancements in reasoning, such as o1 and o3, pushing the boundaries of AI. Despite these impressive achievements in mathematics and coding, the reasoning abilities of LLMs in domains requiring cryptographic expertise remain underexplored.
๐ฅ News
2025.05: ๐๐ Congratulations: CipherBank was accepted by ACL-2025 finding conference.
๐ Abstract
๐ In this paper, we introduce CipherBank, a comprehensive benchmark designed to evaluate the reasoning capabilities of LLMs in cryptographic decryption tasks. CipherBank comprises 2,358 meticulously crafted problems, covering 262 unique plaintexts across 5 domains and 14 subdomains, with a focus on privacy-sensitive and real-world scenarios that necessitate encryption.
๐ From a cryptographic perspective, CipherBank incorporates:
- 3 major categories of encryption methods
- 9 distinct algorithms, ranging from classical ciphers to custom cryptographic techniques
๐ค We evaluate state-of-the-art LLMs on CipherBank, including:
GPT-4o|DeepSeek-V3|Claude|Gemini- Cutting-edge reasoning-focused models like
o1andDeepSeek-R1
๐ก Key Findings:
- Significant gaps in reasoning abilities between general-purpose and reasoning-focused LLMs
- Challenges in classical cryptographic decryption tasks
- Limitations in understanding and manipulating encrypted data
๐ Data Introduction
| File | Description |
|---|---|
data/plaintext.jsonl ๐ | Original plaintext from 5 domains and 14 subdomains |
data/shot_case.jsonl ๐ฏ | 3 case examples for few-shot testing |
data/test.jsonl ๐ | Complete test data with plaintext and 9 encryption algorithms |
๐งช Test Introduction
๐ Encryption
python cipher/encryption.py --input_file ../data/plaintext.jsonl --output_file ../data/test.jsonl --mode cipher
๐ Decryption (Test Reversibility)
python cipher/encryption.py --input_file ../data/test.jsonl --mode decrypt
๐ค Test Your Model
๐ ๏ธ Predefined Models
We support API loading for:
GPT|DeepSeek|Claude|Gemini(Just pass your API key inutils/tools.py)
๐๏ธ Custom Models
class YourModel:
def __call__(self, prompt):
# Your implementation here
return response
๐โโ๏ธ Run Tests
# Basic test
bash run.sh --model model_name --shot_number 3
# With detailed prompts
bash run.sh --model model_name --shot_number 3 --is_hint True
# Specific algorithm test (e.g., Rot13)
python test.py --cipher_type Rot13 --model model_name
๐ Citation
If you find CipherBank useful for your your research and applications, please kindly cite using this BibTeX:
@misc{li2025cipherbankexploringboundaryllm,
title={CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges},
author={Yu Li and Qizhi Pei and Mengyuan Sun and Honglin Lin and Chenlin Ming and Xin Gao and Jiang Wu and Conghui He and Lijun Wu},
year={2025},
eprint={2504.19093},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2504.19093},
}
๐ค Contributing
PRs welcome! Please open an issue first to discuss changes.