๐Ÿ” CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges

June 13, 2025 ยท View on GitHub

Python License Research

๐Ÿ”— Resources:

๐Ÿ”ฅ Large language models (LLMs) have demonstrated remarkable capabilities, especially the recent advancements in reasoning, such as o1 and o3, pushing the boundaries of AI. Despite these impressive achievements in mathematics and coding, the reasoning abilities of LLMs in domains requiring cryptographic expertise remain underexplored.

๐Ÿ”ฅ News

2025.05: ๐ŸŽ‰๐ŸŽ‰ Congratulations: CipherBank was accepted by ACL-2025 finding conference.

๐Ÿ“œ Abstract

๐Ÿ” In this paper, we introduce CipherBank, a comprehensive benchmark designed to evaluate the reasoning capabilities of LLMs in cryptographic decryption tasks. CipherBank comprises 2,358 meticulously crafted problems, covering 262 unique plaintexts across 5 domains and 14 subdomains, with a focus on privacy-sensitive and real-world scenarios that necessitate encryption.

๐Ÿ” From a cryptographic perspective, CipherBank incorporates:

  • 3 major categories of encryption methods
  • 9 distinct algorithms, ranging from classical ciphers to custom cryptographic techniques

๐Ÿค– We evaluate state-of-the-art LLMs on CipherBank, including:

  • GPT-4o | DeepSeek-V3 | Claude | Gemini
  • Cutting-edge reasoning-focused models like o1 and DeepSeek-R1

๐Ÿ’ก Key Findings:

  • Significant gaps in reasoning abilities between general-purpose and reasoning-focused LLMs
  • Challenges in classical cryptographic decryption tasks
  • Limitations in understanding and manipulating encrypted data

๐Ÿ“‚ Data Introduction

FileDescription
data/plaintext.jsonl ๐Ÿ“„Original plaintext from 5 domains and 14 subdomains
data/shot_case.jsonl ๐ŸŽฏ3 case examples for few-shot testing
data/test.jsonl ๐Ÿ”Complete test data with plaintext and 9 encryption algorithms

๐Ÿงช Test Introduction

๐Ÿ”’ Encryption

python cipher/encryption.py --input_file ../data/plaintext.jsonl --output_file ../data/test.jsonl --mode cipher

๐Ÿ”“ Decryption (Test Reversibility)

python cipher/encryption.py --input_file ../data/test.jsonl --mode decrypt

๐Ÿค– Test Your Model

๐Ÿ› ๏ธ Predefined Models

We support API loading for:

  • GPT | DeepSeek | Claude | Gemini (Just pass your API key in utils/tools.py)

๐Ÿ—๏ธ Custom Models

class YourModel:
    def __call__(self, prompt):
        # Your implementation here
        return response

๐Ÿƒโ€โ™‚๏ธ Run Tests

# Basic test
bash run.sh --model model_name --shot_number 3

# With detailed prompts
bash run.sh --model model_name --shot_number 3 --is_hint True

# Specific algorithm test (e.g., Rot13)
python test.py --cipher_type Rot13 --model model_name

๐Ÿ“œ Citation

If you find CipherBank useful for your your research and applications, please kindly cite using this BibTeX:

@misc{li2025cipherbankexploringboundaryllm,
      title={CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges}, 
      author={Yu Li and Qizhi Pei and Mengyuan Sun and Honglin Lin and Chenlin Ming and Xin Gao and Jiang Wu and Conghui He and Lijun Wu},
      year={2025},
      eprint={2504.19093},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2504.19093}, 
}

๐Ÿค Contributing

PRs welcome! Please open an issue first to discuss changes.