SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors

July 30, 2025 · View on GitHub

Hugging Face Datasets arXiv License: CC BY-NC-SA-4.0 Github

Introduction

SeniorTalk is a comprehensive, open-source Mandarin Chinese speech dataset specifically designed for research on elderly aged 75 to 85. This dataset addresses the critical lack of publicly available resources for this age group, enabling advancements in automatic speech recognition (ASR), speaker verification (SV), speaker dirazation (SD), speech editing and other related fields. The dataset is released under a CC BY-NC-SA 4.0 license, meaning it is available for non-commercial use.

Dataset Details

This dataset contains 55.53 hours of high-quality speech data collected from 202 elderly across 16 provinces in China. Key features of the dataset include:

  • Age Range: 75-85 years old (inclusive). This is a crucial age range often overlooked in speech datasets.
  • Speakers: 202 unique elderly speakers.
  • Geographic Diversity: Speakers from 16 of China's 34 provincial-level administrative divisions, capturing a range of regional accents.
  • Gender Balance: Approximately 7:13 representation of male and female speakers, largely attributed to the differing average ages of males and females among the elderly.
  • Recording Conditions: Recordings were made in quiet environments using a variety of smartphones (both Android and iPhone devices) to ensure real-world applicability.
  • Content: Natural, conversational speech during age-appropriate activities. The content is unrestricted, promoting spontaneous and natural interactions.
  • Audio Format: WAV files with a 16kHz sampling rate.
  • Transcriptions: Carefully crafted, character-level manual transcriptions.
  • Annotations: The dataset includes annotations for each utterance, and for the speakers level.
    • Session-level: sentence_start_time,sentence_end_time,overlapped speech
    • Utterance-level: id, accent_level, text (transcription).
    • Token-level: special token([SONANT],[MUSIC],[NOISE]....)
    • Speaker-level: speaker_id, age, gender, location (province), device.

Dataset Structure

Dialogue Dataset

The dataset is split into two subsets:

Split# Speakers# DialoguesDuration (hrs)Avg. Dialogue Length (h)
train1829149.830.54
test20105.700.57
Total20210155.530.55

The dataset file structure is as follows.


dialogue_data/  
├── wav  
│   ├── train/*.tar   
│   └── test/*.tar   
└── transcript/*.txt
UTTERANCEINFO.txt  # annotation of topics and duration
SPKINFO.txt   # annotation of location , age , gender and device

Each WAV file has a corresponding TXT file with the same name, containing its annotations.

For more details, please refer to our paper SeniorTalk.

ASR Dataset

The dataset is split into three subsets:

Split# Speakers# UtterancesDuration (hrs)Avg. Utterance Length (s)
train16247,26929.952.28
validation206,8914.092.14
test205,8693.772.31
Total20260,02937.812.27

The dataset file structure is as follows.

sentence_data/  
├── wav  
│   ├── train/*.tar
│   ├── dev/*.tar 
│   └── test/*.tar   
└── transcript/*.txt   
UTTERANCEINFO.txt  # annotation of topics and duration
SPKINFO.txt   # annotation of location , age , gender and device

Each WAV file has a corresponding TXT, containing its annotations.

For more details, please refer to our paper SeniorTalk.

📐 Experiments

We conducted experiments on Automatic Speech Recognition (ASR) , Speaker Verification (SV) tasks , Speaker Dirazation (SD) tasks and Speech Editing tasks to evaluate the dataset.

1️⃣ ASR Results

Models Trained from Scratch

Encoder# ParamsCERNoLightModerateHeavySouthNorth
Transformer14.1M48.9922.5849.0551.0780.9548.550.24
Conformer15.7M34.6121.2334.2137.6259.5234.5534.74
E-Branchformer16.9M33.2523.2520.7133.0335.3264.2933.94

Fine-tuned Pre-trained Models

Model# ParamsZero-shotFine-tuning
Paraformer-large232M14.9114.41
Whisper-tiny39M92.2058.80
Whisper-base74M64.0238.17
Whisper-small244M55.8328.69
Whisper-medium769M60.4725.77
Whisper-large-v31,550M57.7423.84

2️⃣ SV Results

Model#ParamsDimDev (%)EER (%)minDCFEER (%)minDCF
X-vector4.2M51212.0414.630.976819.260.9598
ResNet-TDNN15.5M2564.37210.880.845011.500.9196
ECAPA-TDNN20.8M1928.8611.5410.240.95820.9582

3️⃣ SD Results

Model# ParamsDimcollar=0 DER(%)collar=0 Confusion(%)collar=0.25 DER(%)collar=0.25 Confusion(%)
ResNet-34-LM15.5M25633.1416.8228.3916.85
x-vector4.2M51253.0136.6949.8238.28
ResNet-TDNN15.5M25643.4427.1339.5828.03
ECAPA-TDNN20.8M19227.8411.5222.8511.31

4️⃣ Speech Editing Results

MethodMCD(↓)STOI(↑)PESQ(↑)
CampNet7.3020.2201.291
EditSpeech6.2250.5141.363
A3T5.8510.5861.455
FluentSpeech5.8110.6271.645

🤗 Dataset Download

You can access the SeniorTalk dataset on HuggingFace Datasets:

https://huggingface.co/datasets/BAAI/SeniorTalk

Code Access Control

This code and dataset is available to researchers upon request for academic and non-commercial use. To request access, please follow these steps:

Submit Application via Email: Send an email to 2120230617@mail.nankai.edu.cn with the following information:

  • Subject: Dataset Access Request: [Your Name/Institution]
  • Body:
    • Your Hugging Face Username.
    • Your full name, title, and academic/institutional affiliation.
    • A link to your professional profile (e.g., university page, Google Scholar, LinkedIn).
    • A brief description of your research project and how you intend to use the dataset.

We will review your application and grant access on Hugging Face upon approval. Please allow 3-5 business days for processing.

📚 Cite me

@misc{chen2025seniortalkchineseconversationdataset,
      title={SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors}, 
      author={Yang Chen and Hui Wang and Shiyao Wang and Junyang Chen and Jiabei He and Jiaming Zhou and Xi Yang and Yequan Wang and Yonghua Lin and Yong Qin},
      year={2025},
      eprint={2503.16578},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.16578}, 
}