Korean Wellness Chatbot Models
January 12, 2026 · View on GitHub
Formerly: WellnessConversation-LanguageModel
Korean README: README.ko.md
Korean wellness/counseling chatbot models built with PyTorch and Hugging Face Transformers. This repository is a research/learning snapshot and contains two main directions:
- Text classification (query → category): KoELECTRA, KoBERT
- Text generation (query → response): KoGPT2 (autoregressive)
What’s in this repo
1) Category classification (KoELECTRA / KoBERT)
Goal: given a user query, predict a wellness category (the dataset includes 359 category classes).
Typical training example format:
- input: a user query sentence
- label: an integer category id (0..358)
See:
- Training:
train/run_koelectra.py,train/run_text_classification.py - Models:
model/koelectra.py,model/kobert.py - Dataset loader:
dataloader/wellness.py - Example inference:
example/koelectra-wellness-qa.py,example/kobert-wellness-qa.py
In the example scripts, the predicted category is used to retrieve a canned answer from the dataset (randomly sampled among answers for the category).
2) Response generation (KoGPT2)
Goal: train an autoregressive model to generate the next response given a user query.
Typical training example format:
- input:
question answerpairs from the wellness dataset
See:
- Training:
train/run_auto_regressive.py - Model:
model/kogpt2.py - Dataset loader:
dataloader/wellness.py - Example inference:
example/kogpt2-text-generation.py
Data
This repo does not ship the AI Hub dataset itself.
- AI Hub mental health counseling dataset (Korean): requires signup and approval
http://www.aihub.or.kr/keti_data_board/language_intelligence - Additional public dataset often used in experiments:
songys/Chatbot_data
https://github.com/songys/Chatbot_data
The original wellness dataset is organized around:
- category / question / answer, with multiple candidate answers per category.
Dataset sizes used in this project (snapshot)
From the original Korean README:
- Number of category classes: 359
- Number of (query, category) pairs used for classification: 5231
Setup
Install dependencies:
pip install -r requirements.txt
Notes:
- The code was written around older
transformersversions (seerequirements.txt, e.g.transformers==3.0.2). - Training scripts assume local folders like
data/andcheckpoint/under the repo root.
Environment used (snapshot)
- GPU: Colab Pro, P100
- Core packages:
kogpt2-transformerskobert-transformerstransformers==3.0.2torch
Training (best-effort)
The training scripts use hard-coded paths (examples below). You may need to adjust them to match your local data layout.
- KoELECTRA classification:
train/run_koelectra.py- expects
data/wellness_dialog_for_text_classification_train.txt - saves checkpoint to
checkpoint/koelectra-wellnesee-text-classification.pth
- expects
- KoBERT classification:
train/run_text_classification.py- expects
data/wellness_dialog_for_text_classification_train.txt - saves checkpoint to
checkpoint/kobert-wellnesee-text-classification.pth
- expects
- KoGPT2 generation:
train/run_auto_regressive.py- expects
data/wellness_dialog_for_autoregressive_train.txt - saves checkpoint to
checkpoint/kogpt2-wellnesee-auto-regressive.pth
- expects
Preprocessing
preprocess/training_data.py contains helper functions used during dataset preparation (Excel → text files, splitting train/test, etc.).
It is best treated as a reference script; you may need to adapt it depending on how you store the dataset locally.
Serving (Flask)
A simple REST API is provided:
service/api.py/api/wellness/dialog/bert?s=.../api/wellness/dialog/electra?s=...
Project period
2020.06 ~ 2020.07
References
- KoBERT: https://github.com/SKTBrain/KoBERT
- KoBERT-Transformers: https://github.com/monologg/KoBERT-Transformers
- KoGPT2: https://github.com/SKT-AI/KoGPT2
- KoGPT2-Transformers: https://github.com/taeminlee/KoGPT2-Transformers/
- KoELECTRA: https://github.com/monologg/KoELECTRA
- enlipleai/kor_pretrain_LM: https://github.com/enlipleai/kor_pretrain_LM
- Hugging Face blog (text generation): https://huggingface.co/blog/how-to-generate