CLI / WebUI

December 19, 2025 · View on GitHub

Overview

CLI (Command Line Interface) provides terminal-based interaction with the program, enabling efficient and flexible execution of model training, inference, and evaluation tasks through parameterized configurations.

WebUI (Web User Interface) offers a browser-based visual interface that allows users to perform model training, chatting, and deployment without coding or complex commands, making it ideal for non-technical users and rapid prototyping.

Features

This document details the usage of CLI tools and WebUI in the ERNIE model toolkit, covering core functionalities:

  • 📈 Model Fine-tuning: SFT/LoRA/DPO fine-tuning with built-in/custom datasets
  • 🗣️ Chat Interaction: Load models for multi-turn conversation testing
  • 📊 Performance Evaluation: Validate models on built-in/custom datasets
  • 📁 Model Export: Convert trained models to deployable formats

Whether you're a developer seeking script-based customization or prefer graphical interfaces for quick experimentation, both approaches are supported.

Quick Start

Installation

Run in the erniekit root directory:

python -m pip install -e .

Verify installation:

erniekit help

Expected output:

------------------------------------------------------------
| Usage:                                                     |
|   erniekit train -h: model finetuning                      |
|   erniekit export -h: model export                         |
|   erniekit split -h: model split                           |
|   erniekit eval -h: model evaluation                       |
|   erniekit server -h: model deployment                     |
|   erniekit chat -h: launch a chat interface in CLI         |
|   erniekit webui -h: launch webui                          |
|   erniekit version: show version info                      |
|   erniekit help: show helping info                         |
------------------------------------------------------------

GPU Configuration

By default, all available gpus are used in CLI/WebUI. If you wan to specify certain gpus, please set CUDA_VISIBLE_DEVICES before running CLI/WebUI:

# Single GPU
export CUDA_VISIBLE_DEVICES=0
# Multi GPUs
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7

# Single XPU
export XPU_VISIBLE_DEVICES=0
# Multi XPUs
export XPU_VISIBLE_DEVICES=0,1,2,3,4,5,6,7

# Single NPU
export ASCEND_RT_VISIBLE_DEVICES=0
# Multi NPUs
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
  • Note: In Chat module, the number of gpus configured by CUDA_VISIBLE_DEVICES should be equal to tensor_parallel_degree in the config. Alternatively, you can also unset CUDA_VISIBLE_DEVICES.

1. CLI Usage

Examples using ERNIE-4.5-0.3B model:

1.1. Chat

# download model from huggingface
huggingface-cli download baidu/ERNIE-4.5-0.3B-Paddle --local-dir baidu/ERNIE-4.5-0.3B-Paddle
# Load model and start service
erniekit server examples/configs/ERNIE-4.5-0.3B/run_chat.yaml
# Launch CLI chat interface
erniekit chat examples/configs/ERNIE-4.5-0.3B/run_chat.yaml
  • Note: the command-line dialogue for VL-model only supports pure text input.

1.2. Model Fine-tuning

1.2.1. SFT & LoRA Fine-tuning

# download model from huggingface
huggingface-cli download baidu/ERNIE-4.5-0.3B-Paddle --local-dir baidu/ERNIE-4.5-0.3B-Paddle
# Example 1: 8K seq length, SFT
erniekit train examples/configs/ERNIE-4.5-0.3B/sft/run_sft_8k.yaml
# Example 2: 32K seq length, SFT
erniekit train examples/configs/ERNIE-4.5-0.3B/sft/run_sft_32k.yaml
# Example 3: 8K seq length, SFT-LoRA
erniekit train examples/configs/ERNIE-4.5-0.3B/sft/run_sft_lora_8k.yaml
# Example 4: 32K seq length, SFT-LoRA
erniekit train examples/configs/ERNIE-4.5-0.3B/sft/run_sft_lora_32k.yaml

1.2.2. DPO & LoRA Fine-tuning

# download model from huggingface
huggingface-cli download baidu/ERNIE-4.5-0.3B-Paddle --local-dir baidu/ERNIE-4.5-0.3B-Paddle
# Example 1: 8K seq length, DPO
erniekit train examples/configs/ERNIE-4.5-0.3B/dpo/run_dpo_8k.yaml
# Example 2: 32K seq length, DPO
erniekit train examples/configs/ERNIE-4.5-0.3B/dpo/run_dpo_32k.yaml
# Example 3: 8K seq length, DPO-LoRA
erniekit train examples/configs/ERNIE-4.5-0.3B/dpo/run_dpo_lora_8k.yaml
# Example 4: 32K seq length, DPO-LoRA
erniekit train examples/configs/ERNIE-4.5-0.3B/dpo/run_dpo_lora_32k.yaml

1.3. Model Evaluation

erniekit eval examples/configs/ERNIE-4.5-0.3B/run_eval.yaml

1.4. Model Export

erniekit export examples/configs/ERNIE-4.5-0.3B/run_export.yaml

1.5. Multi-Node Training

NNODES={num_nodes} MASTER_ADDR={your_master_addr} MASTER_PORT={your_master_port} CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 erniekit train examples/configs/ERNIE-4.5-300B-A47B/sft/run_sft_lora_8k.yaml

2. WebUI Examples

Launch WebUI:

erniekit webui
# Specify port: GRADIO_SERVER_PORT=8080 erniekit webui

WebUI contains five modules: Basic Info, Training, Chat, Evaluation, and Export.

2.1. Basic Info

2.1.1 Model

Default model name is Customization. Custom models support local paths (relative/absolute).

If using a multimodal model, you need to select Customization_VL.

2.1.2 Export Directory

If empty, training will auto-generate paths like ./output/ERNIE-4.5-0.3B_SFT_LoRA_2025_06_29_12_03_36. Evaluation/chat/export default to ./output.

2.1.3 Available GPUs

Displays GPU count (read-only).

2.1.4 Training Method

WebUI ParamVariableDescription
Fine-tuningfine_tuningLoRA or Full-parameter
Compute Typecompute_typebf16, fp16, fp8 (NVIDIA H-series only), wint8, wint4/8
AMP Master Gradamp_master_gradFor AMP O2, uses fp32 weight gradients (default: keep unchanged)
Disable CKPT Quantdisable_ckpt_quantDisables weight quantization
LoRA Ranklora_rankLoRA rank dimension
LoRA Alphalora_alphaLoRA scaling factor
LoRA+ Scalelora_plus_scaleLoRA B scale in LoRA+
RSLoRArsloraEnable RSLoRA

2.1.5 Distributed Parameters

WebUI ParamVariableDescription
Tensor Paralleltensor_parallel_degreeTensor parallelism degree
Pipeline Parallelpipeline_parallel_degreePipeline parallelism degree
Sharding Parallelsharding_parallel_degreeSharding parallelism degree
Pipeline Configpipeline_parallel_configRecommended: "disable_partial_send_recv enable_clear_every_step_cache enable_delay_scale_loss enable_overlap_p2p_comm best_unbalanced_scheduler"
PP Seg Methodpp_seg_methodPipeline layer segmentation
ShardingshardingSharding stage: stage1 (optimizer), stage2 (gradients), stage3 (model)
Use SP Callbackuse_sp_callbackSkips redundant gradient calculations
MoE Groupmoe_groupMoE communication group ("mp" or "dummy")

Basic Info

2.2. Training Module

Default SFT/DPO configurations for ERNIE-4.5-0.3B-Paddle are provided under "Switch SFT/DPO Presets".

After setting dataset paths/probabilities, click "Preview Dataset" for visualization. Click "Preview" to show configurations, "Start" to begin training, and "Stop" to interrupt.

2.2.1 Data Parameters

WebUI ParamVariableDescription
Max Sequence Lengthmax_seq_lenToken limit (adjust lower with larger GBS to avoid OOM)
Max Prompt Lengthmax_prompt_lenFor DPO (max: max_seq_len-10)
Virtual Epoch Sizenum_samples_each_epochRecommended default
RecomputerecomputeGradient checkpointing to save memory
Training Epochsnum_train_epochsOverridden by max_steps if both set
Max Stepsmax_stepsTotal training steps
Batch Sizebatch_sizeMicro batch size
Gradient Accumulationgradient_accumulation_stepsSteps for gradient accumulation

2.2.2 Training Dataset

Choose built-in (demo/HuggingFace) or custom datasets (mixed by probability):

WebUI ParamVariableDescription
Dataset Pathtrain_dataset_pathTraining dataset path
Dataset Probabilitytrain_dataset_probSampling probability
Data Typetrain_dataset_typeSupported: erniekit, alpaca
  • Note: Multimodal models can additionally be configured with text-only datasets, allowing for mixed training with both multimodal and text-only data. You can adjust the data ratio through a sliding window interface.

2.2.3 Evaluation Dataset

Same options as training dataset:

WebUI ParamVariableDescription
Dataset Patheval_dataset_pathEvaluation dataset path
Dataset Probabilityeval_dataset_probSampling probability
Data Typeeval_dataset_typeSupported: erniekit, alpaca

2.2.4 Dataloader

WebUI ParamVariableDescription
Workersdataloader_num_workersSubprocess count (0 to disable)
Distributeddistributed_dataloaderSaves memory for large datasets

2.2.5 Optimizer

WebUI ParamVariableDescription
LR Schedulerlr_scheduler_typelinear/cosine/polynomial/constant/constant_with_warmup
Learning Ratelearning_rateSuggested: 3e-5 (SFT), 1e-6 (DPO), 3e-4 (SFT-LoRA), 1e-5 (DPO-LoRA)
Min LRmin_lrFor cosine scheduler only
Layerwise Decaylayerwise_lr_decay_bound(0, 1], 1=no decay
Warmup Stepswarmup_stepsTypically 1-10% of max_steps
OptimizeroptimDefault: adamw
Offload Optimoffload_optimOffload to CPU
Release Gradsrelease_gradsReduces peak memory (recommended: True)
Loss Scalingscale_lossFor float16 training
Weight Decayweight_decayAdamW parameter
Adam Epsilonadam_epsilonAdamW parameter
Adam Beta1adam_beta1AdamW parameter
Adam Beta2adam_beta2AdamW parameter

2.2.6 Output

WebUI ParamVariableDescription
Logging Stepslogging_stepsLog interval
Eval Stepseval_stepsEvaluation interval
Eval Strategyevaluation_strategy"steps" enables periodic evaluation
Save Stepssave_stepsCheckpoint interval (when save_strategy=="steps")
Save Strategysave_strategyCheckpoint saving method
Save Limitsave_total_limitMax checkpoints to keep

Training Module

2.3. Chat Module

Load models from Basic Info section. Click "Verify Model Loading" to check status, and "Unload" to release models.

*Note: Full-parameter checkpoints in output_dir take priority for deployment.

After successful loading:

  • Enter prompts in the input box
  • Set roles/system prompts
  • 【VL model】 Select "Enable VL Thought Mode" to enable thinking mode
  • 【VL model】 You can drag and drop to upload images or videos, or click to upload, or enter a URL
  • Click "Submit" to start chatting
  • View history in "Chat History"
  • "Clear" resets conversation
  • "Stop" interrupts generation
WebUI ParamVariableDescription
Max Lengthmax_model_lenInput+output token limit
PortportService port
Max New Tokensmax_new_tokensGeneration limit
Top-ptop_pNucleus sampling (higher=more diverse)
TemperaturetemperatureControls randomness (higher=more creative)

Chat Module

2.4. Evaluation Module

Select model in Basic Info (latest checkpoint in export dir used by default).

Choose evaluation dataset (built-in/custom). Click "Preview Eval Dataset" for visualization.

"Preview Command" shows configurations. "Start" begins evaluation, "Stop" interrupts.

WebUI ParamVariableDescription
Dataset Patheval_dataset_pathEvaluation dataset path
Dataset Probabilityeval_dataset_probSampling probability
Data Typeeval_dataset_typeSupported: erniekit, alpaca

Evaluation Module

2.5. Export Module

Two functions:

  1. LoRA weight merging
  2. Model weight splitting (safetensors format only)

LoRA Merging Set export directory to training output dir. Click "Start Merge LoRA Weights" to merge into original model (saved in export_dir/export).

Weight Splitting For large safetensors files, click "Start Split Model" to split weights (saved in export_dir/split_export).

WebUI ParamVariableDescription
Max Shard Size (GB)max_shard_sizeSplit file size limit

Export Module