Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving
April 8, 2026 · View on GitHub
Zehao Wang1, Huaide Jiang1, Shuaiwu Dong1, Yuping Wang1,2, Hang Qiu1, Jiachen Li1*
1University of California, Riverside 2University of Michigan *Corresponding author
Abstract
Human driving behavior is inherently personal, shaped by long-term habits and influenced by short-term intentions. Individuals differ in how they accelerate, brake, merge, yield, and overtake across diverse situations. However, existing end-to-end autonomous driving systems either optimize for generic objectives or rely on fixed driving modes, lacking the ability to adapt to individual preferences or interpret natural language intent.
To address this gap, we propose Drive My Way (DMW), a personalized Vision-Language-Action (VLA) driving framework that aligns with users' long-term driving habits and adapts to real-time user instructions. DMW learns a user embedding from our personalized driving dataset collected across multiple real drivers and conditions the policy on this embedding during planning, while natural language instructions provide additional short-term guidance. Closed-loop evaluation on the Bench2Drive benchmark demonstrates that DMW improves style instruction adaptation, and user studies show that its generated behaviors are recognizable as each driver's own style, highlighting personalization as a key capability for human-centered autonomous driving.
Key Features
- Long-term preference learning — A contrastive preference encoder learns user embeddings from structured driver profiles and historical driving behavior, capturing stable individual driving habits.
- Short-term instruction alignment — Natural language instructions at runtime steer the policy toward the user's immediate intent (e.g., aggressive vs. conservative maneuvers).
- GRPO-based policy alignment — Group Relative Policy Optimization with style-aware rewards aligns the VLA policy to diverse user preferences without relying on explicit human feedback.
- Personalized Driving Dataset (PDD) — Real human driving demonstrations across diverse CARLA scenarios, collected with a steering wheel setup across multiple drivers and conditions.
Method Overview
Given camera observations and navigation goals, DMW fuses the driver's long-term preferences (via a learned user embedding) with real-time natural language instructions to produce adaptive, personalized actions.
Personalized Driving Dataset (PDD)
PDD collects real human driving demonstrations across diverse scenarios in CARLA using a steering wheel setup. It covers a wide range of interactive scenarios: cut-ins, pedestrians, obstacle avoidance, merging, and more.
Download: PDD on Hugging Face
Sample drivers from the dataset, recorded at 2× speed:



Environment Setup
Directory Structure
DMW/
├── grpo/ # GRPO post-training (to be released)
├── checkpoints/ # Checkpoints (to be released)
├── model/ # Model arch
├── team_code/ # CARLA agent
├── leaderboard/ # CARLA leaderboard evaluation
├── scenario_runner/ # CARLA scenario runner
├── pretrained/ # Base VLM checkpoint (InternVL2-1B)
├── data/ # Route configs
Prerequisites
- Linux (Ubuntu 20.04+ recommended)
- Conda / Miniconda
- CUDA 12.1 (for PyTorch 2.2.0 + flash-attn)
- CARLA 0.9.15 simulator
Installation
1. Create the Conda Environment
conda env create -f environment.yaml
conda activate dmw
This installs Python 3.8 and base system packages. All Python dependencies are installed via pip inside the conda env.
2. Install Remaining pip Dependencies
pip install -r requirements.txt
Install flash-attention (optional but recommended for speed)
pip install flash-attn==2.7.0.post2 --no-build-isolation
3. Install the Custom TRL Library (GRPO)
This repo contains a stripped-down TRL fork with only GRPO training support.
cd grpo
pip install -e .
cd ..
The custom TRL requires:
accelerate >= 1.4.0
datasets >= 3.0.0
transformers >= 4.55.0
These are already covered by requirements.txt.
4. Set Up CARLA
Download CARLA 0.9.15
Download and extract CARLA 0.9.15 to your system (e.g., /home/<user>/carla0915).
Official download: https://github.com/carla-simulator/carla/releases/tag/0.9.15
Configure Environment Variables
Edit setup_carla.sh to match your paths, then source it:
# Edit these paths in setup_carla.sh
export CARLA_ROOT=/home/<user>/carla0915
export WORK_DIR=/home/<user>/Downloads/DMW
# Then source it
source setup_carla.sh
This sets the following PYTHONPATH entries:
$CARLA_ROOT/PythonAPI/carla$WORK_DIR/scenario_runner_autopilot$WORK_DIR/leaderboard_autopilot$WORK_DIR/grpo
Add source /path/to/setup_carla.sh to your .bashrc / .zshrc to persist across sessions.
5. Download Pretrained Model
The training pipeline uses InternVL2-1B as the base vision-language model.
# Expected path: pretrained/InternVL2-1B/
huggingface-cli download OpenGVLab/InternVL2-1B --local-dir pretrained/InternVL2-1B
6. Verify Installation
conda activate dmw
python -c "import trl; from trl import GRPOTrainer, GRPOConfig; print('TRL OK')"
python -c "import torch; print('PyTorch:', torch.__version__); print('CUDA:', torch.cuda.is_available())"
python -c "import transformers; print('Transformers:', transformers.__version__)"
Common Issues
carla module not found
- Ensure
setup_carla.shis sourced and$CARLA_ROOT/PythonAPI/carlais onPYTHONPATH.
flash_attn build fails
- Match your CUDA version exactly. Use
nvcc --versionandpython -c "import torch; print(torch.version.cuda)"to confirm alignment.
transformers version conflict
- TRL requires
>= 4.55.0whileenvironment.yamlpins4.46.3. Afterconda env create, upgrade via:pip install "transformers>=4.55.0"
DeepSpeed compilation errors
- Ensure
ninjais installed:pip install ninja - Set
DS_BUILD_OPS=0to disable custom CUDA kernel compilation during import.
Acknowledgements
We sincerely thank the researchers and developers for SimLingo for their amazing work.
Citation
If you find this work useful, please cite:
@misc{wang2026drivewaypreferencealignment,
title={Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving},
author={Zehao Wang and Huaide Jiang and Shuaiwu Dong and Yuping Wang and Hang Qiu and Jiachen Li},
year={2026},
eprint={2603.25740},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2603.25740},
}