Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving

April 8, 2026 · View on GitHub

Zehao Wang¹, Huaide Jiang¹, Shuaiwu Dong¹, Yuping Wang^1,2, Hang Qiu¹, Jiachen Li^1*

¹University of California, Riverside ²University of Michigan ^*Corresponding author

Abstract

Human driving behavior is inherently personal, shaped by long-term habits and influenced by short-term intentions. Individuals differ in how they accelerate, brake, merge, yield, and overtake across diverse situations. However, existing end-to-end autonomous driving systems either optimize for generic objectives or rely on fixed driving modes, lacking the ability to adapt to individual preferences or interpret natural language intent.

To address this gap, we propose Drive My Way (DMW), a personalized Vision-Language-Action (VLA) driving framework that aligns with users' long-term driving habits and adapts to real-time user instructions. DMW learns a user embedding from our personalized driving dataset collected across multiple real drivers and conditions the policy on this embedding during planning, while natural language instructions provide additional short-term guidance. Closed-loop evaluation on the Bench2Drive benchmark demonstrates that DMW improves style instruction adaptation, and user studies show that its generated behaviors are recognizable as each driver's own style, highlighting personalization as a key capability for human-centered autonomous driving.

Key Features

Long-term preference learning — A contrastive preference encoder learns user embeddings from structured driver profiles and historical driving behavior, capturing stable individual driving habits.
Short-term instruction alignment — Natural language instructions at runtime steer the policy toward the user's immediate intent (e.g., aggressive vs. conservative maneuvers).
GRPO-based policy alignment — Group Relative Policy Optimization with style-aware rewards aligns the VLA policy to diverse user preferences without relying on explicit human feedback.
Personalized Driving Dataset (PDD) — Real human driving demonstrations across diverse CARLA scenarios, collected with a steering wheel setup across multiple drivers and conditions.

Method Overview

DMW Architecture

Given camera observations and navigation goals, DMW fuses the driver's long-term preferences (via a learned user embedding) with real-time natural language instructions to produce adaptive, personalized actions.

Personalized Driving Dataset (PDD)

PDD collects real human driving demonstrations across diverse scenarios in CARLA using a steering wheel setup. It covers a wide range of interactive scenarios: cut-ins, pedestrians, obstacle avoidance, merging, and more.

Download: PDD on Hugging Face

Sample drivers from the dataset, recorded at 2× speed:

Driver 01

Driver 02

Driver 14

Directory Structure

DMW/
├── grpo/                       # GRPO post-training (to be released)
├── checkpoints/                # Checkpoints (to be released)
├── model/                      # Model arch
├── team_code/                  # CARLA agent
├── leaderboard/                # CARLA leaderboard evaluation
├── scenario_runner/            # CARLA scenario runner
├── pretrained/                 # Base VLM checkpoint (InternVL2-1B)
├── data/                       # Route configs

Prerequisites

Linux (Ubuntu 20.04+ recommended)
Conda / Miniconda
CUDA 12.1 (for PyTorch 2.2.0 + flash-attn)
CARLA 0.9.15 simulator

Installation

1. Create the Conda Environment

conda env create -f environment.yaml
conda activate dmw

This installs Python 3.8 and base system packages. All Python dependencies are installed via pip inside the conda env.

2. Install Remaining pip Dependencies

pip install -r requirements.txt

Install flash-attention (optional but recommended for speed)

pip install flash-attn==2.7.0.post2 --no-build-isolation

3. Install the Custom TRL Library (GRPO)

This repo contains a stripped-down TRL fork with only GRPO training support.

cd grpo
pip install -e .
cd ..

The custom TRL requires:

accelerate >= 1.4.0
datasets >= 3.0.0
transformers >= 4.55.0

These are already covered by requirements.txt.

# Edit these paths in setup_carla.sh
export CARLA_ROOT=/home/<user>/carla0915
export WORK_DIR=/home/<user>/Downloads/DMW

# Then source it
source setup_carla.sh

This sets the following PYTHONPATH entries:

$CARLA_ROOT/PythonAPI/carla
$WORK_DIR/scenario_runner_autopilot
$WORK_DIR/leaderboard_autopilot
$WORK_DIR/grpo

Add source /path/to/setup_carla.sh to your .bashrc / .zshrc to persist across sessions.

5. Download Pretrained Model

The training pipeline uses InternVL2-1B as the base vision-language model.

# Expected path: pretrained/InternVL2-1B/
huggingface-cli download OpenGVLab/InternVL2-1B --local-dir pretrained/InternVL2-1B

6. Verify Installation

conda activate dmw
python -c "import trl; from trl import GRPOTrainer, GRPOConfig; print('TRL OK')"
python -c "import torch; print('PyTorch:', torch.__version__); print('CUDA:', torch.cuda.is_available())"
python -c "import transformers; print('Transformers:', transformers.__version__)"

Common Issues

carla module not found

Ensure setup_carla.sh is sourced and $CARLA_ROOT/PythonAPI/carla is on PYTHONPATH.

flash_attn build fails

Match your CUDA version exactly. Use nvcc --version and python -c "import torch; print(torch.version.cuda)" to confirm alignment.

transformers version conflict

TRL requires >= 4.55.0 while environment.yaml pins 4.46.3. After conda env create, upgrade via:
```
pip install "transformers>=4.55.0"
```

DeepSpeed compilation errors

Ensure ninja is installed: pip install ninja
Set DS_BUILD_OPS=0 to disable custom CUDA kernel compilation during import.

Acknowledgements

We sincerely thank the researchers and developers for SimLingo for their amazing work.

Citation

If you find this work useful, please cite:

@misc{wang2026drivewaypreferencealignment,
      title={Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving}, 
      author={Zehao Wang and Huaide Jiang and Shuaiwu Dong and Yuping Wang and Hang Qiu and Jiachen Li},
      year={2026},
      eprint={2603.25740},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2603.25740}, 
}

More Works from TASL