ACTIVE: Action from Robotic View 🤖

February 5, 2026 · View on GitHub

This repository contains the official PyTorch implementation and dataset for our paper: "Recognizing Actions from Robotic View for Natural Human-Robot Interaction", accepted at ICCV 2025.

pipeline

Update (Feb 2026): We apologize that the initial release had an incorrect data split configuration due to code organization issues. This has now been corrected. We have also included the pre-generated meta file datasets/active.list for convenience.

📜 About The Project

Natural Human-Robot Interaction (N-HRI) requires robots to recognize human actions at varying distances and states, regardless of whether the robot itself is in motion or stationary. This setup is more flexible and practical than conventional human action recognition tasks. However, existing benchmarks designed for traditional action recognition fail to address the unique complexities in N-HRI due to limited data, modalities, task categories, and diversity of subjects and environments. To address these challenges, we introduce ACTIVE (Action from Robotic View), a large-scale dataset tailored specifically for perception-centric robotic views prevalent in mobile service robots. ACTIVE comprises 30 composite action categories, 80 participants, and 46,868 annotated video instances, covering both RGB and point cloud modalities. Participants performed various human actions in diverse environments at distances ranging from 3m to 50m, while the camera platform was also mobile, simulating real-world scenarios of robot perception with varying camera heights due to uneven ground. This comprehensive and challenging benchmark aims to advance action and attribute recognition research in N-HRI. Furthermore, we propose ACTIVE-PC, a method that accurately perceives human actions at long distances using Multilevel Neighborhood Sampling, Layered Recognizers, Elastic Ellipse Query, and precise decoupling of kinematic interference from human actions. Experimental results demonstrate the effectiveness of ACTIVE-PC.

🚀 Getting Started

Follow these steps to set up the environment and run our code.

Prerequisites

Python 3.8+
PyTorch 1.12.0+
CUDA 11.3+

Installation

Clone the repository:

git clone https://github.com/your-username/ACTIVE.git
cd ACTIVE

Create a virtual environment and install dependencies:

# We recommend using conda or venv
pip install -r requirements.txt

Compile custom CUDA operators: Our model requires custom CUDA extensions for efficient point cloud processing.

PointNet++ Layers:

cd modules/
python setup.py install
cd ..

k-Nearest Neighbors (kNN):

pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl

For a detailed environment specification, please see the requirements.yml file.

📊 Dataset

You can download the dataset from either Hugging Face or Baidu Netdisk:

File Naming Convention

The video files in the dataset follow the format PxxxSxxxAxxxRxxx. Each component represents:

Component	Meaning	Description
P	Person	Subject ID (e.g., `P002` is Person 2)
S	Scene	Scene ID
A	Action	Action Label
R	Repetition	Repetition Count

Example: P002S003A001R001 means Person 2, Scene 3, Action 1, Repetition 1.

Data Preparation (Meta Files)

The training script (active-train.py) requires a meta file path (passed via --data-meta). We provide a pre-generated meta file datasets/active.list in this repository.

Meta File Format:

It is a text file where each line contains the video name and the frame count, separated by a space.

P001S003A001R001 24
P001S003A001R002 16
...

Evaluation Protocols

For reproducibility, we follow a standard Cross-Subject Evaluation protocol. The 80 participants are split as follows:

Training Set: 57 subjects
Testing Set: 23 subjects

The participant IDs used for the training group are: 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 27, 28, 30, 34, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 52, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 76, 77, 78, 79

The remaining 23 subjects are used for testing.

⚡️ Usage

Training

active-train.py trains on the training split and evaluates on the test split at the end of every epoch. CUDA is required (the script sets device='cuda').

python active-train.py \
  --data-path /path/to/ACTIVE \
  --data-meta datasets/active.list \
  --output-dir ./outputs/activepc

Resume training:

python active-train.py \
  --data-path /path/to/ACTIVE \
  --data-meta datasets/active.list \
  --output-dir ./outputs/activepc \
  --resume ./outputs/activepc/checkpoint.pth

Testing (during training)

No separate command is required to “enable testing.” The script always runs evaluation on the test split each epoch and logs:

Test-only (how to implement)

active-train.py does not include a built-in “test-only” CLI mode. If you want to evaluate a checkpoint without running another training epoch, add a tiny flag and early return:

Add an argument in parse_args():

parser.add_argument('--test-only', action='store_true',
                    help='Run evaluation on the test split and exit')

Short-circuit in main(args) after loading the model & checkpoint and after building data_loader_test:

if args.test_only:
    if not args.resume:
        logging.warning('No --resume provided; evaluating random-initialized weights.')
    evaluate(model, criterion, data_loader_test, device=device)
    return

Then you can run:

python active-train.py \
  --data-path /path/to/ACTIVE \
  --data-meta /path/to/split_or_meta_file \
  --model ACTIVEPC \
  --resume /path/to/model_or_checkpoint.pth \
  --test-only

This will load the checkpoint and only run the evaluation once on the test split, then exit.

🙏 Acknowledgements

This work is built upon the foundational codebase of PSTNet.

✍️ Citation

If you use our dataset or model in your research, please consider citing our paper:

@inproceedings{wang2025recognizing,
  title={Recognizing Actions from Robotic View for Natural Human-Robot Interaction},
  author={Wang, Ziyi and Li, Peiming and Liu, Hong and Deng, Zhichao and Wang, Can and Liu, Jun and Yuan, Junsong and Liu, Mengyuan},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={14218--14227},
  year={2025}
}