Efficient-SAM2: Accelerating SAM2 with Object-Aware Visual Encoding and Memory Retrieval

February 7, 2026 · View on GitHub

Authors: Jing Zhang, Zhikai Li✉, Xuewen Liu, Qingyi Gu✉

(✉ denotes corresponding author.)

Intruduction

This repository contains the official implementation for the ICLR 2026 paper "Efficient-SAM2: Accelerating SAM2 with Object-Aware Visual Encoding and Memory Retrieval".

Overview
Create Environment
Prepare Models
Usage
Main Results
Reference
Acknowledgments

SAM2's perception pattern exhibite computational redundancy. i) The focused attention in mask decoder vs. broad attention span in image encoder shows unnecessary background computation. ii) In memory bank, only a small subset of tokens contribute significantly to memory attention, and the salient regions exhibit temporal consistency. motivation

Method

For image encoder, we introduce object-aware Sparse Window Routing (SWR), which assigns object-irrelevant background windows to a lightweight shortcut branch based on spatial-temporal consistency and perceptual saliency of the object, thus reducing encoding redundancy. For memory attention, we propose object-aware Sparse Memory Retrieval (SMR), which builds a FIFO mask queue to retrieval most salient memory tokens, in which the saliency patterns are reused from their first recollection, thereby reducing the computational cost. Method

Performance

Efficient-SAM2 wins a well-balanced accuracy–speed trade-off. Method

Create Environment

Prerequisites

The code requires python>=3.10, as well as torch>=2.5.1 and torchvision>=0.20.1. Please follow the instructions here to install both PyTorch and TorchVision dependencies. You can install SAM 2 on a GPU machine using:

git clone https://github.com/jingjing0419/Efficient-SAM2.git
cd sam3
pip install -e .

To use the SAM 2 predictor and run the example notebooks, jupyter and matplotlib are required and can be installed by:

pip install -e ".[notebooks]"

Prepare Models

All the model checkpoints can be downloaded by running:

cd checkpoints && \
./download_ckpts.sh && \
cd ..

or individually from:

Usage

Train Bypass

python tools/train_bypass_all.py \
    --apply_bypass \
    --apply_WB \
    --use_wandb \
    --train_epoch=5 \
    --train_step=32 \
    --lr=1e-4 \
    --base_video_dir=<PATH-TO-TRAINING-IMAGES> \   
    --input_mask_dir=<PATH-TO-TRAINING-ANNOTATION> \
    --video_list_file=./train_sel_v1.txt \
    --output_mask_dir=./outputs/SAV_train/sav_train_pred_pngs \
    --dataset='sav_train' \
    --sam2_model='base+' \
    --bypass_type='bottleneck'

Inference

The vos_inference_main.py script can be used to generate predictions for semi-supervised video object segmentation (VOS) evaluation on datasets such as DAVIS, MOSE or the SA-V dataset.

After installing SAM 2 and its dependencies, it can be used as follows (DAVIS 2017 dataset as an example). This script saves the prediction PNG files to the --output_mask_dir.

Run Efficient-SAM2 inference:

python tools/vos_inference_main.py \
--sam2_model='base+' --Mem_stride=1 --dataset='SAV_test' \
--apply_bypass --apply_WB --dilate_mask --WB_theta=0.7 \
--bypass_ckpt_base='./bypass/ckpt/bypass_bottleneck_base.pth' \
--prune_memory --topk_mask --set_drop_ratio=0.95 \
--output_mask_dir='./outputs2/'

Evaluation

Run SA-V evaluation:

python sav_evaluator.py \
--gt_root <PATH-TO-SAV-TEST/VAL-DATASET-GROUNDTRUTH> \
--pred_root <PATH-TO-MODEL-OUTPUT>

Star this repository if you find it helpful!