videoobjectsegmenting_mapper

February 4, 2026 · View on GitHub

Text-guided semantic segmentation of valid objects throughout the video (YOLOE + SAM2).

在整个视频中对有效物体进行文本引导的语义分割（YOLOE + SAM2）。

Type 算子类型: mapper

Tags 标签: gpu, hf, video

🔧 Parameter Configuration 参数配置

name 参数名	type 类型	default 默认值	desc 说明
`sam2_hf_model`	<class 'str'>	`'facebook/sam2.1-hiera-tiny'`
`yoloe_path`	<class 'str'>	`'yoloe-11l-seg.pt'`	The path to the YOLOE model.
`yoloe_conf`	<class 'float'>	`0.5`	Confidence threshold for YOLOE object detection.
`torch_dtype`	<class 'str'>	`'bf16'`	The floating point type used for model inference. Can be one of ['fp32', 'fp16', 'bf16'].
`if_binarize`	<class 'bool'>	`True`	Whether the final mask requires binarization. If 'if_save_visualization' is set to True, 'if_binarize' will automatically be adjusted to True.
`if_save_visualization`	<class 'bool'>	`False`	Whether to save visualization results.
`save_visualization_dir`	<class 'str'>	`DATA_JUICER_ASSETS_CACHE`	The path for saving visualization results.
`args`		`''`
`kwargs`		`''`

📊 Effect demonstration 效果演示

test

VideoObjectSegmentingMapper(sam2_hf_model='facebook/sam2.1-hiera-tiny', yoloe_path='yoloe-11l-seg.pt', yoloe_conf=0.2, torch_dtype='bf16', if_binarize=True, if_save_visualization=False)

📥 input data 输入数据

Sample 1: 1 video

video4.mp4:

main_character_list
['glasses', 'a woman', 'a window']

Sample 2: 1 video

video3.mp4:

main_character_list
['a laptop']

📤 output data 输出数据

Sample 1: empty

segment_data
[673, 3, 1, 360, 480]
cls_id_dict	3
object_cls_list
[3]
yoloe_conf_list
[3]

Sample 2: empty

segment_data
[1190, 1, 1, 640, 362]
cls_id_dict	1
object_cls_list
[1]
yoloe_conf_list
[1]

🔧 Parameter Configuration 参数配置

📊 Effect demonstration 效果演示

test

📥 input data 输入数据

📤 output data 输出数据

🔗 related links 相关链接