README.md

March 30, 2026 · View on GitHub

SAMKit

Segment Anything, right on your iPhone.

Install · Quick Start · Demo App · Download Models

SAMKit Demo

SAMKit brings Meta's Segment Anything Model to iOS as a native Swift package. Tap, draw, or describe any object to instantly segment it — all inference runs on-device with Core ML, no server required.

Features

Point & Box — Tap a point or drag a bounding box to segment any object
Text Prompt — Type "dog" or "red cup" to find and segment objects, powered by YOLO-World + CLIP
Subject Lift — Long-press to lift the segmented object from the scene, then copy, save, or share as a transparent PNG
Two Backbones — MobileSAM (fast, 23 MB) and SAM2 Tiny (accurate, 76 MB)
Drop-in UI — Pre-built SwiftUI views for shipping a segmentation feature in minutes
Fully On-Device — Neural Engine / GPU acceleration, FP16, zero network calls

Requirements

iOS 15.0+
Xcode 14.0+
Swift 5.7+

Installation

1. Add the Swift Package

dependencies: [
    .package(url: "https://github.com/john-rocky/SamKit.git", from: "1.0.0")
]

Product	What it does
`SAMKit`	Core segmentation engine (point / box)
`SAMKitGrounding`	Open-vocabulary text detection (YOLO-World + CLIP)
`SAMKitUI`	Ready-made SwiftUI views

2. Download Models

Grab the .mlpackage files from Releases and drag them into your Xcode project.

MobileSAM — 23 MB (required)

File	Size
`mobile_sam_encoder.mlpackage`	13 MB
`mobile_sam_decoder.mlpackage`	9.8 MB
`mobile_sam_prompt_encoder_weights.json`	40 KB

SAM2 Tiny — 76 MB (optional)

File	Size
`SAM2TinyImageEncoderFLOAT16.mlpackage`	64 MB
`SAM2TinyPromptEncoderFLOAT16.mlpackage`	2.0 MB
`SAM2TinyMaskDecoderFLOAT16.mlpackage`	9.8 MB

Grounding (YOLO-World + CLIP) — 148 MB (optional)

File	Size
`clip_text_encoder.mlpackage`	121 MB
`yoloworld_detector.mlpackage`	25 MB
`clip_vocab.json`	1.6 MB
`cv4_params.json`	4 KB

Quick Start

Point & Box Segmentation

import SAMKit

let session = try SamSession(
    model: .bundled(.mobileSam),
    config: .bestAvailable
)

try session.setImage(cgImage)

let result = try session.predict(
    points: [SamPoint(x: 100, y: 200, label: .positive)]
)

let mask = result.masks.first!   // .cgImage, .alpha, .score

SAM2 Tiny

import SAMKit

let session = try Sam2Session(
    modelName: "SAM2Tiny",
    config: .bestAvailable
)

try session.setImage(cgImage)
let result = try session.predict(
    points: [SamPoint(x: 100, y: 200, label: .positive)]
)

Text-Prompted Segmentation

import SAMKit
import SAMKitGrounding

let session = try TextSegmentationSession(
    groundingModel: .bundled(),
    samModel: .bundled(.mobileSam)
)

try session.setImage(cgImage)
let result = try session.segment(query: "dog, cat")
// result.masks      — segmentation masks
// result.detections — bounding boxes + labels

Subject Lifting

import SAMKit

// After segmentation, extract the object with transparency
let extracted = SamMask.extractObject(from: cgImage, masks: result.masks)
// Returns a CGImage with transparent background — ready for copy/save/share

Architecture

SAMKit/
├── runtime/apple/
│   ├── SAMKit/            # Core inference engine
│   ├── SAMKitGrounding/   # YOLO-World + CLIP text detection
│   └── SAMKitUI/          # SwiftUI components
├── models/converters/     # PyTorch -> Core ML conversion scripts
├── samples/ios-sample/    # Full demo app
└── CLAUDE.md

Sample App

git clone https://github.com/john-rocky/SamKit.git
open samples/ios-sample/SAMKitDemo.xcodeproj

Download models from Releases, add to the project, and run on a physical device.

Model Conversion

Convert from PyTorch checkpoints yourself:

cd models/converters
pip install -r requirements.txt

# MobileSAM
python convert_to_coreml.py --model mobile_sam

# SAM2 Tiny
python convert_sam2_to_coreml.py

# YOLO-World (S/M/L/X)
python convert_yoloworld_to_coreml.py --size s

License

Apache 2.0 — see LICENSE for details.

Acknowledgments

Segment Anything & SAM 2 — Meta AI
MobileSAM — Chaoning Zhang et al.
YOLO-World — Tencent AILab
OpenAI CLIP