auge

July 1, 2026 · View on GitHub

Apple's on-device Vision framework from the command line — OCR, image classification, barcode detection, and face detection.

No API keys. No cloud. No network. The Vision framework is already on your Mac — auge lets you use it from the terminal.

What is this

Every Mac ships with Apple's Vision framework — a powerful on-device computer vision engine for text recognition, image classification, barcode scanning, and face detection. But it's only accessible through Swift/ObjC code. auge wraps it in a UNIX CLI — so you can use it from the terminal, shell scripts, and pipelines.

OCR — extract text from screenshots, scans, PDFs
Classification — identify what's in an image (1000+ categories)
Barcodes — scan QR codes, EAN, Code128, and more
Face detection — count faces and get bounding boxes
Pipe-friendly — works with jq, xargs, apfel, and shell scripts
Zero cost — no API keys, no cloud, no subscriptions, no dependencies

Requirements & Install

macOS 26 (Tahoe) — auge bundles Vision capabilities (document parsing, lens smudge, image aesthetics) that require the Tahoe baseline.
Building from source requires Command Line Tools with Swift 6.3. No Xcode required.

Homebrew (recommended):

brew tap Arthur-Ficial/tap
brew install Arthur-Ficial/tap/auge

Build from source:

git clone https://github.com/Arthur-Ficial/auge.git
cd auge
make install

Quick Start

# OCR — extract text from images
auge --ocr screenshot.png
auge --ocr scan.pdf
auge --ocr image1.png image2.png

# Classification — what's in this image?
auge --classify photo.jpg
auge --classify photo.jpg --top 5

# Barcodes — scan QR codes and barcodes
auge --barcode product.jpg

# Face detection — count and locate faces
auge --faces group.jpg

JSON output

auge --ocr screenshot.png -o json | jq .results.lines
auge --classify photo.jpg -o json | jq '.results.classifications[:3]'
auge --faces group.jpg -o json | jq .results.count

{
  "file" : "screenshot.png",
  "metadata" : {
    "on_device" : true,
    "schema" : "2",
    "version" : "1.8.0"
  },
  "mode" : "ocr",
  "results" : {
    "lines" : ["Hello", "World"],
    "text" : "Hello\nWorld"
  }
}

JSON keys are uniformly snake_case (schema 2). Compound keys are snaked too — e.g. feature_print, line_details, angle_radians, persons_mask.

Piping

# OCR a screenshot, summarize with apfel
auge --ocr screenshot.png | apfel "summarize this"

# OCR all PNGs in a directory
ls *.png | auge --ocr

# Pipe file paths via stdin
find . -name "*.jpg" | auge --classify --top 3

# Chain with jq for structured extraction
auge --ocr receipt.png -o json | jq -r .results.text

Demos

See demo/ for real-world shell scripts powered by auge.

screenshot — capture screen and extract text instantly:

demo/screenshot                    # full screen OCR
demo/screenshot -r                 # drag to select a region
demo/screenshot -c                 # copy text to clipboard
demo/screenshot | grep "error"     # find errors on screen

clipboard-ocr — OCR an image from the clipboard:

# Press Cmd+Ctrl+Shift+4 to screenshot a region to clipboard, then:
demo/clipboard-ocr                 # print extracted text
demo/clipboard-ocr -c              # replace clipboard image with text
demo/clipboard-ocr | apfel "summarize this"

describe — describe an image in natural language (auge + apfel):

demo/describe photo.jpg            # "A cat sleeping on a blue couch..."
demo/describe screenshot.png -c    # describe and copy

translate — OCR text from image and translate (auge + apfel):

demo/translate menu.jpg            # translate to English
demo/translate -l German sign.png  # translate to German
demo/translate -l Japanese doc.pdf

receipt — extract structured data from receipt photos (auge + apfel):

demo/receipt grocery.jpg           # vendor, date, total, items
demo/receipt -j scan.png | jq .total

explain-image — full image analysis (auge + apfel):

demo/explain-image screenshot.png  # classify + OCR + faces + barcodes → explanation
demo/explain-image error.png -c    # explain an error dialog

Also in demo/:

qr — read QR codes and barcodes, optionally open URLs
sort-images — classify all images in a directory, group by category
diff-text — OCR two images and diff the extracted text
faces — count faces across photos with per-file summary
monitor — watch mode: periodic screen OCR, alert on text changes or pattern match

CLI Reference

auge --ocr <image>              Extract text from image (OCR)
auge --classify <image>         Classify image content
auge --barcode <image>          Detect barcodes and QR codes
auge --faces <image>            Detect faces
auge --release                  Show detailed release and build info

Options

Flag	Description
`-o, --output <fmt>`	Output format: `plain`, `json`, `md`, or `ndjson`
`--plain` / `--json` / `--md` / `--ndjson`	Shorthand for `-o <fmt>`
`--compact`	Single-line compact JSON (when output is JSON)
`-q, --quiet`	Suppress non-essential output
`--no-color`	Disable ANSI colors
`--clipboard`	Read image from the macOS clipboard (NSPasteboard)
`--dpi <n>`	PDF rasterization DPI 72-600 (default: 200)
`--prefer-embedded`	Use PDF text layer when present (default)
`--no-prefer-embedded`	Force OCR even on searchable PDFs
`--langs <a,b,c>`	BCP-47 OCR language hints (e.g. `en-US,de-DE`)
`--enhance`	Upscale tiny images before OCR (helps small text)
`--clean`	FoundationModels post-pass: dehyphenate, reflow, fix OCR errors (macOS 26+)
`--top <n>`	Max classification results (default: 10)
`--min-confidence <n>`	Min confidence threshold 0-1 (default: 0.01)
`-v, --version`	Print version
`--release`	Show detailed version, build, and capability info
`-h, --help`	Show help

Exit Codes

Code	Meaning
0	Success (also: no text/results found — not an error)
1	Runtime error (bad file, invalid image, analysis failure)
2	Usage error (bad flags, missing arguments)
5	Vision framework unavailable

Environment Variables

Variable	Description
`NO_COLOR`	Disable colors (no-color.org)

Vision Capabilities

Mode	Framework Request	macOS	Output
`--ocr`	`VNRecognizeTextRequest`	10.15+	Text lines
`--classify`	`VNClassifyImageRequest`	12+	Labels with confidence
`--barcode`	`VNDetectBarcodesRequest`	10.13+	Payload + symbology
`--faces`	`VNDetectFaceRectanglesRequest`	10.13+	Count + bounding boxes

Supported Image Formats

PNG, JPEG, TIFF, BMP, GIF, HEIC, PDF

Architecture

CLI (--ocr/--classify/--barcode/--faces)
  │
  ├─→ ImageSource.validatePath()     — file validation (AugeCore)
  ├─→ Analyzer.recognizeText()       — VNRecognizeTextRequest
  ├─→ Analyzer.classifyImage()       — VNClassifyImageRequest
  ├─→ Analyzer.detectBarcodes()      — VNDetectBarcodesRequest
  └─→ Analyzer.detectFaces()         — VNDetectFaceRectanglesRequest
       │
       └─→ Vision framework (100% on-device, zero network)

Built with Swift 6.3 strict concurrency. Single Package.swift, three targets:

AugeCore — pure logic library (no Vision dependency, unit-testable)
auge — executable (CLI + Vision framework)
auge-tests — 115 unit tests, pure Swift runner (no XCTest)

No Xcode required. Builds and tests with Command Line Tools only.

Build & Test

# Build + install (auto-bumps patch version each time)
make install                    # build release + install to /usr/local/bin
make build                      # build release only (no install)

# Version management (zero manual editing)
make version                    # print current version
make release-minor              # bump minor: 0.0.x -> 0.1.0
make release-major              # bump major: 0.x.y -> 1.0.0

# Debug build (no version bump)
swift build                     # quick debug build

# Tests
swift run auge-tests            # 115 pure Swift unit tests (no XCTest needed)
bash Tests/integration/run.sh   # 17 integration tests (end-to-end CLI)

Every make build/make install automatically:

Bumps the patch version (.version file is the single source of truth)
Updates the README version badge
Generates build metadata (commit, date, Swift version) viewable via auge --release

Test Coverage

Suite	Tests	Covers
AugeErrorTests	10	Error classification, CLI labels, exit codes, messages
AugeErrorDeepTests	18	Every keyword variant, case insensitivity, passthrough, cross-type
ImageSourceTests	16	Extension validation, path validation
ImageSourceDeepTests	11	Unicode paths, URL correctness, error propagation, exhaustive formats
ResultFormatterTests	15	OCR, classification, barcode, face formatting + JSON encoding
ResultFormatterDeepTests	23	Exact formats, boundary values, round-trip JSON, key names, unicode
CLIParsingTests	22	Edge cases, dotfiles, special chars, stress tests
Integration	17	CLI basics, exit codes, OCR, classify, faces, barcodes, piping, quiet
Total	132

Part of the apfel ecosystem

Tool	What	Apple Framework	Repo
apfel	LLM (text generation)	FoundationModels	Arthur-Ficial/apfel
ohr	Speech-to-text	SpeechAnalyzer	Arthur-Ficial/ohr
kern	Text embeddings	NLContextualEmbedding	Arthur-Ficial/kern
auge	Vision / OCR	Vision	Arthur-Ficial/auge

Meta-repo: apfel-ecosystem

License

MIT