auge

June 1, 2026 · View on GitHub

Version 1.8.0 Swift 6.3+ macOS 26+ No Xcode Required License: MIT 100% On-Device

Apple's on-device Vision framework from the command line — OCR, image classification, barcode detection, and face detection.

No API keys. No cloud. No network. The Vision framework is already on your Mac — auge lets you use it from the terminal.

What is this

Every Mac ships with Apple's Vision framework — a powerful on-device computer vision engine for text recognition, image classification, barcode scanning, and face detection. But it's only accessible through Swift/ObjC code. auge wraps it in a UNIX CLI — so you can use it from the terminal, shell scripts, and pipelines.

  • OCR — extract text from screenshots, scans, PDFs
  • Classification — identify what's in an image (1000+ categories)
  • Barcodes — scan QR codes, EAN, Code128, and more
  • Face detection — count faces and get bounding boxes
  • Pipe-friendly — works with jq, xargs, apfel, and shell scripts
  • Zero cost — no API keys, no cloud, no subscriptions, no dependencies

Requirements & Install

  • macOS 26 (Tahoe) — auge bundles Vision capabilities (document parsing, lens smudge, image aesthetics) that require the Tahoe baseline.
  • Building from source requires Command Line Tools with Swift 6.3. No Xcode required.

Homebrew (recommended):

brew tap Arthur-Ficial/tap
brew install Arthur-Ficial/tap/auge

Build from source:

git clone https://github.com/Arthur-Ficial/auge.git
cd auge
make install

Quick Start

# OCR — extract text from images
auge --ocr screenshot.png
auge --ocr scan.pdf
auge --ocr image1.png image2.png

# Classification — what's in this image?
auge --classify photo.jpg
auge --classify photo.jpg --top 5

# Barcodes — scan QR codes and barcodes
auge --barcode product.jpg

# Face detection — count and locate faces
auge --faces group.jpg

JSON output

auge --ocr screenshot.png -o json | jq .results.lines
auge --classify photo.jpg -o json | jq '.results.classifications[:3]'
auge --faces group.jpg -o json | jq .results.count
{
  "file" : "screenshot.png",
  "metadata" : {
    "on_device" : true,
    "schema" : "2",
    "version" : "1.8.0"
  },
  "mode" : "ocr",
  "results" : {
    "lines" : ["Hello", "World"],
    "text" : "Hello\nWorld"
  }
}

JSON keys are uniformly snake_case (schema 2). Compound keys are snaked too — e.g. feature_print, line_details, angle_radians, persons_mask.

Piping

# OCR a screenshot, summarize with apfel
auge --ocr screenshot.png | apfel "summarize this"

# OCR all PNGs in a directory
ls *.png | auge --ocr

# Pipe file paths via stdin
find . -name "*.jpg" | auge --classify --top 3

# Chain with jq for structured extraction
auge --ocr receipt.png -o json | jq -r .results.text

Demos

See demo/ for real-world shell scripts powered by auge.

screenshot — capture screen and extract text instantly:

demo/screenshot                    # full screen OCR
demo/screenshot -r                 # drag to select a region
demo/screenshot -c                 # copy text to clipboard
demo/screenshot | grep "error"     # find errors on screen

clipboard-ocr — OCR an image from the clipboard:

# Press Cmd+Ctrl+Shift+4 to screenshot a region to clipboard, then:
demo/clipboard-ocr                 # print extracted text
demo/clipboard-ocr -c              # replace clipboard image with text
demo/clipboard-ocr | apfel "summarize this"

describe — describe an image in natural language (auge + apfel):

demo/describe photo.jpg            # "A cat sleeping on a blue couch..."
demo/describe screenshot.png -c    # describe and copy

translate — OCR text from image and translate (auge + apfel):

demo/translate menu.jpg            # translate to English
demo/translate -l German sign.png  # translate to German
demo/translate -l Japanese doc.pdf

receipt — extract structured data from receipt photos (auge + apfel):

demo/receipt grocery.jpg           # vendor, date, total, items
demo/receipt -j scan.png | jq .total

explain-image — full image analysis (auge + apfel):

demo/explain-image screenshot.png  # classify + OCR + faces + barcodes → explanation
demo/explain-image error.png -c    # explain an error dialog

Also in demo/:

  • qr — read QR codes and barcodes, optionally open URLs
  • sort-images — classify all images in a directory, group by category
  • diff-text — OCR two images and diff the extracted text
  • faces — count faces across photos with per-file summary
  • monitor — watch mode: periodic screen OCR, alert on text changes or pattern match

CLI Reference

auge --ocr <image>              Extract text from image (OCR)
auge --classify <image>         Classify image content
auge --barcode <image>          Detect barcodes and QR codes
auge --faces <image>            Detect faces
auge --release                  Show detailed release and build info

Options

FlagDescription
-o, --output <fmt>Output format: plain, json, md, or ndjson
--plain / --json / --md / --ndjsonShorthand for -o <fmt>
--compactSingle-line compact JSON (when output is JSON)
-q, --quietSuppress non-essential output
--no-colorDisable ANSI colors
--clipboardRead image from the macOS clipboard (NSPasteboard)
--dpi <n>PDF rasterization DPI 72-600 (default: 200)
--prefer-embeddedUse PDF text layer when present (default)
--no-prefer-embeddedForce OCR even on searchable PDFs
--langs <a,b,c>BCP-47 OCR language hints (e.g. en-US,de-DE)
--enhanceUpscale tiny images before OCR (helps small text)
--cleanFoundationModels post-pass: dehyphenate, reflow, fix OCR errors (macOS 26+)
--top <n>Max classification results (default: 10)
--min-confidence <n>Min confidence threshold 0-1 (default: 0.01)
-v, --versionPrint version
--releaseShow detailed version, build, and capability info
-h, --helpShow help

Exit Codes

CodeMeaning
0Success (also: no text/results found — not an error)
1Runtime error (bad file, invalid image, analysis failure)
2Usage error (bad flags, missing arguments)
5Vision framework unavailable

Environment Variables

VariableDescription
NO_COLORDisable colors (no-color.org)

Vision Capabilities

ModeFramework RequestmacOSOutput
--ocrVNRecognizeTextRequest10.15+Text lines
--classifyVNClassifyImageRequest12+Labels with confidence
--barcodeVNDetectBarcodesRequest10.13+Payload + symbology
--facesVNDetectFaceRectanglesRequest10.13+Count + bounding boxes

Supported Image Formats

PNG, JPEG, TIFF, BMP, GIF, HEIC, PDF

Architecture

CLI (--ocr/--classify/--barcode/--faces)

  ├─→ ImageSource.validatePath()     — file validation (AugeCore)
  ├─→ Analyzer.recognizeText()       — VNRecognizeTextRequest
  ├─→ Analyzer.classifyImage()       — VNClassifyImageRequest
  ├─→ Analyzer.detectBarcodes()      — VNDetectBarcodesRequest
  └─→ Analyzer.detectFaces()         — VNDetectFaceRectanglesRequest

       └─→ Vision framework (100% on-device, zero network)

Built with Swift 6.3 strict concurrency. Single Package.swift, three targets:

  • AugeCore — pure logic library (no Vision dependency, unit-testable)
  • auge — executable (CLI + Vision framework)
  • auge-tests — 115 unit tests, pure Swift runner (no XCTest)

No Xcode required. Builds and tests with Command Line Tools only.

Build & Test

# Build + install (auto-bumps patch version each time)
make install                    # build release + install to /usr/local/bin
make build                      # build release only (no install)

# Version management (zero manual editing)
make version                    # print current version
make release-minor              # bump minor: 0.0.x -> 0.1.0
make release-major              # bump major: 0.x.y -> 1.0.0

# Debug build (no version bump)
swift build                     # quick debug build

# Tests
swift run auge-tests            # 115 pure Swift unit tests (no XCTest needed)
bash Tests/integration/run.sh   # 17 integration tests (end-to-end CLI)

Every make build/make install automatically:

  • Bumps the patch version (.version file is the single source of truth)
  • Updates the README version badge
  • Generates build metadata (commit, date, Swift version) viewable via auge --release

Test Coverage

SuiteTestsCovers
AugeErrorTests10Error classification, CLI labels, exit codes, messages
AugeErrorDeepTests18Every keyword variant, case insensitivity, passthrough, cross-type
ImageSourceTests16Extension validation, path validation
ImageSourceDeepTests11Unicode paths, URL correctness, error propagation, exhaustive formats
ResultFormatterTests15OCR, classification, barcode, face formatting + JSON encoding
ResultFormatterDeepTests23Exact formats, boundary values, round-trip JSON, key names, unicode
CLIParsingTests22Edge cases, dotfiles, special chars, stress tests
Integration17CLI basics, exit codes, OCR, classify, faces, barcodes, piping, quiet
Total132

Part of the apfel ecosystem

ToolWhatApple FrameworkRepo
apfelLLM (text generation)FoundationModelsArthur-Ficial/apfel
ohrSpeech-to-textSpeechAnalyzerArthur-Ficial/ohr
kernText embeddingsNLContextualEmbeddingArthur-Ficial/kern
augeVision / OCRVisionArthur-Ficial/auge

Meta-repo: apfel-ecosystem

License

MIT