UForm for JavaScript
January 3, 2025 ยท View on GitHub
UForm multimodal AI SDK offers a simple way to integrate multimodal AI capabilities into your JavaScript applications. Built around ONNX, the SDK is supposed to work with most runtimes and almost any hardware.
Installation
There are several ways to install the UForm JavaScript SDK from NPM.
pnpm add uform
npm add uform
yarn add uform
Quick Start
Embeddings
import { getModel, Modality, TextProcessor, TextEncoder, ImageEncoder, ImageProcessor } from '@unum-cloud/uform';
const { configPath, modalityPaths, tokenizerPath } = await getModel(
modelId: 'unum-cloud/uform3-image-text-english-small',
modalities: [Modality.TextEncoder, Modality.ImageEncoder]
);
const textProcessor = new TextProcessor(configPath, tokenizerPath);
await textProcessor.init();
const processedTexts = await textProcessor.process(["a small red panda in a zoo"]);
const textEncoder = new TextEncoder(modalityPaths.text_encoder, textProcessor);
await textEncoder.init();
const textOutput = await textEncoder.encode(processedTexts);
assert(textOutput.embeddings.dims.length === 2, "Output should be 2D");
await textEncoder.dispose();
const imageProcessor = new ImageProcessor(configPath);
await imageProcessor.init();
const processedImages = await imageProcessor.process("path/to/image.png");
const imageEncoder = new ImageEncoder(modalityPaths.image_encoder, imageProcessor);
await imageEncoder.init();
const imageOutput = await imageEncoder.encode(processedImages);
assert(imageOutput.embeddings.dims.length === 2, "Output should be 2D");
The textOutput and imageOutput would contain features and embeddings properties, which are the same as the features and embeddings properties in the Python SDK.
The embeddings can later be compared using the cosine similarity or other distance metrics.
Generative Models
Coming soon ...
Technical Details
Faster Search
Depending on the application, the embeddings can be down-casted to smaller numeric representations without losing much recall. Independent of the quantization level, native JavaScript functionality may be too slow for large-scale search. In such cases, consider using USearch or SimSimD.