Rawbbit

June 23, 2026 · View on GitHub

Rawbbit is a self-hosted in-app event tracking, ingestion, and raw-storage pipeline for product, application, and game analytics. It was created for teams that want to keep control of their event data, reduce vendor lock-in, and run analytics infrastructure without depending on heavyweight enterprise platforms. The system is designed to stay portable and maintainable for small teams operating their own stack.

It accepts batched events over HTTP, validates and enriches them in the collector, buffers them through NATS JetStream, writes partitioned Parquet files to object storage, and includes a small SQLMesh starter project for querying the raw layer through BigQuery external tables.

Rawbbit

Current public runtime shape:

Producer -> Collector API -> NATS JetStream -> Raw Writer -> Parquet in object storage

BigQuery query path:

Producer -> Collector API -> NATS JetStream -> Raw Writer -> Parquet -> BigQuery external table -> SQLMesh base model

ClickHouse serving path:

Producer -> Collector API -> NATS JetStream -> Raw Writer -> Parquet -> ClickHouse

ClickHouse MCP and Metabase path:

Producer -> Collector API -> NATS JetStream -> Raw Writer -> Parquet -> ClickHouse -> MCP / Metabase

AI agent access path:

Producer -> Collector API -> NATS JetStream -> Raw Writer -> Parquet -> ClickHouse -> MCP -> AI agents

Optional downstream analytics UI path:

Producer -> Collector API -> NATS JetStream -> Raw Writer -> Parquet -> BigQuery external table -> SQLMesh base model -> Metabase OSS

Storage note:

raw landing supports either GCS or an S3-compatible backend such as SeaweedFS
the documented BigQuery external-table path remains GCS-based

What is included
Architecture
Quickstart
Configuration
Repository layout
Project status
Documentation
License

What is included

This repository contains the public ingestion-to-raw-storage path:

backend/collector-api — HTTP ingestion service
backend/raw-writer — JetStream consumer that writes partitioned Parquet files
deploy/ — Docker Compose and environment scaffolding for local or simple self-hosted setups
sqlmesh_project/ — starter SQLMesh project for reading the raw external-table layer
clickhouse/ — optional downstream ClickHouse guide for loading/querying raw Parquet
clickhouse-mcp/ — optional ClickHouse-backed MCP server and combined MCP + Metabase deployment guide
AI agents and MCP clients such as OpenCode or OpenClaw can connect to the ClickHouse MCP endpoint for read-only analytics exploration
metabase/ — Optional downstream Metabase deployment guide

Architecture

The system is built around a few explicit boundaries:

the collector accepts and validates event batches
NATS JetStream separates request handling from storage writes
the raw writer lands durable Parquet files in object storage
raw Parquet is the system-of-record boundary for downstream analytics work
downstream modeling can evolve without changing the ingestion contract

For the deeper architecture note, see docs/architecture.md.

Quickstart

The shortest path to a working local setup is:

copy deploy/.env.example to deploy/.env
set API keys, object-storage bucket, and credentials
start the stack with Docker Compose
send a test batch to POST /v1/events:batch
verify that Parquet files land in object storage

For the full walkthrough, see docs/quickstart.md.

Configuration

The canonical environment-variable reference is deploy/.env.example.

Important configuration groups:

NATS and stream settings
collector API limits, API keys, CORS settings, and optional GeoIP-related attribution requirements
raw-writer batching and ACK behavior
object-storage bucket, prefix, and credentials

For the grouped configuration guide, see docs/configuration.md.

Repository layout

backend/
  collector-api/   HTTP ingestion service
  raw-writer/      Parquet landing worker
deploy/            Local and self-hosted runtime scaffolding
sqlmesh_project/   Starter downstream modeling project
clickhouse/        ClickHouse downstream query path
clickhouse-mcp/    ClickHouse MCP and optional Metabase deploy path
metabase/          Metabase OSS ver. deploy instructions
docs/              OSS documentation

Component reference notes:

Project status

Current maturity:

ingestion path is implemented
raw Parquet landing path is implemented
raw storage backend selection is implemented for both GCS and S3-compatible targets
BigQuery external-table querying is supported
ClickHouse can be used as a downstream serving/query layer over the raw Parquet boundary and in general as main analytical database
ClickHouse MCP can expose a read-only analytical tool surface over a configured Rawbbit ClickHouse events table
AI agents and MCP clients can use that MCP surface without direct access to the ingestion runtime
Metabase can be deployed separately or together with the ClickHouse MCP package
SQLMesh is included as a starter downstream layer

The current release is intentionally narrow: it focuses on reliable ingestion, durable raw storage, and a simple first query path.

The included SQLMesh model is intentionally small. It reads from the BigQuery external table over the raw Parquet layer and serves as an optional starter path for downstream shaping rather than a large modeling system.

awesome-data-engineering — for the broader data engineering ecosystem

Rawbbit

Table of contents

What is included

Architecture

Quickstart

Configuration

Repository layout

Project status

Documentation

License

Inspired by