Rawbbit

June 23, 2026 · View on GitHub

Rawbbit is a self-hosted in-app event tracking, ingestion, and raw-storage pipeline for product, application, and game analytics. It was created for teams that want to keep control of their event data, reduce vendor lock-in, and run analytics infrastructure without depending on heavyweight enterprise platforms. The system is designed to stay portable and maintainable for small teams operating their own stack.

It accepts batched events over HTTP, validates and enriches them in the collector, buffers them through NATS JetStream, writes partitioned Parquet files to object storage, and includes a small SQLMesh starter project for querying the raw layer through BigQuery external tables.

Rawbbit

Current public runtime shape:

Producer -> Collector API -> NATS JetStream -> Raw Writer -> Parquet in object storage

BigQuery query path:

Producer -> Collector API -> NATS JetStream -> Raw Writer -> Parquet -> BigQuery external table -> SQLMesh base model

ClickHouse serving path:

Producer -> Collector API -> NATS JetStream -> Raw Writer -> Parquet -> ClickHouse

ClickHouse MCP and Metabase path:

Producer -> Collector API -> NATS JetStream -> Raw Writer -> Parquet -> ClickHouse -> MCP / Metabase

AI agent access path:

Producer -> Collector API -> NATS JetStream -> Raw Writer -> Parquet -> ClickHouse -> MCP -> AI agents

Optional downstream analytics UI path:

Producer -> Collector API -> NATS JetStream -> Raw Writer -> Parquet -> BigQuery external table -> SQLMesh base model -> Metabase OSS

Storage note:

  • raw landing supports either GCS or an S3-compatible backend such as SeaweedFS
  • the documented BigQuery external-table path remains GCS-based

Table of contents

What is included

This repository contains the public ingestion-to-raw-storage path:

  • backend/collector-api — HTTP ingestion service
  • backend/raw-writer — JetStream consumer that writes partitioned Parquet files
  • deploy/ — Docker Compose and environment scaffolding for local or simple self-hosted setups
  • sqlmesh_project/ — starter SQLMesh project for reading the raw external-table layer
  • clickhouse/ — optional downstream ClickHouse guide for loading/querying raw Parquet
  • clickhouse-mcp/ — optional ClickHouse-backed MCP server and combined MCP + Metabase deployment guide
  • AI agents and MCP clients such as OpenCode or OpenClaw can connect to the ClickHouse MCP endpoint for read-only analytics exploration
  • metabase/ — Optional downstream Metabase deployment guide

Architecture

The system is built around a few explicit boundaries:

  • the collector accepts and validates event batches
  • NATS JetStream separates request handling from storage writes
  • the raw writer lands durable Parquet files in object storage
  • raw Parquet is the system-of-record boundary for downstream analytics work
  • downstream modeling can evolve without changing the ingestion contract

For the deeper architecture note, see docs/architecture.md.

Quickstart

The shortest path to a working local setup is:

  1. copy deploy/.env.example to deploy/.env
  2. set API keys, object-storage bucket, and credentials
  3. start the stack with Docker Compose
  4. send a test batch to POST /v1/events:batch
  5. verify that Parquet files land in object storage

For the full walkthrough, see docs/quickstart.md.

Configuration

The canonical environment-variable reference is deploy/.env.example.

Important configuration groups:

  • NATS and stream settings
  • collector API limits, API keys, CORS settings, and optional GeoIP-related attribution requirements
  • raw-writer batching and ACK behavior
  • object-storage bucket, prefix, and credentials

For the grouped configuration guide, see docs/configuration.md.

Repository layout

backend/
  collector-api/   HTTP ingestion service
  raw-writer/      Parquet landing worker
deploy/            Local and self-hosted runtime scaffolding
sqlmesh_project/   Starter downstream modeling project
clickhouse/        ClickHouse downstream query path
clickhouse-mcp/    ClickHouse MCP and optional Metabase deploy path
metabase/          Metabase OSS ver. deploy instructions
docs/              OSS documentation

Component reference notes:

Project status

Current maturity:

  • ingestion path is implemented
  • raw Parquet landing path is implemented
  • raw storage backend selection is implemented for both GCS and S3-compatible targets
  • BigQuery external-table querying is supported
  • ClickHouse can be used as a downstream serving/query layer over the raw Parquet boundary and in general as main analytical database
  • ClickHouse MCP can expose a read-only analytical tool surface over a configured Rawbbit ClickHouse events table
  • AI agents and MCP clients can use that MCP surface without direct access to the ingestion runtime
  • Metabase can be deployed separately or together with the ClickHouse MCP package
  • SQLMesh is included as a starter downstream layer

The current release is intentionally narrow: it focuses on reliable ingestion, durable raw storage, and a simple first query path.

The included SQLMesh model is intentionally small. It reads from the BigQuery external table over the raw Parquet layer and serves as an optional starter path for downstream shaping rather than a large modeling system.

Documentation

License

This project is released under the Apache License 2.0. See LICENSE.


Inspired by