Voice Note Ragie Pipeline

November 30, 2025 · View on GitHub

This repository contains a small collection of synthetic context data about a non-existent individual, generated by large language models. The data is designed to be internally consistent, creating a coherent fictional persona for testing purposes.

Purpose

The objective of this repository is to test and validate a voice RAG (Retrieval-Augmented Generation) pipeline using Ragie.

Pipeline Architecture

The voice RAG pipeline consists of the following stages:

Voice Recording: Raw audio recordings containing personal context data (stored in voice-data/)
Transcription: Voice recordings are transcribed to text using speech-to-text processing
LLM Processing & Reformatting: Transcribed text passes through a large language model layer that structures the content optimally for retrieval as pieces of personal context data
Embedding & Storage: Processed text is embedded and ingested into a vector database via Ragie

Pipeline Flowchart

flowchart TD
    subgraph Input["Input"]
        A[Voice Recording<br/>MP3/WAV Audio Files]
    end

    subgraph STT["Speech-to-Text"]
        B[Transcription Service<br/>Whisper / Gemini / etc.]
    end

    subgraph LLM["LLM Processing"]
        C[OpenRouter API<br/>Claude Haiku / GPT-4o-mini]
        D[Context Standardization<br/>- Remove filler words<br/>- Structure content<br/>- Extract metadata<br/>- Categorize information]
    end

    subgraph RAG["RAG Storage"]
        E[Ragie API<br/>Document Upload]
        F[(Vector Database<br/>Embeddings + Metadata)]
    end

    subgraph Output["Output"]
        G[Ready for Retrieval<br/>Semantic Search Enabled]
    end

    A -->|Audio File| B
    B -->|Raw Transcript| C
    C --> D
    D -->|Structured JSON| E
    E -->|Embed & Index| F
    F --> G

    style A fill:#e1f5fe
    style B fill:#fff3e0
    style C fill:#f3e5f5
    style D fill:#f3e5f5
    style E fill:#e8f5e9
    style F fill:#e8f5e9
    style G fill:#c8e6c9

Data Flow Summary

Stage	Input	Output	Tool/Service
1. Recording	Voice	MP3/WAV file	Any recorder
2. Transcription	Audio file	Raw text	Whisper, Gemini, etc.
3. Standardization	Raw transcript	Structured JSON	OpenRouter (LLM)
4. Embedding	Structured text	Vector embeddings	Ragie API

Repository Structure

.
├── voice-data/          # MP3 audio recordings of synthetic context data
│   ├── general-context.mp3
│   ├── 1.mp3
│   ├── 2.mp3
│   └── ...
├── texts/               # Text transcripts (for reference/validation)
│   ├── general.txt
│   ├── 1.txt
│   ├── 2.txt
│   └── ...
├── processed/           # LLM-processed structured outputs (generated)
├── pipeline.py          # Full pipeline script (STT -> LLM -> Ragie)
├── .env.example         # Example environment variables template
├── .env                 # Your API keys (create from .env.example, git-ignored)
└── README.md

Running the Pipeline

Prerequisites

Install dependencies:

pip install openai ragie python-dotenv

Copy the example environment file and add your API keys:

cp .env.example .env
# Edit .env with your actual API keys

You'll need two API keys:

OpenRouter: Get yours at https://openrouter.ai/keys
Ragie: Get yours at https://app.ragie.ai/settings/api-keys

Full Pipeline Usage

The pipeline.py script processes transcripts through the complete pipeline:

# Run the full pipeline on all transcripts in texts/
python pipeline.py

This will:

Read each .txt transcript from texts/
Send to OpenRouter LLM for context standardization
Upload structured content to Ragie with metadata
Save processed outputs to processed/

Simple Direct Upload

For simple direct upload without LLM processing:

import os
from ragie import Ragie

client = Ragie(auth="YOUR_RAGIE_API_KEY")

VOICE_DATA_DIR = "./voice-data"

for filename in os.listdir(VOICE_DATA_DIR):
    if filename.endswith(".mp3"):
        file_path = os.path.join(VOICE_DATA_DIR, filename)

        with open(file_path, "rb") as f:
            client.documents.create(
                file=f,
                metadata={"source": "voice-note", "filename": filename}
            )

        print(f"Uploaded: {filename}")

print("All voice notes uploaded to Ragie.")

Required API Keys

Service	Environment Variable	Purpose
OpenRouter	`OPENROUTER_API_KEY`	LLM for context standardization
Ragie	`RAGIE_API_KEY`	Vector storage and retrieval

Important Notes

All personal data in this repository is entirely fictional and generated by LLMs
The synthetic individual does not exist
This data is intended solely for pipeline testing and validation purposes
The contextual information is designed to be internally consistent across all recordings