Data Flow

January 12, 2026 · View on GitHub

How documents flow through IntellyWeave from upload to query response.

Overview

IntellyWeave processes data through three main flows:

  1. Document Ingestion - Upload to storage
  2. Query Processing - Question to answer
  3. Visualization - Data to visual representation

1. Document Ingestion Flow

flowchart LR
    Upload["Upload<br/>(PDF/TXT)"] --> Parse["Parse<br/>(pypdf)"]
    Parse --> Extract["Extract<br/>(GLiNER)"]
    Extract --> Chunk["Chunk<br/>(semantic)"]
    Chunk --> Vectorize["Vectorize<br/>(OpenAI)"]
    Vectorize --> Store["Store<br/>(Weaviate)"]

Stage Details

StageInputOutputComponent
UploadFile (PDF/TXT/DOCX)Raw bytesapi/routes/documents.py
ParseRaw bytesPlain textpypdf, python-docx
ExtractPlain textEntities + textapi/utils/ner.py (GLiNER)
ChunkText + entitiesSemantic chunkspreprocessing/collection.py
VectorizeChunksEmbeddingsOpenAI text-embedding-3-small
StoreEmbeddings + metadataWeaviate objectsWeaviate client

Entity Extraction Detail

Input Text: "Klaus Barbie fled to Buenos Aires in 1951..."

GLiNER Output:
├── persons: ["Klaus Barbie"]
├── locations: ["Buenos Aires"]
├── dates: ["1951"]
└── events: []

Stored as chunk metadata arrays

2. Query Processing Flow

flowchart TB
    Query["Query<br/>(user)"] --> Router["Domain<br/>Router"]
    Router --> Selection["Agent<br/>Selection"]
    Selection --> Simple["Simple Query"]
    Selection --> Courthouse["Courthouse Debate"]
    Selection --> Intel["Intelligence Orchestrator"]
    Simple --> Response["Response<br/>(stream)"]
    Courthouse --> Response
    Intel --> Response

Agent Selection

Query TypeAgentTrigger
Simple factualDefaultMost queries
InterpretiveCourthouse Debate"Debate...", ambiguous evidence
ComprehensiveIntelligence Orchestrator"Full analysis...", entity discovery
Domain-specificCustom AgentMatches agent's domain

Courthouse Debate Flow

flowchart LR
    Q[Query] --> D[Defense]
    D --> P[Prosecution]
    P --> J[Judge]
    J -->|Repeat until consensus| D
    J --> R[Response]

Intelligence Orchestrator Flow

flowchart LR
    Q[Query] --> E[Entity Extractor]
    E --> M[Relationship Mapper]
    M --> G[Geospatial Analyst]
    G --> N[Network Analyst]
    N --> P[Pattern Detector]
    P --> S[Synthesizer]
    S --> R[Response]

3. Visualization Flow

flowchart TB
    QR["Query Response"] --> Process["Process<br/>(agent)"]
    Process --> Format["Format<br/>(type)"]
    Format --> Mapbox["Mapbox<br/>(geo data)"]
    Format --> VisNet["vis-network<br/>(entities)"]
    Format --> Charts["Charts<br/>(aggregate)"]

Visualization Mapping

Data TypeVisualizationComponent
Locations with coordinatesMapbox 3D MapMapboxMap
Entity relationshipsNetwork GraphNetworkGraph
Counts/distributionsBar/Line ChartBarChart
Structured dataTableTableView

Data Storage Schema

Weaviate Collections

ELYSIA_UPLOADED_DOCUMENTS
├── filename: string
├── content: text
├── upload_date: date
└── user_id: string

ELYSIA_CHUNKED_{collection_name}
├── text: string (chunk content)
├── document_id: reference
├── chunk_index: int
├── persons: string[] (GLiNER)
├── organizations: string[] (GLiNER)
├── locations: string[] (GLiNER)
├── dates: string[] (GLiNER)
├── events: string[] (GLiNER)
├── laws: string[] (GLiNER)
└── cryptonyms: string[] (GLiNER)

Performance Considerations

StageBottleneckMitigation
UploadLarge filesChunked upload
GLiNERModel loadingLazy initialization
VectorizeAPI callsBatch processing
QueryLLM latencyStreaming responses
VisualizationLarge datasetsPagination, aggregation

See Also