Deployment Strategy for News Synthesizer

October 28, 2025 · View on GitHub

Deployment Philosophy

The application requires careful deployment due to local LLM model dependency, GPU requirements, and real-time processing needs. Focus on isolated environments, reliable model loading, and scalable RSS processing.

Infrastructure Requirements

Hardware Requirements

Minimum: CPU with AVX2, 8GB RAM, 10GB storage
Recommended: NVIDIA GPU (RTX 30-series+), 16GB+ RAM, SSD storage
Local Models: GPU acceleration for llama.cpp inference

Software Environment

Backend: Python 3.8+, FastAPI, llama.cpp compiled with GPU support
Frontend: Node.js 18+, Next.js build tools
Model: Pre-downloaded GGUF file (mlabonne_gemma-3-27b-it-abliterated-IQ4_XS.gguf)

Deployment Options

1. Local Development

Setup:

cd backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp llama.cpp path/to/compiled/llamacpp

Frontend: npm install && npm run dev
Backend: uvicorn main:app --reload

2. Docker Containerization

Backend Dockerfile:

FROM python:3.11-slim
# Install system dependencies for llama.cpp
RUN apt-get update && apt-get install -y cmake build-essential
# Copy llama.cpp and compile
COPY llama.cpp /llama.cpp
RUN cd /llama.cpp && mkdir build && cd build && cmake .. && make
# Install Python dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt
# Copy model file
COPY mlabonne_gemma-3-27b-it-abliterated-IQ4_XS.gguf /models/
COPY . .
CMD ["uvicorn", "main:app", "--host", "0.0.0.0"]

Frontend Dockerfile:

FROM node:18-alpine
COPY package*.json .
RUN npm ci
COPY . .
RUN npm run build
CMD ["npm", "start"]

Docker Compose:

version: '3.8'
services:
  backend:
    build: ./backend
    ports:
      - "8000:8000"
    volumes:
      - ./backend:/app
  frontend:
    build: ./frontend
    ports:
      - "3000:3000"
    environment:
      - NEXT_PUBLIC_API_URL=http://backend:8000

3. Production Deployment

Cloud Options: AWS EC2 with GPU instances, Google Cloud Compute Engine
Cost Considerations: GPU instances ($0.50+/hour), focus on spot instances
Scaling: Horizontal scaling for multiple users, model loading optimization

CI/CD Pipeline

GitHub Actions Configuration

name: CI/CD
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'
      - name: Install dependencies
        run: pip install -r backend/requirements.txt
      - name: Run tests
        run: cd backend && pytest
  deploy:
    needs: test
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to server
        # Add deployment steps (Docker build and push, etc.)

Automated Pipelines

Build Step: Compile llama.cpp if needed
Test Step: Run unit/integration tests
Security Scan: Dependency and code analysis
Deploy Step: Docker image push to registry
Rollback: Automated rollback on deployment failures

Environment Configuration

Environment Variables

Backend:
- LLAMA_CPP_MODEL_PATH: Path to .gguf file
- DATABASE_URL: SQLite file location
- RSS_FETCH_CONCURRENCY: Number of concurrent feed fetches
Frontend:
- NEXT_PUBLIC_API_BASE_URL: Backend API URL
- NEXT_PUBLIC_TTS_VOICE: Default voice for TTS

Monitoring and Operations

Model Performance: Monitor inference latency and memory usage
RSS Feed Health: Alert on failed fetches or parsing errors
User Experience: Track chat interaction quality and TTS generation
Logging: Structured logs for debugging RAG and persona composition

Checklist

Docker containers built and tested
CI/CD pipeline configured
Environment variables documented
Monitoring setup planned

Ledger

Component	Deployment Status	Notes
Backend Container	Not Tested	llama.cpp compilation needed
Frontend Build	Planned	Next.js static export options
CI Pipeline	Planned	GitHub Actions for automated tests
Production Scaling	Designed	GPU instance requirements