Deployment Strategy for News Synthesizer
October 28, 2025 ยท View on GitHub
Deployment Philosophy
The application requires careful deployment due to local LLM model dependency, GPU requirements, and real-time processing needs. Focus on isolated environments, reliable model loading, and scalable RSS processing.
Infrastructure Requirements
Hardware Requirements
- Minimum: CPU with AVX2, 8GB RAM, 10GB storage
- Recommended: NVIDIA GPU (RTX 30-series+), 16GB+ RAM, SSD storage
- Local Models: GPU acceleration for llama.cpp inference
Software Environment
- Backend: Python 3.8+, FastAPI, llama.cpp compiled with GPU support
- Frontend: Node.js 18+, Next.js build tools
- Model: Pre-downloaded GGUF file (mlabonne_gemma-3-27b-it-abliterated-IQ4_XS.gguf)
Deployment Options
1. Local Development
- Setup:
cd backend python -m venv venv source venv/bin/activate pip install -r requirements.txt cp llama.cpp path/to/compiled/llamacpp - Frontend:
npm install && npm run dev - Backend:
uvicorn main:app --reload
2. Docker Containerization
- Backend Dockerfile:
FROM python:3.11-slim # Install system dependencies for llama.cpp RUN apt-get update && apt-get install -y cmake build-essential # Copy llama.cpp and compile COPY llama.cpp /llama.cpp RUN cd /llama.cpp && mkdir build && cd build && cmake .. && make # Install Python dependencies COPY requirements.txt . RUN pip install -r requirements.txt # Copy model file COPY mlabonne_gemma-3-27b-it-abliterated-IQ4_XS.gguf /models/ COPY . . CMD ["uvicorn", "main:app", "--host", "0.0.0.0"] - Frontend Dockerfile:
FROM node:18-alpine COPY package*.json . RUN npm ci COPY . . RUN npm run build CMD ["npm", "start"] - Docker Compose:
version: '3.8' services: backend: build: ./backend ports: - "8000:8000" volumes: - ./backend:/app frontend: build: ./frontend ports: - "3000:3000" environment: - NEXT_PUBLIC_API_URL=http://backend:8000
3. Production Deployment
- Cloud Options: AWS EC2 with GPU instances, Google Cloud Compute Engine
- Cost Considerations: GPU instances ($0.50+/hour), focus on spot instances
- Scaling: Horizontal scaling for multiple users, model loading optimization
CI/CD Pipeline
GitHub Actions Configuration
name: CI/CD
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install dependencies
run: pip install -r backend/requirements.txt
- name: Run tests
run: cd backend && pytest
deploy:
needs: test
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- name: Deploy to server
# Add deployment steps (Docker build and push, etc.)
Automated Pipelines
- Build Step: Compile llama.cpp if needed
- Test Step: Run unit/integration tests
- Security Scan: Dependency and code analysis
- Deploy Step: Docker image push to registry
- Rollback: Automated rollback on deployment failures
Environment Configuration
Environment Variables
- Backend:
LLAMA_CPP_MODEL_PATH: Path to .gguf fileDATABASE_URL: SQLite file locationRSS_FETCH_CONCURRENCY: Number of concurrent feed fetches
- Frontend:
NEXT_PUBLIC_API_BASE_URL: Backend API URLNEXT_PUBLIC_TTS_VOICE: Default voice for TTS
Monitoring and Operations
- Model Performance: Monitor inference latency and memory usage
- RSS Feed Health: Alert on failed fetches or parsing errors
- User Experience: Track chat interaction quality and TTS generation
- Logging: Structured logs for debugging RAG and persona composition
Checklist
- Docker containers built and tested
- CI/CD pipeline configured
- Environment variables documented
- Monitoring setup planned
Ledger
| Component | Deployment Status | Notes |
|---|---|---|
| Backend Container | Not Tested | llama.cpp compilation needed |
| Frontend Build | Planned | Next.js static export options |
| CI Pipeline | Planned | GitHub Actions for automated tests |
| Production Scaling | Designed | GPU instance requirements |