BPDiscord

October 11, 2025 · View on GitHub

A full-stack TypeScript web application for scraping and analyzing Letterboxd user rating data. Features user comparison tools, hater rankings, and comprehensive rating statistics with a modern React frontend and Express.js backend.

Project Overview

BPDiscord provides powerful tools for analyzing Letterboxd movie rating data:

User Comparison: Compare rating statistics between any two users
Hater Rankings: Rank users by average rating (lowest = biggest "hater")
Profile Analysis: Display user profiles with followers, following, and lists
Rating Distributions: Visual histograms of rating patterns
Data Scraping: Automated extraction of Letterboxd profile data

Project Structure

bpdiscord/
├── src/
│   ├── server/           # Backend API (Express + TypeScript)
│   │   ├── config/       # Database and Supabase configuration
│   │   ├── controllers/  # Business logic controllers
│   │   │   ├── authController.ts
│   │   │   ├── comparisonController.ts
│   │   │   ├── dataController.ts        # Database operations
│   │   │   ├── filmUserController.ts    # Database-first user operations
│   │   │   ├── scraperController.ts     # Force-scraping endpoints (module exports)
│   │   │   └── userController.ts
│   │   ├── middleware/   # Authentication, validation, error handling
│   │   ├── routes/       # API route definitions
│   │   │   ├── authRoutes.ts
│   │   │   ├── comparisonRoutes.ts
│   │   │   ├── filmUserRoutes.ts        # Database-first endpoints
│   │   │   ├── scraperRoutes.ts         # Force-scraping endpoints
│   │   │   └── userRoutes.ts
│   │   ├── scraperFunctions.ts # Core scraping logic (browser, page management)
│   │   ├── utilities.ts  # Helper functions and parsers
│   │   ├── constants.ts  # Configuration constants
│   │   ├── types.ts      # Server-specific TypeScript definitions
│   │   ├── server.ts     # Main server file
│   │   ├── package.json  # Server dependencies
│   │   └── dist/         # Compiled TypeScript output
│   ├── client/           # Frontend (Vite + React + TypeScript)
│   │   ├── components/   # React components
│   │   │   ├── Dashboard.tsx
│   │   │   ├── HaterRankings.tsx
│   │   │   ├── UserComparison.tsx
│   │   │   ├── PublicUserComparison.tsx
│   │   │   ├── ScraperInterface.tsx
│   │   │   └── UserProfile.tsx
│   │   ├── services/     # API service layer
│   │   ├── types.ts      # Client-specific TypeScript definitions
│   │   ├── index.tsx     # Main React app entry point
│   │   ├── vite.config.js # Vite configuration
│   │   ├── tailwind.config.js # Tailwind CSS configuration
│   │   ├── package.json  # Client dependencies
│   │   └── build/        # Vite build output
├── CLAUDE.md            # Comprehensive technical documentation
├── package.json         # Root orchestration + shared dependencies
├── vercel.json          # Deployment configuration
└── tsconfig.json       # TypeScript configuration

Features

Public Features (No Authentication Required)

User Comparison (`/compare`)

Compare rating statistics between any two Letterboxd users
Side-by-side profile metrics (followers, following, lists, total films)
Visual rating distribution comparison with percentages
Highlighting of higher values for easy comparison

Hater Rankings (`/hater-rankings`)

Rank all users by average movie rating (lowest first)
Visual rating distribution histograms
Trophy icon for the biggest "hater" (lowest average rating)
Display names with username fallbacks

Protected Features (Authentication Required)

Dashboard (`/dashboard`)

Profile Management: View and manage user account
Data Fetcher: Scrape Letterboxd user data
Enhanced Comparison: Full comparison tools
Private Rankings: Authenticated hater rankings view

Data Scraping

Extract user rating distributions from Letterboxd profiles
Scrape user profile data (display name, followers, following, lists)
Get complete film lists with ratings
Database storage with intelligent upserts

Backend API

Authentication & Security

JWT token-based authentication via Supabase
Row Level Security (RLS) with service role bypass
Rate limiting (100 requests/15min general, 20 requests/15min scraping)
CORS protection and security headers
Input validation and sanitization

Web Scraping

Puppeteer browser automation for data extraction
Cheerio HTML parsing for structured data extraction
Retry mechanisms and error handling
Data validation and normalization

Frontend

Modern React Application

TypeScript for type safety
Tailwind CSS for responsive design
React Router for navigation
Local storage for authentication state
Real-time feedback and loading states

User Experience

Responsive design for mobile and desktop
Visual highlighting of comparison data
Interactive histograms and charts
Progressive enhancement with fallbacks

Prerequisites

Node.js (v18 or higher)
Yarn package manager
Supabase account and project

Environment Variables

Backend (.env)

Create a .env file in the project root:

SUPABASE_URL=your_supabase_url
SUPABASE_ANON_KEY=your_supabase_anon_key
SUPABASE_SERVICE_ROLE_KEY=your_supabase_service_role_key
PORT=3001
NODE_ENV=development
CRON_SECRET=your_random_secret_key  # For manual cron triggers

Frontend (.env)

Create a .env file in src/client/:

# VITE_API_URL=/api  # Uses proxy in development, override for production
VITE_HOT_RELOAD=true

Security Notice

⚠️ Important: Never commit your .env files to version control! Both .env files are in .gitignore.

For production deployment, use environment variables. See DEPLOYMENT.md for details.

Installation

Quick Install (All Dependencies)

yarn install:all

Manual Installation

1. Install Root Dependencies

yarn install

2. Install Server Dependencies

cd src/server
yarn install
cd ../..

3. Install Client Dependencies

cd src/client
yarn install
cd ../..

Database Setup

Create a Supabase project
Set up the required tables:

Users Table

CREATE TABLE "Users" (
  "lbusername" VARCHAR PRIMARY KEY,
  "display_name" VARCHAR,
  "followers" INTEGER DEFAULT 0,
  "following" INTEGER DEFAULT 0,
  "number_of_lists" INTEGER DEFAULT 0,
  "created_at" TIMESTAMP DEFAULT NOW(),
  "updated_at" TIMESTAMP DEFAULT NOW()
);

UserRatings Table

CREATE TABLE "UserRatings" (
  "username" VARCHAR,
  "rating" DECIMAL(2,1),
  "count" INTEGER,
  "created_at" TIMESTAMP DEFAULT NOW(),
  PRIMARY KEY ("username", "rating")
);

Configure Row Level Security (RLS) policies as needed
Note your service role key for admin operations

Running the Application

Development Mode

Option 1: Start Both (Concurrent - Recommended)

yarn dev

Server runs on http://localhost:3001
Client runs on http://localhost:5173 (or 5174 if 5173 is in use)

Option 2: Start Individually

Start the Server

yarn dev:server

Server runs on http://localhost:3001

Start the Client (new terminal)

yarn dev:client

Client runs on http://localhost:5173

Production Mode

Build and Start

# Build both server and client
yarn build

# Or build individually
yarn build:server
yarn build:client

# Start production server
yarn start

API Endpoints

Public Endpoints

Film User API (`/api/film-users`) - Database-First

GET / - Get all users with display names
GET /:username/ratings - Get user's ratings (database only)
GET /:username/profile - Get user's profile (database only)
GET /:username/complete - Get complete user data (database only)
Add ?fallback=scrape to any endpoint to scrape if data missing

Comparison API (`/api/comparison`)

GET /usernames - Get list of users with display names
POST /user-ratings - Get user's ratings and profile data
POST /compare - Compare two users' data
GET /hater-rankings - Get all users ranked by average rating

Protected Endpoints (Require Authentication)

Authentication (`/api/auth`)

POST /signup - User registration
POST /login - User login
POST /logout - Session termination
POST /password-reset - Password reset

Scraper API (`/api/scraper`) - Force-Scraping Only

POST /getUserRatings - Force scrape user's rating distribution
POST /getUserProfile - Force scrape complete profile + ratings
POST /getAllFilms - Force scrape user's film list
POST /getData - Generic scraping with custom selectors
Note: Disabled in production unless ENABLE_SCRAPER=true

Cron API (`/api/cron`) - Automated Data Refresh

POST /refresh-all-users - Refresh all users' data from Letterboxd
POST /refresh-user/:username - Refresh specific user's data
Authentication: Vercel Cron header or Bearer token
Schedule: Daily at 2 AM (configurable in vercel.json)

User Management (`/api/users`)

GET / - Get all users
GET /me - Get current user profile
GET /:id - Get specific user
PUT /:id - Update user
DELETE /:id - Delete user

Health Check

GET /api/health - Server status

Usage

Public Access

Visit http://localhost:5173/compare for user comparison
Visit http://localhost:5173/hater-rankings for rankings
No authentication required for public features

Authenticated Access

Start the application (see Running section)
Navigate to http://localhost:5173 (redirects to login)
Sign up or log in to access dashboard
Use the Data Fetcher to scrape Letterboxd profiles
View comparisons and rankings with your data

Testing Cron Job Endpoints

The cron endpoints allow automated data refresh for all users. Here's how to test them:

Prerequisites

Make sure you have a CRON_SECRET in your .env file:
```
CRON_SECRET=your_random_secret_key
```

Generate a secure secret (optional):

# Using OpenSSL (Mac/Linux)
openssl rand -base64 32

# Or using Node.js
node -e "console.log(require('crypto').randomBytes(32).toString('base64'))"

Testing in Development

1. Start the development server:

yarn dev:server

2. Test refresh all users:

curl -X POST http://localhost:3001/api/cron/refresh-all-users \
  -H "Authorization: Bearer your_cron_secret" \
  -H "Content-Type: application/json"

3. Test refresh specific user:

curl -X POST http://localhost:3001/api/cron/refresh-user/username \
  -H "Authorization: Bearer your_cron_secret" \
  -H "Content-Type: application/json"

Expected Response:

{
  "message": "Refresh completed",
  "duration": "45.23s",
  "results": {
    "totalUsers": 5,
    "success": 5,
    "failed": 0,
    "skipped": 0,
    "errors": []
  }
}

Testing in Production (Vercel)

1. Set CRON_SECRET in Vercel Dashboard:

Go to your project → Settings → Environment Variables
Add CRON_SECRET with a secure value
Redeploy if needed

2. Test the production endpoint:

curl -X POST https://your-app.vercel.app/api/cron/refresh-all-users \
  -H "Authorization: Bearer your_cron_secret" \
  -H "Content-Type: application/json"

3. Check Vercel Logs:

Go to your project dashboard
Click Functions or Logs
Look for /api/cron/refresh-all-users executions
View detailed logs including:
- Start time and duration
- Success/failure counts
- Individual user refresh status
- Any error messages

Automatic Cron Execution (Production Only)

Once deployed to Vercel (Pro plan or higher):

Cron runs automatically at the scheduled time (default: 2 AM UTC daily)
No authentication needed - Vercel adds the x-vercel-cron-signature header automatically
View cron status:
- Vercel Dashboard → Your Project → Settings → Cron Jobs
- See scheduled jobs, last run time, and execution history

Monitoring Cron Job Results

View detailed logs:

# In development - check terminal output
yarn dev:server

# In production - check Vercel logs:
# 1. Vercel Dashboard → Your Project → Functions
# 2. Filter by "/api/cron/refresh-all-users"
# 3. Click any execution to see full logs

Log output includes:

=== Starting scheduled refresh of all users ===
Found 5 users to refresh
[1/5] Refreshing data for: username1
[1/5] ✓ Successfully refreshed username1
[2/5] Refreshing data for: username2
...
=== Refresh Summary ===
Total users: 5
✓ Success: 5
✗ Failed: 0
Duration: 45.23s

Troubleshooting

401 Unauthorized Error:

Check that CRON_SECRET matches in .env and your request
Ensure the Authorization header is formatted correctly: Bearer your_secret

Timeout Errors:

Vercel Hobby plan has 10-second timeout (won't work for many users)
Vercel Pro plan has 300-second timeout (5 minutes)
Consider reducing the number of users or optimizing scraping

Rate Limiting:

The refresh includes 2-second delays between users to avoid being blocked
If you have many users, the process will take time
Monitor logs to see progress

Data Processing

Scraping Architecture

The scraping system is organized into modular, reusable components:

Core Scraping Functions (`scraperFunctions.ts`)

Browser Management: Shared browser instances with automatic cleanup
Page Creation: Optimized Puppeteer page setup with stealth measures
User Profile Scraping: Extract display name, followers, following, lists
Rating Scraping: Parse rating histogram from user profile
Film Scraping: Multi-page film list extraction with progress tracking
Memory Management: Aggressive cleanup and garbage collection

Utility Functions (`utilities.ts`)

parseStarRating(): Self-contained star rating parser (works in browser context)
detectLikedStatus(): Detect liked films from DOM structure
parseNumberFromText(): Handle K/M suffixes (1.2K → 1200)
validateUserProfile(): Verify page content and detect 404s
Data Formatters: Consistent API response formatting

Configuration (`constants.ts`)

LETTERBOXD_SELECTORS: CSS selectors for Letterboxd elements
BROWSER_CONFIG: Timeouts, delays, and resource limits
STAR_PATTERNS: Star rating patterns for parsing
BLOCKED_RESOURCES: Resources to block for performance

Scraping Algorithm

Input Validation: Verify Letterboxd username format
Browser Launch: Shared Puppeteer browser with stealth configuration
Page Navigation: Load user's Letterboxd profile with retry strategies
Data Extraction: Parse HTML using CSS selectors (Cheerio or browser context)
Data Validation: Ensure extracted data is valid and complete
Database Storage: Upsert data with conflict resolution
Cleanup: Close pages and manage memory

Rating Calculations

Average Rating: Σ(rating × count) / Σ(count)
Percentage Distribution: (count / total) × 100
Hater Rankings: Sort by ascending average rating

Number Parsing

Handles abbreviated formats: "1.2K" → 1200, "2.5M" → 2500000
Removes commas and normalizes text
Graceful fallback to 0 for invalid data

Development

Available Scripts

Root Scripts

yarn dev - Start both server and client concurrently
yarn build - Build both server and client
yarn build:server - Build server only
yarn build:client - Build client only
yarn start - Start production server
yarn install:all - Install all dependencies
yarn clean - Clean build directories

Server Scripts (run from src/server/)

yarn dev - Development server with hot reload
yarn build - Build TypeScript to JavaScript
yarn start - Start production server
yarn watch - Build in watch mode
yarn clean - Clean build directory

Client Scripts (run from src/client/)

yarn dev - Vite development server with hot reload
yarn build - Build for production with Vite
yarn preview - Preview production build locally

Code Organization

Backend Architecture

Controllers: Handle business logic and HTTP request/response
- scraperController.ts: Force-scraping endpoints (module exports pattern)
- filmUserController.ts: Database-first operations with fallback
- dataController.ts: Database CRUD operations
- comparisonController.ts: User comparison logic
Core Modules:
- scraperFunctions.ts: Browser automation, page management, scraping logic
- utilities.ts: Helper functions (parsers, validators, formatters)
- constants.ts: Configuration constants (selectors, timeouts, patterns)
Routes: Define API endpoints and middleware
Middleware: Authentication, validation, error handling
Types: Shared TypeScript interfaces

Frontend Architecture

Components: Reusable React components
Services: API client and utility functions
Types: Client-specific TypeScript definitions

Technologies Used

Backend

Express.js - Web framework
TypeScript - Type safety and modular architecture
Supabase - PostgreSQL database and authentication
Puppeteer - Browser automation for web scraping
Cheerio - Server-side HTML parsing
JWT - Authentication tokens
Helmet - Security headers
Express Rate Limit - API rate limiting
@sparticuz/chromium - Serverless Chrome for Vercel deployment

Frontend

Vite - Fast build tool and dev server
React 18 - UI framework
TypeScript - Type safety
Tailwind CSS - Utility-first styling
React Router - Client-side routing
Heroicons - Icon library

Database

PostgreSQL (via Supabase) - Primary database
Row Level Security - Data access control
Real-time subscriptions - Live data updates

Error Handling

Backend

Global error middleware with structured responses
Retry mechanisms for scraping failures
Graceful degradation for external service issues
Detailed logging for debugging

Frontend

Error boundaries for React component errors
User-friendly error messages
Loading states and retry options
Fallback UI for missing data

Performance Optimizations

Database: Efficient indexes and parallel queries
Frontend: Component memoization and lazy loading
Scraping: Resource blocking and browser reuse
API: Rate limiting and response caching
Modular Architecture: Separated scraping logic into reusable functions

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Add tests if applicable
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Deployment to Vercel

Full-Stack Deployment Configuration

This project is configured for Vercel deployment with both frontend and backend. Here's the complete setup:

Project Structure for Deployment

project/
├── package.json              # Root orchestration
├── vercel.json               # Deployment configuration
├── tsconfig.json             # Root TypeScript config
└── src/
    ├── client/               # Frontend (React/Vite)
    │   ├── package.json
    │   ├── vite.config.js
    │   └── build/            # Generated by Vite
    └── server/               # Backend (Express/Node)
        ├── package.json
        ├── tsconfig.json     # Server-specific config
        └── dist/             # Generated by TypeScript

Critical Configuration Files

1. Root package.json Build Script

{
  "scripts": {
    "build": "cd src/server && tsc"
  }
}

Important: Root build script should ONLY build the server. Vercel handles client building separately.

2. Server tsconfig.json (src/server/tsconfig.json)

{
  "compilerOptions": {
    "outDir": "./dist",
    "rootDir": "./"
    // ... other options
  },
  "include": ["./**/*"],
  "exclude": ["node_modules", "dist", "**/*.test.ts"]
}

Critical: Server needs its own tsconfig.json to prevent compiling client files.

Route Configuration Explained

Routes are processed in order:

API Routes (/api/(.*)) → Server function
Asset Routes (/assets/(.*)) → Static files
Specific Static Files → favicon, manifest, etc.
Catch-All (/(.*)) → React app (SPA routing)

Common Deployment Issues and Solutions

Problem: 404 Errors on Deployment

Cause: Routes pointing to wrong file locations
Solution: Ensure routes match actual build output structure (/src/client/index.html)

Problem: Static Assets Getting 401/404 Errors

Cause: All requests routed to index.html instead of serving static files
Solution: Add specific routes for assets before catch-all route

Problem: Build Conflicts

Cause: Multiple vercel.json files or incorrect TypeScript compilation scope
Solution:
- Remove any vercel.json files from subdirectories
- Ensure server has its own tsconfig.json
- Root build script should only build server

Problem: Mysterious Build Artifacts

Cause: Root build script building both client and server
Solution: Let Vercel handle client build via @vercel/static-build

Deployment Process

Root Build: yarn build → Compiles server TypeScript only
Client Build: Vercel runs @vercel/static-build → Builds React app separately
Deployment: Routes configured to serve from correct locations

Environment Variables for Production

Set these in your Vercel dashboard:

SUPABASE_URL=your_supabase_url
SUPABASE_ANON_KEY=your_supabase_anon_key
SUPABASE_SERVICE_ROLE_KEY=your_supabase_service_role_key
NODE_ENV=production
ENABLE_SCRAPER=true  # Optional: Enable scraping in production
CRON_SECRET=your_random_secret_key  # For manual cron triggers (optional)

Verification Steps

Check build logs: No conflicting configurations mentioned
Verify file structure: Static files in expected locations
Test routes: API calls work, static assets load, SPA routing works
No build artifacts: Clean deployment output

This configuration enables true full-stack deployment where both frontend and backend are deployed from a single repository while maintaining proper separation and build processes.

Documentation

CLAUDE.md - Comprehensive technical documentation
DEPLOYMENT.md - Production deployment guide

License

MIT License - see LICENSE file for details