BPDiscord
October 11, 2025 · View on GitHub
A full-stack TypeScript web application for scraping and analyzing Letterboxd user rating data. Features user comparison tools, hater rankings, and comprehensive rating statistics with a modern React frontend and Express.js backend.
Project Overview
BPDiscord provides powerful tools for analyzing Letterboxd movie rating data:
- User Comparison: Compare rating statistics between any two users
- Hater Rankings: Rank users by average rating (lowest = biggest "hater")
- Profile Analysis: Display user profiles with followers, following, and lists
- Rating Distributions: Visual histograms of rating patterns
- Data Scraping: Automated extraction of Letterboxd profile data
Project Structure
bpdiscord/
├── src/
│ ├── server/ # Backend API (Express + TypeScript)
│ │ ├── config/ # Database and Supabase configuration
│ │ ├── controllers/ # Business logic controllers
│ │ │ ├── authController.ts
│ │ │ ├── comparisonController.ts
│ │ │ ├── dataController.ts # Database operations
│ │ │ ├── filmUserController.ts # Database-first user operations
│ │ │ ├── scraperController.ts # Force-scraping endpoints (module exports)
│ │ │ └── userController.ts
│ │ ├── middleware/ # Authentication, validation, error handling
│ │ ├── routes/ # API route definitions
│ │ │ ├── authRoutes.ts
│ │ │ ├── comparisonRoutes.ts
│ │ │ ├── filmUserRoutes.ts # Database-first endpoints
│ │ │ ├── scraperRoutes.ts # Force-scraping endpoints
│ │ │ └── userRoutes.ts
│ │ ├── scraperFunctions.ts # Core scraping logic (browser, page management)
│ │ ├── utilities.ts # Helper functions and parsers
│ │ ├── constants.ts # Configuration constants
│ │ ├── types.ts # Server-specific TypeScript definitions
│ │ ├── server.ts # Main server file
│ │ ├── package.json # Server dependencies
│ │ └── dist/ # Compiled TypeScript output
│ ├── client/ # Frontend (Vite + React + TypeScript)
│ │ ├── components/ # React components
│ │ │ ├── Dashboard.tsx
│ │ │ ├── HaterRankings.tsx
│ │ │ ├── UserComparison.tsx
│ │ │ ├── PublicUserComparison.tsx
│ │ │ ├── ScraperInterface.tsx
│ │ │ └── UserProfile.tsx
│ │ ├── services/ # API service layer
│ │ ├── types.ts # Client-specific TypeScript definitions
│ │ ├── index.tsx # Main React app entry point
│ │ ├── vite.config.js # Vite configuration
│ │ ├── tailwind.config.js # Tailwind CSS configuration
│ │ ├── package.json # Client dependencies
│ │ └── build/ # Vite build output
├── CLAUDE.md # Comprehensive technical documentation
├── package.json # Root orchestration + shared dependencies
├── vercel.json # Deployment configuration
└── tsconfig.json # TypeScript configuration
Features
Public Features (No Authentication Required)
User Comparison (/compare)
- Compare rating statistics between any two Letterboxd users
- Side-by-side profile metrics (followers, following, lists, total films)
- Visual rating distribution comparison with percentages
- Highlighting of higher values for easy comparison
Hater Rankings (/hater-rankings)
- Rank all users by average movie rating (lowest first)
- Visual rating distribution histograms
- Trophy icon for the biggest "hater" (lowest average rating)
- Display names with username fallbacks
Protected Features (Authentication Required)
Dashboard (/dashboard)
- Profile Management: View and manage user account
- Data Fetcher: Scrape Letterboxd user data
- Enhanced Comparison: Full comparison tools
- Private Rankings: Authenticated hater rankings view
Data Scraping
- Extract user rating distributions from Letterboxd profiles
- Scrape user profile data (display name, followers, following, lists)
- Get complete film lists with ratings
- Database storage with intelligent upserts
Backend API
Authentication & Security
- JWT token-based authentication via Supabase
- Row Level Security (RLS) with service role bypass
- Rate limiting (100 requests/15min general, 20 requests/15min scraping)
- CORS protection and security headers
- Input validation and sanitization
Web Scraping
- Puppeteer browser automation for data extraction
- Cheerio HTML parsing for structured data extraction
- Retry mechanisms and error handling
- Data validation and normalization
Frontend
Modern React Application
- TypeScript for type safety
- Tailwind CSS for responsive design
- React Router for navigation
- Local storage for authentication state
- Real-time feedback and loading states
User Experience
- Responsive design for mobile and desktop
- Visual highlighting of comparison data
- Interactive histograms and charts
- Progressive enhancement with fallbacks
Prerequisites
- Node.js (v18 or higher)
- Yarn package manager
- Supabase account and project
Environment Variables
Backend (.env)
Create a .env file in the project root:
SUPABASE_URL=your_supabase_url
SUPABASE_ANON_KEY=your_supabase_anon_key
SUPABASE_SERVICE_ROLE_KEY=your_supabase_service_role_key
PORT=3001
NODE_ENV=development
CRON_SECRET=your_random_secret_key # For manual cron triggers
Frontend (.env)
Create a .env file in src/client/:
# VITE_API_URL=/api # Uses proxy in development, override for production
VITE_HOT_RELOAD=true
Security Notice
⚠️ Important: Never commit your .env files to version control! Both .env files are in .gitignore.
For production deployment, use environment variables. See DEPLOYMENT.md for details.
Installation
Quick Install (All Dependencies)
yarn install:all
Manual Installation
1. Install Root Dependencies
yarn install
2. Install Server Dependencies
cd src/server
yarn install
cd ../..
3. Install Client Dependencies
cd src/client
yarn install
cd ../..
Database Setup
- Create a Supabase project
- Set up the required tables:
Users Table
CREATE TABLE "Users" (
"lbusername" VARCHAR PRIMARY KEY,
"display_name" VARCHAR,
"followers" INTEGER DEFAULT 0,
"following" INTEGER DEFAULT 0,
"number_of_lists" INTEGER DEFAULT 0,
"created_at" TIMESTAMP DEFAULT NOW(),
"updated_at" TIMESTAMP DEFAULT NOW()
);
UserRatings Table
CREATE TABLE "UserRatings" (
"username" VARCHAR,
"rating" DECIMAL(2,1),
"count" INTEGER,
"created_at" TIMESTAMP DEFAULT NOW(),
PRIMARY KEY ("username", "rating")
);
- Configure Row Level Security (RLS) policies as needed
- Note your service role key for admin operations
Running the Application
Development Mode
Option 1: Start Both (Concurrent - Recommended)
yarn dev
- Server runs on
http://localhost:3001 - Client runs on
http://localhost:5173(or 5174 if 5173 is in use)
Option 2: Start Individually
Start the Server
yarn dev:server
Server runs on http://localhost:3001
Start the Client (new terminal)
yarn dev:client
Client runs on http://localhost:5173
Production Mode
Build and Start
# Build both server and client
yarn build
# Or build individually
yarn build:server
yarn build:client
# Start production server
yarn start
API Endpoints
Public Endpoints
Film User API (/api/film-users) - Database-First
GET /- Get all users with display namesGET /:username/ratings- Get user's ratings (database only)GET /:username/profile- Get user's profile (database only)GET /:username/complete- Get complete user data (database only)- Add
?fallback=scrapeto any endpoint to scrape if data missing
Comparison API (/api/comparison)
GET /usernames- Get list of users with display namesPOST /user-ratings- Get user's ratings and profile dataPOST /compare- Compare two users' dataGET /hater-rankings- Get all users ranked by average rating
Protected Endpoints (Require Authentication)
Authentication (/api/auth)
POST /signup- User registrationPOST /login- User loginPOST /logout- Session terminationPOST /password-reset- Password reset
Scraper API (/api/scraper) - Force-Scraping Only
POST /getUserRatings- Force scrape user's rating distributionPOST /getUserProfile- Force scrape complete profile + ratingsPOST /getAllFilms- Force scrape user's film listPOST /getData- Generic scraping with custom selectors- Note: Disabled in production unless
ENABLE_SCRAPER=true
Cron API (/api/cron) - Automated Data Refresh
POST /refresh-all-users- Refresh all users' data from LetterboxdPOST /refresh-user/:username- Refresh specific user's data- Authentication: Vercel Cron header or Bearer token
- Schedule: Daily at 2 AM (configurable in vercel.json)
User Management (/api/users)
GET /- Get all usersGET /me- Get current user profileGET /:id- Get specific userPUT /:id- Update userDELETE /:id- Delete user
Health Check
GET /api/health- Server status
Usage
Public Access
- Visit
http://localhost:5173/comparefor user comparison - Visit
http://localhost:5173/hater-rankingsfor rankings - No authentication required for public features
Authenticated Access
- Start the application (see Running section)
- Navigate to
http://localhost:5173(redirects to login) - Sign up or log in to access dashboard
- Use the Data Fetcher to scrape Letterboxd profiles
- View comparisons and rankings with your data
Testing Cron Job Endpoints
The cron endpoints allow automated data refresh for all users. Here's how to test them:
Prerequisites
-
Make sure you have a
CRON_SECRETin your.envfile:CRON_SECRET=your_random_secret_key -
Generate a secure secret (optional):
# Using OpenSSL (Mac/Linux) openssl rand -base64 32 # Or using Node.js node -e "console.log(require('crypto').randomBytes(32).toString('base64'))"
Testing in Development
1. Start the development server:
yarn dev:server
2. Test refresh all users:
curl -X POST http://localhost:3001/api/cron/refresh-all-users \
-H "Authorization: Bearer your_cron_secret" \
-H "Content-Type: application/json"
3. Test refresh specific user:
curl -X POST http://localhost:3001/api/cron/refresh-user/username \
-H "Authorization: Bearer your_cron_secret" \
-H "Content-Type: application/json"
Expected Response:
{
"message": "Refresh completed",
"duration": "45.23s",
"results": {
"totalUsers": 5,
"success": 5,
"failed": 0,
"skipped": 0,
"errors": []
}
}
Testing in Production (Vercel)
1. Set CRON_SECRET in Vercel Dashboard:
- Go to your project → Settings → Environment Variables
- Add
CRON_SECRETwith a secure value - Redeploy if needed
2. Test the production endpoint:
curl -X POST https://your-app.vercel.app/api/cron/refresh-all-users \
-H "Authorization: Bearer your_cron_secret" \
-H "Content-Type: application/json"
3. Check Vercel Logs:
- Go to your project dashboard
- Click Functions or Logs
- Look for
/api/cron/refresh-all-usersexecutions - View detailed logs including:
- Start time and duration
- Success/failure counts
- Individual user refresh status
- Any error messages
Automatic Cron Execution (Production Only)
Once deployed to Vercel (Pro plan or higher):
- Cron runs automatically at the scheduled time (default: 2 AM UTC daily)
- No authentication needed - Vercel adds the
x-vercel-cron-signatureheader automatically - View cron status:
- Vercel Dashboard → Your Project → Settings → Cron Jobs
- See scheduled jobs, last run time, and execution history
Monitoring Cron Job Results
View detailed logs:
# In development - check terminal output
yarn dev:server
# In production - check Vercel logs:
# 1. Vercel Dashboard → Your Project → Functions
# 2. Filter by "/api/cron/refresh-all-users"
# 3. Click any execution to see full logs
Log output includes:
=== Starting scheduled refresh of all users ===
Found 5 users to refresh
[1/5] Refreshing data for: username1
[1/5] ✓ Successfully refreshed username1
[2/5] Refreshing data for: username2
...
=== Refresh Summary ===
Total users: 5
✓ Success: 5
✗ Failed: 0
Duration: 45.23s
Troubleshooting
401 Unauthorized Error:
- Check that
CRON_SECRETmatches in.envand your request - Ensure the
Authorizationheader is formatted correctly:Bearer your_secret
Timeout Errors:
- Vercel Hobby plan has 10-second timeout (won't work for many users)
- Vercel Pro plan has 300-second timeout (5 minutes)
- Consider reducing the number of users or optimizing scraping
Rate Limiting:
- The refresh includes 2-second delays between users to avoid being blocked
- If you have many users, the process will take time
- Monitor logs to see progress
Data Processing
Scraping Architecture
The scraping system is organized into modular, reusable components:
Core Scraping Functions (scraperFunctions.ts)
- Browser Management: Shared browser instances with automatic cleanup
- Page Creation: Optimized Puppeteer page setup with stealth measures
- User Profile Scraping: Extract display name, followers, following, lists
- Rating Scraping: Parse rating histogram from user profile
- Film Scraping: Multi-page film list extraction with progress tracking
- Memory Management: Aggressive cleanup and garbage collection
Utility Functions (utilities.ts)
- parseStarRating(): Self-contained star rating parser (works in browser context)
- detectLikedStatus(): Detect liked films from DOM structure
- parseNumberFromText(): Handle K/M suffixes (1.2K → 1200)
- validateUserProfile(): Verify page content and detect 404s
- Data Formatters: Consistent API response formatting
Configuration (constants.ts)
- LETTERBOXD_SELECTORS: CSS selectors for Letterboxd elements
- BROWSER_CONFIG: Timeouts, delays, and resource limits
- STAR_PATTERNS: Star rating patterns for parsing
- BLOCKED_RESOURCES: Resources to block for performance
Scraping Algorithm
- Input Validation: Verify Letterboxd username format
- Browser Launch: Shared Puppeteer browser with stealth configuration
- Page Navigation: Load user's Letterboxd profile with retry strategies
- Data Extraction: Parse HTML using CSS selectors (Cheerio or browser context)
- Data Validation: Ensure extracted data is valid and complete
- Database Storage: Upsert data with conflict resolution
- Cleanup: Close pages and manage memory
Rating Calculations
- Average Rating: Σ(rating × count) / Σ(count)
- Percentage Distribution: (count / total) × 100
- Hater Rankings: Sort by ascending average rating
Number Parsing
- Handles abbreviated formats: "1.2K" → 1200, "2.5M" → 2500000
- Removes commas and normalizes text
- Graceful fallback to 0 for invalid data
Development
Available Scripts
Root Scripts
yarn dev- Start both server and client concurrentlyyarn build- Build both server and clientyarn build:server- Build server onlyyarn build:client- Build client onlyyarn start- Start production serveryarn install:all- Install all dependenciesyarn clean- Clean build directories
Server Scripts (run from src/server/)
yarn dev- Development server with hot reloadyarn build- Build TypeScript to JavaScriptyarn start- Start production serveryarn watch- Build in watch modeyarn clean- Clean build directory
Client Scripts (run from src/client/)
yarn dev- Vite development server with hot reloadyarn build- Build for production with Viteyarn preview- Preview production build locally
Code Organization
Backend Architecture
- Controllers: Handle business logic and HTTP request/response
- scraperController.ts: Force-scraping endpoints (module exports pattern)
- filmUserController.ts: Database-first operations with fallback
- dataController.ts: Database CRUD operations
- comparisonController.ts: User comparison logic
- Core Modules:
- scraperFunctions.ts: Browser automation, page management, scraping logic
- utilities.ts: Helper functions (parsers, validators, formatters)
- constants.ts: Configuration constants (selectors, timeouts, patterns)
- Routes: Define API endpoints and middleware
- Middleware: Authentication, validation, error handling
- Types: Shared TypeScript interfaces
Frontend Architecture
- Components: Reusable React components
- Services: API client and utility functions
- Types: Client-specific TypeScript definitions
Technologies Used
Backend
- Express.js - Web framework
- TypeScript - Type safety and modular architecture
- Supabase - PostgreSQL database and authentication
- Puppeteer - Browser automation for web scraping
- Cheerio - Server-side HTML parsing
- JWT - Authentication tokens
- Helmet - Security headers
- Express Rate Limit - API rate limiting
- @sparticuz/chromium - Serverless Chrome for Vercel deployment
Frontend
- Vite - Fast build tool and dev server
- React 18 - UI framework
- TypeScript - Type safety
- Tailwind CSS - Utility-first styling
- React Router - Client-side routing
- Heroicons - Icon library
Database
- PostgreSQL (via Supabase) - Primary database
- Row Level Security - Data access control
- Real-time subscriptions - Live data updates
Error Handling
Backend
- Global error middleware with structured responses
- Retry mechanisms for scraping failures
- Graceful degradation for external service issues
- Detailed logging for debugging
Frontend
- Error boundaries for React component errors
- User-friendly error messages
- Loading states and retry options
- Fallback UI for missing data
Performance Optimizations
- Database: Efficient indexes and parallel queries
- Frontend: Component memoization and lazy loading
- Scraping: Resource blocking and browser reuse
- API: Rate limiting and response caching
- Modular Architecture: Separated scraping logic into reusable functions
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests if applicable
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Deployment to Vercel
Full-Stack Deployment Configuration
This project is configured for Vercel deployment with both frontend and backend. Here's the complete setup:
Project Structure for Deployment
project/
├── package.json # Root orchestration
├── vercel.json # Deployment configuration
├── tsconfig.json # Root TypeScript config
└── src/
├── client/ # Frontend (React/Vite)
│ ├── package.json
│ ├── vite.config.js
│ └── build/ # Generated by Vite
└── server/ # Backend (Express/Node)
├── package.json
├── tsconfig.json # Server-specific config
└── dist/ # Generated by TypeScript
Critical Configuration Files
1. Root package.json Build Script
{
"scripts": {
"build": "cd src/server && tsc"
}
}
Important: Root build script should ONLY build the server. Vercel handles client building separately.
2. Server tsconfig.json (src/server/tsconfig.json)
{
"compilerOptions": {
"outDir": "./dist",
"rootDir": "./"
// ... other options
},
"include": ["./**/*"],
"exclude": ["node_modules", "dist", "**/*.test.ts"]
}
Critical: Server needs its own tsconfig.json to prevent compiling client files.
Route Configuration Explained
Routes are processed in order:
- API Routes (
/api/(.*)) → Server function - Asset Routes (
/assets/(.*)) → Static files - Specific Static Files → favicon, manifest, etc.
- Catch-All (
/(.*)) → React app (SPA routing)
Common Deployment Issues and Solutions
Problem: 404 Errors on Deployment
- Cause: Routes pointing to wrong file locations
- Solution: Ensure routes match actual build output structure (
/src/client/index.html)
Problem: Static Assets Getting 401/404 Errors
- Cause: All requests routed to index.html instead of serving static files
- Solution: Add specific routes for assets before catch-all route
Problem: Build Conflicts
- Cause: Multiple
vercel.jsonfiles or incorrect TypeScript compilation scope - Solution:
- Remove any
vercel.jsonfiles from subdirectories - Ensure server has its own
tsconfig.json - Root build script should only build server
- Remove any
Problem: Mysterious Build Artifacts
- Cause: Root build script building both client and server
- Solution: Let Vercel handle client build via
@vercel/static-build
Deployment Process
- Root Build:
yarn build→ Compiles server TypeScript only - Client Build: Vercel runs
@vercel/static-build→ Builds React app separately - Deployment: Routes configured to serve from correct locations
Environment Variables for Production
Set these in your Vercel dashboard:
SUPABASE_URL=your_supabase_url
SUPABASE_ANON_KEY=your_supabase_anon_key
SUPABASE_SERVICE_ROLE_KEY=your_supabase_service_role_key
NODE_ENV=production
ENABLE_SCRAPER=true # Optional: Enable scraping in production
CRON_SECRET=your_random_secret_key # For manual cron triggers (optional)
Verification Steps
- Check build logs: No conflicting configurations mentioned
- Verify file structure: Static files in expected locations
- Test routes: API calls work, static assets load, SPA routing works
- No build artifacts: Clean deployment output
This configuration enables true full-stack deployment where both frontend and backend are deployed from a single repository while maintaining proper separation and build processes.
Documentation
- CLAUDE.md - Comprehensive technical documentation
- DEPLOYMENT.md - Production deployment guide
License
MIT License - see LICENSE file for details