Cost Optimization Principles
February 4, 2026 · View on GitHub
Universal patterns for building cost-effective systems
1. Cost-First Infrastructure Design
Rule: Perform cost analysis during planning, before implementation.
Why:
- Prevents expensive surprises
- Catches expensive patterns early
Process:
1. Estimate monthly costs per component
2. Compare cost vs value
3. Document cost implications
4. Get approval for costs above threshold
THEN implement
Example:
# ❌ Bad - deploy first, discover costs later
Database:
DBInstanceClass: db.r5.24xlarge # \$13,000/month - no review
# ✅ Good - cost analysis before deployment
# COST ANALYSIS:
# - Dev: db.t3.micro @ \$15/month
# - Prod: db.t3.small @ \$30/month (start small)
# - Scale trigger: 80% CPU sustained > 1 hour
# - Annual: \$360-\$600
Database:
DBInstanceClass: !If [IsProduction, db.t3.small, db.t3.micro]
MultiAZ: !If [IsProduction, true, false]
2. Pay-Per-Use Default
Rule: Default to pay-per-use. Add fixed costs only when measured need exists.
Why:
- No monthly bills when not used
- Costs grow with usage/revenue
Strategy:
Start with pay-per-use
Monitor actual usage
IF usage consistent & high for 3+ months
THEN consider reserved capacity
Examples:
# ❌ Bad - always-on EC2 for batch jobs
EC2Instance:
InstanceType: t3.large # \$60/month even if used 1 hour/day
# ✅ Good - Lambda pay-per-invocation
BatchFunction:
Runtime: python3.11
# Cost: \$0.20/hour when running → \$6/month for 1 hour/day
# ❌ Bad - provisioned RDS always on
Database:
DBInstanceClass: db.t3.small # \$30/month always
# ✅ Good - Aurora Serverless scales to zero
Database:
Engine: aurora-postgresql
ServerlessV2ScalingConfiguration:
MinCapacity: 0.5 # Scales down when idle
# Cost: ~\$40/month prod, ~\$5/month dev
Switch to reserved when:
- Running 24/7 for 30+ days
- Pay-per-use > reserved for 3+ months
- Usage predictable
3. Environment-Appropriate Sizing
Rule: Don't use production resources for development.
Why: Dev/staging can be 90% cheaper.
Strategy:
Development: Smallest tier that works, single-AZ, < \$50/month
Staging: Small tier, production data subset
Production: Sized for actual load + headroom, multi-AZ
Example:
# ❌ Bad - same size everywhere
Database:
DBInstanceClass: db.r5.xlarge # \$365/month × 3 envs = \$1,095/month
# ✅ Good - environment-appropriate
Database:
DBInstanceClass: !If [IsProd, db.r5.xlarge, !If [IsStaging, db.t3.small, db.t3.micro]]
MultiAZ: !If [IsProd, true, false]
# Dev: \$15/month, Staging: \$30/month, Prod: \$365/month
# Total: \$410/month (62% savings)
4. Cost Analysis Gates
Rule: Validate costs before deployment. Require approval above threshold.
Why: Catches expensive mistakes before production.
Process:
estimate_monthly_cost()
IF cost > threshold
THEN require_approval() + document_justification()
validate_environment_sizing()
THEN deploy
Example CI/CD gate:
# GitHub Actions
cost-check:
steps:
- run: infracost breakdown --path . > costs.json
- run: |
monthly_cost=$(jq '.totalMonthlyCost' costs.json)
if (( $(echo "$monthly_cost > 100" | bc -l) )); then
echo "❌ Cost $monthly_cost exceeds threshold"
exit 1
fi
Thresholds:
- < $50/month: Auto-approve
- $50-200: Team lead approval
- $200-1000: Manager approval
-
$1000: Director + business justification
5. Right-Size for Actual Load
Rule: Size based on measured load, not guesses. Monitor and adjust continuously.
Why: Avoid over/under-provisioning, adapt to change.
Process:
Monitor actual usage
IF consistently < 20% utilized THEN downsize
IF consistently > 80% utilized THEN upsize
Document rationale
Example:
Initial: db.t3.small (\$30/month)
After 30 days:
- CPU: 15% avg, 45% peak
- Storage: 3GB used / 20GB allocated
→ Action: Downsize to db.t3.micro (\$15/month, 50% savings)
After 6 months:
- CPU: 65% avg, 92% peak
- Performance degrading during peaks
→ Action: Upsize to db.t3.medium (\$60/month, needed for growth)
Monitor:
- Utilization (CPU, memory, storage)
- Performance (latency, throughput)
- Cost trends
Review frequency:
- New resources: Weekly for first month
- Established: Monthly
- Large costs (>$500/mo): Weekly
6. Eliminate Waste
Rule: Actively find and eliminate wasteful spending.
Why: Hidden waste accumulates quickly.
Common waste sources:
# 1. Orphaned resources (unattached EBS volumes)
aws ec2 describe-volumes --filters Name=status,Values=available
# Cost: \$0.10/GB/month (50GB = \$5/month wasted)
# 2. Over-provisioned (instances with <10% CPU)
# → Downsize or consider serverless
# 3. Forgotten test resources
# → Tag with auto-expiration: AutoDelete: "2025-10-30"
# 4. Excessive data transfer
# → Add CDN/caching, reduce cross-region calls
Monthly review checklist:
- Review all resources > $10/month
- Check for orphaned/unused resources
- Verify utilization (< 20% = candidate for downsizing)
- Review data transfer costs
- Check for forgotten test/dev resources
Summary
- Cost-First Infrastructure Design → Know costs before commit
- Pay-Per-Use Default → Start variable, go fixed when proven
- Environment-Appropriate Sizing → Don't use prod resources for dev
- Cost Analysis Gates → Prevent expensive mistakes
- Right-Size for Actual Load → Monitor and adjust
- Eliminate Waste → Actively remove unnecessary costs
Questions? Contact info@happyhippo.ai