Chapter 3: Database Design Principles
September 28, 2025 · View on GitHub
The Importance of Good Design
Database design determines how efficiently your application stores and retrieves data. Poor design leads to:
- Data redundancy and inconsistencies
- Difficult maintenance and updates
- Poor performance at scale
- Complex queries for simple operations
Good design provides:
- Data integrity and consistency
- Efficient storage and retrieval
- Scalability and flexibility
- Clear relationships between entities
Entity-Relationship Modeling
Before creating tables, model your data conceptually:
Entities: Things you want to store (customers, products, orders) Attributes: Properties of entities (customer name, product price) Relationships: How entities connect (customers place orders)
Relationship Types
One-to-One: Each row in Table A relates to exactly one row in Table B
- User → UserProfile
- Employee → ParkingSpace
One-to-Many: Each row in Table A relates to zero or more rows in Table B
- Customer → Orders
- Category → Products
Many-to-Many: Rows in Table A relate to multiple rows in Table B and vice versa
- Students ↔ Courses
- Products ↔ Tags
Normalization
Normalization eliminates redundancy and ensures data integrity through progressive rules:
First Normal Form (1NF)
- Each column contains atomic (indivisible) values
- Each column contains values of a single type
- Each column has a unique name
- Order of rows doesn't matter
Violation:
-- Bad: Multiple phone numbers in one field
CREATE TABLE customers_bad (
id INT PRIMARY KEY,
name VARCHAR(100),
phones VARCHAR(255) -- "555-0001, 555-0002"
);
Fixed:
-- Good: Separate table for phones
CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(100)
);
CREATE TABLE customer_phones (
id INT PRIMARY KEY,
customer_id INT,
phone VARCHAR(20),
phone_type ENUM('home', 'work', 'mobile'),
FOREIGN KEY (customer_id) REFERENCES customers(id)
);
Second Normal Form (2NF)
- Meets 1NF requirements
- All non-key columns depend on the entire primary key
Violation:
-- Bad: Product name depends only on product_id, not full composite key
CREATE TABLE order_items_bad (
order_id INT,
product_id INT,
product_name VARCHAR(100), -- Depends only on product_id!
quantity INT,
PRIMARY KEY (order_id, product_id)
);
Fixed:
-- Good: Product details in separate table
CREATE TABLE products (
id INT PRIMARY KEY,
name VARCHAR(100),
price DECIMAL(10,2)
);
CREATE TABLE order_items (
order_id INT,
product_id INT,
quantity INT,
PRIMARY KEY (order_id, product_id),
FOREIGN KEY (product_id) REFERENCES products(id)
);
Third Normal Form (3NF)
- Meets 2NF requirements
- No transitive dependencies (non-key columns depending on other non-key columns)
Violation:
-- Bad: City and state depend on zip_code, not customer_id
CREATE TABLE customers_bad (
id INT PRIMARY KEY,
name VARCHAR(100),
zip_code VARCHAR(10),
city VARCHAR(50), -- Depends on zip_code!
state VARCHAR(2) -- Depends on zip_code!
);
Fixed:
-- Good: Separate table for zip code data
CREATE TABLE zip_codes (
zip_code VARCHAR(10) PRIMARY KEY,
city VARCHAR(50),
state VARCHAR(2)
);
CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(100),
zip_code VARCHAR(10),
FOREIGN KEY (zip_code) REFERENCES zip_codes(zip_code)
);
Primary Keys
Every table needs a primary key to uniquely identify rows:
Natural vs Surrogate Keys
Natural Key: Meaningful data that uniquely identifies a row
-- Email as natural key
CREATE TABLE users (
email VARCHAR(100) PRIMARY KEY,
name VARCHAR(100)
);
Surrogate Key: Artificial identifier with no business meaning
-- Auto-increment ID as surrogate key
CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
email VARCHAR(100) UNIQUE,
name VARCHAR(100)
);
Surrogate keys are usually preferred because:
- Natural keys can change (users change email addresses)
- Surrogate keys are more efficient (integers vs strings)
- They simplify foreign key relationships
Composite Keys
Sometimes multiple columns together form the primary key:
CREATE TABLE course_enrollments (
student_id INT,
course_id INT,
enrollment_date DATE,
grade VARCHAR(2),
PRIMARY KEY (student_id, course_id)
);
Foreign Keys and Referential Integrity
Foreign keys enforce relationships between tables:
CREATE TABLE orders (
id INT AUTO_INCREMENT PRIMARY KEY,
customer_id INT NOT NULL,
order_date DATETIME DEFAULT CURRENT_TIMESTAMP,
total DECIMAL(10,2),
FOREIGN KEY (customer_id) REFERENCES customers(id)
);
Cascade Options
Control what happens when referenced rows are updated or deleted:
CREATE TABLE order_items (
id INT AUTO_INCREMENT PRIMARY KEY,
order_id INT,
product_id INT,
quantity INT,
FOREIGN KEY (order_id)
REFERENCES orders(id)
ON DELETE CASCADE -- Delete items when order deleted
ON UPDATE CASCADE, -- Update if order ID changes
FOREIGN KEY (product_id)
REFERENCES products(id)
ON DELETE RESTRICT -- Prevent product deletion if ordered
);
Options:
CASCADE: Propagate the changeSET NULL: Set foreign key to NULLRESTRICT/NO ACTION: Prevent the operationSET DEFAULT: Set to default value (not supported in MySQL)
Indexes
Indexes speed up data retrieval at the cost of storage space and write performance:
When to Create Indexes
Create indexes on columns that are:
- Frequently used in WHERE clauses
- Used in JOIN conditions
- Used in ORDER BY clauses
- Part of unique constraints
-- Single column index
CREATE INDEX idx_email ON customers(email);
-- Composite index (order matters!)
CREATE INDEX idx_name ON customers(last_name, first_name);
-- Unique index
CREATE UNIQUE INDEX idx_unique_email ON customers(email);
Index Types
B-Tree (default): Good for range queries and sorting Hash: Fast for equality comparisons (MEMORY engine only) Full-text: For text searching
-- Full-text index for searching
CREATE FULLTEXT INDEX idx_description ON products(description);
-- Use with MATCH...AGAINST
SELECT * FROM products
WHERE MATCH(description) AGAINST('organic coffee');
Denormalization
Sometimes controlled redundancy improves performance:
When to Denormalize
- Read-heavy workloads
- Complex queries requiring many joins
- Reporting and analytics
Denormalization Techniques
Redundant Columns:
-- Store calculated values
ALTER TABLE orders
ADD COLUMN item_count INT,
ADD COLUMN total_amount DECIMAL(10,2);
-- Update via trigger or application logic
Materialized Views (simulated in MySQL):
-- Summary table updated periodically
CREATE TABLE daily_sales_summary (
date DATE PRIMARY KEY,
total_sales DECIMAL(10,2),
order_count INT,
unique_customers INT
);
-- Refresh via scheduled job
INSERT INTO daily_sales_summary
SELECT
DATE(order_date),
SUM(total),
COUNT(*),
COUNT(DISTINCT customer_id)
FROM orders
WHERE DATE(order_date) = CURDATE() - INTERVAL 1 DAY
GROUP BY DATE(order_date)
ON DUPLICATE KEY UPDATE
total_sales = VALUES(total_sales),
order_count = VALUES(order_count),
unique_customers = VALUES(unique_customers);
Common Design Patterns
Hierarchical Data
Adjacency List (Simple parent-child):
CREATE TABLE categories (
id INT PRIMARY KEY,
name VARCHAR(100),
parent_id INT,
FOREIGN KEY (parent_id) REFERENCES categories(id)
);
Path Enumeration:
CREATE TABLE categories (
id INT PRIMARY KEY,
name VARCHAR(100),
path VARCHAR(255) -- "1/3/7/15"
);
Temporal Data
Track changes over time:
CREATE TABLE product_prices (
id INT AUTO_INCREMENT PRIMARY KEY,
product_id INT,
price DECIMAL(10,2),
valid_from DATE,
valid_to DATE,
INDEX idx_validity (product_id, valid_from, valid_to)
);
Many-to-Many Relationships
Use junction tables:
CREATE TABLE students (
id INT PRIMARY KEY,
name VARCHAR(100)
);
CREATE TABLE courses (
id INT PRIMARY KEY,
name VARCHAR(100)
);
CREATE TABLE enrollments (
student_id INT,
course_id INT,
enrolled_date DATE,
grade VARCHAR(2),
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES students(id),
FOREIGN KEY (course_id) REFERENCES courses(id)
);
Real-World Example: E-commerce Database
-- Customers
CREATE TABLE customers (
id INT AUTO_INCREMENT PRIMARY KEY,
email VARCHAR(100) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
first_name VARCHAR(50),
last_name VARCHAR(50),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_email (email)
);
-- Products
CREATE TABLE products (
id INT AUTO_INCREMENT PRIMARY KEY,
sku VARCHAR(50) UNIQUE NOT NULL,
name VARCHAR(100) NOT NULL,
description TEXT,
price DECIMAL(10,2) NOT NULL,
stock_quantity INT DEFAULT 0,
category_id INT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_sku (sku),
INDEX idx_category (category_id),
FULLTEXT idx_search (name, description)
);
-- Orders
CREATE TABLE orders (
id INT AUTO_INCREMENT PRIMARY KEY,
customer_id INT NOT NULL,
order_number VARCHAR(20) UNIQUE NOT NULL,
status ENUM('pending', 'processing', 'shipped', 'delivered', 'cancelled'),
total_amount DECIMAL(10,2),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (customer_id) REFERENCES customers(id),
INDEX idx_customer (customer_id),
INDEX idx_status (status),
INDEX idx_created (created_at)
);
-- Order Items
CREATE TABLE order_items (
id INT AUTO_INCREMENT PRIMARY KEY,
order_id INT NOT NULL,
product_id INT NOT NULL,
quantity INT NOT NULL,
price DECIMAL(10,2) NOT NULL, -- Store historical price
FOREIGN KEY (order_id) REFERENCES orders(id) ON DELETE CASCADE,
FOREIGN KEY (product_id) REFERENCES products(id),
INDEX idx_order (order_id)
);
-- Shopping Cart
CREATE TABLE cart_items (
customer_id INT,
product_id INT,
quantity INT NOT NULL,
added_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (customer_id, product_id),
FOREIGN KEY (customer_id) REFERENCES customers(id) ON DELETE CASCADE,
FOREIGN KEY (product_id) REFERENCES products(id) ON DELETE CASCADE
);
Best Practices
- Start with normalization, denormalize only when necessary
- Use appropriate data types - don't use VARCHAR(255) for everything
- Add constraints to enforce business rules
- Index foreign keys for better join performance
- Document your schema with comments
- Use consistent naming conventions
- Plan for growth but don't over-engineer
- Regular maintenance - analyze and optimize tables
Summary
Good database design is the foundation of efficient applications. Normalization reduces redundancy, foreign keys maintain integrity, and indexes improve performance. Understanding when to apply or break these rules comes with experience.