Chapter 3: Database Design Principles

September 28, 2025 · View on GitHub

The Importance of Good Design

Database design determines how efficiently your application stores and retrieves data. Poor design leads to:

Data redundancy and inconsistencies
Difficult maintenance and updates
Poor performance at scale
Complex queries for simple operations

Good design provides:

Data integrity and consistency
Efficient storage and retrieval
Scalability and flexibility
Clear relationships between entities

Entity-Relationship Modeling

Before creating tables, model your data conceptually:

Entities: Things you want to store (customers, products, orders) Attributes: Properties of entities (customer name, product price) Relationships: How entities connect (customers place orders)

Relationship Types

One-to-One: Each row in Table A relates to exactly one row in Table B

User → UserProfile
Employee → ParkingSpace

One-to-Many: Each row in Table A relates to zero or more rows in Table B

Customer → Orders
Category → Products

Many-to-Many: Rows in Table A relate to multiple rows in Table B and vice versa

Students ↔ Courses
Products ↔ Tags

Normalization

Normalization eliminates redundancy and ensures data integrity through progressive rules:

First Normal Form (1NF)

Each column contains atomic (indivisible) values
Each column contains values of a single type
Each column has a unique name
Order of rows doesn't matter

Violation:

-- Bad: Multiple phone numbers in one field
CREATE TABLE customers_bad (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    phones VARCHAR(255)  -- "555-0001, 555-0002"
);

Fixed:

-- Good: Separate table for phones
CREATE TABLE customers (
    id INT PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE customer_phones (
    id INT PRIMARY KEY,
    customer_id INT,
    phone VARCHAR(20),
    phone_type ENUM('home', 'work', 'mobile'),
    FOREIGN KEY (customer_id) REFERENCES customers(id)
);

Second Normal Form (2NF)

Meets 1NF requirements
All non-key columns depend on the entire primary key

Violation:

-- Bad: Product name depends only on product_id, not full composite key
CREATE TABLE order_items_bad (
    order_id INT,
    product_id INT,
    product_name VARCHAR(100),  -- Depends only on product_id!
    quantity INT,
    PRIMARY KEY (order_id, product_id)
);

Fixed:

-- Good: Product details in separate table
CREATE TABLE products (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    price DECIMAL(10,2)
);

CREATE TABLE order_items (
    order_id INT,
    product_id INT,
    quantity INT,
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (product_id) REFERENCES products(id)
);

Third Normal Form (3NF)

Meets 2NF requirements
No transitive dependencies (non-key columns depending on other non-key columns)

Violation:

-- Bad: City and state depend on zip_code, not customer_id
CREATE TABLE customers_bad (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    zip_code VARCHAR(10),
    city VARCHAR(50),       -- Depends on zip_code!
    state VARCHAR(2)        -- Depends on zip_code!
);

Fixed:

-- Good: Separate table for zip code data
CREATE TABLE zip_codes (
    zip_code VARCHAR(10) PRIMARY KEY,
    city VARCHAR(50),
    state VARCHAR(2)
);

CREATE TABLE customers (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    zip_code VARCHAR(10),
    FOREIGN KEY (zip_code) REFERENCES zip_codes(zip_code)
);

Primary Keys

Every table needs a primary key to uniquely identify rows:

Natural vs Surrogate Keys

Natural Key: Meaningful data that uniquely identifies a row

-- Email as natural key
CREATE TABLE users (
    email VARCHAR(100) PRIMARY KEY,
    name VARCHAR(100)
);

Surrogate Key: Artificial identifier with no business meaning

-- Auto-increment ID as surrogate key
CREATE TABLE users (
    id INT AUTO_INCREMENT PRIMARY KEY,
    email VARCHAR(100) UNIQUE,
    name VARCHAR(100)
);

Surrogate keys are usually preferred because:

Natural keys can change (users change email addresses)
Surrogate keys are more efficient (integers vs strings)
They simplify foreign key relationships

Composite Keys

Sometimes multiple columns together form the primary key:

CREATE TABLE course_enrollments (
    student_id INT,
    course_id INT,
    enrollment_date DATE,
    grade VARCHAR(2),
    PRIMARY KEY (student_id, course_id)
);

Foreign Keys and Referential Integrity

Foreign keys enforce relationships between tables:

CREATE TABLE orders (
    id INT AUTO_INCREMENT PRIMARY KEY,
    customer_id INT NOT NULL,
    order_date DATETIME DEFAULT CURRENT_TIMESTAMP,
    total DECIMAL(10,2),
    FOREIGN KEY (customer_id) REFERENCES customers(id)
);

Cascade Options

Control what happens when referenced rows are updated or deleted:

CREATE TABLE order_items (
    id INT AUTO_INCREMENT PRIMARY KEY,
    order_id INT,
    product_id INT,
    quantity INT,
    FOREIGN KEY (order_id)
        REFERENCES orders(id)
        ON DELETE CASCADE      -- Delete items when order deleted
        ON UPDATE CASCADE,     -- Update if order ID changes
    FOREIGN KEY (product_id)
        REFERENCES products(id)
        ON DELETE RESTRICT     -- Prevent product deletion if ordered
);

Options:

CASCADE: Propagate the change
SET NULL: Set foreign key to NULL
RESTRICT / NO ACTION: Prevent the operation
SET DEFAULT: Set to default value (not supported in MySQL)

Indexes

Indexes speed up data retrieval at the cost of storage space and write performance:

When to Create Indexes

Create indexes on columns that are:

Frequently used in WHERE clauses
Used in JOIN conditions
Used in ORDER BY clauses
Part of unique constraints

-- Single column index
CREATE INDEX idx_email ON customers(email);

-- Composite index (order matters!)
CREATE INDEX idx_name ON customers(last_name, first_name);

-- Unique index
CREATE UNIQUE INDEX idx_unique_email ON customers(email);

Index Types

B-Tree (default): Good for range queries and sorting Hash: Fast for equality comparisons (MEMORY engine only) Full-text: For text searching

-- Full-text index for searching
CREATE FULLTEXT INDEX idx_description ON products(description);

-- Use with MATCH...AGAINST
SELECT * FROM products
WHERE MATCH(description) AGAINST('organic coffee');

Denormalization

Sometimes controlled redundancy improves performance:

When to Denormalize

Read-heavy workloads
Complex queries requiring many joins
Reporting and analytics

Denormalization Techniques

Redundant Columns:

-- Store calculated values
ALTER TABLE orders
ADD COLUMN item_count INT,
ADD COLUMN total_amount DECIMAL(10,2);

-- Update via trigger or application logic

Materialized Views (simulated in MySQL):

-- Summary table updated periodically
CREATE TABLE daily_sales_summary (
    date DATE PRIMARY KEY,
    total_sales DECIMAL(10,2),
    order_count INT,
    unique_customers INT
);

-- Refresh via scheduled job
INSERT INTO daily_sales_summary
SELECT
    DATE(order_date),
    SUM(total),
    COUNT(*),
    COUNT(DISTINCT customer_id)
FROM orders
WHERE DATE(order_date) = CURDATE() - INTERVAL 1 DAY
GROUP BY DATE(order_date)
ON DUPLICATE KEY UPDATE
    total_sales = VALUES(total_sales),
    order_count = VALUES(order_count),
    unique_customers = VALUES(unique_customers);

Common Design Patterns

Hierarchical Data

Adjacency List (Simple parent-child):

CREATE TABLE categories (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    parent_id INT,
    FOREIGN KEY (parent_id) REFERENCES categories(id)
);

Path Enumeration:

CREATE TABLE categories (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    path VARCHAR(255)  -- "1/3/7/15"
);

Temporal Data

Track changes over time:

CREATE TABLE product_prices (
    id INT AUTO_INCREMENT PRIMARY KEY,
    product_id INT,
    price DECIMAL(10,2),
    valid_from DATE,
    valid_to DATE,
    INDEX idx_validity (product_id, valid_from, valid_to)
);

Many-to-Many Relationships

Use junction tables:

CREATE TABLE students (
    id INT PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE courses (
    id INT PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE enrollments (
    student_id INT,
    course_id INT,
    enrolled_date DATE,
    grade VARCHAR(2),
    PRIMARY KEY (student_id, course_id),
    FOREIGN KEY (student_id) REFERENCES students(id),
    FOREIGN KEY (course_id) REFERENCES courses(id)
);

Real-World Example: E-commerce Database

-- Customers
CREATE TABLE customers (
    id INT AUTO_INCREMENT PRIMARY KEY,
    email VARCHAR(100) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_email (email)
);

-- Products
CREATE TABLE products (
    id INT AUTO_INCREMENT PRIMARY KEY,
    sku VARCHAR(50) UNIQUE NOT NULL,
    name VARCHAR(100) NOT NULL,
    description TEXT,
    price DECIMAL(10,2) NOT NULL,
    stock_quantity INT DEFAULT 0,
    category_id INT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_sku (sku),
    INDEX idx_category (category_id),
    FULLTEXT idx_search (name, description)
);

-- Orders
CREATE TABLE orders (
    id INT AUTO_INCREMENT PRIMARY KEY,
    customer_id INT NOT NULL,
    order_number VARCHAR(20) UNIQUE NOT NULL,
    status ENUM('pending', 'processing', 'shipped', 'delivered', 'cancelled'),
    total_amount DECIMAL(10,2),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (customer_id) REFERENCES customers(id),
    INDEX idx_customer (customer_id),
    INDEX idx_status (status),
    INDEX idx_created (created_at)
);

-- Order Items
CREATE TABLE order_items (
    id INT AUTO_INCREMENT PRIMARY KEY,
    order_id INT NOT NULL,
    product_id INT NOT NULL,
    quantity INT NOT NULL,
    price DECIMAL(10,2) NOT NULL,  -- Store historical price
    FOREIGN KEY (order_id) REFERENCES orders(id) ON DELETE CASCADE,
    FOREIGN KEY (product_id) REFERENCES products(id),
    INDEX idx_order (order_id)
);

-- Shopping Cart
CREATE TABLE cart_items (
    customer_id INT,
    product_id INT,
    quantity INT NOT NULL,
    added_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (customer_id, product_id),
    FOREIGN KEY (customer_id) REFERENCES customers(id) ON DELETE CASCADE,
    FOREIGN KEY (product_id) REFERENCES products(id) ON DELETE CASCADE
);

Best Practices

Start with normalization, denormalize only when necessary
Use appropriate data types - don't use VARCHAR(255) for everything
Add constraints to enforce business rules
Index foreign keys for better join performance
Document your schema with comments
Use consistent naming conventions
Plan for growth but don't over-engineer
Regular maintenance - analyze and optimize tables

Summary

Good database design is the foundation of efficient applications. Normalization reduces redundancy, foreign keys maintain integrity, and indexes improve performance. Understanding when to apply or break these rules comes with experience.

Next: Chapter 4: Advanced Queries and Joins