Chapter 2: SQL Fundamentals - The Language of Databases

September 28, 2025 · View on GitHub

Understanding SQL

SQL (Structured Query Language) is the standard language for interacting with relational databases. It's declarative – you describe what you want, not how to get it. This distinction is crucial: while programming languages require you to specify steps, SQL lets the database optimize the execution.

SQL consists of several sublanguages:

  • DDL (Data Definition Language): CREATE, ALTER, DROP
  • DML (Data Manipulation Language): SELECT, INSERT, UPDATE, DELETE
  • DCL (Data Control Language): GRANT, REVOKE
  • TCL (Transaction Control Language): COMMIT, ROLLBACK, SAVEPOINT

SELECT: The Heart of SQL

The SELECT statement retrieves data. Its basic structure:

SELECT column1, column2
FROM table_name
WHERE condition
ORDER BY column;

Column Selection

Select specific columns:

SELECT first_name, last_name FROM customers;

Select all columns (use sparingly in production):

SELECT * FROM customers;

Use aliases for readability:

SELECT
    first_name AS fname,
    last_name AS lname,
    CONCAT(first_name, ' ', last_name) AS full_name
FROM customers;

The WHERE Clause

WHERE filters rows based on conditions:

-- Equality
SELECT * FROM products WHERE category = 'coffee';

-- Inequality
SELECT * FROM products WHERE price != 10.00;
SELECT * FROM products WHERE price <> 10.00;  -- Alternative syntax

-- Comparison
SELECT * FROM products WHERE price > 5.00;
SELECT * FROM products WHERE price <= 10.00;

-- Range
SELECT * FROM products WHERE price BETWEEN 3.00 AND 8.00;

-- List membership
SELECT * FROM products WHERE category IN ('coffee', 'tea');

-- Pattern matching
SELECT * FROM customers WHERE email LIKE '%gmail.com';
SELECT * FROM customers WHERE last_name LIKE 'Sm_th';  -- _ matches single character

-- NULL checking
SELECT * FROM customers WHERE phone IS NULL;
SELECT * FROM customers WHERE phone IS NOT NULL;

Logical Operators

Combine conditions with AND, OR, NOT:

-- AND: All conditions must be true
SELECT * FROM products
WHERE category = 'coffee' AND price < 5.00;

-- OR: At least one condition must be true
SELECT * FROM products
WHERE category = 'coffee' OR category = 'tea';

-- NOT: Negates a condition
SELECT * FROM products
WHERE NOT category = 'pastry';

-- Complex combinations (use parentheses for clarity)
SELECT * FROM products
WHERE (category = 'coffee' OR category = 'tea')
  AND price < 3.00;

Sorting with ORDER BY

-- Ascending (default)
SELECT * FROM customers ORDER BY last_name;
SELECT * FROM customers ORDER BY last_name ASC;

-- Descending
SELECT * FROM products ORDER BY price DESC;

-- Multiple columns
SELECT * FROM customers
ORDER BY last_name, first_name;

-- Sort by expression
SELECT *, (price * 1.1) AS price_with_tax
FROM products
ORDER BY price_with_tax DESC;

Limiting Results

-- First 5 rows
SELECT * FROM products LIMIT 5;

-- Skip 10, take 5
SELECT * FROM products LIMIT 5 OFFSET 10;

-- Alternative syntax
SELECT * FROM products LIMIT 10, 5;  -- Skip 10, take 5

Aggregate Functions

Aggregate functions compute single values from multiple rows:

-- Count rows
SELECT COUNT(*) FROM customers;
SELECT COUNT(phone) FROM customers;  -- Counts non-NULL values
SELECT COUNT(DISTINCT city) FROM customers;

-- Sum values
SELECT SUM(price) FROM products;
SELECT SUM(quantity * price) AS total FROM order_items;

-- Average
SELECT AVG(price) FROM products;
SELECT AVG(price) FROM products WHERE category = 'coffee';

-- Min/Max
SELECT MIN(price) AS cheapest, MAX(price) AS most_expensive FROM products;

-- Standard deviation and variance
SELECT STD(price), VARIANCE(price) FROM products;

GROUP BY: Organizing Results

GROUP BY creates groups for aggregate functions:

-- Count by category
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category;

-- Average price per category
SELECT category, AVG(price) AS avg_price
FROM products
GROUP BY category
ORDER BY avg_price DESC;

-- Multiple grouping columns
SELECT category, available, COUNT(*) AS count
FROM products
GROUP BY category, available;

-- Group by expression
SELECT
    YEAR(created_at) AS year,
    MONTH(created_at) AS month,
    COUNT(*) AS signups
FROM customers
GROUP BY YEAR(created_at), MONTH(created_at);

HAVING: Filtering Groups

WHERE filters rows before grouping, HAVING filters after:

-- Categories with more than 3 products
SELECT category, COUNT(*) AS count
FROM products
GROUP BY category
HAVING COUNT(*) > 3;

-- Categories with average price over \$5
SELECT category, AVG(price) AS avg_price
FROM products
GROUP BY category
HAVING AVG(price) > 5.00;

-- Combine WHERE and HAVING
SELECT category, COUNT(*) AS available_count
FROM products
WHERE available = TRUE
GROUP BY category
HAVING COUNT(*) >= 2;

INSERT: Adding Data

Basic INSERT

-- Single row
INSERT INTO customers (first_name, last_name, email)
VALUES ('Sarah', 'Connor', 'sarah@example.com');

-- Multiple rows
INSERT INTO products (name, category, price) VALUES
    ('Mocha', 'coffee', 4.50),
    ('Hot Chocolate', 'other', 3.00),
    ('Chai Latte', 'tea', 3.75);

INSERT with SELECT

-- Copy data from another table
INSERT INTO customers_archive
SELECT * FROM customers
WHERE created_at < '2023-01-01';

-- Insert calculated data
INSERT INTO daily_sales_summary (date, total_sales, order_count)
SELECT
    DATE(order_date),
    SUM(total),
    COUNT(*)
FROM orders
WHERE DATE(order_date) = CURDATE()
GROUP BY DATE(order_date);

INSERT IGNORE and ON DUPLICATE KEY

-- Ignore duplicate key errors
INSERT IGNORE INTO tags (name) VALUES ('coffee'), ('tea'), ('coffee');

-- Update on duplicate
INSERT INTO inventory (product_id, quantity)
VALUES (1, 100)
ON DUPLICATE KEY UPDATE quantity = quantity + 100;

UPDATE: Modifying Data

Basic UPDATE

-- Update single column
UPDATE products
SET price = price * 1.10
WHERE category = 'coffee';

-- Update multiple columns
UPDATE customers
SET email = 'newemail@example.com',
    phone = '555-9999'
WHERE id = 1;

UPDATE with JOIN

-- Update based on another table
UPDATE orders o
JOIN customers c ON o.customer_id = c.id
SET o.discount = 0.10
WHERE c.membership_level = 'gold';

Safe Updates

-- Always use WHERE clause!
-- This updates ALL rows:
UPDATE products SET price = 0;  -- Dangerous!

-- Enable safe updates mode
SET SQL_SAFE_UPDATES = 1;  -- Prevents UPDATE/DELETE without WHERE

DELETE: Removing Data

Basic DELETE

-- Delete specific rows
DELETE FROM customers WHERE id = 1;

-- Delete multiple rows
DELETE FROM orders WHERE order_date < '2020-01-01';

-- Delete all rows (but keeps table structure)
DELETE FROM temp_data;

TRUNCATE vs DELETE

-- DELETE: Row-by-row, can be rolled back, triggers fired
DELETE FROM large_table;

-- TRUNCATE: Faster, resets AUTO_INCREMENT, can't be rolled back
TRUNCATE TABLE large_table;

Data Types

Numeric Types

-- Integers
TINYINT     -- -128 to 127 (or 0 to 255 UNSIGNED)
SMALLINT    -- -32,768 to 32,767
MEDIUMINT   -- -8,388,608 to 8,388,607
INT         -- -2,147,483,648 to 2,147,483,647
BIGINT      -- -9,223,372,036,854,775,808 to ...

-- Decimals
DECIMAL(10,2)  -- Total 10 digits, 2 after decimal
FLOAT          -- Approximate, 4 bytes
DOUBLE         -- Approximate, 8 bytes

-- Boolean (alias for TINYINT(1))
BOOLEAN

String Types

-- Fixed/Variable length
CHAR(10)       -- Fixed 10 characters
VARCHAR(255)   -- Variable up to 255 characters

-- Text
TINYTEXT       -- Up to 255 bytes
TEXT           -- Up to 65,535 bytes
MEDIUMTEXT     -- Up to 16,777,215 bytes
LONGTEXT       -- Up to 4,294,967,295 bytes

-- Binary
BINARY(16)     -- Fixed binary
VARBINARY(255) -- Variable binary
BLOB           -- Binary large object

Date and Time

DATE           -- 'YYYY-MM-DD'
TIME           -- 'HH:MM:SS'
DATETIME       -- 'YYYY-MM-DD HH:MM:SS'
TIMESTAMP      -- Like DATETIME but timezone-aware
YEAR           -- Year in 2 or 4 digit format

Special Types

-- Enumeration (list of allowed values)
ENUM('small', 'medium', 'large')

-- Set (multiple values from list)
SET('read', 'write', 'execute')

-- JSON (MySQL 5.7+)
JSON

Working with Dates

-- Current date/time
SELECT NOW();                    -- Current datetime
SELECT CURDATE();                -- Current date
SELECT CURTIME();                -- Current time

-- Date extraction
SELECT
    DATE(created_at) AS date_only,
    YEAR(created_at) AS year,
    MONTH(created_at) AS month,
    DAY(created_at) AS day,
    HOUR(created_at) AS hour,
    DAYNAME(created_at) AS day_name,
    MONTHNAME(created_at) AS month_name
FROM orders;

-- Date arithmetic
SELECT
    DATE_ADD(NOW(), INTERVAL 1 DAY) AS tomorrow,
    DATE_SUB(NOW(), INTERVAL 1 WEEK) AS last_week,
    DATEDIFF(NOW(), '2024-01-01') AS days_since,
    TIMESTAMPDIFF(HOUR, '2024-01-01', NOW()) AS hours_since;

-- Formatting dates
SELECT DATE_FORMAT(NOW(), '%Y-%m-%d %H:%i:%s');  -- 2024-01-15 14:30:45
SELECT DATE_FORMAT(NOW(), '%M %d, %Y');          -- January 15, 2024

String Functions

-- Concatenation
SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM customers;
SELECT CONCAT_WS(' - ', category, name) FROM products;  -- With separator

-- Case manipulation
SELECT UPPER(name), LOWER(email) FROM customers;

-- Substring
SELECT SUBSTRING(description, 1, 50) AS preview FROM products;
SELECT LEFT(phone, 3) AS area_code FROM customers;
SELECT RIGHT(email, 10) FROM customers;

-- Trimming
SELECT TRIM('  hello  ');                    -- Remove spaces
SELECT TRIM(LEADING '0' FROM '000123');      -- Remove leading zeros

-- Searching
SELECT LOCATE('@', email) FROM customers;     -- Position of @
SELECT REPLACE(phone, '-', '') FROM customers;  -- Remove dashes

-- Length
SELECT LENGTH(description) AS bytes,
       CHAR_LENGTH(description) AS characters
FROM products;

Numeric Functions

-- Rounding
SELECT ROUND(price, 1) FROM products;         -- Round to 1 decimal
SELECT CEILING(price), FLOOR(price) FROM products;

-- Mathematical
SELECT ABS(-10), SQRT(16), POW(2, 8);
SELECT MOD(10, 3);                            -- Modulo (remainder)

-- Random
SELECT RAND();                                -- Random 0-1
SELECT FLOOR(RAND() * 100);                  -- Random 0-99

-- Aggregates with math
SELECT
    SUM(quantity * price) AS total,
    ROUND(AVG(price), 2) AS avg_price
FROM order_items;

CASE Expressions

-- Simple CASE
SELECT name,
    CASE category
        WHEN 'coffee' THEN 'Beverage'
        WHEN 'tea' THEN 'Beverage'
        WHEN 'pastry' THEN 'Food'
        ELSE 'Other'
    END AS type
FROM products;

-- Searched CASE
SELECT name, price,
    CASE
        WHEN price < 3 THEN 'Budget'
        WHEN price < 5 THEN 'Standard'
        WHEN price < 8 THEN 'Premium'
        ELSE 'Luxury'
    END AS price_tier
FROM products;

-- CASE in WHERE
SELECT * FROM orders
WHERE
    CASE
        WHEN DAYOFWEEK(order_date) IN (1, 7) THEN total > 50
        ELSE total > 30
    END;

-- CASE in ORDER BY
SELECT * FROM products
ORDER BY
    CASE
        WHEN category = 'coffee' THEN 1
        WHEN category = 'tea' THEN 2
        ELSE 3
    END, price;

NULL Handling

-- COALESCE: First non-NULL value
SELECT COALESCE(phone, email, 'No contact') AS contact FROM customers;

-- IFNULL: Two-argument version
SELECT IFNULL(phone, 'No phone') FROM customers;

-- NULLIF: Returns NULL if arguments equal
SELECT NULLIF(quantity, 0) AS qty FROM inventory;  -- Avoids division by zero

-- IS NULL in CASE
SELECT
    CASE
        WHEN phone IS NULL THEN 'No phone'
        ELSE phone
    END AS phone_display
FROM customers;

Common Patterns and Best Practices

Finding Duplicates

-- Find duplicate emails
SELECT email, COUNT(*) AS count
FROM customers
GROUP BY email
HAVING COUNT(*) > 1;

-- Find all duplicate rows
SELECT c1.*
FROM customers c1
JOIN customers c2 ON c1.email = c2.email
WHERE c1.id != c2.id;

Pagination

-- Page 1 (items 1-10)
SELECT * FROM products ORDER BY name LIMIT 10 OFFSET 0;

-- Page 2 (items 11-20)
SELECT * FROM products ORDER BY name LIMIT 10 OFFSET 10;

-- With total count
SELECT SQL_CALC_FOUND_ROWS * FROM products LIMIT 10;
SELECT FOUND_ROWS() AS total_count;

Running Totals

-- Running total with variables (MySQL < 8.0)
SELECT
    order_date,
    amount,
    @running_total := @running_total + amount AS cumulative
FROM orders, (SELECT @running_total := 0) AS init
ORDER BY order_date;

Exercises

  1. Basic Queries: Write queries to:

    • Find all products under $5
    • List customers who signed up this year
    • Show products sorted by price (highest first)
  2. Aggregation: Calculate:

    • Total revenue by product category
    • Average order value by month
    • Number of customers by city
  3. Complex Filters: Find:

    • Customers with Gmail or Yahoo email addresses
    • Products between $3-$7 in coffee or tea categories
    • Orders placed on weekends
  4. Data Manipulation:

    • Increase all coffee prices by 10%
    • Delete orders older than 5 years
    • Insert a new product with all fields

Summary

SQL is powerful yet approachable. You've learned:

  • SELECT queries with filtering, sorting, and limiting
  • Aggregate functions and grouping
  • INSERT, UPDATE, and DELETE operations
  • Data types and their uses
  • String, date, and numeric functions
  • CASE expressions and NULL handling

Next chapter, we'll explore database design principles – how to structure your tables for efficiency and maintainability.


Next: Chapter 3: Database Design Principles