Chapter 2: SQL Fundamentals - The Language of Databases
September 28, 2025 · View on GitHub
Understanding SQL
SQL (Structured Query Language) is the standard language for interacting with relational databases. It's declarative – you describe what you want, not how to get it. This distinction is crucial: while programming languages require you to specify steps, SQL lets the database optimize the execution.
SQL consists of several sublanguages:
- DDL (Data Definition Language): CREATE, ALTER, DROP
- DML (Data Manipulation Language): SELECT, INSERT, UPDATE, DELETE
- DCL (Data Control Language): GRANT, REVOKE
- TCL (Transaction Control Language): COMMIT, ROLLBACK, SAVEPOINT
SELECT: The Heart of SQL
The SELECT statement retrieves data. Its basic structure:
SELECT column1, column2
FROM table_name
WHERE condition
ORDER BY column;
Column Selection
Select specific columns:
SELECT first_name, last_name FROM customers;
Select all columns (use sparingly in production):
SELECT * FROM customers;
Use aliases for readability:
SELECT
first_name AS fname,
last_name AS lname,
CONCAT(first_name, ' ', last_name) AS full_name
FROM customers;
The WHERE Clause
WHERE filters rows based on conditions:
-- Equality
SELECT * FROM products WHERE category = 'coffee';
-- Inequality
SELECT * FROM products WHERE price != 10.00;
SELECT * FROM products WHERE price <> 10.00; -- Alternative syntax
-- Comparison
SELECT * FROM products WHERE price > 5.00;
SELECT * FROM products WHERE price <= 10.00;
-- Range
SELECT * FROM products WHERE price BETWEEN 3.00 AND 8.00;
-- List membership
SELECT * FROM products WHERE category IN ('coffee', 'tea');
-- Pattern matching
SELECT * FROM customers WHERE email LIKE '%gmail.com';
SELECT * FROM customers WHERE last_name LIKE 'Sm_th'; -- _ matches single character
-- NULL checking
SELECT * FROM customers WHERE phone IS NULL;
SELECT * FROM customers WHERE phone IS NOT NULL;
Logical Operators
Combine conditions with AND, OR, NOT:
-- AND: All conditions must be true
SELECT * FROM products
WHERE category = 'coffee' AND price < 5.00;
-- OR: At least one condition must be true
SELECT * FROM products
WHERE category = 'coffee' OR category = 'tea';
-- NOT: Negates a condition
SELECT * FROM products
WHERE NOT category = 'pastry';
-- Complex combinations (use parentheses for clarity)
SELECT * FROM products
WHERE (category = 'coffee' OR category = 'tea')
AND price < 3.00;
Sorting with ORDER BY
-- Ascending (default)
SELECT * FROM customers ORDER BY last_name;
SELECT * FROM customers ORDER BY last_name ASC;
-- Descending
SELECT * FROM products ORDER BY price DESC;
-- Multiple columns
SELECT * FROM customers
ORDER BY last_name, first_name;
-- Sort by expression
SELECT *, (price * 1.1) AS price_with_tax
FROM products
ORDER BY price_with_tax DESC;
Limiting Results
-- First 5 rows
SELECT * FROM products LIMIT 5;
-- Skip 10, take 5
SELECT * FROM products LIMIT 5 OFFSET 10;
-- Alternative syntax
SELECT * FROM products LIMIT 10, 5; -- Skip 10, take 5
Aggregate Functions
Aggregate functions compute single values from multiple rows:
-- Count rows
SELECT COUNT(*) FROM customers;
SELECT COUNT(phone) FROM customers; -- Counts non-NULL values
SELECT COUNT(DISTINCT city) FROM customers;
-- Sum values
SELECT SUM(price) FROM products;
SELECT SUM(quantity * price) AS total FROM order_items;
-- Average
SELECT AVG(price) FROM products;
SELECT AVG(price) FROM products WHERE category = 'coffee';
-- Min/Max
SELECT MIN(price) AS cheapest, MAX(price) AS most_expensive FROM products;
-- Standard deviation and variance
SELECT STD(price), VARIANCE(price) FROM products;
GROUP BY: Organizing Results
GROUP BY creates groups for aggregate functions:
-- Count by category
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category;
-- Average price per category
SELECT category, AVG(price) AS avg_price
FROM products
GROUP BY category
ORDER BY avg_price DESC;
-- Multiple grouping columns
SELECT category, available, COUNT(*) AS count
FROM products
GROUP BY category, available;
-- Group by expression
SELECT
YEAR(created_at) AS year,
MONTH(created_at) AS month,
COUNT(*) AS signups
FROM customers
GROUP BY YEAR(created_at), MONTH(created_at);
HAVING: Filtering Groups
WHERE filters rows before grouping, HAVING filters after:
-- Categories with more than 3 products
SELECT category, COUNT(*) AS count
FROM products
GROUP BY category
HAVING COUNT(*) > 3;
-- Categories with average price over \$5
SELECT category, AVG(price) AS avg_price
FROM products
GROUP BY category
HAVING AVG(price) > 5.00;
-- Combine WHERE and HAVING
SELECT category, COUNT(*) AS available_count
FROM products
WHERE available = TRUE
GROUP BY category
HAVING COUNT(*) >= 2;
INSERT: Adding Data
Basic INSERT
-- Single row
INSERT INTO customers (first_name, last_name, email)
VALUES ('Sarah', 'Connor', 'sarah@example.com');
-- Multiple rows
INSERT INTO products (name, category, price) VALUES
('Mocha', 'coffee', 4.50),
('Hot Chocolate', 'other', 3.00),
('Chai Latte', 'tea', 3.75);
INSERT with SELECT
-- Copy data from another table
INSERT INTO customers_archive
SELECT * FROM customers
WHERE created_at < '2023-01-01';
-- Insert calculated data
INSERT INTO daily_sales_summary (date, total_sales, order_count)
SELECT
DATE(order_date),
SUM(total),
COUNT(*)
FROM orders
WHERE DATE(order_date) = CURDATE()
GROUP BY DATE(order_date);
INSERT IGNORE and ON DUPLICATE KEY
-- Ignore duplicate key errors
INSERT IGNORE INTO tags (name) VALUES ('coffee'), ('tea'), ('coffee');
-- Update on duplicate
INSERT INTO inventory (product_id, quantity)
VALUES (1, 100)
ON DUPLICATE KEY UPDATE quantity = quantity + 100;
UPDATE: Modifying Data
Basic UPDATE
-- Update single column
UPDATE products
SET price = price * 1.10
WHERE category = 'coffee';
-- Update multiple columns
UPDATE customers
SET email = 'newemail@example.com',
phone = '555-9999'
WHERE id = 1;
UPDATE with JOIN
-- Update based on another table
UPDATE orders o
JOIN customers c ON o.customer_id = c.id
SET o.discount = 0.10
WHERE c.membership_level = 'gold';
Safe Updates
-- Always use WHERE clause!
-- This updates ALL rows:
UPDATE products SET price = 0; -- Dangerous!
-- Enable safe updates mode
SET SQL_SAFE_UPDATES = 1; -- Prevents UPDATE/DELETE without WHERE
DELETE: Removing Data
Basic DELETE
-- Delete specific rows
DELETE FROM customers WHERE id = 1;
-- Delete multiple rows
DELETE FROM orders WHERE order_date < '2020-01-01';
-- Delete all rows (but keeps table structure)
DELETE FROM temp_data;
TRUNCATE vs DELETE
-- DELETE: Row-by-row, can be rolled back, triggers fired
DELETE FROM large_table;
-- TRUNCATE: Faster, resets AUTO_INCREMENT, can't be rolled back
TRUNCATE TABLE large_table;
Data Types
Numeric Types
-- Integers
TINYINT -- -128 to 127 (or 0 to 255 UNSIGNED)
SMALLINT -- -32,768 to 32,767
MEDIUMINT -- -8,388,608 to 8,388,607
INT -- -2,147,483,648 to 2,147,483,647
BIGINT -- -9,223,372,036,854,775,808 to ...
-- Decimals
DECIMAL(10,2) -- Total 10 digits, 2 after decimal
FLOAT -- Approximate, 4 bytes
DOUBLE -- Approximate, 8 bytes
-- Boolean (alias for TINYINT(1))
BOOLEAN
String Types
-- Fixed/Variable length
CHAR(10) -- Fixed 10 characters
VARCHAR(255) -- Variable up to 255 characters
-- Text
TINYTEXT -- Up to 255 bytes
TEXT -- Up to 65,535 bytes
MEDIUMTEXT -- Up to 16,777,215 bytes
LONGTEXT -- Up to 4,294,967,295 bytes
-- Binary
BINARY(16) -- Fixed binary
VARBINARY(255) -- Variable binary
BLOB -- Binary large object
Date and Time
DATE -- 'YYYY-MM-DD'
TIME -- 'HH:MM:SS'
DATETIME -- 'YYYY-MM-DD HH:MM:SS'
TIMESTAMP -- Like DATETIME but timezone-aware
YEAR -- Year in 2 or 4 digit format
Special Types
-- Enumeration (list of allowed values)
ENUM('small', 'medium', 'large')
-- Set (multiple values from list)
SET('read', 'write', 'execute')
-- JSON (MySQL 5.7+)
JSON
Working with Dates
-- Current date/time
SELECT NOW(); -- Current datetime
SELECT CURDATE(); -- Current date
SELECT CURTIME(); -- Current time
-- Date extraction
SELECT
DATE(created_at) AS date_only,
YEAR(created_at) AS year,
MONTH(created_at) AS month,
DAY(created_at) AS day,
HOUR(created_at) AS hour,
DAYNAME(created_at) AS day_name,
MONTHNAME(created_at) AS month_name
FROM orders;
-- Date arithmetic
SELECT
DATE_ADD(NOW(), INTERVAL 1 DAY) AS tomorrow,
DATE_SUB(NOW(), INTERVAL 1 WEEK) AS last_week,
DATEDIFF(NOW(), '2024-01-01') AS days_since,
TIMESTAMPDIFF(HOUR, '2024-01-01', NOW()) AS hours_since;
-- Formatting dates
SELECT DATE_FORMAT(NOW(), '%Y-%m-%d %H:%i:%s'); -- 2024-01-15 14:30:45
SELECT DATE_FORMAT(NOW(), '%M %d, %Y'); -- January 15, 2024
String Functions
-- Concatenation
SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM customers;
SELECT CONCAT_WS(' - ', category, name) FROM products; -- With separator
-- Case manipulation
SELECT UPPER(name), LOWER(email) FROM customers;
-- Substring
SELECT SUBSTRING(description, 1, 50) AS preview FROM products;
SELECT LEFT(phone, 3) AS area_code FROM customers;
SELECT RIGHT(email, 10) FROM customers;
-- Trimming
SELECT TRIM(' hello '); -- Remove spaces
SELECT TRIM(LEADING '0' FROM '000123'); -- Remove leading zeros
-- Searching
SELECT LOCATE('@', email) FROM customers; -- Position of @
SELECT REPLACE(phone, '-', '') FROM customers; -- Remove dashes
-- Length
SELECT LENGTH(description) AS bytes,
CHAR_LENGTH(description) AS characters
FROM products;
Numeric Functions
-- Rounding
SELECT ROUND(price, 1) FROM products; -- Round to 1 decimal
SELECT CEILING(price), FLOOR(price) FROM products;
-- Mathematical
SELECT ABS(-10), SQRT(16), POW(2, 8);
SELECT MOD(10, 3); -- Modulo (remainder)
-- Random
SELECT RAND(); -- Random 0-1
SELECT FLOOR(RAND() * 100); -- Random 0-99
-- Aggregates with math
SELECT
SUM(quantity * price) AS total,
ROUND(AVG(price), 2) AS avg_price
FROM order_items;
CASE Expressions
-- Simple CASE
SELECT name,
CASE category
WHEN 'coffee' THEN 'Beverage'
WHEN 'tea' THEN 'Beverage'
WHEN 'pastry' THEN 'Food'
ELSE 'Other'
END AS type
FROM products;
-- Searched CASE
SELECT name, price,
CASE
WHEN price < 3 THEN 'Budget'
WHEN price < 5 THEN 'Standard'
WHEN price < 8 THEN 'Premium'
ELSE 'Luxury'
END AS price_tier
FROM products;
-- CASE in WHERE
SELECT * FROM orders
WHERE
CASE
WHEN DAYOFWEEK(order_date) IN (1, 7) THEN total > 50
ELSE total > 30
END;
-- CASE in ORDER BY
SELECT * FROM products
ORDER BY
CASE
WHEN category = 'coffee' THEN 1
WHEN category = 'tea' THEN 2
ELSE 3
END, price;
NULL Handling
-- COALESCE: First non-NULL value
SELECT COALESCE(phone, email, 'No contact') AS contact FROM customers;
-- IFNULL: Two-argument version
SELECT IFNULL(phone, 'No phone') FROM customers;
-- NULLIF: Returns NULL if arguments equal
SELECT NULLIF(quantity, 0) AS qty FROM inventory; -- Avoids division by zero
-- IS NULL in CASE
SELECT
CASE
WHEN phone IS NULL THEN 'No phone'
ELSE phone
END AS phone_display
FROM customers;
Common Patterns and Best Practices
Finding Duplicates
-- Find duplicate emails
SELECT email, COUNT(*) AS count
FROM customers
GROUP BY email
HAVING COUNT(*) > 1;
-- Find all duplicate rows
SELECT c1.*
FROM customers c1
JOIN customers c2 ON c1.email = c2.email
WHERE c1.id != c2.id;
Pagination
-- Page 1 (items 1-10)
SELECT * FROM products ORDER BY name LIMIT 10 OFFSET 0;
-- Page 2 (items 11-20)
SELECT * FROM products ORDER BY name LIMIT 10 OFFSET 10;
-- With total count
SELECT SQL_CALC_FOUND_ROWS * FROM products LIMIT 10;
SELECT FOUND_ROWS() AS total_count;
Running Totals
-- Running total with variables (MySQL < 8.0)
SELECT
order_date,
amount,
@running_total := @running_total + amount AS cumulative
FROM orders, (SELECT @running_total := 0) AS init
ORDER BY order_date;
Exercises
-
Basic Queries: Write queries to:
- Find all products under $5
- List customers who signed up this year
- Show products sorted by price (highest first)
-
Aggregation: Calculate:
- Total revenue by product category
- Average order value by month
- Number of customers by city
-
Complex Filters: Find:
- Customers with Gmail or Yahoo email addresses
- Products between $3-$7 in coffee or tea categories
- Orders placed on weekends
-
Data Manipulation:
- Increase all coffee prices by 10%
- Delete orders older than 5 years
- Insert a new product with all fields
Summary
SQL is powerful yet approachable. You've learned:
- SELECT queries with filtering, sorting, and limiting
- Aggregate functions and grouping
- INSERT, UPDATE, and DELETE operations
- Data types and their uses
- String, date, and numeric functions
- CASE expressions and NULL handling
Next chapter, we'll explore database design principles – how to structure your tables for efficiency and maintainability.