Linear Regression
March 21, 2026 · View on GitHub
Overview & Motivation
Given a set of observations , we often want to find the best-fit linear relationship between features and target . Linear regression does this by finding the coefficients that minimize the sum of squared residuals — the gap between predicted and observed values.
The approach is fundamental because:
- It has a closed-form solution (the normal equation), so no iterative optimization is needed.
- It provides a baseline model against which more complex methods are measured.
- The solution is the maximum likelihood estimator under Gaussian noise assumptions.
This library solves the normal equation directly using Gaussian elimination, making it suitable for small-to-moderate feature counts typical in embedded estimation tasks.
Mathematical Theory
The Model
In matrix form, augmenting with a column of ones for the intercept:
where , , .
Normal Equation
Minimizing with respect to yields:
This is solved in practice by forming and , then using Gaussian elimination to solve the system.
Geometric Interpretation
The prediction is the orthogonal projection of onto the column space of . The residual is perpendicular to every column of .
Complexity Analysis
| Phase | Time | Space | Notes |
|---|---|---|---|
| Dominated by matrix multiply | |||
| Matrix-vector multiply | |||
| Solve | Gaussian elimination | ||
| Predict | Dot product |
For embedded use with small (say ), the entire fit completes in microseconds.
Step-by-Step Walkthrough
Data: 3 samples, 1 feature.
| 1 | 2 |
| 2 | 4 |
| 3 | 5 |
Step 1 — Design matrix (with bias column):
Step 2 — Compute and :
Step 3 — Solve :
Forward elimination → back-substitution: ,
Result:
Prediction at :
Pitfalls & Edge Cases
- Multicollinearity. If features are nearly linearly dependent, is ill-conditioned and the solution is unstable. Consider regularization (Ridge/Lasso) or removing redundant features.
- Fewer samples than features (). The system is underdetermined and is singular. At a minimum, .
- Outliers have outsized influence because the squared loss amplifies large residuals. Robust alternatives (Huber loss, RANSAC) exist outside this library.
- Extrapolation danger. The linear model has no mechanism to detect when it is being queried far from the training data range.
- Fixed-point precision. For Q15/Q31 types, features should be scaled to to avoid overflow in .
Variants & Generalizations
| Variant | Key Difference |
|---|---|
| Ridge regression (L2) | Adds to the cost; solves |
| Lasso regression (L1) | Adds ; promotes sparsity but requires iterative optimization |
| Polynomial regression | Adds powers of as new features; still linear in parameters |
| Weighted least squares | Weights each sample differently; solves |
| Recursive least squares | Updates incrementally as new data arrives; suited for online estimation |
Applications
- Sensor calibration — Fitting a linear transfer function between raw ADC counts and physical units.
- Trend estimation — Extracting linear trends from noisy time series (temperature, voltage drift).
- System identification — Estimating static gain or simple dynamic relationships.
- Feature importance — The magnitude of indicates the influence of feature (after feature scaling).
- Predictive maintenance — Modeling degradation rate from operating-condition features.
Connections to Other Algorithms
graph LR
LR["Linear Regression"]
GE["Gaussian Elimination"]
YW["Yule-Walker"]
NN["Neural Network"]
LR --> GE
YW -.->|"similar normal-equation structure"| LR
NN -.->|"single linear layer = regression"| LR
| Algorithm | Relationship |
|---|---|
| Gaussian Elimination | Used to solve the normal equation system |
| Yule-Walker | Structurally similar — also solves a linear system derived from correlations |
| Neural Network | A single-layer neural network with no activation and MSE loss reduces to linear regression |
References & Further Reading
- Hastie, T., Tibshirani, R. and Friedman, J., The Elements of Statistical Learning, 2nd ed., Springer, 2009 — Chapter 3.
- Bishop, C.M., Pattern Recognition and Machine Learning, Springer, 2006 — Chapter 3.
- Strang, G., Linear Algebra and Its Applications, 4th ed., Thomson, 2006 — Section 4.3.