README.md

May 12, 2025 · View on GitHub

MIT 18.06, Spring 2025
Linear Algebra

Welcome to MIT 18.06: Linear Algebra! The Spring 2025 course information, materials, and links are recorded below. Course materials for previous semesters are archived in the other branches of this repository. You can dive in right away by reading this introduction to the course by Professor Strang.

Catalog Description: Basic subject on matrix theory and linear algebra, emphasizing topics useful in other disciplines, including systems of equations, vector spaces, determinants, eigenvalues, singular value decomposition, and positive definite matrices. Applications to least-squares approximations, stability of differential equations, networks, Fourier transforms, and Markov processes. Uses linear algebra software. Compared with 18.700, more emphasis on matrix algorithms and many applications.

Instructor: Prof. Nike Sun

Textbook: Introduction to Linear Algebra: 6th Edition.

Detailed lecture notes are posted on Canvas (accessible only to registered students).

Lecture Material and Summaries

Lecture 1 (Mon Feb 3 2025)

  • Vectors in R2\mathbb{R}^2, and generalization to vectors in RN\mathbb{R}^N (N-dimensional space).
  • Vector operations: addition and scalar multiplication. Both operations together: linear combinations.
  • The span of a set of vectors {u1,,uk}\lbrace u_1,\ldots,u_k\rbrace is the set of all linear combinations of these vectors: we covered some examples in class.
  • Definition of matrix times vector: AxAx where AA is an M×NM \times N matrix, and xx is in RN\mathbb{R}^N.

Reading: Strang Chapter 1.

Lecture 2 (Wed Feb 5 2025)

  • Dot product, vector length, cosine formula.
  • The gaussian elimination algorithm for solving a system of linear equations Ax=b: reduce the matrix to row echelon form (REF), and do back-substitution. Worked through an example in class.
  • Definition of matrix times matrix: AB=XAB=X where AA is M×NM \times N, BB is N×PN \times P, XX is M×PM \times P.* We explained how to view the gaussian elimination operations as matrix multiplication operations: the steps of gaussian elimination correspond to changing Ax=bAx=b to (G1)Ax=(G1)b(G1)Ax=(G1)b, (G2)(G1)Ax=(G2)(G1)b(G2)(G1)Ax=(G2)(G1)b, etc.

Reading: Strang 2.1-2.2.

Lecture 3 (Fri Feb 7 2025)

  • Reviewed the gaussian elimination example using matrix multiplications to encode the operations.
  • Gauss-Jordan elimination has a few additional steps which brings the system to reduced row echelon form (RREF) — we did this in the same example, again using matrix multiplications.
  • In the example, the final RREF system was (G5)(G4)(G3)(G2)(G1)Ax=(G5)(G4)(G3)(G2)(G1)b=c(G5)(G4)(G3)(G2)(G1)Ax=(G5)(G4)(G3)(G2)(G1)b=c. Moreover we found (G5)(G4)(G3)(G2)(G1)A=I3(G5)(G4)(G3)(G2)(G1)A = I_3, the $3 \times 3identitymatrix.Inthisexampleitallowedustoreadoffidentity matrix. In this example it allowed us to read offx = c$.
  • We reviewed basic rules of matrix multiplication: associative A(BC)=(AB)CA(BC)=(AB)C, distributive A(B+C)=AB+ACA(B+C)=AB+AC, but not commutative: ABAB and BABA are generally not equal!
  • Inversion: if AA is an n×nn \times n matrix, it is invertible if there exists a matrix A1A^{-1}, also n×nn \times n, such that AA1=A1A=InAA^{-1} = A^{-1}A = I_n, the n×nn \times n identity matrix.
  • If AA is invertible, then Gauss-Jordan elimination converts (Ab)(A|b) to (Ic)(I|c). Moreover it converts (AI)(A|I) to (IA1)(I|A^{-1}).
  • Square matrices need not be invertible, and we covered some examples.

Reading: Strang 2.3.

Lecture 4 (Mon Feb 10 2025)

  • The columns of AA are linearly dependent if a non-trivial linear combination of the columns is zero: can write this as Ax=0Ax=0 for xx nonzero. If AA is a square matrix with linearly dependent columns, then AA cannot be invertible. We covered some examples.
  • We defined the column space C(A)C(A) of a matrix. An m×nm \times n matrix AA can be viewed as a function/map from Rn\mathbb{R}^n to Rm\mathbb{R}^m, sending input xx in Rn\mathbb{R}^n to output AxAx in Rm\mathbb{R}^m. The column space C(A)C(A) is exactly the image of this map. The equation Ax=bAx=b is solvable if and only if bb lies in C(A)C(A).
  • We defined general vector spaces and subspaces, and covered some examples.

Reading: Strang 3.1.

Lecture 5 (Wed Feb 12 2025)

  • Defined the null space N(A)N(A) as the set of vectors xx such that Ax=0Ax=0.
  • Note that if AA is m×nm \times n, then C(A)C(A) is a subspace of Rn\mathbb{R}^n, while N(A)N(A) is a subspace of Rm\mathbb{R}^m.
  • Invertible row operations (such as in Gauss-Jordan elimination) do not affect the null space, so if RR is the RREF of AA, then N(A)=N(R)N(A) = N(R).
  • We covered several examples of calculating N(A)N(A). We also noted that in all our examples, dim C(A) + dim N(A) = n.

Reading: Strang 3.2.

Lecture 6 (Fri Feb 14 2025)

  • In this class we covered the general solution for a system of linear equations Ax=b.
  • The basic principle: if bb is not in C(A)C(A), then there is no solution. If bb is in C(A)C(A), then there must exist at least one "particular solution," call it x0x_0. Then the set of all solutions to Ax=bAx=b is the set of vectors x0+xx_0 + x', where x0x_0 is the particular solution, and xx' is any vector from the null space N(A)N(A).
  • General recipe for solving:
    • given (Ab)(A|b), apply Gauss-Jordan elimination to transform to RREF system (Rc)(R|c).
    • If the RREF system contains a row that says 0 = nonzero, then we have a contradiction, and in this case bb is not in C(A)C(A) and there is no solution.
    • Otherwise, set the free variables to zero to find a particular solution x0x_0.
    • Separately, solve for the null space N(A)N(A).
    • Then the set of all solutions to Ax=bAx=b is the set of vectors x0+xx_0 + x', where x0x_0 is the particular solution, and xx' is any vector from N(A)N(A).

Reading: Strang 3.3.

Lecture 7 (Mon Feb 18 2025)

  • Throughout this class, we let v1,,vnv^1, \ldots, v^n be list of n vectors, each in the space Rm\mathbb{R}^m. Let AA be the m×nm \times n matrix with columns v1,,vnv^1, \ldots, v^n.
    • The vectors {v1,...,vn}\lbrace v^1, ..., v^n\rbrace are linearly dependent if a non-trivial linear combination of them equals zero: this corresponds to N(A)N(A) being strictly larger than {0}\lbrace 0\rbrace. Otherwise, we say they are linearly independent: this corresponds to N(A)={0}N(A) = \lbrace 0\rbrace.
    • A basis for a vector space VV is a list of vectors that span VV, and are linearly independent. We covered the standard basis {e1,...,en}\lbrace e^1, ..., e^n\rbrace for the space Rn\mathbb{R}^n.
    • Let V=span{v1,...,vn}V = \text{span} \lbrace v^1, ..., v^n\rbrace. Then VV is the same as C(A)C(A). If {v1,...,vn}\lbrace v^1, ..., v^n\rbrace are linearly independent, then they form a basis for VV.
  • More generally, perform Gauss-Jordan elimination, and let R=GAR = GA be the RREF of AA. Then C(R)=GC(A)C(R) = G C(A).
    • The pivot columns of RR form a basis for C(R)C(R), and the corresponding columns of AA form a basis for C(A)C(A).
    • Note that rank(A) = # pivots in R = dim C(R) = dim C(A). Meanwhile # free variables in R = dim N(A).
    • There are n columns total, and each column must be pivot or free, so n = # pivot + # free = dim C(A) + dim N(A): this is the rank-nullity theorem.
  • Lastly, we reviewed that if AA is an m×nm \times n matrix, then we view it as a map from Rn\mathbb{R}^n to Rm\mathbb{R}^m, sending xx in Rn\mathbb{R}^n to AxAx in Rm\mathbb{R}^m.

Reading: Strang 3.4.

Note: You should be able to do all of exam 1 using only the information covered in Lectures 1–6, together with the homework, recitations, and reading assigned before exam 1. However, since all the topics of the class are closely connected, material from Lecture 7 might also be helpful to know for the exam, and you are free to use that material on the exam as well. All exams in this class are closed book, closed notes.


Exam 1 (Wed Feb 19 2025)


Lecture 8 (Fri Feb 21 2025)

  • We started the lecture with the definition of the matrix transpose AtA^t.
    • Note in general (At)t=A(A^t)^t=A, and (AB)t=BtAt(AB)^t = B^t A^t.
    • If A=AtA=A^t, then we say that AA is symmetric. Only square matrices can be symmetric.
  • We covered the four fundamental subspaces of a matrix, and how to calculate them. Throughout, let A be an m×nm \times n matrix, and let R=GAR = GA be the RREF. Thus GG is an invertible m×mm \times m matrix that encodes the Gauss-Jordan row operations.
    • Column space C(A)=G1C(R)C(A) = G^{-1} C(R). This is a subspace of Rm\mathbb{R}^m.
    • Null space N(A)=N(R)N(A) = N(R). This is a subspace of Rn\mathbb{R}^n.
    • Row space C(At)=C(Rt)C(A^t) = C(\mathbb{R}^t). This is a subspace of Rn\mathbb{R}^n.
    • Left null space N(At)=GtN(Rt)N(A^t) = G^t N(\mathbb{R}^t). This is a subspace of Rm\mathbb{R}^m.

Formal reasoning for the above claims:

  1. Column space: C(A)=Ax:xRnC(A) = {Ax : x \in \mathbb{R}^n} and C(R)=GAx:xRnC(R) = {GAx : x \in \mathbb{R}^n}. Thus bC(R)b=GAx for some xG1b=Ax for some xG1bC(A)b' \in C(R) \Leftrightarrow b' = GAx \text{ for some } x \Leftrightarrow G^{-1}b' = Ax \text{ for some } x \Leftrightarrow G^{-1}b' \in C(A). This proves C(A)=G1C(R)C(A) = G^{-1} C(R).
  2. Null space: N(A)={x:Ax=0}N(A) = \lbrace x : Ax = 0\rbrace and N(R)={x:GAx=0}N(R) = \lbrace x : GAx = 0\rbrace. Since GG invertible, Ax=0GAx=0Ax = 0 \Leftrightarrow GAx = 0. This proves N(A)=N(R)N(A) = N(R).
  3. Row space: recall Rt=(GA)t=AtGt\mathbb{R}^t = (GA)^t = A^t G^t. C(At)={Atx:xRm}C(A^t) = \lbrace A^t x : x \in \mathbb{R}^m\rbrace and C(Rt)={AtGtx:xRm}C(\mathbb{R}^t) = \lbrace A^t G^t x : x \in \mathbb{R}^m\rbrace. Since GG is invertible, GtG^t is also invertible. As xx ranges over all of Rm\mathbb{R}^m, GtxG^t x also ranges over all of Rm\mathbb{R}^m. Therefore C(At)=C(Rt)C(A^t) = C(\mathbb{R}^t).
  4. Left null space: (There was a typo on the blackboard, so please read this one carefully.) N(At)={x:Atx=0}N(A^t) = \lbrace x : A^t x = 0\rbrace and N(Rt)={x:AtGtx=0}N(\mathbb{R}^t) = \lbrace x : A^t G^t x = 0\rbrace. Therefore xN(Rt)AtGtx=0GtxN(At)x \in N(\mathbb{R}^t) \Leftrightarrow A^t G^t x = 0 \Leftrightarrow G^t x \in N(A^t). This proves N(At)=GtN(Rt)N(A^t) = G^t N(\mathbb{R}^t).

In class, we calculated the four fundamental subspaces on a small example. We verified that the column space and left null space are orthogonal subspaces of Rm\mathbb{R}^m, while the row space and null space are orthogonal subspace of Rn\mathbb{R}^n.

Reading: Strang 3.5.

Lecture 9 (Mon Feb 24 2025)

  • In this class we reviewed the four fundamental subspaces of an m×nm \times n matrix AA.
  • We went over an example of how to calculate the four subspaces of AA, using the RREF R=GAR = GA.
  • Dimensions: both column space and row space have dimension r = rank(A). The null space has dimension nrn - r, while the left null space has dimension mrm - r.
  • We covered the fact that in Rn\mathbb{R}^n, the row space and null space are orthogonal complements of one another. In Rm\mathbb{R}^m, the column space and left null space are orthogonal complements of one another.

Reading: Strang 4.1.

Lecture 10 (Wed Feb 26 2025)

  • We covered what it means for two subspaces of Rn\mathbb{R}^n to be:
    • complementary
    • orthogonal
    • orthogonal complements.
  • In particular:
    • If VV and WW are complementary subspaces of Rn\mathbb{R}^n, then any xRnx \in \mathbb{R}^n can be uniquely written as x=v+wx = v + w with vv from VV, ww from WW.
    • If VV and WW are in additional orthogonal complements, then vv is the orthogonal projection of xx onto VV, while ww is the orthogonal projection of xx onto WW. Denoted v=projVxv = \text{proj}_V x and w=projWxw = \text{proj}_W x.
  • We discussed the geometric interpretation of orthogonal projection:
    • v=projVxv = \text{proj}_V x is the unique vector vv in VV that lies closest to xx.
    • equivalently, v=projVxv = \text{proj}_V x is the unique vector vv in VV such that (xv)(x-v) is perpendicular to VV.
    • We used the latter characterization to calculate projYx\text{proj}Y x where YY is the span of a single nonzero vector yy in Rn\mathbb{R}^n.

Reading: Strang 4.2.

Lecture 11 (Fri Feb 28 2025)

  • We covered the general formulas for orthogonal projection.
  • Projection onto a one-dimensional subspace Y=span{y}Y = \text{span}\lbrace y\rbrace, where yy is a unit vector in Rn\mathbb{R}^n: projY(x)=PYx\text{proj}Y(x) = P_Y x where PY=yytP_Y = yy^t. Note that PYP_Y is an n×nn \times n symmetric matrix. Its column space is exactly the one-dimensional space YY, therefore PYP_Y has rank one.
  • Projection onto a general subspace VV of Rn\mathbb{R}^n, where dim V=r<n\text{dim } V = r < n: first express V=C(A)V = C(A) where Aisann×rA is an n \times r matrix whose columns form a basis of VV. We showed in class that v=projV(b)=PVbv = \text{proj}V(b) = P_V b where PV=A(AtA)1AtP_V = A(A^t A)^{-1} A^t. This is an n×nn \times n symmetric matrix. Its column space is exactly V=C(A)V = C(A), therefore PVP_V has rank rr.
    • Claim: If AA is n×rn \times r with rank rr, then AtAA^t A is invertible. We stated this fact in class, and used it to define PVP_V. We did not yet give a justification of this fact, and will do so in a future lecture.
    • Note that v=Axv = A x where x=(AtA)1Atbx = (A^t A)^{-1} A^t b. This achieves the minimum distance Axb\Vert Ax-b \Vert, and we call this the least squares solution.
  • Lastly we went over some examples of the projection matrix formula:
    • In the one-dimensional case Y=span{y}Y = \text{span}\lbrace y\rbrace where yy is a unit vector, we take A=yA = y and recover the formula PY=yytP_Y = yy^t.
    • If we have an orthonormal basis {u1,...,ur}\lbrace u^1, ..., u^r\rbrace for VV, then PV=P1+...+PrP_V = P_1 + ... + P_r where Pj=uj(uj)tP_j = u^j(u^j)^t is the orthogonal projection onto span{uj}\text{span}\lbrace u^j\rbrace.

Reading: Strang 4.3.

Lecture 12 (Mon March 3 2025)

  • As we learned previously, the equation Ax=bAx=b does not have a solution if b does not lie in column space C(A)C(A). In this case, one can instead ask for the least squares (LS) solution: the choice of x that minimizes
Axb2=i[(Ax)ibi]2\|Ax-b\|^2 = \sum_i [(Ax)_i - b_i]^2
  • This means v=Axv=Ax should be precisely the projection of xx onto C(A)C(A), so from what we previously learned, we see that v=A(AtA)1Atbv = A(A^t A)^{-1}A^t b, and consequently x=(AtA)1Atbx=(A^t A)^{-1}A^t b.
  • Application: given a data set (ai,bi)(a_i,b_i) for $1\le i \le 1000$, we covered how to find:
    • The straight line with no intercept that achieves the least squares fit: b=xab=xa where xx is the slope;
    • The straight line with intercept that achieves the least squares fit: b=x0+x1ab = x_0 + x_1 a where x0x_0 is the intercept and x1x_1 is the slope;
    • The cubic function that achieves the least squares fit: b=x0+x1a+x2a2+x3a3b = x_0 + x_1 a + x_2 a^2 + x_3 a^3.

Reading: Strang 4.3.

Lecture 13 (Wed March 5 2025)

  • We learned the Gram-Schmidt procedure: given a basis (v1,,vr)(v_1,\ldots,v_r) for a subspace VV of Rn\mathbb{R}^n, it produces an orthonormal basis (u1,,ur)(u_1,\ldots,u_r) of VV.
  • The Gram-Schmidt procedure can be summarized by the QR factorization: A=QRA=QR where:
    • AA is the n×rn\times r matrix with columns v1,,vrv_1,\ldots,v_r;
    • QQ is the n×rn\times r matrix with columns u1,,uru_1,\ldots,u_r;
    • RR is the r×rr\times r matrix of the coefficients relating the vv's to the uu's. In particular, RR is upper triangular with non-zero diagonal entries, and can be inverted by back-substitution.

Reading: Strang 4.4.

Lecture 14 (Fri March 7 2025)

  • Let AA be an r×nr\times n matrix of rank rr, with r<nr<n. This means that the column space C(A)=RrC(A)=\mathbb{R}^r: therefore, for any bRrb\in\mathbb{R}^r, the equation Ax=bAx=b has at least one solution x~\tilde{x}. We also know that the null space N(A)N(A) is a subspace of Rn\mathbb{R}^n of dimension nr>0n-r>0. It follows that in fact Ax=bAx=b has infinitely many solutions, since x~+x\tilde{x}+x' is also a solution for any xx' from N(A)N(A). We can therefore ask, what is the minimum norm solution xx? Any solution xx can be decomposed as x+xx^\parallel + x^\perp where xN(A)x^\parallel \in N(A) while xN(A)=C(A)x^\perp\in N(A)^\perp = C(A^\top) (the row space of AA). We discussed in class that the minimum norm solution to Ax=bAx=b is exactly xx^\perp. If we have a QR factorization A=QRA^\top=QR, then Ax=bAx=b can be rearranged to give x=QQx=Q(R)1bx^\perp = QQ^\top x = Q(R^\top)^{-1}b.
  • If AA is an m×nm\times n matrix, then its matrix pseudoinverse is the n×mn\times m matrix A+A^+ which does the following:
    • Given yRny\in\mathbb{R}^n, let bb be the orthogonal projection of yy onto the column space C(A)C(A).
    • Let xx^\perp be the minimum norm solution to the equation Ax=bAx=b. Then A+A^+ is the n×mn\times m matrix which acts as A+y=xA^+y=x^\perp.
  • Two examples of calculating the pseudoinverse:
    • If AA is r×nr\times n with rank rr, then the above calculation tells us that if we have the QR factorization A=QRA^\top=QR, then A+=Q(R)1=A(AA)1A^+=Q(R^\top)^{-1} = A^\top (AA^\top)^{-1}.
    • (Corrected; this previously had a typo!) If AA is n×rn\times r with rank rr, then the pseudoinverse should first map yy to its orthogonal projection onto C(A)C(A), that is, b=A(AA)1Ayb = A(A^\top A)^{-1} A^\top y, which lies in C(A)C(A). As a result Ax=bAx=b has a unique solution, given by x=(AA)1Ayx=(A^\top A)^{-1} A^\top y. It follows that A+=(AA)1AA^+ = (A^\top A)^{-1} A^\top.

Reading: Strang 4.5.

Lecture 15 (Mon March 10 2025)

  • If AA is an n×nn\times n square determinant, its determinant is the signed factor by which the linear transformation A:RnRnA:\mathbb{R}^n\to\mathbb{R}^n scales nn-dimensional volumes.
  • Some key facts:
    • Product formula: det(AB)=(detA)(detB)\det(AB)=(\det A)(\det B).
    • We have detA0\det A\neq0 if and only if AA is invertible.
    • The determinant of an upper triangular matrix is the product of the diagonal entries.
  • We covered several cases of $2\times 2matricesmatricesA:theunitsquare: the unit square Smapstoaparallelogrammaps to a parallelogramAS,and, and \det Ais(uptosign)the2dimensionalvolume(area)ofis (up to sign) the 2-dimensional volume (area) ofAS$.
  • Two ways to calculate detA\det A up to sign:
    • Use a (generalized) QR factorization: A=QRA=QR where QQ is an n×nn\times n orthogonal matrix, and RR is upper triangular (possibly with zero entries on the diagonal). Then detQ=±1\det Q=\pm1, so detA=±(detR)\det A = \pm(\det R).
    • Use gaussian elimination: GA=A~GA=\tilde{A} where A~\tilde{A} is in row echelon form (REF), and GG is a product of row swap or elimination matrices. Then detG=±1\det G = \pm1, so detA=±(detA~)\det A = \pm(\det\tilde{A}).

Reading: Strang 5.1.

Lecture 16 (Wed March 12 2025)

  • We covered the "big formula" for the determinant of an n×nn\times n matrix, detA=σ(sgn σ)i=1nai,σ(i)\det A = \sum_\sigma (\textup{sgn }\sigma)\prod_{i=1}^n a_{i,\sigma(i)}. The sum goes over all n!n! permutations of {1,,n}\{1,\ldots,n\}, and sgn σ\textup{sgn }\sigma denotes the sign of the permutation σ\sigma: it is +1+1 if σ\sigma is a composition of an even number of swaps, and 1-1 if σ\sigma is a composition of an odd number of swaps. We explained that this formula can be derived from the multilinearity property of the determinant.
  • In most cases, the more efficient way to compute detA\det A will be by gaussian elimination: R=GkG1AR = G_k \cdots G_1 A. RR is in REF, so it is upper triangular and its determinant is simply the product of its diagonal entries. Each GiG_i encodes an elementary row operation: if GiG_i encodes a row swap, it follows from the big formula that detGi=1\det G_i=-1. Otherwise, if GiG_i encodes an elimination operation, then GiG_i is a lower triangular matrix with all $1salongthediagonal,andinthiscase's along the diagonal, and in this case \det G_i=1.Itfollowsthat. It follows that \det A=(-1)^s\det R,where, where s$ is the number of row swaps in the gaussian elimination.

Reading: Strang 5.2.

Lecture 17 (Fri March 14 2025)

  • We covered the Laplace expansion of the determinant, which can be viewed as a way to organize the "big formula" from last time.
  • We considered one example of a circulant matrix; see https://en.wikipedia.org/wiki/Circulant_matrix. Following the Wikipedia notation, our example had c0=1c_0=1, c1=zc_1=z, and all other cj=0c_j=0. We covered how to evaluate the determinant of this matrix using the "big formula", and also using Laplace expansion along the top row.

Reading: Strang 5.3.

Lecture 18 (Mon March 17 2025)

  • In this lecture we did some review and examples in preparation for the Wednesday exam.
  • If M=I+vvM=I+vv^\top, we explained how to calculate that M1=I(1+v2)1vvM^{-1} = I-(1+\|v\|^2)^{-1}vv^\top.
  • Let II be the r×rr\times r identity matrix, and vv a vector in Rr\mathbb{R}^r. We calculated the pseudoinverse of the matrix
A=(Iv00)A = \begin{pmatrix} I & v \\ 0 & 0 \end{pmatrix}

Exam 2 (Wed March 19 2025)


Lecture 19 (Fri March 21 2025)

  • In this lecture we reviewed the Laplace expansion (also called cofactor expansion) of the determinant.
  • Given an n×nn\times n matrix AA, the (i,j)(i,j) minor is the (n1)×(n1)(n-1)\times(n-1) matrix Mi,jM_{i,j} obtained by removing the ii-th row and jj-th column of AA.
  • We defined the (i,j)(i,j) cofactor as Ci,j=(1)i+jdetMi,jC_{i,j}=(-1)^{i+j}\det M_{i,j}. The cofactor matrix is the n×nn\times n matrix CC with entries Ci,jC_{i,j}.
  • The adjugate matrix is X=CX=C^\top. We derived in class that if AA is invertible, then A1=(1/detA)XA^{-1}=(1/\det A) X.
  • We also used this to derive Cramer's rule for solving a linear system Ax=bAx=b.

Reading: finish Strang Chapter 5.

Lecture 20 (Mon March 31 2025)

  • At the start of this class, we discussed that diagonal matrices act in a simple way.
  • A (square) matrix AA is diagonalizable if it can be related to a diagonal matrix via the equation A=EDE1A=EDE^{-1} where EE is n×nn\times n invertible, and DD is n×nn\times n diagonal.
  • Caution: not all square matrices are diagonalizable! Nevertheless, it is an important and useful concept.
  • Let EE have columns v1,,vnv^1,\ldots,v^n, and let DD have diagonal entries d1,,dnd_1,\ldots,d_n. We showed that vjv^j is an eigenvector of AA with eigenvalue djd_j.
  • Since EE is invertible (by definition), its columns form a basis of Rn\mathbb{R}^n, which is called the eigenbasis of AA. The action of AA in the eigenbasis is diagonal.
  • We explained that EE and E1E^{-1} may be viewed as implementing change of basis: E1E^{-1} maps from standard coordinates to eigenbasis coordinates, while EE maps from eigenbasis coordinates to standard basis coordinates.
  • We also covered some concrete examples. In future lectures we will learn how to compute matrix eigenvalues and eigenvectors.

Reading: start Strang Chapter 6.

Lecture 21 (Wed April 2 2025)

  • Let AA be a square n×nn\times n matrix. An eigenvector of AA is a non-zero vector vRnv\in\mathbb{R}^n such that Av=λvAv=\lambda v for a scalar λ\lambda (the eigenvalue). We will allow λ\lambda to be a real or complex number, so in general λC\lambda\in\mathbb{C}.
  • An eigenvector with eigenvalue λ=0\lambda=0 is just a null vector.
  • In general, for any λ\lambda, an eigenvector with eigenvalue λ\lambda is any non-zero vector in the null space N(AλI)N(A-\lambda I).
  • It follows that the eigenvalues of AA are exactly the roots of pA(λ)=det(AλI)p_A(\lambda)=\det(A-\lambda I), the characteristic polynomial of AA.
  • Let α\alpha denote the trace of AA (sum of its diagonal entries). It follows from the determinant formula that pA(λ)p_A(\lambda) is a polynomial in λ\lambda of degree nn, of the form
pA(λ)=(1)nλn+(1)n1αλn1++detA.p_A(\lambda) = (-1)^n\lambda^n +(-1)^{n-1} \alpha \lambda^{n-1} + \ldots + \det A\,.
  • The fundamental theorem of algebra tells us that pA(λ)p_A(\lambda) has nn roots λ1,,λn\lambda_1,\ldots,\lambda_n, and can be factorized as
pA(λ)=(1)nj=1n(λλj).p_A(\lambda) = (-1)^n \prod_{j=1}^n(\lambda-\lambda_j)\,.
  • The eigenvalues are exactly the roots λj\lambda_j. They may be complex-valued, and it is possible to have multiple roots.
  • The algebraic multiplicity of an eigenvalue λ\lambda is the number of times it appears as a root in the characteristic polynomial.
  • The geometric multiplicity of an eigenvalue λ\lambda is the dimension of its eigenspace, N(AλI)N(A-\lambda I).
  • In general, $1\le \textup{geo mult} \le \textup{alg mult}$. We will discuss this further in the next lecture.

Reading: Strang 6.1-6.2.

Lecture 22 (Fri April 4 2025)

  • Let AA be a square n×nn\times n matrix. We discussed last time that the characteristic polynomial pA(λ)p_A(\lambda) is a polynomial in λ\lambda of degree nn. The fundamental theorem of algebra then tells us that it has nn roots λ1,,λn\lambda_1,\ldots,\lambda_n, and these are precisely the eigenvalues of AA. The roots may be complex-valued, and it is possible to have multiple roots.
  • The algebraic multiplicity of an eigenvalue λ\lambda is the number of times it appears as a root of the characteristic polynomial.
  • The geometric multiplicity of an eigenvalue λ\lambda is the dimension of its eigenspace, N(AλI)N(A-\lambda I).
  • In general, $1 \le \textup{geo mult} \le \textup{alg mult}$.
  • The algebraic multiplicities sum up to the total number nn of roots. Eigenspaces for distinct eigenvalues are linearly independent, so the geometric multiplicities sum up to the combined dimension of all eigenspaces. The matrix AA is diagonalizable if and only if the latter sum equals nn, which means we must have geo mult=alg mult\textup{geo mult} = \textup{alg mult} for all the eigenvalues. This is not guaranteed in general.
  • A special case is that all nn roots are distinct. In this case we must have geo mult=alg mult=1\textup{geo mult} = \textup{alg mult} = 1 for all eigenvalues, so the matrix AA in this case is always diagonalizable. If the roots are not all distinct, then AA may or may not be diagonalizable.
  • If AA is not diagonalizable, then it has a Jordan canonical form (JCF), which can be viewed as a generalization of the diagonalization. (In the special case that AA is diagonalizable, the JCF is the same as the diagonalization.)

Reading: finish reading Strang 6.1-6.2.

Lecture 23 (Mon April 7 2025)

  • We discussed that eigenvalues and eigenvectors can be complex-valued, even for real matrices, and covered an example.
  • In general, if AA is a real n×nn\times n matrix, then its characteristic polynomial has real coefficients. This implies that the non-real eigenvalues of AA must occur in conjugate pairs. (For example, this also implies that if AA is n×nn\times n with nn odd, then AA must have at least one real eiganvalue).
  • We covered some basic concepts of complex numbers, including conjugate and modulus. We also covered complex vectors, the conjugate transpose operation, and the norm of a complex vector.
  • Lastly, we covered the spectral theorem, which says that is AA is n×nn\times n real symmetric, then it has all real eigenvalues and eigenvectors, and an orthonormal eigenbasis. We can write this as A=EDEA=EDE^\top where both DD and EE are real, and EE is an orthogonal matrix. In class we also gave a partial proof of the spectral theorem.

Reading: Strang 6.3, as well as the first three pages of Strang 6.4.

Lecture 24 (Wed April 9 2025)

  • A symmetric matrix is positive-definite (PD) if all its eigenvalues are strictly positive.
  • A symmetric matrix is positive semi-definite (PSD) if all its eigenvalues are nonnegative.
  • If AA is symmetric, it is PD if and only if xAx>0x^\top Ax>0 for every vector xx.
  • If AA is symmetric, it is PSD if and only if xAx0x^\top Ax\ge0 for every vector xx.
  • For any matrix MM (n×pn\times p), both MMMM^\top and MMM^\top M are PSD.
  • In this lecture we introduced the singular value decomposition (SVD), which applies to any matrix MM (n×pn\times p). More precisely we covered both the long SVD and the short SVD.
  • In the special case that MM is a square matrix of full rank, the long and short SVD are identical, and we covered in class the procedure to find this SVD.

Reading: Strang 7.1.

Lecture 25 (Fri April 11 2025)

  • We covered how to find the short and long SVD of a general matrix MM (n×pn\times p).
  • The matrix A=MMA=M^\top M is PSD, so we can find its spectral decomposition A=EDEA=EDE^\top. Moreover we can arrange that the diagonal entries of DD are d1dr>0=dr+1==dpd_1 \ge \ldots \ge d_r > 0 = d_{r+1} = \ldots = d_p, where rr is the rank of MM (and also the rank of AA).
  • Let VV be the p×rp\times r matrix formed from the first rr columns of EE.
  • Let Σ\Sigma be the r×rr\times r diagonal matrix with diagonal entries σi=(di)1/2\sigma_i=(d_i)^{1/2} for $1\le i\le r$.
  • Let UU be the n×rn\times r matrix defined by U=MVΣ1U=MV\Sigma^{-1}.
  • Then M=UΣVM=U\Sigma V^\top gives the short SVD of MM.
  • To convert from short to long SVD: expand Σ\Sigma from r×rr\times r to n×pn\times p by adding zeroes; expand UU from n×rn\times r to n×nn\times n so that its columns form an orthonormal basis of Rn\mathbb{R}^n; expand VV from p×rp\times r to p×pp\times p so that its columns form an orthonormal basis of Rp\mathbb{R}^p.
  • We also covered some examples and practice problems.

Reading: finish reading Strang 7.1.

Lecture 26 (Wed April 16 2025)

  • In this lecture we covered geometric interpretations of the SVD.
  • Throughout, suppose MM is n×pn\times p with rank rr, and that we rank its singular values in nondecreasing order σ1σr>0\sigma_1\ge \ldots \ge \sigma_r>0.
  • The maximum singular value σ1\sigma_1 is the operator norm or spectral norm of MM, usually denoted Mop\|M\|_\textup{op} or M\|M\|.
  • The operator norm of MM can be understood as the maximum value of Mv\|Mv\| attained as vv ranges over all unit vectors in Rp\mathbb{R}^p.
  • The SVD can be used to calculate the pseudoinverse: if the short SVD of MM is given by M=UΣVM=U\Sigma V^\top, then the pseudoinverse of MM is VΣ1UV\Sigma^{-1}U^\top.

Reading: Strang 7.2.

Lecture 27 (Fri April 18 2025)

  • In this lecture we covered the application of SVD to low-rank approximation and image compression.
  • Suppose MM is n×pn\times p with short SVD M=UΣVM=U\Sigma V^\top. As always, we rank the singular values (diagonal entries of Σ\Sigma) σ1σr>0\sigma_1\ge \ldots \ge \sigma_r>0.
  • The rank-kk approximation of MM is given by Mk=UkΣk(Vk)M_k = U_k \Sigma_k (V_k)^\top, where UkU_k is formed from the first kk columns of UU, VkV_k is formed from the first kk columns of VV, and Σk\Sigma_k is the upper left k×kk\times k submatrix of Σ\Sigma.
  • We discussed three matrix norms: (1) operator norm / spectral norm; (2) Frobenius norm / Hilbert--Schmidt norm, (3) nuclear norm.
  • Eckhart--Young theorem: among all matrices of rank at most kk, the best approximation to MM is given by MkM_k. It is best with respect to all three of the spectral norms listed above.
  • Application to image compression: if MM represents an n×pn\times p image, the original image consists of npnp pixels. Storing the compressed image MkM_k requires storing (n+p)k+k(n+p)k+k values. If n,pn,p are large and kk is relatively small, the compressed image requires much less storage.
  • See https://timbaumann.info/svd-image-compression-demo/ for examples.

Reading: Strang 7.2.

Lecture 28 (Wed April 23 2025)

  • We covered the application of SVD to PCA (principal components analysis).
  • Let XX be an n×pn\times p data matrix where nn is the number of individuals or samples, and pp is the number of attributes or features.
  • We assume the data is normalized, so that each column (feature) has mean zero and standard deviation one.
  • 2D PCA: choose the 2D projection of the data that shows the most variability.
  • We learned in class that this is achieved by taking u,vu,v to be the two top right singular vectors (corresponding to the first two columns of the VV matrix), resulting in the 2D scatterplot of the values (xiu,xiv)(x_i\cdot u,x_i\cdot v) for i=1,,ni=1,\ldots,n.
  • Lastly, we showed that 1D PCA and ordinary least squares (OLS, see Lecture 12) are not the same. This is the reason the textbook refers to 1D PCA as "perpendicular least squares."

Reading: Strang 7.3.

Lecture 29 (Fri April 25 2025)

  • In this lecture we finished our discussion of SVD and PCA by going over the application covered in the paper https://doi.org/10.1038/nature07331.
  • We reviewed the basics of complex numbers: real part, imaginary part, complex conjugate, modulus, polar form, Euler's formula.
  • Example: the permutation matrix AA corresponding to the permutation σ\sigma that maps $1\mapsto 2, 2\mapsto 3, \ldots, n-1\mapsto n, n\mapsto1.Theeigenvaluesarethe. The eigenvalues are the n$-th roots of unity.
  • Example: solving the differential equation f(t)=f(t)f''(t)=-f(t).
  • We defined the complex dot product (also called scalar product or inner product): for v,wCnv,w\in\mathbb{C}^n, we define vw=vwv\cdot w = \overline{v}^\top w.

Reading: start Strang 6.4.

Lecture 30 (Mon April 28 2025)

  • In this class we covered definitions of special classes of complex n×nn\times n matrices.
  • Unitary matrices: extension of definition of orthogonal matrices. Orthogonal matrices preserve the Rn\mathbb{R}^n scalar product, and unitary matrices preserve the Cn\mathbb{C}^n scalar product.
  • Hermitian matrices: extension of definition of symmetric matrices. Symmetric matrices are self-adjoint on Rn\mathbb{R}^n, and hermitian matrices are self-adjoint on Cn\mathbb{C}^n.
  • A matrix ACn×nA\in\mathbb{C}^{n\times n} is normal if AAˉ=AˉAA\bar{A}^\top=\bar{A}^\top A. Both sets of unitary matrices and hermitian matrices sit inside the set of normal matrices.
  • We covered the spectral theorem for three cases: symmetric matrices (covered previously), hermitian matrices, and normal matrices.
  • Example: same permutation matrix as discussed last time. This matrix is normal, but not symmetric. It has an orthogonal eigenbasis, which corresponds to a special basis of Cn\mathbb{C}^n called the Fourier basis.

Reading: continue Strang 6.4.

Lecture 31 (Wed April 30 2025)

  • Let PP be the n×nn\times n permutation matrix corresponding to the cyclic permutation σ\sigma that sends $1\mapsto2, 2\mapsto 3, \ldots, n-1\mapsto n, n\mapsto 1$.
  • We reviewed that the eigenvalues of PP are the nn-th roots of unity, and the Fourier basis (columns of the n×nn\times n Fourier matrix) is an eigenbasis of PP.
  • It follows that the Fourier basis is also an eigenbasis for any (nonnegative) power of PP.
  • A circulant matrix is any linear combination of powers of PP. It often arises in situations where some underlying system has a circular structure. For example we define the cycle graph TnT_n consisting of nn nodes connected by nn edges in a circle: a natural definition of discrete derivative applied to functions on TnT_n gives rise to a circulant matrix.
  • The Fourier basis is also an eigenbasis for any circulant matrix. This fact can be used for multiplying circulant matrices, and also implies that circulant matrices commute with one another.

Reading: continue Strang 6.4.

Lecture 32 (Fri May 2 2025)

  • In this lecture we continued our discussion of circulant matrices. We covered the cyclic convolution operation in detail.
  • We introduced TnT_n as the graph with nn vertices numbered $0uptoup ton-1,withcyclicindexingmod, with cyclic indexing mod n,suchthatthereisanedgebetween, such that there is an edge between iandandi+1(andthus,withcyclicindexing,thereisanedgebetween(and thus, with cyclic indexing, there is an edge betweenn-1 and \0).Avector). A vector f=(f_0,\ldots,f_{n-1})\in\mathbb{C}^nisequivalenttoafunctionis equivalent to a functionf:T_n\to\mathbb{C}thatsendselementthat sends elementj\in T_ntovalueto valuef(j)=f_j$.
  • We discussed that for functions f:TnCf:T_n\to\mathbb{C}, a natural notion of a discrete derivative of ff can be expressed as applying a circulant matrix, fDff\mapsto Df.
  • We also defined a discrete second derivative Δ\Delta, which we found was also a circulant matrix. Vectors in the kernel of this matrix correspond to what are called harmonic functions.

Reading: continue Strang 6.4.

Lecture 33 (Mon May 5)

  • In this lecture we discussed complex vector spaces.
  • We reviewed the basic notions of span, linear independence, basis, dimension in the context of complex vector spaces.
  • We reviewed the orthogonal projection calculation in the context of subspaces of Cn\mathbb{C}^n equipped with the Cn\mathbb{C}^n dot product.
  • In preparation for the next lecture, we reviewed the precise structure of the $6\times6$ Fourier matrix.

Reading: continue Strang 6.4.

Lecture 34 (Wed May 7)

  • In this lecture we reviewed the Fourier basis, and covered in detail its connection to the sines-cosines basis.
  • We covered the discrete Fourier transform (DFT), and explained how to regard it as a change of basis operation.
  • We discussed natural applications of DFT to compression and denoising in signal processing.

Reading: continue Strang 6.4.

Lecture 35 (Wed May 9)

  • This lecture focused on the fact that the DFT sends convolution to pointwise multiplication. This is a key property of the Fourier transform.
  • We used what we learned about circulant matrices to obtain an algebraic derivation of this identity.
  • We covered some of the uses of this identity: e.g., since convolution can be challenging (e.g. think about the nn-fold convolution of a vector xx), one can use DFT to convert convolution to pointwise multiplication --- which is easy --- then apply Fourier inversion to convert back.

Reading: continue Strang 6.4.

Lecture 36 (Mon May 12)

  • In this lecture we gave a probabilistic derivation of the same fact from last time, that the DFT sends convolution to pointwise multiplication.
  • We also reviewed some previous material about the SVD.

Reading: finish reading Strang 6.4.