A Deeper look at Vectors#
Learning Objectives#
By the end of this lecture, you should be able to:
Define a vector space and an inner product space.
Develop a geometric intuition for vectors and the dot product.
Understand basic vector operations, including addition, scalar multiplication, and the dot product.
Define a matrix and matrix multiplication.
Why Vectors#
At its most immediate, encoding an object in a list of numbers is a natural thing to do.
We might describe an object using multiple “features”, each of which has a numeric value. We might describe an apartment by its size in square feet, the number of bedrooms, which floor it is on. Similarly, we might describe a student by their transcript: a number between 0 and 100 for every class they’ve taken.
A convenient way to store a function on a computer is by storing its values at a list of points.
To describe a molecule, we might write down its elemental composition: the number of carbons, the number of hydrogens, the number of oxygens, and so forth.
However, by itself a list of numbers is not very useful: to do science, we need ways to manipulate it. To manipulate our lists, we first consider three main properties:
Addition: We can add two lists of numbers together by adding their corresponding entries.
Scalar Multiplication: We can multiply a list of numbers by a single number by multiplying each entry by that number.
Dot Product: We can multiply two lists of numbers together by multiplying their corresponding entries and summing the result.
We can visualize vectors as arrows in space. The vector \(\begin{bmatrix} 1 \\ 2 \end{bmatrix}\), for instance, is an arrow that starts at the origin and ends at the point \((1,2)\).
A Formal Definition of a Vector#
Formally, we have defined a set of minimal operations we expect our list to obey.
We define a space of objects called vectors along with a second space of objects called scalars.
We define two operations: vector addition and scalar multiplication.
If, for any three vectors \(u,v,w\) and two scalars \(a,b\), we the following properties to hold,
Associativity: \((u+v)+w = u+(v+w)\).
Commutativity: \(u+v = v+u\).
Identity element for vector addition: There is a vector \(0\) such that \(v+0 = v\) for all \(v\).
Inverse: For every vector \(v\), there is a another vector \(-v\) such that \(v+(-v) = 0\).
Compatibility with Scalars: \(a(bv) = (ab)v\).
Distributivity of Scalars: \(a(u+v) = au + av\).
Distributivity of Vectors: \((a+b)v = av + bv\).
Identity element for scalar multiplication: \(1v = v\). then our vectors and scalars form a vector space.
We then further introduce a dot product operation, which takes two vectors and returns a scalar. We require that the dot product be bilinear:
Linearity in the first argument: \((au+v) \cdot w = a(u \cdot w) + v \cdot w\).
Symmetry: \(u \cdot v = \overline{v \cdot u}\).
Positive Definite: \(u \cdot u \geq 0\) and \(u \cdot u = 0\) if and only if \(u = 0\). Equipped with these operations, we have formed an inner product space. (Note: you might have heard of the term “Hilbert Space” in your quantum mechanics classes. A Hilbert space is an inner product space with a few more technical properties we won’t discuss here.)
These somewhat abstract properties allow us to impose a consistent structure on a wide variety of objects. For instance, we can define an inner product space of functions whose square-integral is finite. Our “vectors” add elementwise: \( (f+g)(x) = f(x) + g(x) \). Similarly, a function times a scalar is defined by multiplying the function by the scalar: \( (af)(x) = a f(x) \) For the dot product, we use the integral: \( (f \cdot g) = \int f^*(x) g(x) dx \) where \(f^*\) is the complex conjugate of \(f\). This structure shows up often in quantum mechanics, where vectors are wavefunctions. However, this not the only inner product space we can define. For instance, in statistical mechanics, we might define our vectors and scalars the same way, but instead define an inner product using the average against the Boltzmann density: \( (f \cdot g) = \frac{1}{Z} \int f(x) g(x) E^{-\beta H(x)} \) where \(\beta\) is the inverse temperature.
The key point between these examples is that we can define similar structures on a wide variety of objects. This will be very useful computationally: we can approximate a function (e.g. a wavefunction) by a list of numbers but still expect much of the same mathematical structure to exist.
Geometrically understanding vectors#
Vectors have direct geometric interpretations: we can interpret a real vector with \(k\) entries as an arrow pointing from the origin to the corresponding \(k\)-dimensional space. For instance, the vector \(\begin{bmatrix} 1 \\ 2 \end{bmatrix}\) is an arrow that starts at the origin and ends at the point \((1,2)\). Multiplying a vector by a scalar changes the length of the arrow, while adding two vectors adds the arrows tip-to-tail.
The dot product also has a strong geometric interpretation. For a vector with entries \(x_1, x_2, \ldots, x_N\), the dot product of the vector with itself is equal to its squared Euclidean length:
We say a vector is normalized if its length is equal to 1. We can normalize a vector by dividing it by its length:
Moreover, the dot product of two vectors is proportional to the cosine of the angle between them.
where \(\theta\) is the angle between two vectors \(x\) and \(y\). If \(x \cdot y\) is zero, we say the vectors are orthogonal: vectors that are both normalized and orthogonal are called orthonormal.
Not only does this give us a natural connection to trigonometry, it lets us generalize the notion of distance and angle between any two vectors.
Consider two wavefunctions \(\psi_1\) and \(\psi_2\). One definition of the “angle” between them is their overlap integral:
Introduction to Matrices#
We have a natural way of generalizing addition to vectors. We now turn to a second question: how should we generalize multiplication? In fact, we have already introduced one possible generalization: the dot product, which “multiplies” two vectors to produce a scalar. This gives us an idea; if we have a vector with \(M entries, we can perform \)K\( different dot products to get a new vector with \)K$ entries.
To help make this practical, we introduce a new notation. We write the first vector in a dot product as a row of numbers (a “row vector”) and the second vector as a column of numbers (a “column vector”).
To take the dot product of a row vector and a column vector, we march along the row vector and down the column vector, multiplying the corresponding entries and summing the result as we go.
We can extend this idea to a matrix, which is a rectangular array of numbers. Effectively, this is a set of row vectors stacked on top of each other.
As another example, if we had an arbitrary entries in a 2-by-2 matrix and a vector with two entries, we could write the matrix-vector product as below.
Another way to think of a matrix-vector product is a weighted sum of the matrix columns, where the weights are given by the vector entries.
Range, rank, and kernel.#
This view tells us what the possible range of a matrix is: the set of all output vectors we can get by multiplying the matrix by any vector. The dimensionality of a matrix’s range is called its rank. For instance, the matrix \(\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}\) has rank 2, since we can get any vector in \(\mathbb{R}^2\) by multiplying it by a vector. However, the matrix \(\begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix}\) has rank 1, since the second column is the same as the first column but scaled by 2. In general, the rank of a matrix is the number of columns that are linearly independent: that is, the number of columns that can’t be written as a linear combination of any other columns. One of the most important theorems in linear algebra is that the number of linearly independent columns is the same as the number of linearly independent rows: that is, the rank of a matrix is the same as the rank of its transpose.
The kernel of a matrix is the set of all vectors that get sent to zero by the matrix. All matrices have a ``trivial’’ kernel: the zero vector. However, some matrices have nontrivial kernels: nonzero vectors that get sent to zero as well.
For instance, the kernel of the matrix \(\begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix}\) is the set of all vectors of the form \(\begin{bmatrix} x \\ -2x \end{bmatrix}\). The kernel of a matrix is always a vector space: it contains the zero vector and is closed under addition and scalar multiplication. The dimensionality of the kernel is called the nullity of the matrix, and the sum of the rank and the nullity is the number of columns in the matrix.
Matrices as Linear Transformations#
Another way of understanding matrices is as stretches, rotations, and reflections of space. For instance,
\(\begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix}\) stretches space by a factor of 2 in both the \(x\) and \(y\) directions: applying it to any vector doubles its length.
\(\begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}\) rotates space by 90 degrees counterclockwise: any vector gets rotated by 90 degrees.
\(\begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}\) reflects space across the \(x\)-axis: any vector gets flipped across the \(x\)-axis.
Matrices with nontrivial kernel “squash” vectors into a subspace of lower dimensionality than . For instance, the matrix \(\begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix}\) squashes all vectors onto the line \(y = 2x\).
Matrix Multiplication#
Just as we stacked row vectors vertically, we can stack our column vectors horizontally. This allows us to multiply two matrices together. For instance, if we have two 2-by-2 matrices, we can multiply them together as follows.
Observe that the \(i,j\)’th entry is the dot product of the \(i\)’th row of the first matrix and the \(j\)’th column of the second matrix.
Matrices don’t have to be square or even have the same number of rows and columns. However, to multiply two matrices together, the number of columns in the first matrix must equal the number of rows in the second matrix. For instance, if we have a 2-by-3 matrix and a 3-by-2 matrix, we can multiply them together.
We could also multiply a 3-by-2 matrix by a 2-by-3 matrix.
We could also multiply a 3-by-2 matrix by a 2-by-2 matrix.
However, we could not multiply a 2-by-3 matrix by a 2-by-2 matrix.
Note that matrix multiplication has some key differences from scalar multiplication.
Matrix multiplication is associative: \((AB)C = A(BC)\).
Matrix multiplication is distributive: \(A(B+C) = AB + AC\).
Matrix multiplication is not commutative: \(AB = BA\).
Basic Matrix Operations#
Matrices inherent many of the properties of vectors:
Addition: We can add two matrices of the same size together by adding their corresponding entries. For example, $\( \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} + \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} = \begin{bmatrix} 1+5 & 2+6 \\ 3+7 & 4+8 \end{bmatrix} = \begin{bmatrix} 6 & 8 \\ 10 & 12 \end{bmatrix} \)$
Scalar Multiplication: We can multiply a matrix by a single number by multiplying each entry by that number. For example, $\( 2 \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} = \begin{bmatrix} 2 \cdot 1 & 2 \cdot 2 \\ 2 \cdot 3 & 2 \cdot 4 \end{bmatrix} = \begin{bmatrix} 2 & 4 \\ 6 & 8 \end{bmatrix} \)$ In fact, we can consider matrices themselves as vectors: we can add them and multiply by them scalars. There are also multiple possible ways to define a “dot product” between two matrices. One simple way is to stack the entries into a bigger vector and take the dot product of the resulting vectors: this is known as the Frobenius inner product.
However, there are also some additional operations we might apply on matrices that are useful:
The transpose operation. The transpose of a matrix is the matrix with its rows and columns swapped. For instance,
\[\begin{split} \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}^T = \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix} \end{split}\]It should be easy to verify that the transpose operation satisfies the following properties:
\((A^T)^T = A\).
\((A + B)^T = A^T + B^T\).
\((cA)^T = cA^T\).
\((AB)^T = B^T A^T\).
The trace. The trace of a square matrix is the sum of its diagonal entries. For instance,
\[\begin{split} \text{tr} \left( \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \right) = 1 + 4 = 5 \end{split}\]For a more general \(M \times M\) matrix \(A\), the trace obeys \(\text{tr}(A) = \sum_{i=1}^M A_{ii}\). The trace operation satisfies the following properties:
\(\text{tr}(A + B) = \text{tr}(A) + \text{tr}(B)\).
\(\text{tr}(cA) = c \text{tr}(A)\).
\(\text{tr}(AB) = \text{tr}(BA)\).
Some Examples of Matrices in Chemistry#
First-order Rate Laws#
Consider a collection of first-order rections,
The rate equations are given by
We can write this as a matrix equation:
Note that the kernel of this matrix is the same as the steady-state concentration of the system.
A Quantum Mechanical Mixture of \(M\) wavefunctions#
Assume we have a mixture of \(M\) wavefunctions \(\psi_1, \psi_2, \ldots, \psi_M\) (typically eigenstates of some Hamiltonian).
We now attempt to calculate the expectation for an operator \(\hat{A}\). If the matrix entries \(A_{ij} = \langle \psi_i | \hat{A} | \psi_j \rangle\) are known, then the expectation value is given by
This is a matrix-vector product, of the form
Note that a short calculation shows that this is equivalent to
where \(D\) is the density matrix with entries \(D_{ij} = c_i c_j^*\).
Approximating a derivative#
Consider a function \(f(x)\) that we wish to approximate the derivative of. We can approximate the derivative by a finite difference:
If we have evaluated \(f\) at \(N\) points \(x_1, x_2, \ldots, x_N\), we can write this as a matrix equation:
This is a matrix-vector product and forms the basis of many numerical methods for solving differential equations.
Multivariate Linear Regression#
Assume that we have a signal \(y\) that we are modelling as a linear function of \(k\) input variables, \(x_1, x_2, \ldots x_k\). Moreover, we have \(N\) pairs of sampled input points \(X\) as well as their corresponding output points \(Y\).
In linear regression we then attempt to find a vector \(\beta\) such that \(X \beta\) is as close to \(Y\) as possible.
The Identity and the Matrix Inversion#
One of the most important matrices is the identity matrix, traditionally, denoted as \(I\), which is a square matrix with ones on the diagonal and zeros elsewhere. For instance, the 2-by-2 identity matrix is
The identity matrix has the property that multiplying any matrix (or vector) by the identity matrix gives the original matrix: Just as we can define the inverse of a number, we can (attempt to) define the inverse of a matrix. The inverse of a matrix \(A\) is a matrix \(A^{-1}\) such that \(AA^{-1} = A^{-1}A = I\).
Not all matrices have inverses: any matrix who sends a vector to zero does not have an inverse. To show this, let \(v\) be a nonzero vector such that \(Av = 0\). If an inverse existed, we would have \(v = I v = A^{-1} A v = A^{-1} 0 = 0\), which is a contradiction.
Exercise
Is the matrix
invertible? If so, find its inverse. If not, find a vector \(v\) such that \(Av = 0\).