Matrices Explained

ZhiZhi Gewu

03 Nov 2024 — 6 min read

Linear Transformations and Matrices

Linear Transformations

A transformation \(T\) is linear if it satisfies the following two properties:

Additivity: For all vectors \(x, y\) in the domain of \(T\), we have \(T(x + y) = T(x) + T(y)\).
Homogeneity: For all vectors \(x\) in the domain of \(T\) and all scalars \(c\), we have \(T(c x) = c T(x)\).

Let's start with the basics: what is a transformation? In mathematics, a transformation is a function that maps one set of values to another. In linear algebra specifically, a transformation maps vectors from one vector space to another. If you're not familiar with vector spaces, think of them as collections of vectors that can be added together and scaled by scalars.

The properties of linear transformations can be understood intuitively:

Additivity means that applying a linear transformation to the sum of two vectors \(x\) and \(y\) is equivalent to applying the transformation to each vector separately and then adding the results.
Homogeneity means that applying a linear transformation to a scaled vector \(c x\) is equivalent to scaling the transformed vector \(T(x)\) by \(c\).

The term "linear" comes from a key geometric property: when you apply a linear transformation to a straight line, you get another straight line. The transformation preserves straightness and proportions between points. For example, if you have a line segment between points \(A\) and \(B\), after applying a linear transformation, the line segment between the transformed points \(T(A)\) and \(T(B)\) will maintain the same proportion as the original segment.

To better understand these properties, let's examine some counterexamples where they are violated:

Violating both additivity and homogeneity:

Consider the transformation \(T(x) = x + 1\). For this transformation:

\(T(x + y) = (x + y) + 1\)
\(T(x) + T(y) = (x + 1) + (y + 1) = x + y + 2\)

Clearly, \(T(x + y) \neq T(x) + T(y)\). Similarly:

\(T(c x) = c x + 1\)
\(c T(x) = c (x + 1) = c x + c\)

Therefore, \(T(c x) \neq c T(x)\). This transformation violates both additivity and homogeneity.

Violating additivity only:

Consider a transformation \(T\) acting on sequences \(x[n]\) defined as:

\[ T(x[n]) = \begin{cases} x[n], & \text{if } x[0] \neq x[1] \\ 0, & \text{if } x[0] = x[1] \end{cases} \]

Case 1: \(c=0\)
- \(T(c x[n]) = T(0) = 0\)
- \(c T(x[n]) = 0 \cdot T(x[n]) = 0\)
Case 2: \(c \neq 0\) and \(x[0] \neq x[1]\)
- \(T(c x[n]) = c x[n]\)
- \(c T(x[n]) = c x[n]\)
Case 3: \(c \neq 0\) and \(x[0] = x[1]\)
- \(T(c x[n]) = T(0) = 0\)
- \(c T(x[n]) = c \cdot 0 = 0\)

This transformation satisfies homogeneity since \(T(c x[n]) = c T(x[n])\) in all cases. However, let's see how it violates additivity:

Consider these example sequences:

Sequence \(x_1[n]\): \[x_1[n] = \begin{cases} 1, & n = 0 \\ 0, & n \neq 0 \end{cases}\]
- \(x_1[0] = 1\), \(x_1[1] = 0\), so \(x_1[0] \neq x_1[1]\)
- Therefore, \(T(x_1[n]) = x_1[n]\)
Sequence \(x_2[n]\): \[x_2[n] = \begin{cases} 1, & n = 1 \\ 0, & n \neq 1 \end{cases}\]
- \(x_2[0] = 0\), \(x_2[1] = 1\), so \(x_2[0] \neq x_2[1]\)
- Therefore, \(T(x_2[n]) = x_2[n]\)

When we add these sequences:

\((x_1 + x_2)[n] = x_1[n] + x_2[n]\) (sequence addition is element-wise)
Looking at the first two elements:
- \((x_1 + x_2)[0] = x_1[0] + x_2[0] = 1 + 0 = 1\)
- \((x_1 + x_2)[1] = x_1[1] + x_2[1] = 0 + 1 = 1\)
- Thus, \((x_1 + x_2)[0] = (x_1 + x_2)[1] = 1\)
- Since these are equal, \(T\) maps the sum to zero: \(T(x_1[n] + x_2[n]) = 0\)
Therefore, \(T(x_1[n] + x_2[n]) = 0 \neq T(x_1[n]) + T(x_2[n]) = x_1[n] + x_2[n]\)

Violating homogeneity only:

Consider the transformation \(T(x) = x^*\) where \(x^*\) is the complex conjugate of \(x\) (the complex number with the same real part but opposite imaginary part, e.g., \((a + bi)^* = a - bi\)). For this transformation:

\(T(x + y) = (x + y)^* = x^* + y^*\)
\(T(x) + T(y) = x^* + y^*\)

This satisfies additivity since \(T(x + y) = T(x) + T(y)\). However:

\(T(c x) = (c x)^* = c^* x^*\)
\(c T(x) = c x^*\)

These are not equal, violating homogeneity. For a concrete example: \(T(i \cdot i) = T(-1) = -1 \neq i T(i) = i \cdot (-i) = 1\)

These examples demonstrate that both additivity and homogeneity are necessary conditions for a transformation to be linear - neither property alone is sufficient.

Matrices as Representations of Linear Transformations

Matrix-vector multiplication provides a concrete way to represent linear transformations:

\[ A x = \begin{pmatrix} | & | & & | \\ \vec{a}_1 & \vec{a}_2 & \cdots & \vec{a}_n \\ | & | & & | \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix} = x_1 \vec{a}_1 + x_2 \vec{a}_2 + \cdots + x_n \vec{a}_n = \begin{pmatrix} a_{11} x_1 + a_{12} x_2 + \cdots + a_{1n} x_n \\ a_{21} x_1 + a_{22} x_2 + \cdots + a_{2n} x_n \\ \vdots \\ a_{m1} x_1 + a_{m2} x_2 + \cdots + a_{mn} x_n \end{pmatrix} \]

Let \(T: \mathbb{R}^n \to \mathbb{R}^m\) be a linear transformation, and let \(A\) be an \(m \times n\) matrix. The vectors \(e_1, e_2, \dots, e_n\) represent the standard basis vectors in \(\mathbb{R}^n\) (for example, in \(\mathbb{R}^3\), \(e_1 = (1, 0, 0)\), \(e_2 = (0, 1, 0)\), \(e_3 = (0, 0, 1)\)).

We can define the matrix \(A\) as:

\[ A = \begin{pmatrix} | & | & & | \\ T(e_1) & T(e_2) & \cdots & T(e_n) \\ | & | & & | \end{pmatrix}. \]

Then, the transformation \(T\) can be represented as matrix multiplication: \(T(x) = A x\).

Let's break this down step by step:

Expressing a vector in terms of basis vectors:
Any vector \(x\) in \(\mathbb{R}^n\) can be written as a linear combination of the standard basis vectors:
\[ x = x_1 e_1 + x_2 e_2 + \cdots + x_n e_n. \]
Applying the linearity of \(T\):
- By additivity: \[ T(x) = T(x_1 e_1 + x_2 e_2 + \cdots + x_n e_n) = T(x_1 e_1) + T(x_2 e_2) + \cdots + T(x_n e_n). \]
- By homogeneity: \[ T(x_1 e_1) + T(x_2 e_2) + \cdots + T(x_n e_n) = x_1 T(e_1) + x_2 T(e_2) + \cdots + x_n T(e_n). \]
Relating to matrix multiplication:
The expression \(x_1 T(e_1) + x_2 T(e_2) + \cdots + x_n T(e_n)\) is exactly the matrix product \(A x\).

Therefore:

\[ T(x) = x_1 T(e_1) + x_2 T(e_2) + \cdots + x_n T(e_n) = A x. \]

The matrix \(A\) is called the standard matrix for the linear transformation \(T\). This establishes that linear transformations can be represented by matrices, and matrix multiplication is the mechanism for performing these transformations.

Matrix Multiplication

Having established that matrices represent linear transformations, we can now explore how matrix multiplication corresponds to the composition of linear transformations.

Composition of Linear Transformations

If \(T: \mathbb{R}^n \to \mathbb{R}^m\) and \(S: \mathbb{R}^m \to \mathbb{R}^p\) are linear transformations, then their composition \(S \circ T\), defined by \(S(T(x))\), is also a linear transformation. This follows from:

Additivity: \[ S(T(x + y)) = S(T(x) + T(y)) = S(T(x)) + S(T(y)). \]
Homogeneity: \[ S(T(c x)) = S(c T(x)) = c S(T(x)). \]

Defining Matrix Multiplication

Matrix multiplication is defined to reflect this composition of linear transformations.

Let \(A\) be a \(p \times m\) matrix representing the linear transformation \(S\), and \(B\) be an \(m \times n\) matrix representing the linear transformation \(T\). Writing \(B\) in terms of its column vectors:

\[ B = \begin{pmatrix} | & | & & | \\ \vec{b}_1 & \vec{b}_2 & \cdots & \vec{b}_n \\ | & | & & | \end{pmatrix}. \]

The product \(A B\) is a \(p \times n\) matrix whose columns are \(A \vec{b}_1, A \vec{b}_2, \dots, A \vec{b}_n\):

\[ A B = \begin{pmatrix} | & | & & | \\ A \vec{b}_1 & A \vec{b}_2 & \cdots & A \vec{b}_n \\ | & | & & | \end{pmatrix}. \]

This definition aligns perfectly with composition of transformations:

Applying \(B\) to a vector \(x\): \[ B x = x_1 \vec{b}_1 + x_2 \vec{b}_2 + \cdots + x_n \vec{b}_n. \]
Applying \(A\) to \(B x\): \[ A (B x) = A ( x_1 \vec{b}_1 + x_2 \vec{b}_2 + \cdots + x_n \vec{b}_n ) = x_1 A \vec{b}_1 + x_2 A \vec{b}_2 + \cdots + x_n A \vec{b}_n. \]

This shows that:

\[ (A B) x = A (B x). \]

The properties of linear transformations make this possible:

Additivity allows us to distribute \(A\) over the sum: \[ A (\vec{v}_1 + \vec{v}_2) = A \vec{v}_1 + A \vec{v}_2. \]
Homogeneity allows us to factor out scalars: \[ A (c \vec{v}) = c A \vec{v}. \]

Thus, matrix multiplication is defined to correspond to the composition of linear transformations.

Matrices As Representations of Data

Beyond representing transformations, matrices can also represent data directly. A common example appears in machine learning, where we often write \(X W\) with \(X\) being a matrix of features and \(W\) being a matrix of weights. In this context:

Each row of \(X\) represents a single data point
Each column of \(X\) represents a different feature
Each column of \(W\) contains the weights for a particular output

The product \(X W\) computes a weighted combination of features for each data point:

\[ X W = \begin{pmatrix} X_1 W_1 + X_2 W_2 + \cdots + X_n W_n \end{pmatrix} \]

Here, \(X\) serves not as a linear transformation but as a collection of data points being transformed by the weights in \(W\) to produce predictions or intermediate values in the model.