
I used to think linear algebra was just another math class I had to get through. Then I started learning machine learning and realized it’s basically everywhere. Turns out all those matrix operations and vector spaces actually run the algorithms that power modern AI.
Why Everything Is Vectors and Matrices
Machine learning boils down to doing math on data, and that data gets represented as vectors and matrices. When you train a model, you’re really just doing a bunch of linear transformations on these mathematical objects.
Vector Spaces and Transformations
Say you have a dataset with $n$ features. Each data point lives in an $n$ dimensional vector space $\mathbb{R}^n$. When you apply a linear transformation, you multiply by a matrix $A$:
$$f(x) = Ax$$
This simple operation is basically what happens in every layer of a neural network, just with a nonlinear activation function added on top.
1 | import numpy as np |
How Matrix Operations Power ML
Let me show you how matrix operations show up in some fundamental machine learning techniques.
Principal Component Analysis
PCA uses eigendecomposition to find the directions where your data varies the most. You compute the covariance matrix:
$$C = \frac{1}{n}X^TX$$
Then solve the eigenvalue equation:
$$Cv = \lambda v$$
1 | def compute_pca(X: np.ndarray, n_components: int) -> tuple[np.ndarray, np.ndarray]: |
Neural Networks Are Just Matrix Math
Deep learning networks are basically just chains of linear transformations with nonlinear functions in between. Each layer does:
$$h = \sigma(Wx + b)$$
where $\sigma$ is the activation function, $W$ is the weight matrix, and $b$ is the bias vector.
1 | def neural_network_layer( |
Training Is Linear Algebra Too
Even the training process relies on linear algebra. Gradient descent computes partial derivatives with respect to matrices:
$$W_{t+1} = W_t - \alpha \frac{\partial L}{\partial W_t}$$
where $L$ is the loss function and $\alpha$ is the learning rate.
Beyond the Basics
More advanced techniques like Singular Value Decomposition are crucial for things like recommendation systems. SVD factorizes a matrix $A$ as:
$$A = U\Sigma V^T$$
where $U$ and $V$ are orthogonal matrices, and $\Sigma$ contains the singular values.
1 | def truncated_svd(X: np.ndarray, k: int) -> tuple[np.ndarray, np.ndarray, np.ndarray]: |
Why This Matters
Linear algebra lets you express complicated transformations using simple matrix operations. The more I work with machine learning, the more I appreciate how these mathematical foundations make everything possible. Next time you call model.fit()
or torch.matmul()
, remember there’s some beautiful math happening underneath that makes it all work.