Linear Algebra in ML

Dilawar Mahmood

2018-12-27

I used to think linear algebra was just another math class I had to get through. Then I started learning machine learning and realized it’s basically everywhere. Turns out all those matrix operations and vector spaces actually run the algorithms that power modern AI.

Why Everything Is Vectors and Matrices

Machine learning boils down to doing math on data, and that data gets represented as vectors and matrices. When you train a model, you’re really just doing a bunch of linear transformations on these mathematical objects.

Vector Spaces and Transformations

Say you have a dataset with $n$ features. Each data point lives in an $n$ dimensional vector space $\mathbb{R}^n$. When you apply a linear transformation, you multiply by a matrix $A$:

$$f(x) = Ax$$

This simple operation is basically what happens in every layer of a neural network, just with a nonlinear activation function added on top.

import numpy as np

def linear_transform(X: np.ndarray, W: np.ndarray, b: np.ndarray) -> np.ndarray:
    """
    Applies a linear transformation to input data.
    
    Args:
        X: Input data matrix of shape (n_samples, n_features)
        W: Weight matrix of shape (n_features, n_output)
        b: Bias vector of shape (n_output,)
    
    Returns:
        Transformed data of shape (n_samples, n_output)
    """
    return np.dot(X, W) + b

How Matrix Operations Power ML

Let me show you how matrix operations show up in some fundamental machine learning techniques.

Principal Component Analysis

PCA uses eigendecomposition to find the directions where your data varies the most. You compute the covariance matrix:

$$C = \frac{1}{n}X^TX$$

Then solve the eigenvalue equation:

$$Cv = \lambda v$$

def compute_pca(X: np.ndarray, n_components: int) -> tuple[np.ndarray, np.ndarray]:
    """
    Performs PCA dimensionality reduction.
    
    Args:
        X: Data matrix of shape (n_samples, n_features)
        n_components: Number of components to keep
        
    Returns:
        eigenvalues, eigenvectors of the covariance matrix
    """
    # Center the data
    X_centered = X - np.mean(X, axis=0)
    
    # Compute covariance matrix
    cov_matrix = np.dot(X_centered.T, X_centered) / X.shape[0]
    
    # Compute eigendecomposition
    eigenvals, eigenvecs = np.linalg.eigh(cov_matrix)
    
    # Sort in descending order
    idx = np.argsort(eigenvals)[::-1]
    eigenvals = eigenvals[idx][:n_components]
    eigenvecs = eigenvecs[:, idx][:, :n_components]
    
    return eigenvals, eigenvecs

Neural Networks Are Just Matrix Math

Deep learning networks are basically just chains of linear transformations with nonlinear functions in between. Each layer does:

$$h = \sigma(Wx + b)$$

where $\sigma$ is the activation function, $W$ is the weight matrix, and $b$ is the bias vector.

def neural_network_layer(
    X: np.ndarray,
    W: np.ndarray,
    b: np.ndarray,
    activation: callable
) -> np.ndarray:
    """
    Implements a single neural network layer.
    
    Args:
        X: Input data
        W: Weight matrix
        b: Bias vector
        activation: Activation function
        
    Returns:
        Layer output after activation
    """
    Z = linear_transform(X, W, b)
    return activation(Z)

Training Is Linear Algebra Too

Even the training process relies on linear algebra. Gradient descent computes partial derivatives with respect to matrices:

$$W_{t+1} = W_t - \alpha \frac{\partial L}{\partial W_t}$$

where $L$ is the loss function and $\alpha$ is the learning rate.

Beyond the Basics

More advanced techniques like Singular Value Decomposition are crucial for things like recommendation systems. SVD factorizes a matrix $A$ as:

$$A = U\Sigma V^T$$

where $U$ and $V$ are orthogonal matrices, and $\Sigma$ contains the singular values.

def truncated_svd(X: np.ndarray, k: int) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
    """
    Performs truncated SVD for dimensionality reduction.
    
    Args:
        X: Input matrix
        k: Number of singular values to keep
        
    Returns:
        U, Sigma, V matrices of the truncated SVD
    """
    U, s, Vt = np.linalg.svd(X, full_matrices=False)
    return U[:, :k], s[:k], Vt[:k, :]

Why This Matters

Linear algebra lets you express complicated transformations using simple matrix operations. The more I work with machine learning, the more I appreciate how these mathematical foundations make everything possible. Next time you call model.fit() or torch.matmul(), remember there’s some beautiful math happening underneath that makes it all work.