线性回归：从数学原理到代码实现

Posted on 五月 12, 2019

🎙️ 语音朗读当前: 晓晓 (温柔女声)

线性回归：从数学原理到代码实现

线性回归是最基础的机器学习算法，也是理解更复杂模型的基石。本文将从数学原理出发，逐步实现线性回归。

数学原理

线性回归假设目标变量与特征之间存在线性关系：

$$\hat{y} = w_1x_1 + w_2x_2 + \cdots + w_nx_n + b = \mathbf{w}^T\mathbf{x} + b$$

目标是最小化均方误差：

$$L(\mathbf{w}, b) = \frac{1}{2n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2$$

正规方程解

对于线性回归，存在解析解：

$$\mathbf{w}^* = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}$$

import numpy as np

class LinearRegressionNormal:
    """使用正规方程的线性回归"""

    def fit(self, X, y):
        # 添加偏置项
        X_b = np.c_[np.ones((X.shape[0], 1)), X]
        # 正规方程
        self.theta = np.linalg.inv(X_b.T @ X_b) @ X_b.T @ y
        return self

    def predict(self, X):
        X_b = np.c_[np.ones((X.shape[0], 1)), X]
        return X_b @ self.theta

# 使用示例
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

model = LinearRegressionNormal()
model.fit(X, y)
print(f"截距: {model.theta[0]:.4f}, 斜率: {model.theta[1]:.4f}")

梯度下降实现

class LinearRegressionGD:
    """使用梯度下降的线性回归"""

    def __init__(self, lr=0.01, n_iters=1000):
        self.lr = lr
        self.n_iters = n_iters
        self.weights = None
        self.bias = None
        self.losses = []

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0

        for i in range(self.n_iters):
            y_pred = np.dot(X, self.weights) + self.bias
            error = y_pred - y

            dw = (1 / n_samples) * np.dot(X.T, error)
            db = (1 / n_samples) * np.sum(error)

            self.weights -= self.lr * dw
            self.bias -= self.lr * db

            loss = np.mean(error ** 2) / 2
            self.losses.append(loss)

        return self

    def predict(self, X):
        return np.dot(X, self.weights) + self.bias

多项式回归

线性回归可以扩展为多项式回归来拟合非线性关系：

from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline

poly_reg = Pipeline([
    ('poly', PolynomialFeatures(degree=2)),
    ('linear', LinearRegressionGD(lr=0.01, n_iters=1000))
])

正则化线性回归

Ridge回归（L2正则化）

$$L_{Ridge} = \frac{1}{2n}\sum(y_i - \hat{y}_i)^2 + \lambda|\mathbf{w}|_2^2$$

class RidgeRegression:
    def __init__(self, alpha=1.0, lr=0.01, n_iters=1000):
        self.alpha = alpha
        self.lr = lr
        self.n_iters = n_iters

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0

        for _ in range(self.n_iters):
            y_pred = np.dot(X, self.weights) + self.bias
            error = y_pred - y

            dw = (1 / n_samples) * (np.dot(X.T, error) + self.alpha * self.weights)
            db = (1 / n_samples) * np.sum(error)

            self.weights -= self.lr * dw
            self.bias -= self.lr * db
        return self

    def predict(self, X):
        return np.dot(X, self.weights) + self.bias

Lasso回归（L1正则化）

L1正则化可以产生稀疏解，用于特征选择：

class LassoRegression:
    def __init__(self, alpha=1.0, lr=0.01, n_iters=1000):
        self.alpha = alpha
        self.lr = lr
        self.n_iters = n_iters

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0

        for _ in range(self.n_iters):
            y_pred = np.dot(X, self.weights) + self.bias
            error = y_pred - y

            dw = (1 / n_samples) * np.dot(X.T, error) + self.alpha * np.sign(self.weights)
            db = (1 / n_samples) * np.sum(error)

            self.weights -= self.lr * dw
            self.bias -= self.lr * db
        return self

模型评估

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LinearRegressionGD(lr=0.01, n_iters=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print(f"MSE: {mean_squared_error(y_test, y_pred):.4f}")
print(f"R²: {r2_score(y_test, y_pred):.4f}")

总结

线性回归虽然简单，但包含了机器学习的核心概念：损失函数、优化方法、正则化和模型评估。正规方程提供了精确解，梯度下降适用于大规模数据。Ridge和Lasso正则化分别解决过拟合和特征选择问题，是实际应用中不可或缺的工具。