Ordinary Least Squares


Suggested Prerequisites


Loss Function and Optimization Problem

The Ordinary Least Squares (OLS) loss function is simply the sum of squared error (SSE) error term:

\[ L(\mathbf{\beta}) = \|\mathbf{y} - \hat{\mathbf{y}}\|_2^2 = \|\mathbf{y} - \mathbf{X}\mathbf{\beta}\|_2^2 \]

Using this function to formulate a least-squares optimization problem yields:

\[ \hat{\mathbf{\beta}} = \arg\min_{\mathbf{\beta}} L(\mathbf{\beta}) = \arg\min_{\mathbf{\beta}} \frac{1}{2n} \|\mathbf{y}-\mathbf{X}\mathbf{\beta} \|_{2}^{2} \]

The \(\frac{1}{2n}\) term is added in order to simplify gradient solving (\(\frac{1}{2}\)) and allow objective function convergence to the expected value of model error by the Law of Large Numbers (\(\frac{1}{n}\)).

Model Estimator

By setting the gradient of the loss function equal to zero and solving for the coefficient vector, \( \hat{\mathbf{ \beta }} \), the OLS estimator is found:

\[ \hat{\mathbf{\beta}} = (\mathbf{X}^\mathbf{T}\mathbf{X})^{-1}(\mathbf{X}^\mathbf{T}\mathbf{y}) \]

Proving Uniqueness of the Estimator

The OLS estimator can be shown be unique by convexity as for any convex function will have a unique global minimum. The second-order convexity conditions state that a function is convex if it continuous, twice differentiable, and has an associated Hessian matrix that is positive semi-definite.

The OLS loss function satisfies the first two conditions due to its quadratic nature. The OLS Hessian matrix can be found as:

\[ \mathbf{H} = 2\mathbf{X}^\mathbf{T}\mathbf{X} \]

This Hessian can be shown to be positive semi-definite as:

\[ \mathbf{\beta}^\mathbf{T} (2\mathbf{X}^\mathbf{T}\mathbf{X}) \beta = 2(\mathbf{X}\beta)^\mathbf{T} \mathbf{X}\beta = 2 \|\mathbf{X}\mathbf{\beta}\|_2^2 \succeq 0 \: \: \: \forall \: \: \: \mathbf{\beta} \]

Thus, by second-order convexity conditions, the OLS loss function is convex implying that the OLS estimator is the unique global minimizer to the OLS problem [2][1].


Implementation

Sources

1

Uc berkeley fall 2020 cs189 (introduction to machine learning) note 2. Sep 2020. URL: https://www.eecs189.org/static/notes/n2.pdf.

2

Anil Aswani. Ieor 165 – engineering statistics, quality control, and forecasting lecture notes 3. Jan 2021. URL: http://courses.ieor.berkeley.edu/ieor165/lecture_notes/ieor165_lec3.pdf.


Contributions made by our wonderful GitHub Contributors: @wyattowalsh