Ordinary Least Squares¶
Suggested Prerequisites¶
Loss Function and Optimization Problem¶
The Ordinary Least Squares (OLS) loss function is simply the sum of squared error (SSE) error term:
Using this function to formulate a least-squares optimization problem yields:
The \(\frac{1}{2n}\) term is added in order to simplify gradient solving (\(\frac{1}{2}\)) and allow objective function convergence to the expected value of model error by the Law of Large Numbers (\(\frac{1}{n}\)).
Model Estimator¶
By setting the gradient of the loss function equal to zero and solving for the coefficient vector, \( \hat{\mathbf{ \beta }} \), the OLS estimator is found:
Proving Uniqueness of the Estimator¶
The OLS estimator can be shown be unique by convexity as for any convex function will have a unique global minimum. The second-order convexity conditions state that a function is convex if it continuous, twice differentiable, and has an associated Hessian matrix that is positive semi-definite.
The OLS loss function satisfies the first two conditions due to its quadratic nature. The OLS Hessian matrix can be found as:
This Hessian can be shown to be positive semi-definite as:
Thus, by second-order convexity conditions, the OLS loss function is convex implying that the OLS estimator is the unique global minimizer to the OLS problem [2][1].
Implementation¶
Sources¶
- 1
Uc berkeley fall 2020 cs189 (introduction to machine learning) note 2. Sep 2020. URL: https://www.eecs189.org/static/notes/n2.pdf.
- 2
Anil Aswani. Ieor 165 – engineering statistics, quality control, and forecasting lecture notes 3. Jan 2021. URL: http://courses.ieor.berkeley.edu/ieor165/lecture_notes/ieor165_lec3.pdf.
Contributions made by our wonderful GitHub Contributors: @wyattowalsh