Least-Squares Method

General Least-Squares Problem

Suppose [math]A[/math] is an m x n matrix and [math]b[/math] is a column vector in [math]\mathbb{R}^m[/math]. We consider the equation [math]Ax=b[/math]. It is well known that such equation has no solution if [math]b[/math] is not in [math]\text{Col} \ A[/math]. But still we would like to find a solution [math]x^*[/math] such that [math]Ax^*[/math] is [b]closest[/b] to [math]b[/math]. In other words, we need to minimize [math]\|b-Ax\|[/math] among all vectors [math]x[/math] in [math]\mathbb{R}^n[/math]. This is called the [b]general least-squares problem[/b]. The term"least-squares" comes from the fact that [math]\|b-Ax\|[/math] is the square root of a sum of squares.[br][br][u]Definition[/u]: A vector [math]x^*[/math] in [math]\mathbb{R}^n[/math] is a [b]least-squares solution[/b] of [math]Ax=b[/math] if [math]\|b-Ax^*\|\leq \|b-Ax\|[/math] for all [math]x[/math] in [math]\mathbb{R}^n[/math].[br][br]Visualizing the least-squares problem as follows: Let [math]W=\text{Col} \ A[/math], which is a subspace of [math]\mathbb{R}^m[/math]. Let [math]b[/math] be a vector not in [math]W[/math]. Thank to the [url=https://www.geogebra.org/m/rsamte2c#material/kkn64rnj]Best Approximation Theorem[/url], [math]b^*=\text{proj}_{W}b[/math] is the vector in [math]W[/math] such that [br][br][math]\|b-b^*\|\leq\|b-w\|[/math], for any [math]w[/math] in [math]W[/math].[br][br]Since [math]b^*[/math] is in [math]\text{Col} \ A[/math], the equation [math]Ax=b^*[/math] must be consistent. Let [math]x^*[/math] be a solution i.e. [math]Ax^*=b^*[/math]. Also, for any [math]w[/math] in [math]\text{Col} \ A[/math], [math]w=Ax[/math] for some [math]x[/math] in [math]\mathbb{R}^n[/math]. Then the above inequality can be rewritten as follows:[br][br][math]\|b-Ax^*\|\leq\|b-Ax\|[/math], for any [math]x[/math] in [math]\mathbb{R}^n[/math]. Therefore, [math]x^*[/math] is a least-squares solution of [math]Ax=b[/math].[br][br][br]Instead of computing [math]b^*[/math] and solving the equation [math]Ax=b^*[/math] for the least-squares solution, we have a better way to compute the least-squares solution directly.[br][br]Since [math]b^*=\text{proj}_{W}b[/math], [math]b-b^*[/math] is in [math]W^\perp[/math]. And [math]W^\perp=(\text{Col} \ A)^\perp=\text{Nul} \ A^T[/math]. Hence, we have [math]A^T(b-b^*)=0[/math]. Let [math]x^*[/math] be a least-squares solution i.e. [math]Ax^*=b^*[/math], then[br][br][math]A^T(b-b^*)=0\Rightarrow A^Tb-A^Tb^*=0\Rightarrow A^TAx^*=A^Tb[/math][br][br]That is to say, a least-squares solution must satisfy the equation [math]A^TAx=A^Tb[/math]. It is called the [b]normal equations[/b] for [math]Ax=b[/math].[br][br]Conversely, given any solution [math]x^*[/math] to the normal equations [math]A^TAx=A^Tb[/math], we have [math]A^T(b-Ax^*)=0[/math] i.e. [math]b-Ax^*[/math] is in [math]\text{Nul} \ A^T[/math] and [math]\text{Nul} \ A^T=(\text{Col} \ A)^\perp=W^\perp[/math]. Hence, [math]b=Ax^*+(b-Ax^*)[/math] is the orthogonal decomposition such that [math]Ax^*[/math] is in [math]W[/math] and [math]b-Ax^*[/math] is in [math]W^\perp[/math]. By the uniqueness of orthogonal decomposition, [math]Ax^*=\text{proj}_{W}b=b^*[/math] i.e. [math]x^* [/math] is a least-squares solution.[br][br]In short, we can solve the normal equations to find all the least-squares solution(s).

Exercise

Find a least-squares solution of the inconsistent linear system [math]\begin{pmatrix}3&-1\\1&2\\2&1\end{pmatrix}\begin{pmatrix}x\\y\end{pmatrix}=\begin{pmatrix}4\\0\\1\end{pmatrix}[/math].

An Application - Linear Regression

Suppose we have [math]n[/math] pairs of experimental data for the two variable quantities [math]x[/math] and [math]y[/math]: [math](x_1,y_1),(x_2,y_2),\ldots,(x_n,y_n)[/math]. And we assume that [math]y[/math] is theoratically related to [math]x[/math] by a linear equation [math]y=a+bx[/math], where [math]a[/math] and [math]b[/math] are unknown parameters. [br][br][u]Question[/u]: How can we find the parameters [math]a[/math] and [math]b[/math] such that the line [math]y=a+bx[/math] best fits the experimental data?[br][br]Given the parameter [math]a[/math] and [math]b[/math], we compute the [b]predicted y-value[/b] of [math]x_j[/math] by the line [math]y=a+bx[/math]: [math]a+bx_j[/math] for [math]j=1,2,\ldots,n[/math]. We express them in terms of matrices, we have[br][br][math]\begin{pmatrix}1&x_1\\1&x_2\\ \vdots&\vdots \\1&x_n\end{pmatrix}\begin{pmatrix}a\\b\end{pmatrix}=\begin{pmatrix}a+bx_1\\a+bx_2\\ \vdots \\a+bx_n\end{pmatrix}[/math][br][br]Let [math]X=\begin{pmatrix}1&x_1\\1&x_2\\ \vdots&\vdots \\1&x_n\end{pmatrix}[/math], [math]\beta=\begin{pmatrix}a\\b\end{pmatrix}[/math] and [math]y=\begin{pmatrix}y_1\\y_2\\ \vdots \\y_n\end{pmatrix}[/math]. [math]X, \beta[/math], and [math]y[/math] are called the [b]design matrix[/b], [b]parameter vector[/b], and [b]observation vector[/b] respectively. Therefore, each entry of the vector [math]y-X\beta[/math] is the difference between an observed y-value and a predicted y-value, which is called a [b]residual[/b]. The usual way to find the "best-fitted" line is to minimize the sum of the squares of the residuals. Such a best-fitted line is called a [b]line of regression of [math]y[/math] on [math]x[/math][/b]. And the parameters [math]a[/math] and [math]b[/math] are called [b](linear) regression coefficients[/b]. Equivalently, it means finding the least-squares solution to the equation [math]X\beta=y[/math] ![br][br]In other words, we can find the regression coefficients by solving the normal equations [math]X^TX\beta=X^Ty[/math]:[br][br][math]\begin{pmatrix}1&1&\cdots&1\\x_1&x_2&\cdots&x_n\end{pmatrix}\begin{pmatrix}1&x_1\\1&x_2\\ \vdots&\vdots \\1&x_n\end{pmatrix}\begin{pmatrix}a\\b\end{pmatrix}=\begin{pmatrix}1&1&\cdots&1\\x_1&x_2&\cdots&x_n\end{pmatrix}\begin{pmatrix}y_1\\y_2\\ \vdots \\y_n\end{pmatrix}[/math][br][br][math]\Rightarrow \begin{pmatrix}n&\sum x_i\\ \sum x_i& \sum x_i^2\end{pmatrix}\begin{pmatrix}a\\b\end{pmatrix}=\begin{pmatrix}\sum y_i\\ \sum x_iy_i\end{pmatrix}[/math][br][br]Assume [math]n\sum x_i^2-\left(\sum x_i\right)^2\ne 0[/math], the 2 x 2 matrix is invertible and the least-squares solution can be evaluated. The formulas for [math]a[/math] and [math]b[/math] are as follows:[br][br][math]\begin{eqnarray}a&=&\frac{(\sum y_i)(\sum x_i^2)-(\sum x_i)(\sum x_iy_i)}{n(\sum x_i^2)-(\sum x_i)^2}\\b&=&\frac{n(\sum x_iy_i)-(\sum x_i)(\sum y_i)}{n(\sum x_i^2)-(\sum x_i)^2}\end{eqnarray}[/math][br][br][br][br][u]Remark[/u]: [math]n\sum x_i^2-\left(\sum x_i\right)^2=n^2\left[\frac{\sum x_i^2}n-\left(\frac{\sum x_i}n\right)^2\right]=n^2\sigma_x^2\ [/math], where [math]\sigma_x^2[/math] is the variance of [math]x[/math] in the data. That is to say, the 2 x 2 matrix is invertible when not all [math]x_i[/math] are equal.[br]

Least-Squares Method

General Least-Squares Problem

Exercise

An Application - Linear Regression

Example of Linear Regression

Information: Least-Squares Method