Suppose
is an m x n matrix and
is a column vector in
. We consider the equation
. It is well known that such equation has no solution if
is not in
. But still we would like to find a solution
such that
is
closest to
. In other words, we need to minimize
among all vectors
in
. This is called the
general least-squares problem. The term"least-squares" comes from the fact that
is the square root of a sum of squares.
Definition: A vector
in
is a
least-squares solution of
if
for all
in
.
Visualizing the least-squares problem as follows: Let
, which is a subspace of
. Let
be a vector not in
. Thank to the
Best Approximation Theorem,
is the vector in
such that
, for any
in
.
Since
is in
, the equation
must be consistent. Let
be a solution i.e.
. Also, for any
in
,
for some
in
. Then the above inequality can be rewritten as follows:
, for any
in
. Therefore,
is a least-squares solution of
.
Instead of computing
and solving the equation
for the least-squares solution, we have a better way to compute the least-squares solution directly.
Since
,
is in
. And
. Hence, we have
. Let
be a least-squares solution i.e.
, then
That is to say, a least-squares solution must satisfy the equation
. It is called the
normal equations for
.
Conversely, given any solution
to the normal equations
, we have
i.e.
is in
and
. Hence,
is the orthogonal decomposition such that
is in
and
is in
. By the uniqueness of orthogonal decomposition,
i.e.
is a least-squares solution.
In short, we can solve the normal equations to find all the least-squares solution(s).