Skip to main content

Section 3.2 Least Squares Fitting: Noisy Data

Suppose we have \(n\) data points \(\{(x_{i},y_{i})\}_{i=1}^{n}\text{.}\) In the previous section we learned about interpolation, where the fitting function \(f(x)\) is required to satisfy \(y_{i}=f(x_{i})\) for \(i=1,\dots,n\text{.}\)Very often data has a significant amount of noise, so it makes more sense to only ask for \(y_{i}\approx f(x_{i})\text{.}\) The least squares fitting method produces such an \(f\text{.}\) The next illustration shows the effects noise can have and how least squares is used.

Subsection 3.2.1 Traffic flow model

Suppose you are interested in the time it takes to travel on a certain section of highway for the sake of planning. According to theory, assuming up to a moderate amount of traffic, the time should be approximately
\begin{equation*} T(x) = ax + b \end{equation*}
where \(b\) is the travel time when there is no other traffic and \(x\) is the current number of cars on the road (in hundreds). To determine the coefficients \(a\) and \(b\) you could run several experiments which consist of driving the highway at different times of day and also estimating the number of cars on the road using a counter. Of course both of these measurements will contain noise, i.e. random fluctuations.
We could simulate such data in MATLAB as follows:
>> x = 1:.1:6;
>> T = .1*x + 1;
>> Tn = T + .1*randn(size(x));
>> plot(x,Tn,'.')
The data should look like it lies on a line, but with noise. Click on the Tools button and choose Basic fitting. Then choose a linear fit. The resulting line should go through the data in what looks like a very reasonable way. Click on show equations. Compare the equation with \(T(x) = .1 x + 1\text{.}\) The coefficients should be pretty close considering the amount of noise in the plot. Next, try to fit the data with a spline. The result should be ugly. We can see from this example that splines are not suited to noisy data.
How does MATLAB obtain a very nice line to approximate noisy data? The answer is a very standard numerical/statistical method known as least squares.

Subsection 3.2.2 Least squares

The least squares method requires you to first select what type of \(f\) to consider (sometimes called a model) and what parameters it depends on. For example, you might choose
\begin{equation*} f(x) = a_{1} \exp({a_2 x})+a_{0} \quad\text{or}\quad f(x) = a_{2} x^{2} + a_{1} x +a_{0}\text{.} \end{equation*}
Given data \(\{(x_{i},y_{i})\}_{i=1}^{n}\text{,}\) we can then determine the error at each point as \(e_{i} = y_{i} - f(x_{i})\text{.}\) To make \(f\) reasonable, we wish to simultaneously minimize all the errors \(\{e_{1},e_{2}, \ldots, e_{n}\}\text{.}\) There are many possible ways one could go about this, but the standard one is to try to minimize the sum of the squares of the errors. That is, we denote by \(\mathcal{E}\) the sum of the squares
\begin{equation*} \mathcal{E}= \sum_{i=1}^{n} e_{i}^{2} = \sum_{i=1}^{n} \left( y_{i} - f(x_{i}) \right)^{2}\text{.} \end{equation*}
Since \(f\) depends on some parameters, \(\mathcal{E}\) is a function of those parameters. To minimize \(\mathcal{E}\text{,}\) we can compute the partial derivatives of it with respect to each of the parameters, set the results equal to zero, and solve the resulting system of equations.

Subsubsection 3.2.2.1 Linear least squares

If we choose \(f\) to depend linearly on the parameters, then the resulting system of equations will be a linear system, which is easy to solve. This does not mean we have to choose \(f\) to be a line. For example, the function \(f(x) = a_{2} x^{2} + a_{1} x +a_{0}\) is a quadratic in \(x\text{,}\) but depends linearly on each of \(a_{2}\text{,}\) \(a_{1}\text{,}\) and \(a_{0}\text{.}\) To illustrate the linear least squares process, we will go through it using this quadratic \(f\text{.}\)
The error is
\begin{equation*} \mathcal{E}(a_{0},a_{1},a_{2}) =\sum_{i=1}^{n} \left( y_{i} - (a_{2} x_{i}^{2} + a_{1} x_{i} +a_{0}) \right)^{2}\text{,} \end{equation*}
which has partial derivatives
\begin{align*} \frac{ \partial \mathcal{E}}{\partial a_{0}} \amp = -2 \sum_{i=1}^{n} \left( y_{i} - (a_{2} x_{i}^{2} + a_{1} x_{i} +a_{0}) \right)\,, \\ \frac{ \partial \mathcal{E}}{\partial a_{1}} \amp = -2 \sum_{i=1}^{n} \left( y_{i} - (a_{2} x_{i}^{2} + a_{1} x_{i} +a_{0}) \right)x_{i}\,,\quad\text{and} \\ \frac{ \partial \mathcal{E}}{\partial a_{2}}\amp = -2 \sum_{i=1}^{n} \left( y_{i} - (a_{2} x_{i}^{2} + a_{1} x_{i} +a_{0}) \right)x_{i}^{2} \text{.} \end{align*}
Setting these equal to zero and rearranging yields the linear system
\begin{equation*} \begin{pmatrix}\displaystyle\sum_{i=1}^{n} 1&\displaystyle\sum_{i=1}^{n} x_{i}&\displaystyle\sum_{i=1}^{n} x_{i}^{2}\\ \displaystyle\sum_{i=1}^{n} x_{i}&\displaystyle\sum_{i=1}^{n} x_{i}^{2}&\displaystyle\sum_{i=1}^{n} x_{i}^{3}\\ \displaystyle\sum_{i=1}^{n} x_{i}^{2}&\displaystyle\sum_{i=1}^{n} x_{i}^{3}&\displaystyle\sum_{i=1}^{n} x_{i}^{4}\\\end{pmatrix} \begin{pmatrix}a_{0} \\ a_{1} \\ a_{2}\end{pmatrix} = \begin{pmatrix}\displaystyle\sum_{i=1}^{n} y_{i} \\ \displaystyle\sum_{i=1}^{n} y_{i}x_{i} \\ \displaystyle\sum_{i=1}^{n} y_{i} x_{i}^{2}\end{pmatrix}\text{.} \end{equation*}
The entries in the matrix are determined from simple formulas using the data. Since this is a linear system, it is easily solved. The process is quick and easily automated, which is one reason it is very standard. MATLAB’s basic fitting tool uses this process to obtain a \(d\) degree polynomial fit whenever the number of data points is more than \(d+1\text{.}\)

Subsection 3.2.3 Drag coefficients

Drag due to air resistance is proportional to the square of the velocity, i.e. \(d= kv^{2}\text{.}\) In a wind tunnel experiment the velocity \(v\) can be varied by setting the speed of the fan and the drag can be measured directly (it is the force on the object). In this and every experiment some random noise will occur. The following sequence of commands replicates the data one might receive from a wind tunnel:
>> v = 0:1:60;
>> d = .1234*v.^2;
>> dn = d + .4*v.*randn(size(v));
>> plot(v,dn,'*')
The plot should look like a quadratic, but with some noise. Using the tools menu, add a quadratic fit and enable the “show equations” option. What is the coefficient of \(x^{2}\text{?}\) How close is it to \(0.1234\text{?}\)
Note that whenever you select a polynomial in MATLAB with a degree less than \(n-1\) MATLAB will produce a least squares fit.
You will notice that the quadratic fit includes both a constant and linear term. We know from the physical situation that these should not be there; they are remnants of noise and the fitting process. Since we know the curve should be \(k v^{2}\text{,}\) we can do better by employing that knowledge. For instance, we know that the graph of \(d\) versus \(v^{2}\) should be a straight line. We can produce this easily:
>> z = v.^2;
>> plot(z,dn,'*')
By changing the independent variable from \(v\) to \(z=v^{2}\) we produced a plot that looks like a line with noise. Add a linear fit. What is the linear coefficient? This should be closer to \(0.1234\) than using a quadratic fit.
The second fit still has a constant term, which we know should not be there. If there was no noise, then at every data point we would have \(k = d/v^{2}\text{.}\) We can express this as a linear system z'*k = dn', which is badly overdetermined since there are more equations than unknowns. Since there is noise, each point will give a different estimate for \(k\text{.}\) In other words, the overdetermined linear system is also inconsistent. When MATLAB encounters such systems, it automatically gives a least squares solution of the matrix problem, i.e. one that minimizes the sum of the squared errors, which is exactly what we want. To get the least squares estimate for \(k\text{,}\) do
>> k = z''
This will produce a number close to .1234.
Note that this is an application where we have physical knowledge. In this situation extrapolation would be meaningful. For instance we could use the plot to find the predicted drag at 80 mph.

Exercises 3.2.4 Exercises

1.

Find two tables of data in an engineering textbook or engineering website. Plot each (with '*' at data points) and decide if the data is best represented by a polynomial interpolant, spline interpolant, or least squares fit polynomial. Label the axes and include a title. Turn in the best plot of each. Indicate the source and meaning of the data.

2.

The following table contains information from a chemistry experiment in which the concentration of a product was measured at one minute intervals:
Time 0 1 2 3 4 5 6 7
Concentration 3.033 3.306 3.672 3.929 4.123 4.282 4.399 4.527
Plot this data. Suppose it is known that this chemical reaction should follow the law: \(c = a - b \exp(-0.2 t)\text{.}\) Following the example in the notes about the drag coefficients, change one of the variables so that the law is a linear function. Then plot the new variables and use the linear fit option to estimate \(a\) and \(b\text{.}\) What will be the eventual concentration? Finally, plot the graph of \(a - b \exp(-0.2 t)\) on the interval [0,15] (as a solid curve), along with the data from the table (using '*').