Skip to main content

Section 3.10 The Main Sources of Error

Subsection 3.10.1 Truncation Error

Truncation error is defined as the error caused directly by an approximation method. For instance, all numerical integration methods are approximations and so there is error, even if the calculations are performed exactly. Numerical differentiation also has a truncation error, as will the differential equations methods we will study in Part IV, which are based on numerical differentiation formulas. There are two ways to minimize truncation error: (1) use a higher order method, and (2) use a finer grid so that points are closer together. Unless the grid is very small, truncation errors are usually much larger than roundoff errors. The obvious tradeoff is that a smaller grid requires more calculations, which in turn produces more roundoff errors and requires more running time.

Subsection 3.10.2 Roundoff Error

Roundoff error always occurs when a finite number of digits are recorded after an operation. Fortunately, this error is extremely small. The standard measure of how small is called machine epsilon. It is defined as the smallest number that can be added to \(1\) to produce another number on the machine, i.e. if a smaller number is added the result will be rounded down to \(1\text{.}\) In IEEE standard double precision (used by MATLAB and most serious software), machine epsilon is \(2^{-52}\) or about \(2.2 \times 10^{-16}\text{.}\) A different, but equivalent, way of thinking about this is that the machine records \(52\) floating binary digits or about \(15\) floating decimal digits. Thus there are never more than \(15\) significant digits in any calculation. This of course is more than adequate for any application. However, there are ways in which this very small error can cause problems.
You can test that machine epsilon is \(2^{-52}\text{:}\)
>> format long
>> (1 + 2^(-52)) - 1
>> (1 + 2^(-53)) - 1
MATLAB has a command that produces machine epsilon:
>> eps
To see an unexpected occurence of round-off try
>> (2^52+1) - 2^52
>> (2^53+1) - 2^53
Thus roundoff isn’t always small! It is just small compared with the scale of the numbers you are calculating. A number of magnitude \(10^{p}\) will have roundoff of magnitude about \(10^{p}\cdot 10^{-16}=10^{p-16}\text{.}\)

Subsection 3.10.3 Loss of Precision (also called Loss of Significance)

Suppose we had some way to compute \(\pi\) that effectively did the calculation \((e\cdot10^{9}+\pi)-e\cdot10^{9}\text{.}\) Rounding everything to 16 digits, we are computing
\begin{equation*} (2718281828.459045 +3.141592653589793) -2718281828.459045\text{.} \end{equation*}
The addition is performed first and the result rounded to 16 digits, giving
\begin{equation*} 2718281831.600637 -2718281828.459045 \text{.} \end{equation*}
Although roundoff caused some error, it is about a factor of \(10^{-16}\) smaller than the numbers shown. In other words, all but perhaps the last digit shown is correct. Next the subtraction is performed, giving
\begin{equation*} 00 00 00 00 03.141592 \text{.} \end{equation*}
The leading zeros are not significant, so we lost 9 significant digits and have only 7 left. This type of error, where common leading significant digits cancel, is loss-of-precision (also called loss-of-significance) error. Computers will not display these leading zeros. Instead, the subtraction above yielded
\begin{equation*} 3.141592025756836\text{.} \end{equation*}
Although it is displayed with 16 digits, only the first 7 are correct.Usually, if you add two numbers of magnitude \(10^{p}\) each with roundoff error \(10^{p-16}\text{,}\) then the result is also of magnitude \(10^{p}\) and the relative error due to roundoff is \(10^{p-16}/10^{p}=10^{-16}\text{.}\) In this example, cancellation made the result of magnitude \(10^{p-q}\text{,}\) so the relative error due to roundoff is \(10^{p-16}/10^{p-q}=10^{q-16}\text{,}\) and so we lost \(q=9\) digits of precision.
This type of loss of precisioncan happen by accident, with catastrophic results, if you are not careful. For example in \(f'(x)\approx (f(x+h)-f(x))/h\) you will lose precision when \(h\) gets too small. Try
format long 
format compact 
f = @(x) x^2
for i = 1:30 
  h = 10^(-i) 
  df = (f(1+h)-f(1))/h 
  relerr = (2-df)/2 
end
At first the relative error decreases since truncation error is reduced. Then loss of precision takes over and the relative error increases to 1. This happens because when \(f(1)\) and \(f(1+h)\) become close, the subtraction ``cancels’’ digits.

Subsection 3.10.4 Bad Conditioning

We encountered bad conditioning in Part II, when we talked about solving linear systems. Bad conditioning means that the problem is unstable in the sense that small input errors can produce large output errors. This can be a problem in a couple of ways. First, the measurements used to get the inputs cannot be completely accurate. Second, the computations along the way have roundoff errors. Errors in the computations near the beginning especially can be magnified by a factor close to the condition number of the matrix. Thus what was a very small problem with roundoff can become a very big problem.
It turns out that matrix equations are not the only place where condition numbers occur. In any problem one can define the condition number as the maximum ratio of the relative errors in the output versus input, i.e.
\begin{equation*} \textrm{condition \# of a problem}= \max \left( \frac{\text{Relative error of output}}{\text{Relative error of inputs}}\right)\text{.} \end{equation*}
An easy example is solving a simple equation
\begin{equation*} f(x) = 0\text{.} \end{equation*}
Suppose that \(f'\) is close to zero at the solution \(x^{*}\text{.}\) Then a very small change in \(f\) (caused perhaps by an inaccurate measurement of some of the coefficients in \(f\)) can cause a large change in \(x^{*}\text{.}\) It can be shown that the condition number of this problem is \(1/f'(x^{*})\text{.}\)

Subsection 3.10.5 Summary

Table 3.10.1. A summary of the main sources of error.
Error type: Whose fault is it? How to mitigate it?
Truncation the method higher-order method or finer grid
Round-off the computer usually okay, higher precision arithmetic
Loss of Precision the programmer avoid cancellation of significant digits
Bad Conditioning the problem check answers, redesign if possible

Exercises 3.10.6 Exercises

1.

Identify the (main) source of error in each case and propose a way to reduce this error if possible.
  1. If we do \((\sqrt{3})^{2}\) we should get 3, but if we check
    >> mythree = (sqrt(3))^2
    >> mythree-3
    
    we find the error is ans = -4.4409e-16.
  2. Since it is a quarter of a circle of radius 2, we should have \(\int_{0}^{2} \sqrt{4-x^{2}}dx=\frac{1}{4}\pi2^{2}=\pi\text{.}\) We try to use mytrap from Section 3.3 and do
    >> x = 0:.2:2;
    >> y = sqrt(4-x.^2);
    >> mypi = mytrap(x,y)
    >> mypi-pi
    
    and find the error is ans = -0.0371.

2.

 1 
From Numerical Linear Algebra by L. Trefethen and D. Bau, SIAM, 1997.
The function \(f(x)=(x-2)^{9}\) could be evaluated as written, or first expanded as \(f(x)=x^{9}-18x^{8}+\cdots\) and then evaluated. To find the expanded version, type
>> syms x
>> expand((x-2)^9)
>> clear
To evaluate it without expansion, type
>> f1 = @(x) (x-2).^9
>> x = 1.92:.001:2.08;
>> y1 = f1(x);
>> plot(x,y1,'blue')
To do it with expansion, convert the symbolic expansion above to an anonymous function f2 and then type
>> y2 = f2(x);
>> hold on
>> plot(x,y2,'red')
Carefully study the resulting graphs. Should the graphs be the same? Which is more correct? MATLAB does calculations using approximately 16 decimal places. What is the largest error in the graph, and how big is it relative to \(10^{-16}\text{?}\) Which source of error is causing this problem?