2. Significant Figures
The significant figures are the digits that carry meaning to the precision of
the measurement.
Consider three measurements for the length of a table:
L1=3.2 m
L2: 3.27 m
L3: 3.270 m
Number of significant figures for L1 is two, for L2 is three, and for L3 is four.
First digit is the most significant figure, and the last digit is the least
significant digit in the measurement.
We can assign error associated with each measurement:
L1=3.2 +- 0.2 m L2: 3.27 m +- 0.01 m
L3: 3.270 +- 0.003 m
Any digit beyond the error carrying digits is meaningless.
Leading zeros are not significant. They are only used to show the location of
the decimal point. e. g. 0.00052 has only two significant digits . To avoid
confusion, scientists prefer scientific notation (e.g., 5.2x10-4).
3. Accuracy & Precision
Accuracy refers to how closely a
computed or measured value
agrees with the true value.
Inaccuracy (also called bias) is a
systematic deviation from the
truth.
Precision refers to how closely
individual computed or measured
values agree with each other.
Imprecision (also called
uncertainty) refers to the
magnitude of the scatter.
accuracy and precision are
independent from each other.
4. Error Definitions
In numerical methods both accuracy and precision is required for
a particular problem. We will use the collective term error to
represent both inaccuracy and imprecision in our predictions.
Numerical errors arise from the use of approximation to
represent exact mathematic operations or quantities. Consider
the approximation we did in the problem of falling object in air.
We observed some error between the exact (true) and numerical
solutions (approximation).
The relationship between them:
True value = approximation + error
or
Et = true value – approximation
5. Note that in this equation, we included all factors contributing
to the error. So, we used the subscript t to designate that this
is the true error).
To take into account different magnitudes in different
measurements, we prefer to normalize the error. Then, we
define the fractional relative error:
Fractional relative error = (true value-approximation)/(true value)
or the percent relative error:
t = (true value-approximation)/(true value) x 100
Most of the times, we just say “error” to mean percent relative error.
6. So, we define true error as :
(true value approximated value)
t
100 %
(true value)
In most cases we don’t have the knowledge of the “true value”,
so we define approximate error as
(approximate error )
a
100 %
(approximate value)
Approximate error can be defined in different ways depending
on the problem. For example, in iterative methods, error is
defined with respect to the previous calculation.
(current approx. previous approx.)
a
100 %
(current approx.)
7. Round-off Errors
Round-off errors result from the omission of the significant figures.
Base-10 (decimal) versus Base-2 (binary) system:
Base-10
Base-2
103 102 101 100
23 22 21 20
a b c d
a b a b
= ax103 + bx102 + cx101 + dx100
= ax23 + bx22 + ax21 + bx20
Positional
Notation
Computers knows only two numbers (on/off states). So
computers can only store numbers in binary (base-2) system.
e.g. 100101
a bit (binary digit).
1 byte= 8 bits
computer uses 6 bits to store this number.
8. Integer Representation:
First bit is used to store the sign (0 for “+” and 1 for “-”);
remaining bits are used to store the number.
In integer representation, numbers can be defined exactly but
only a limited range of numbers are allowed in a limited
memory. Also fractional quantities can not be represented.
Ex: How -3 is stored in a computer in integer representation?
Ex: Find the range of numbers that you can store in a 16-bit
computer in integer representation. (-32767 to 32767).
9. Floating Point Representation:
FPR allows a much wider range of numbers than integer
representation.
It allows storing fractional quantities.
It is similar to the scientific notation.
Mantissa (significand)
mb e
e.g.
exponent
base
0.015678 1.5678 10 2
(base-10)
Ex: Assume you have a hypotetical base-10 computer with a 5-digit word size
(one digit for sign, two for exponent with sign, two for mantissa). a) Find the
range of values that can be represented. b) Calculate the error of representing 2-5
using this representation.
10. IEEE floating point representation standards:
32-bit (single precision) word format:
64-bit (double precision) word format:
Mantissa takes only a limited number of significant digits
round-off error
Increasing the number of digits (32-bit versus 64-bit) decreases
the roundoff error.
11. Range:
In FPR there is still a limit for the representation numbers but the
range is much bigger.
In 64-bit representiaton in IEEE format:
52 digits
11 digits
Max value= +1.111…1111 x 2 +(1111111111) = 1.7977 x 10+308
Min value= 1.000…0000 x 2 -(1111111111) = 2.2251 x 10 -308
Numbers larger than the max. value cannot be represented by
the computer overflow error.
>> realmax
ans=
1.7976931e+308
Any value bigger than this is set to infinity
Numbers smaller than the min. value cannot be represented.
There is a “hole” at zero. underflow error.
>> realmin
ans=
2.22500738e-308
Any value smalller than this is set to zero
12. Precision:
52 bits used for the mantissa correspond to about 15-16 base-10
significant units.
>> pi
ans=
3.142857
>> formatlong
>> pi
ans=
3.1428571428571
32-bit representation (single precision)
64-bit representation (double precision)
Ex: Find the smallest possible value floating point number for a hypothetical
base-2 machine that stores information using 7-bits words (first bit for the sign
of the number, next three for the sign and magnitude of the exponent , and the
last three for the magnitude of the mantissa). (1x2-3)
13. Chopping versus Rounding:
Assume a computer that can store 7 significant digits:
Rounding
Chopping
error
error
4.2428576428......
4.2428576428......
4.242857
4.242858
Rounding is a better choice since the sign of error can be either
positive and negative leading to smaller total numerical error.
Whereas error in chopping is always positive and adds up.
Rounding costs an extra processing to the computer, so most
computers just chops off the number.
Error associated with rounding/chopping Quantization error
14. Machine epsilon:
As a result of quantization of numbers, there is a finite length of
interval between two numbers in floating point representation.
x
Machine epsilon (or machine precision) is the upper bound on
the relative error due to chopping/rounding in floating point
arithmetic.
x
For a 64-bit
x
representation, b=2, t=53
=2-52
The machine epsilon can be computed as =2.22044.. x 10-16
b
1t
b=number base
t= number of digits in
mantissa
>> eps
ans=
2.2204460e-16
15. Arithmetic operations:
Besides the limitations of the computer for storage of numbers,
arithmetic operations of these numbers also contribute to the
round-off error.
Consider a hypotetical base-10 computer with 4-digit mantissa and 1-digit
exponent:
1.345 + 0.03406 = 0.1345 x 101 + 0.003406 x 101 = 0.137906 x 101
in arithmetic operations numbers are converted as with same exponents
chopped-off
Ex: a) Evaluate the polynomial
y x 3 5 x 2 6 x 0.55
at x=1.73. Use 3-digit arithmetic with chopping. Evaluate the error.
b) If the function is expressed as
y x( x 5) 6x 0.55
What is the percent relative error? Compare with part a.
16. Subtractive cancellation:
Subtructive cancellation occurs when subtracting two nearly
equal number.
0.7549 x 103 - 0.7548 x 103 = 0.0001 x 103
4 S.D.
4 S.D.
Also called
loss of significance
1 S.D.
Many problems in numerical analysis are prone to subtractive
cencallation error. They can be mitigated by manipulations in the
formulation of the problem or by increasing the precision.
Consider finding the roots of a
2nd order polynomial:
b b 2 4ac
x2
2a
x1
b 4ac
2
Subtractive
cancellation
- Can use double precision, or
- Can use an alternative
formulation:
x1
x2
2c
b b 2 4ac
17. Truncation Errors
Truncation errors result from using an approximations in place of
exact mathematical representations. Remember the
approximation in the falling object in air problem:
dv v v(ti 1 ) v(ti )
dt t
ti 1 ti
Taylor theorem:
Taylor’s theorem give us insight for estimating the truncation
error in the numerical approximation.
Taylor’s theorem states that if the function f and its n+1 drivatives
are continous on an interval containing a and x, then the values of
the function at x is given by
f ( 2) (a)
f ( n ) (a)
2
f ( x) f (a) f (a)( x a)
( x a) ...
( x a) n Rn
2!
n!
'
18. exact solution
In other words, any smooth
function can be
approximated as a
polynomial of order n within
a given interval.
The error gets smaller as n
increases.
base point(a)=1
Ex: Use second order Taylor series expansion to approximate the
function
f ( x) 0.1x 4 0.15 x 3 0.5 x 2 0.25 x 1.2
at x=1 from a=0. Calculate the truncation error from this
approximation.
19. Suppose you have f(xi ) and want to evaulate f(xi+1):
f 2 ( xi ) 2
f n ( xi ) n
f ( xi 1 ) f ( xi ) f ' ( xi )h
h ...
h Rn
2!
n!
h ( xi 1 xi )
Step size
where
f n 1 ( ) n 1
Rn
h
( n 1)!
xi , xi 1
Here Rn represents the remainder (or the error) from the n-th
order approximation of the function. It provides and exact
determination of the error.
We can estimate the order of the magnitude of the error in
terms of step size (h):
Rn O (h n 1 )
we can change ‘h’ to control the magnitude of the error in the calculation!
20. Falling object in air problem:
We can evalate the truncation error for the “falling object in
air” problem. Express v(ti+1 ) in Taylor series:
v '' (ti ) 2
v n (ti ) n
v(ti 1 ) v(ti ) v (ti )h
h ...
h Rn
2!
n!
'
h (ti 1 ti )
Taylor series to n=1:
v(ti 1 ) v(ti ) v ' (ti )h R1
R1 O(h 2 )
or
v ' (ti )
v(ti 1 ) v(ti ) R1
h
h
Finite difference appr.
Truncation error
R1 O(h 2 )
O ( h)
h
h
Error
Then the error associated with
finite difference approximation
is in the order of h.
21. Error Propogation
Error propagation concerns how an error in x is propagated to the
function f(x).
x=true v.
xo= approx. v.
x x x
f ( x ) f ( x) f ( x )
Propagation
of error
Taylor expansion can be used to estimate the error propagation.
Lets evaluate f(x) near f(xo):
f ( x) f ( x ) f ' ( x)x ...
f ( x) f ( x) f ( x ) f ' ( x)x
dropping 2nd and
higher order terms
f ( x ) f ' ( xo ) x
Ex 2.3: Given a measured value of x0= 2.50.01, estimate the
resulting error in the function f(x)=x3.
22. Functions of more than one variable:
x x x
y y y
x z z
f ( x , y o , z o ,..)
Propagation of
error
...
error in
error out
Error propagation for functions of more than one variable can be
understood as the generalization of the case of functions with a
single variable:
f
f
f
f ( x , y , z ,..)
x
y
z ...
x
y
z
o
o
Ex 2.3: Open channel flow formula for a rectangular channel is given by :
1 (bh) 5 / 3
Q
n (b 2h) 2 / 3
s
(Q=flow rate , n=roughness coff.,)
(b=width, h=depth, s=slope)
Assume that b=20 m and h=0.3 m for the channel. If you know that
n=0.030±0.002 and s=0.05±0.01, what is the resulting error in calculation of Q?
23. Condition and stability:
Condition of a mathematical computation is the sensitivity to the
input values. It is a measure of how much an uncertainity is
magnified by the computation.
Condition
Number
=
Error in the output
Error in the input
f ( x) f ( x o )
f ' ( x o )( x x o )
Condition
xo f ' ( xo )
f ( xo )
f ( xo )
o
o
Number
(x x )
(x x )
f ( xo )
xo
xo
If the uncertainty in the input results in gross changes in the
output, we say that the problem is unstable or ill-conditioned.
C.N.1
C.N.>>1 (ill-conditioned)
24. Total numerical error:
Total
Numerical
Error
= Round-off
Error
+ Truncation
Error
Round-off errors can be minimized by increasing the number of
significant digits. Subtractive cancellations and number of
computations increases the roundoff-error.
Truncation errors can be reduced by decreasing the step size (h).
But this may result in subtractive cancellation error too.
So, there is a trade-off between
truncation error and round-off error in
terms of step size (h).
Note that there is no systematic and
general approach to evaluating
numerical errors for all problems.
25. Formulation Errors & Data Uncertainity
These errors are totally independant from numerical errors, and
are not directly connected to most numerical methods.
Blunders:
In other words: stupid mistakes. They can only be mitigated by
experience, or by consulting to experienced persons.
Formulation (model) errors:
Formulation (or model) errors are causes by incomplete
formulation of the mathematical model. (e.g., in “the falling object
in air problem”, not taking the effect of air fricton into account).
Data uncertainity:
If your data contain large inaccuracies or imprecisions (may be
due to problems with measurement device), this will directly
affect the quality of the results.
Statistical analyses on the data helps to minimize these errors.