matlab-CSCC43 - Father Essays

CSCC43 – Summary 2 FLOATING POINT ARITHMETIC 1 What is Numerical Analysis Physical Systems are typically transformed into Mathematical Models by engineers, chemists, physi- cists… Mathematical Models can then have a closed form solution (rarely!), however most times there will only be a numerical solution which requires Numerical Analysis 2 Floating Point Arithmetic 2.1 Representation of non-negative integers Decimal System (Base 10): 350 = (350)10 = 3 · 102 + 5 · 101 + 0 · 100 Binary System (Base 2): 350 = (101011110)2 = 1 · 28 + 0 · 27 + · · ·+ 1 · 21 + 0 · 20 In general, we have Base b System, for b > 0, b 2 N x = (dndn1 . . . d0)b = dn · bn + dn1 + · · ·+ d0 · d0 where x 0, x 2 N, 0 di < b, i 2 [1, n] Hexadecimal System (Base 16): Symbols 0, 1, . . . , 8, 9, A,B,C,D,E, F (8FA)16 = 8 · 162 + F · 161 +A · 160 = 8 · 162 + 15 · 161 + 10 · 160 = (2298)10 Converting decimal to base b Example, (350)10 in base b = 2 Numerator Denominator Quotient Remainder 350 2 175 0 175 2 87 1 87 2 43 1 43 2 21 1 21 2 10 1 10 2 5 0 5 2 2 1 2 2 1 0 1 2 0 1 Once we hit 0 in the Quotients column, we simply read the Remainder column from bottom up and we have our conversion. Thus (350)10 = (101011110)2. This algorithm is safe (will always terminate). Where overflow is the only potential problem. 2.2 Representation of real numbers if x 2 R then x = ±(xI .xF )b = ±(dndn1 . . . d0.d1d2 . . . )b Where xI is the integral part and xF is the fractional part. The sign (+ or ) is a single bit 0 or 1 xI is the non-negative integer talked about above, and xF can have infinite digits. Example: (0.77 . . . )10 = (0.7ˉ)10 = 7 · 101 + 7 · 102 + · · · Binary System (Base 2): (0.77 . . . )10 = (.000110011 . . . )2 = (0.0 ˉ0011)2 = 0·21+0·22+0·23+1·24+· · · In general, we have Base b System, for b > 0, b 2 N xF = (.d1d2d3 . . . )b = d1 · b1 + d2 · b2 + · · · = 1X i=1 di · bi We say that a Binary Fraction with n digits is terminating if it does not have any cycling and is finite, furthermore it has a terminating decimal representation. Page 1 CSCC43 – Summary 2 FLOATING POINT ARITHMETIC Converting Decimal Fractions to base b Example, (0.625)10 in base b = 2 Multiplier Base Product Integral Fraction 0.625 2 1.25 1 0.25 .25 2 0.5 0 0.5 0.5 2 1 1 0 Once we hit 0 in the fractions column, we simply read the Integral column from top down and we have our conversions. Thus (0.625)10 = (.101)2 However you may not always hit 0 as this algorithm is not safe (won’t always terminate)! For example: What is (0.1)10 in binary Multiplier Base Product Integral Fraction 0.1 2 0.2 0 0.2 0.2 2 0.4 0 0.4 0.4 2 0.8 0 0.8 0.8 2 1.6 1 0.6 0.6 2 1.2 1 0.2 We see that 0.2 occurs again as a fraction. This means that there must be a cycle! Thus (0.1)10 = (0.0 ˉ0011)2 2.3 Machine Representation of Reals Reals represented in computers as Floating Point numbers (FP for short). A FP x in base b has the form: x = (F )b ·b(e)b , where F is the fraction and has the form F = ±(d1, d2 · · · dt)b such F is also called the mantissa. and e = ±(cscs1 . . . c1)b is called the exponent A FP number is normalized if d1 6= 0 unless d1 = d2 = · · · = dt = 0 Significant Digits: (Of a nonzero FP) are the digits following and including the first nonzero digit. In the case of a normalized FP, all digits of the mantissa is significant (storing useful information). Absolute value of mantissa is always 0 and < 1 Exponent is limited: e 2 [M,M ] where M = (asas1 · · · a1)b with each ai = (b 1) Largest FP number in absolute value defined by this system is (.aa . . . a)b · b(aa...a)b where each a = (b 1) Smallest nonzero normalized FP number in absolute value is (0.10 . . . 0)b · b(aa...a)b , where each a = (b 1) Non-normalized FPs allow us to get very very close to 0, consider (.00 . . . 01)b · b(aa...a)b Rb(t, s) denotes the set of all floating point numbers with t digit mantissa and s digit exponent. And any Rb(t, s) is finite, while R is infinite. Overflow orUnderflow occurs whenever a nonzero floating point number with absolute value outside these ranges must be stored on the computer. Note that when the number is too close to zero it’s called underflow. not when its largely negative. Furthermore, R is compact, where as Rb(t, s) is not. This is to say that between any two real numbers, there is an infinite number of reals in between. (infinite density). A real number x = ±(xI .xF )b = ±(dkdk1 . . . d0.d1d2 . . . )b We say we convert a real x to a fp number with the following notation, x 2 R! Fl(x) 2 Rb(t, s) Page 2 f砻 0 1X26 1xzz 以27 1Xp Nz8 惢 CSCC43 - Summary 2 FLOATING POINT ARITHMETIC We can represent this in Rb(t, s) by the following algorithm: 1. Normalizing the mantissa. We shift the decimal point, and readjust with the exponent x = ±(dkdk1 . . . d0.d1d2 . . . )b = ±(.D1D2 . . . )b · bk+1 2. chop or round the mantissa (a) chopping - chop after digit t of the mantissa. (b) rounding - chop after digit t then round Dt depending on whether Dt+1 b/2 or Dt+1 < b/2. A more ecient way to round is to add b/2 to Dt+1, then simply chop. Example Consider Fl(3/2) 2 R10(2, 4) = ( +0.66 if chop +0.67 if round Round o Error Precisely, this is the di erence between an x 2 R and FL(x) 2 Rb(t, s). Typically measured relative to x as x FL(x) x = or FL(x) = x(1 ) where is the relative round o . can be bound independently of x, < b1t for chopping normalized FPs. || < 1 2 b1t for rounding normalized FPs. 2.4 Machine Arithmetic Let x, y 2 R and FL(x), FL(y) 2 Rb(t, s) Consider operations 2 {+,, , /}, the computer gives x y FL(FL(x) FL(y)) Example in R10(2, 4) let x = 2, y = 0.0000058 2 R we have x+ y = 2.0000058 then FL(x) = (+0.20 · 101)10 and FL(y) = (+0.58 · 105)10 Note that no RROs in FL(x), FL(y) then FL(x) + FL(y) = 0.20 · 101 + 0.58 · 105 = 0.20000058 · 101 stored temporarily. finally, FL(FL(x) + FL(y)) = (+0.20 · 101)10 2.5 Machine Precision Definition: The smallest non-normalized FP number eps such that 1 + eps > 1. This number (eps) is referred to as machine epsilon. This is important as this number eps = 8<:b 1t chopping 1 2 b1t rounding recall = x FL(x) x is the RRO where can be bound independently. 0 eps for chopping, and || eps for rounding. How exactly is error propagated in each of 2 {+,, , /} Let x, y 2 R, then Fl(x) = x(1 ), F l(y) = y(1 ) Multiplication: x · y, where computer gives FL(FL(x) · FL(y)) We have [x(x x) · y(1 y)](1 xy) = x · y(1 x)(1 y)(1 xy) x · y(1 x y xy) = x · y(1 .) We say this is a good approximation as each is a small fraction bound by machine epsilon, there’s a good chance that products of s will be so small that they are lower than such machine epsilon. Where || 3 · eps, this is the worst case scenario. Page 3 CSCC43 - Summary 3 LINEAR SYSTEMS Addition: x+ y, where computer gives FL(FL(x) + FL(y)) We have (x(1x)+y(1y))(1xy) = x(1x)(1xy)+y(1y)(1xy) x(1xxy)+y(1yxy) = (x+ y) 1 x(1 x xy) x+ y + y(1 y xy) x+ y = (x+ y)[1 +] Where |+| xx+ y 2 · eps+ yx+ y 2 · eps = |x|+ |y||x+ y| 2 · eps i.e. |+| 8<:2 · eps x and y are the same sign2 · eps |x y||x+ y| x and y are di erent signs We see that when x y, the error bound could approach infinity. This is called subtractive cancellation. Example of Subtractive Cancellation in R10(3, 1) with rounding. Compute a2 2ab+ b2 with a = 15.6, b = 15.7 2 R first we have fl(a) = +(0.156 · 102) and fl(b) = +(0.157 · 102) with no initial round o . In R10(3, 1), f l(a2)! fl(243.36) = +(0.243 · 103) fl(2ab)! fl(489.84) = +(0.490 · 103) fl(b2)! fl(246.49) = +(0.246 · 103) fl(a2 2ab+ b2)! fl(243 490 + 246) = (0.100.101) But this is a problem as a2 2ab+ b2 = (a b)2 which is positive. 3 Linear Systems A~x = ~b, A 2 Rn n, ~x,~b 2 Rn, Given A,~b, calculate ~x General Solution Technique: 1. Reduce the problem to an equivalent one that is easier to solve. 2. Solve the reduced problem. Ex, given 3x1 5x2 = 1 6x1 7x2 = 5 , 3 5 6 7 x1 x2 = 1 5 , 3 5 0 3 x1 x2 = 1 3 ) x1 = 1, x1 = 2 Generalize this to n equations in n unknowns. Let matrix A = [aij ] where a11 refers to the top left element. We have: Equation 1: a11x1 + a12x2 + · · ·+ a1nxn = b1 Equation 2: a21x1 + a22x2 + · · ·+ a2nxn = b2 ... Equation n: an1x1 + an2x2 + · · ·+ annxn = bn Reduce system to triangular form. 1. Assume that a11 6= 0 multiply Equation 1 by a21 a11 and subtract that from Equation 2. multiply Equation 1 by a31 a11 and subtract that from Equation 3. ... multiply Equation 1 by an1 a11 and subtract that from Equation n. This results in a equivalent system, where x1 has been eliminated from Equations 2 to n. Page 4 CSCC43 - Summary 3 LINEAR SYSTEMS Now we have: Equation 1: a11x1 + a12x2 + · · ·+ a1nxn = b1 Equation 2 : 0 + a 22x2 + · · ·+ a 2nxn = b 2 ... Equation n : 0 + a n2x2 + · · ·+ a nnxn = b n 2. Assume that a 22 6= 0 multiply Equation 2 by a 32 a 22 and subtract that from Equation 3 . multiply Equation 2 by a n2 a 22 and subtract that from Equation n . ... finally, at step n 1, we assume a n1,n1 6= 0, multiply equation n 1 by a n,n1 a n1.n1 and subtract from Equation n this will result in an upper triangular system. Equation 1: a11x1 + a12x2 + · · ·+ a1nxn = b1 Equation 2 : 0 + a 22x2 + · · ·+ a 2nxn = b 2 ... Equation n : 0 + 0 + · · ·+ a nnxn = b n Another way of looking at this problem is using matrix vector notation, looking to solve A~x = ~b where A 2 Rn n,~b, ~x 2 Rn Triangulating such A requires the following steps: 1. Eliminate the first column of A below a11 L1A~x = L1~b, where L1 = 2666664 1 0 . . . 0 a21/a11 1 . . . 0 a31/a11 0 . . . 0 ... ... ... ... an1/a11 0 . . . 1 3777775 This is very similar to the identity matrix, except the first column is filled with the multipliers used in the first step. 2. Eliminate the second column of L1A below a 22 L2(L1A)~x = L2(L1~b) = 2666664 a11 a12 . . . a1n 0 a 22 . . . a 2n 0 0 a33 a3n ... ... ... ... 0 0 an3 ann 3777775, where L2 = 2666664 1 0 . . . 0 0 1 . . . 0 0 a 32/a 22 . . . 0 ... ... ... ... 0 a n2/a 22 . . . 1 3777775 ... We continue until we have Ln1Ln2 . . . L1A~x = Ln1Ln2 . . . L1~b let Ln1Ln2 . . . L1A = U , where U is an upper triangular matrix. This becomes very easy to solve. Page 5 CSCC43 - Summary 3 LINEAR SYSTEMS 3.1 LU Factorization We have Ln1Ln2 . . . L1A = U , A = L11 L12 . . . L1n1U Lemma 1: If Li is a Gauss Transform. Then L 1 i exists and is also a Gauss Transform. Lemma 2: If Li and Lj are Gauss Transforms, and i < j, LiLj = Li + Lj I Then A = L11 L 1 2 . . . L 1 n1U = LU Using A = LU to solve A~x = ~b A~x = ~b, LU~x = ~b , let ~d = ~b for a lower triangular L~d Then L~d = ~b, U~x = ~d for an upper triangular U~x We use LU factorization because its far cheaper, and if we have several linear systems with the same coecient matrix A, but di erent right hand side, i.e. A~x = ~b = ~c = ~d . . . , then we can use the same LU . Example of LU factorization let A = 24 2 1 12 2 2 2 4 1 35 L1 = 24 1 0 01 1 0 1 0 1 35 , L1A = 242 1 10 3 1 0 3 2 35 L2 = 241 0 00 1 0 0 1 1 35 , L2L1A = 242 1 10 3 1 0 0 1 35, L2L1A = U , A = L11 L12 U L11 = 24 1 0 01 1 0 1 0 1 35 , L12 = 241 0 00 1 0 0 1 1 35 , L = L11 L12 = L11 + L12 I = 24 1 0 01 1 0 1 1 1 35 3.2 Gaussian Elimination with Pivoting From last lecture, we have L2P2L1P1A = U L2P2L1P1A L2P2L1P2P2P1A, as P2P2 I Then L2P2L1P2P2P1A L2(P2L1P2)P2P1A Let L 1 = P2L1P2 Claim: L 1 is a Gauss Transform. Example P2 = 241 0 00 0 1 0 1 0 35 , L1 = 24 1 0 0l21 1 0 l31 0 1 35, then P2L1P2 = 24 1 0 0l31 1 0 l21 0 1 35 (swapped multipliers) Them L2L 1P2P1A = U , P2P1A = L 11 L12 U , PA = LU Now, to solve A~x = ~b, given PA = LU A~x = ~b, PA~x = P~b, LU~x = P~b, note that P~b is ~b with 2 elements swapped. Then LU~x = ~ b, let ~d = U~x L~d = ~b is easy to solve because L is lower triangular U~x = ~d is easy to solve because U is upper triangular. What happens if at some k-th stage, the diagonal and everything below it is 0 For example Lk1 . . . L2L1A, we cannot divide by the k-th row, k-th column element because its 0, further- more every element below it is 0. Remember our goal originally is to make every element in the k-th column under k zero, we just continue. Page 6 CSCC43 - Summary 3 LINEAR SYSTEMS This will result in a matrix U with a 0 along one of the diagonal entries. Then we have a singular matrix U , but the factorization still stays. While its possible for U to be singular, it’s not for the matrix L. As a combination of gauss transforms, they must be invertible and thus non-singular. In the last step, we have U~x = ~d, but U is singular. We could have no solution, or infinitely many solutions. Example: U~x = ~d! 242 5 40 0 1 0 0 2 3524x1x2 x3 35 = 24b1b2 b3 35) 2x3 = b3 and x2 = b2 Now if b3 = 2b2, we have a free parameter x2. Otherwise, there are no solutions. Now suppose during a numerical factorization of FPS, if all elements below the diagonal in column k and [akk] at the k-th stage and are of magnitude eps ·maxj2[1,k]|ujj |, we call this Numerical Singularity (Near Singularity). 3.3 Complexity of Gaussian Elimination Let matrix A be of order n n We will count multiplication/addition pairs, i.e. mx+ b as a Flop (Floating Point Operation). Lets begin with factorization. Computing the LU factorization, 1st stage is the zeroing out the first column, we have (n 1)2 flops as we are adding multiples of row 1 to rows 2 to n. 2nd stage is the zeroing out the first column, we have (n 2)2 flops as we are adding multiples of row 2 to rows 3 to n. ... (n 1)-th stage is the zeroing out the first column, we have 1 flop. Finally, we have (n 1)2 + (n 2)2 + · · ·+ 1 total flops. nX k=1 k2 = n(n+ 1)(2n+ 1) 6 , then (n 1)2 + (n 2)2 + · · ·+ 1 = n(n 1)(2n) 6 = n3 3 +O(n2) Computing the forward solve, L~d = ~b where L is a lower triangular matrix.26664 1 0 . . . 0 l21 1 o. ts 0 ... ... . . . 0 ln1 ln2 . . . 1 37775 26664 d1 d2 ... dn 37775 = 26664 b1 b2 ... bn 37775, d1 = b1, d2 = b2 l21d1, d3 = b3 l31d1 l32d2 . . . Total = 0 + 1 + 2 + · · ·+ (n 1) = n 2 2 +O(n) Flops Back-solving is similar, resulting in another n2 2 +O(n) Flops Total Forward/Backward solve: n2 +O(n) Flops 3.4 Round O Error of Gaussian Elimination Recall we have a factorization PA = LU computed in a floating point system. Because of machine round o error, we actually get P (A+E) = L U , where P , L , U are the computed factors. Then solving A~x = ~b becomes solving (A+ E)~ x = ~b, where ~ x is the computed solution. Equivalently, let E~ x = ~r, then (A+ E)~ x = ~b becomes A~ x+ ~r = ~b, or ~r = ~bA~ x Where ~r is the residual. (We would like this to be ~0). Page 7 zzrrffifǐ 必 ye Lexi st_s ii d.ie A th AFC to factorA.lu器 week5 CSCC43 - Summary 3 LINEAR SYSTEMS We can show that if row partial pivoting is used, kEk / k · eps · kAk where k is not too large and depends on n. And that k~rk / k · eps · kbk , krkkbk / k · eps However, this does not mean that k~x ~ xk or k~x ~ xk kxk is small. Example: 0.780 0.563 0.913 0.656 ~x = 0.217 0.254 , we know true solution is ~x = 1 1 Consider two computed solutions, x 1 = 0.999 1.001 , x 2 = 0.341 0.087 ~r1 = ~bAx 1 = 0.001243 0.001572 ,~r2 = ~bAx 2 = 0.000001 0 We see that k~r2k k~bk is much smaller than k~r1k ~b , yet we see that x 2 is a terrible solution. But why is k~x ~ x1k kxk so much smaller than k~x ~ x2k kxk We need the relationship between relative error and relative residual. We have Ax = ~b ~r and A~x = ~b, subtract the top from the bottom yields: A(~x x ) = ~r , ~x x = A1~r Taking the norm of both sides yields: k~x x k = kA1~rk kA1kk~rk (1) We can take the original ~b = A~x, k~bk = kA~xk kAkk~xk (2) Combining (1) and (2): k~x x k kAkk~xk kA1kk~rk k~bk , k~x x k k~xk kAkkA1kk~rk k~bk Where kAkkA1k = cond(A) We can try to derive a lower bound for this, again we have equations A~x = ~b and Ax = ~b ~r, Subtracting the two yields A(~x x ) = ~r , kA(~x x )k = k~rk By triangular inequality, kAkk~x x k k~rk (1) We take the original A~x = ~b, rearrange: ~x = A1~b) ~x kA1kk~bk (2) Combining equations (1) and (2) yields k~x x k k~xk k~rk k~bkkAkkA1k Finally, combining two bounds, we have k~rk k~bkcond(A) k~x x k k~xk cond(A) k~rk k~bk If cond(A) is very large, problem is poorly conditioned. Small relative residual does not mean small relative error. If the cond(A) is not too large, the problem is well conditioned and the small relative residual is a reliable indicator of small relative error. Conditioning is a continuous spectrum, how large is ”very large” depends on context. Recall in the previous example: A = 0.780 0.563 0.913 0.656 , A1 = 106 0.656 0.565 0.913 0.780 , where 106 = 1 det(A) Determinant formula: A = a b c d , A1 = 1 ad bc d b c a where det(A) = 0) A1 DNE. Page 8 CSCC43 - Summary 4 NON LINEAR EQUATIONS kAk1 = 1.572, kA1k = 1.693 · 106 ) cond1(A) = kAk1kA1k1 = 2.66 · 106 k~x x k k~xk 2.66 · 10 6 krk kbk , i.e. relative error in x could be as big as 2.66 · 10 6 times the relative residual. Thus this A is a poorly conditioned matrix. and relative residual is not a reliable indicator of relative error. 3.5 Iterative Refinement/Iterative Improvement Can we improve x One easy way is to improve mantissa length. Want to solve A~x = ~b, having already solved (A+ E)x = ~b Subtracting the two equations yields A(~x x ) = ~r, let ~z = ~x x , and we solve A~z = ~r Then ~x = x + ~z, however this is a fallacy since we can not solve for ~z exactly. thus we get z , and not ~z. But if we were to get 2 leading digits correct in x , then we will get 2 leading digits correct in z , and then you will have 4 leading digits correct in x + z The Algorithm Compute x (0) by solving A~x = ~b in a floating point system. For i = 0, 1, 2, . . . until solution is good enough. Compute the residual: r(i) = b ax (i) Solve A~z(i) = r(i) For some z (i) Update x (i+1) = x (i) + z (i) 4 Non Linear Equations Problem: F : R! R We need to some F (x ) = 0 for some x 2 R (root finding problem) Typically, we cannot find a ”closed form” (analytic) solution to non linear F , which is why we need numerical techniques. We can find iterative methods which generate an approximation x k, k = 0, 1, 2, . . . such that x k ! x as k !1 4.1 Fixed Point Methods Given a root finding problem, F (x ) = 0, this is equivalent to a fixed point problem x = g(x ) where x = g(x ) is obtaining a fixed point from g. Example: F (x) = x ex, this is equivalent to finding the fixed point of x = ex We can always use g(x) = xF (x) to obtain a working g(x). Alternatively, we can use g(x) = xh(x)F (x) we refer to x F (x) to be the first form, x h(x)F (X) as the second form. We often use the algebraically equivalent fixed point problem to solve the root finding problem because there is an obvious iteration algorithm to do so. First form, F (x ) = 0, x = g(x ) Second form, F (x ) = 0) x = g(x ), however we can have a fixed point x = g(x ) such that F (x ) 6= 0. (i.e., h(x ) = 0 but F (x ) 6= 0) The advantage of the second form is that there is flexibility in designing g(x), to make iteration converge faster. Page 9 CSCC43 - Summary 4 NON LINEAR EQUATIONS 4.2 Fixed Point Iteration Start with an approximate solution x 0 then iterate: xk+1 = g(x k), k = 0, 1, 2, . . . until convergence/failure. Example: Let F (x) = x2 + 2x 3 with exact roots x = 1,3 Consider the fixed point iteration (FPI): xk+1 = xk + x2k + 2xk 3 x2x 5 for the fixed point problem x = g(x) = x+ x2 + 2x 3 x2 5 We see that this is the second form of the fixed point problem, where h(x) = 1 x2 5 notice using this h(x) 6= 0, 8x 2 R, this means we don’t need to check after convergence. Fixed Point Iteration of x 0 = 5! x ks converges to -3. However, Fixed Point Iteration of x 0 = +5! x ks do not converge. We can keep trying, but that would be a significant waste of time. 4.3 Fixed Point Theorem If there is an interval [a, b] such that g(x) 2 [a, b] for all x 2 [a, b] And if kg0(x)k L < 1, 8x 2 [a, b] Then g(x) has a unique fixed point in [a, b] Proof, start with any x 0 2 [a, b] and iterate with xk+1 = g(x k), k = 0, 1, 2 . . . then all x k 2 [a, b] Moreover, xk+1 xk = g(xk) g(xk1) = g0( k)g(xk xk1) For some k 2 [xk1, xk] [a, b] By MVT (Mean Value Theorem) Thus, |xk+1 xk| < |g0( k)||xk xk1| L|xk xk1| Then, L|xk xk1| · · · Lk|x1 x0| Since L < 1, |xk+1 xk|! 0 as k !1 This means the fixed points are converging to some x 2 [a, b] to complete this proof, we must show: 1. x = g(x ) (i.e., x is a fixed point!) 2. x is unique in [a, b]. Using the Fixed Pint Theorem, Example: F (x) = x ex = 0 and g(x) = ex for x = g(x) What is the maximum interval of guaranteed convergence of xk 4.4 Rate of Convergence Definition: If lim x k!x |x xk+1| |x xk|p = c 6= 0, we have p-th order convergence to fixed point x Example: Table of absolute error of iterates, |x xk| k p = 1, c = 1/2 p = 2, c = 1 0 101 101 1 5 · 102 102 2 2.5 · 102 104 3 1.25 · 102 108 4 6.125 · 103 1016 We see that despite the second column having c = 1, p = 2 converges much much faster. Page 10 CSCC43 - Summary 4 NON LINEAR EQUATIONS 4.5 Rate of Convergence Theorem For the Fixed Point Iteration xk+1 = g(xk), if g0(x ) = g00(x ) · · · = g(p1)(x ) = 0 but gp(x ) 6= 0, then we have p-th order convergence. Proof let xk+1 = g(xk) = g(x + (xk x )) = g(x ) + g(xk x )g0(x ) + (xk x ) 2 2! g00(x ) + · · ·+ (xk x ) p1 (p 1)! g (p1)(x ) + (xk x )p p! g(p)( k) Where k 2 [x , xk] However, if we have g0(x ) = g00(x ) = · · · = g(p1)(x ) = 0 The Taylor expansion yields xk+1 = g(xk) = g(x ) + (xk x )p p! g(p)( k), xk+1 x (xk x )p = 1 p! g(p)( k) We see that as the p!1, xk ! x , k 2 [x , xk]) k = x We can rewrite this as lim xk!x |xk+1 x | |xk x |p = 1 p! g(p)(x ) Thus we see that when the p-th derivative of g hat x is not zero, we reach p-th order convergence. We see that now we can use the second form of the fixed point iteration g(x) = x h(x)F (x), and pick such h(x) so that the p-th derivative of g is not zero at x to accelerate iteration. 4.6 Newton’s Method Recall Newton’s Method of F (x) = 0 where xk+1 = xk F (xk) F 0(xk) Where it takes the root finding problem of F (x) = 0, x = g(x) with g(x) = x F (x) F 0(x) This is the second form of the fixed point problem with h(x) = 1 F 0(x) Speed of Newton’s Method Suppose that F 0(x ) = 0 but F 00(x ) 6= 0 g0(x) = 1 F 0(x)F 0(x) F (x)F 00(x) (F 0(x))2 = F (x)F 00(x) (F 0(x))2 i.e., g0(x ) = 0 For any function F . By rate of convergence theorem, the newtons method has at-least quadratic convergence. since g0(x ) = 0 for any function F . Geometric Interpretation of Newton’s Method Want to solve F (x) = 0, at an initial guess xk approximate (or model) F (x) by a linear polynomial p(x) that satisfies conditions: p(xk) = F (xk) and p0(xk) = F 0(xk)) pk(x) = F (xk) + (x xk)F 0(xk) Then xk+1 is the root of pk(x), i.e. pk(xk+1) = 0) F (xk) + (xk+1 xk)F 0(xk) = 0 ) xk+1 = xk F (xk) F 0(xk) Will Newton’s Method always Converge No. For example Newton’s method on F (x) = x ex xk+1 = xk F (xk) F 0(xk) = xk xk e xk 1 + exk = (1 + xk)exk 1 + exk 4.7 Secant Method Recall NM: xk+1 = xk F (xk) F 0(xk) , notice that we have to supply the first derivative of F Page 11 CSCC43 - Summary 4 NON LINEAR EQUATIONS We can approximate F 0(xk) by F (xk) F (xk1) xk xk1 Then xk+1 = xk F (xk) (xk xk1) F (xk) F (xk1) Another way to derive this is using the geometric interpretation. However This is not a fixed point iteration of the form xk+1 = g(xk), as no function g could support two values. We cannot directly use FPT or RCT to analyze this method. However, with some adjustment can prove rate of convergence is p = 1 + p 5 2 1.62 Super Linear Convergence, perhaps not as fast as Newton (p = 2) Newton’s Method: xk+1 = xk F (xk) F 0(xk) Secant Method: xk+1 = xk F (xk)F (xk) F (xk1) xk xk1 = xk F (xk) (xk xk1) F (xk) F (xk1) But we see that newtons method requires 2 Function evaluations per step as F, F 0 are not the same. And the secant method requires only 1 function evaluation. Although F (x) needs to be evaluated, F (xk1) has been computed during the previous iteration! E ect rate of convergence takes cost per iteration into account, as well as speed. Thus secant method is ”e ectively” faster than Newton’s method. 4.8 Bisection Method Oftentimes, if we start Newton’s/Secant method with the wrong initial guess, they will not converge. (Wrong side of the hill) Bisection Method however has guaranteed convergence, this is to say that it always finds a root, but slowly. We need to find an a b such that F (a) 0 F (b) or F (b) 0 F (a) This means there is at least one root of F in [a, b] Assume F (a) 0 F (b) Loop until b a is small enough. let m = (a+ b)/2 If F (m) 0 then let a = m, else b = m Repeat for the interval [m, b] or [a,m] Then we have linear rate of convergence p = 1, c = 1 2 with guaranteed convergence. 4.9 Hybrid Method We can start o with bisection, then switch to a faster one later on. Example: Combine Bisection and Newton’s Method 4.10 System of Non-Linear Equations Problem: Solve F (~x) = ~0 Where F (~x) = F1(x1, x2) F2(x1, x2) = 4(x1)2 + 9(x2)2 16x1 54x2 + 61 x1x2 2x1 1 = 0 0 Extending Newton’s Method to Systems. Recall Newton’s Method for a single non linear equation where F : R! R and xk+1 = xk F (xk) F 0(xk) Page 12 CSCC43 - Summary 5 APPROXIMATION/INTERPOLATION Then for F : Rn ! Rn then ~xk+1 = ~xk F ( ~xk) F 0( ~xk) Need to Know: F 0 is the Jacobian matrix of F , i.e. F 0 2 Rn n then ~xk+1 = ~xk [F 0( ~xk)]1F ( ~xk) or [F 0( ~xk)]( ~xk+1 ~xk) = F ( ~xk) This is simply in the form of some A~x = ~b, a linear system! This is very expensive! We can use the Pseudo Newton Method by holding the Jacobian matrix fixed for a few iterations, this means we can reuse the PA = LU factorization. This is alright so long as we are still converging. 5 Approximation/Interpolation Truncated Taylor Series: p(x) = F (a) + F 0(a)(x a) + · · ·+ F (n)(a) n! (x a)n This is a polynomial because of the (x a)i for i = 0, 1, 2 . . . Then the error e(x) = p(x) F (x) = F (n+1)( ) (n+ 1)! (x a)n+1 Other approximations: 1. Interpolation: Find a polynomial p such that p(xi) = F (xi), i = 0, 1, 2, . . . F is the function we are trying to approximate, could simply be a set of data. 2. Least Squares: Finding a polynomial p such that p(x) minimize kF pk2 = Z b a (F (x) p(x))2dx !1/2 Other norms we can use for Least Squares approximation, kF pk1 = maxa x b|F (x) p(x)| kF pk1 = Z b a |F (x) p(x)|dx 5.1 Polynomial Interpolation Consider Pn : the set of all polynomials of degree n This is a function space, and requires a basis of n+1 functions. (Pn is a function space of dimension n+1) The most common basis is the monomial basis {xi}ni=0 Weierstrass’ Theorem (Existence) If a function F is continuous on an interval [a, b], then for any > 0, 9p 3: kF p k1 < This is to say that for any continuous function on a closed interval [a, b] there exists some polynomial that is as close to it as it can be. Numerical Methods for Polynomial Interpolation 1. Vandermonde: Method of Undetermined Coecients Theorem For any sets: {xi : i = 0, 1, . . . n}, {yi : i = 0, 1, . . . , n} for distinct xis and undistinct yis Then 9!p 2 Pn,3: p(xi) = yi, i = 0, 1, . . . , n Proof : If such p(x) exists, it must be possible to write it as a monomial space polynomial: p(x) = nX i=0 aix i, Then we have n+ 1 unknowns, {ai}, i = 0, 1, . . . , n Page 13 CSCC43 - Summary 5 APPROXIMATION/INTERPOLATION This can be converted into a matrix problem with p(xi) = yi, i = 0, 1, . . . n We can solve for the ais by using the Vandermonde matrix:26664 (x0)0 (x0)1 (x0)2 . . . (x0)n (x1)0 (x1)1 (x1)2 . . . (x1)n ... (xn)0 (xn)1 (xn)2 . . . (xn)n 37775 26664 a0 a1 ... an 37775 = 26664 y0 y1 ... yn 37775 Now is the Vandermonde matrix non-singular We can return to polynomial problem, we are asking if its possible for p 2 Pn,3: p(xi) = 0, i = 0, 1, 2, . . . , n But this is impossible as no polynomial in Pn have n+ 1 roots Thus the polynomial is the x-axis with 0 coecients. Or det(V ) = (1)n+1 nY i=1 i1Y j=0 (xi xj) The Vandermonde however does not lead to the best algorithm, it can be poorly conditioned, PV = LU factorization then forward/backward solve will cost. This will give the monomial basis p(x) 2. Lagrange Basis Consider a simple interpolation problem p(xi) = yi, i = 0, 1, . . . , n Consider the basis {li(x)} where li(x) = nY j=0,j 6=i x xj xi xj for i = 0, 1, . . . , n Note, li(x) 2 Pn, 8i and li(xj) = ( 1 i = j 0 i 6= j With this basis function, we can write out the interpolating polynomial for free. Where p(x) = nX i=0 li(x)yi ) p(x) 2 Pn and p(xi) = yi, i = 0, 1, . . . , n 3. Newton (Divided Di erence) Basis For simple interpolation p(xi) = yi, i = 0, 1, . . . , n We look for an interpolating p(x) of the form: p(x) = a0 + a1(x x0) + a2(x x0)(x x1) + · · ·+ an(x x0)(x x1)(x x2) . . . (x xn1) Converting into a matrix, 2666666664 1 0 0 . . . 0 1 (x1 x0) 0 . . . 0 1 (x2 x0) (x2 x0)(x2 x1) . . . 0 ... 1 (xn x0) . . . n1Y i=0 (xn xi) 3777777775 2666664 a0 a1 a2 ... an 3777775 = 2666664 y0 y1 y2 ... yn 3777775 This is a lower triangular system, which means there is no factorization involved. Page 14 CSCC43 - Summary 5 APPROXIMATION/INTERPOLATION 5.2 Divided Di erences Definition: y[x1] = y(x1) = y1 y[xi+k, . . . , xi] = y[xi+k, . . . , xi+1] y[xi+k+1, . . . , xi] xi+k xi example: y[x2, x1, x0] = y[x2, x1] y[x1, x0] x2 x0 Newton’s Polynomial: p(x) = y[x0]+ (xx0)y[x1, x0]+ · · ·+(xx0)(xx1) . . . (xxn1)y[xn, , . . . , x0] Then, p(x) 2 Pn and p(xi) = yi, i = 0, 1, . . . , n Proof by Induction, let p(x) pn(x) Basis: n = 0, n = 1 p0(x) = y[x0], p0(x0) = y[x0] = y0 p1(x) = y[x0] + (x1 x0)y[x1, x0] p1(x0) = y[x0] = y0 p1(x1) = y[x0] + (x1 x0) y1 y0 x1 x0 ◆ = y[x0] + y1 y0 = y1 Induction Hypothesis: SKIPPING THIS FOR NOW NO TIME LMFAO How are divided di erences and derivatives related Example: y[x1, x0] = y(x1) y(x0) x1 x0 , but what happens when x1 ! x0 Then lim x1!x0 y[x1, x0] = lim x1!x0 y(x1) y(x0) x1 x0 = y 0(x0) Provided the y0 exists. Example: y[x2, x1, x0] = y[x2, x1] y[x1, x0] x2 x0 Then lim x2!x0 x1!x0 y[x2, x1, x0] = lim x1!x0 y[x2, x1] y[x1, x0] x2 x0 = y00(x0) 2! We can show that in general, lim xk!x0 xk1!x0 xk2!x0 ... x1!x0 y[xk, xk1, . . . x0] = y(k)(x0) k! This helps with Osculatory Interpolation (i.e., interpolation with derivatives). Page 15 CSCC43 - Summary 5 APPROXIMATION/INTERPOLATION Example: Find p 2 P4 such that. p(0) = 0 p(1) = 1, p0(1) = 1, p00(1) = 2 p(2) = 6 Page 16 CSCC43 - Summary 6 NUMERICAL INTEGRATION 6 Numerical Integration Problem: Approximate some I(F ) = Z b a F (x)dx recall Riemann Sums of a function F , R(F ) = u1X i=0 F ( i)(xi+1 xi) Composite Rule: Apply a simple format on many short integrals, i.e. Z x2 x1 F (x)dx Basic Formulas: Based o interpolation, instead of integrating we can use the interpolated polynomial. We approximate F (x) by some p(x) on [a, b] Then I(F ) Q(F ) = I(P ) = Z b a p(x)dx = Z b a [ nX i=0 F (xi)li(x)] = nX i=0 F (xi) Z b a li(x)dx We arrive at the general form of quadrature: Q(F ) = nX i=0 AiF (xi), Ai = Z b a li(x)dx 6.1 Theorem: Existence of Interpolatory Quadrature Given any set of n + 1 distinct nodes, we can choose the weights Ai such that the quadrature rule is exact for all polynomials of degree qn, moreover Ais are unique. Proof: Since Q(F ) is linear, then for {xi}ni=0 the polynomials can be divided into basis form. Then for each xk, Q(xk) = nX i=0 Ai(xi) k = 1 k + 1 (bk+1 ak+1) This results in n+ 1 equations with n+ 1 unknowns.2666664 1 1 1 . . . 1 x0 x1 x2 . . . xn (x0)2 (x1)2 (x2)2 . . . (xn)2 ... ... ... . . . ... (x0)n (x1)n (x2)n . . . (xn)n 3777775 2666664 A0 A1 A2 ... An 3777775 = 2666664 b a (b2 a2)/2 (b3 a3)/3 ... (bn+1 an+1)/(n+ 1) 3777775 We see that the first matrix is simply the transposed Vandermonde Matrix, thus by invertability, weights are unique. 6.2 Definition of Precision If m is the highest natural number such that Q integrates all polynomials of degree m, we say the precision of Q is m. Midpoint Rule: I(F ) Q(F ) =M(F ) = I(p0) = (b a)F (a+ b 2 ) Trapezoid Rule: I(F ) Q(F ) = T (F ) = I(p1) = b a 2 [F (a) + F (b)] Simpsons Rule: I(F ) Q(F ) = T (F ) = I(p2) = b a 6 [F (a) + F ( b+ a 2 ) + F (b)] Page 17 CSCC43 - Summary 6 NUMERICAL INTEGRATION 6.3 Error in Interpolatory Quadrature En = I(F )Qn(F ), where Qn will integrate all polynomials if degree n = Z b a [F (x) pn(x)]dx = Z b a F (n+1)( ) (n+ 1)! nY i=0 (x xi) ! dx Will interpolatory quadrature rules converge to I(F ) as n!1 Theorem: Let F be continuous on [a, b] an Qn(F ) = nX i=0 A(n)i F ((xi) (n)) Then: lim n!1Qn(F ) = I(F )() 9k 2 R,3: X i = 0n|A(n)i | k, 8n 2 N Page 18