统计-347H1S

Lecture Notes of STA347H1S Ziteng Cheng July 31, 2022 Contents 1 Axioms and Basic Properties of Probabilities 1 2 Random Variables 5 3 Distribution as Induced Measure 7 4 Expectation as Lebesgue Integral 9 5 Lebesgue Measure and Density Function 12 6 Independence and Product Measures 14 7 Change of Variables 18 8 Selections of Inequalities 19 9 Convergence of Random Variables 24 10 Limit Theorems 25 11 Relations between Convergences 27 12 Laws of Large Numbers 28 13 Conditional Expectation 31 14 Weak Convergence of Probability 35 A Preliminaries 35 1 Axioms and Basic Properties of Probabilities Let X and be non-empty abstract spaces, with not special structure. We will use X and interchangeably. In particular, we use to emphasize it as the sample space. A is sometimes called an event. The elements in is denoted by ω. 1 22X is called the power set of X, it is the set of all subset of X. A X and A ∈ 2X share the same meaning. def:sigmaAlg Definition 1.1. A ∈ 2X is a σ-algebra if (i) ∈ A , ∈ A ; (ii) [closed under complement] A ∈ A = Ac ∈ A ; (iii) [closed under countable union] (An)n∈N A = n∈NAn ∈ A . For C 2 , we write σ(C ) for the smallest σ-algebra containing C . rmk:SigmaAlg Remark 1.2. In view of Theorem A.2 (h), σ-algebra is also closed under intersection, that is, (An)n∈N A = n∈NAn ∈ A . Intersection of σ-algebra is still a σ-algebra, but union is not. Example 1.3. Here are some examples of σ-algebra: 1. { , } is the trivial σ-algebra. 2. 2 is a σ-algebra. 3. A , then σ(A) = {ω,A,Ac, }. 4. Let = {a, b, c, d}. A = { , {a, b, c, d}, {a}, {b}, {c, d}, {a, b}, {a, c, d}, {b, c, d}} is a σ-algebra. Moreover, σ({{a}, {b}, {c, d}}) = A . rmk:MonotoneClass Remark 1.4. In general, σ-algebra is not quite convenient to describe via definition. An alternative description, which turns out to be more convenient in many cases, is monotone class theorem (cf. [A&B, Secion 4.4]). def:BorelsigmaAlg Definition 1.5. We define the Borel σ-algebra on Rn as B(Rn) := σ({B Rn : B is open}). For the rest of the course, we always pair Rn with its Borel σ-algebra B(Rn). The notion of Borel σ-algebra can be extended to, say, a metric space. If X is a metric space, then B(X) := σ({B X : B is open}). lem:SingletonInBorel Lemma 1.6. Let X be a metric space with metric d. For any x ∈ X, we have {x} ∈ B(X). Proof. For r > 0, we let Br(x) := {x′ ∈ X : d(x, x′) < r}, i.e., Br(x) is the open ball centered at x with radius r. Note that B(X) is also closed under countable intersection (cf. Remark 1.7), we conclude {x} ∈ B(X). rmk:BorelSigmaAlg Remark 1.7. It can be shown by combining [A&B, Section 4.9, Theorem 4.44] and monotone class theorem (cf. Remark 1.4) that B(Rn) = σ({A1 × · · · ×An : Ak ∈ B(R), k = 1, ..., n}) = σ({[a1, b1]× · · · × [an, bn] : ak ≤ bk ∈ R, k = 1, ..., n}) = σ({[a1, b1)× · · · × [an, bn) : ak ≤ bk,∈ R, k = 1, ..., n}) = σ({(a1, b1]× · · · × (an, bn] : ak ≤ bk,∈ R, k = 1, ..., n}) = σ({(a1, b1)× · · · × (an, bn) : ak ≤ bk,∈ R, k = 1, ..., n}). 3def:Measure Definition 1.8. μ : A → [0,∞] is a measure if (i) μ( ) = 0; (ii) [countable-additivity]1 for any (An)n∈N A with Ai ∩ Aj = , i = j, we have μ( n∈NAn) =∑ n∈N μ(An). We say μ is a probability if μ( ) = 1. We usually use P to denote a probability. If μ(X) = ∞, we call μ an infinite measure. rmk:Measure Remark 1.9. 1. Regarding the construction of a measure, we refer to the procedure called Carathe′odory extension (cf. [D, Theorem 1.1.9], [B, Section 4.5] and [A&B, Section 10.23]). 2. Let μ and μ′ be measures on σ(C ) and μ(A) = μ′(A) for A ∈ C , then μ(A) = μ′(A) for A ∈ σ(C ). This can be proved by showing {A ∈ 2 : μ(A) = μ(A)} = σ(C ) using monotone class theorem (cf. Remark 1.4). exmp:MeasureSp Example 1.10. 1. If X is finite or countable, we can easily construct a measure μ on 2X by assigning a non-negative number αx to each x ∈ X and defining μ(A) := ∑ x∈A αx for A ∈ 2X. If ∑ x∈X αx = 1, then μ is a probability. 2. A Dirac measure on x, denoted by δx, is a measure that satisfies δx(A) = { 1, x ∈ A, 0, x /∈ A. (X,X , δx) is a measure space. 3. Let (αn)n∈N [0,∞] and (xn)n∈N X. Then, μ(A) := ∑ n∈N αnδxn(A) is a measure on (X,X ). Such μ is called discrete. If ∑ n∈N αn = 1, then μ is a probability on (X,X ). 4. Lebesgue measure on (R,B(R)). Lebesgue measure on ([0, 1],B([0, 1])) is a probability. See Section 5 for more discussion. Definition 1.11. Let A 2X be a σ-algebra and μ be a measure on X . We call (X,X ) a measurable space and (X,X , μ) a measure space. If μ is a probability, (X,X , μ) is called a probability space; we usually write ( ,A ,P) for probability space. On a measure space (X,X , μ), we say N is a null set if there is A ∈ X such that μ(A) = 0 and N A. Note that N may note belong to X . We say (X,X , μ) is a complete measure space if X contains all null sets, i.e., for all A ∈X with μ(A) = 0, we have N ∈ A as long as N A. We say A ∈ A is true μ-almost surely, if Ac is a null set. 1This is also called σ-additivity. 4Definition 1.12. Let A ∈ 2X. The indicator function 1A : X→ R is defined as 1A(x) := { 1, x ∈ A, 0, x /∈ A. When no confusion arise, we will omit x and simply write 1A. Lemma 1.13. The indicator function has the following properties (a) 1A∩B = 1A1B, and in particular, if A B, 1A = 1A1B; (b) if A ∩B = , 1A∪B = 1A + 1B, and in particular, 1A + 1Ac = 1. Definition 1.14. Let A,A1, A2, · · · ∈ 2X. We say (An)n∈N increases to A if A1 A2 . . . and n∈NAn = A. We (An)n∈N decreases to A if A1 A2 . . . and n∈NAn = A. We say (An)n∈N converges to A if limn→∞ 1An(x) = 1A(x) for x ∈ X. For abbreviation, we write An ↑ A, An ↓ A and limn→∞An = A, respectively. We also define lim sup n→∞ An := n∈N k≥n Ak and lim inf n→∞ An := n∈N k≥n Ak. Note that for any n, ∈ N we have k≥nAk k≥ Ak, and thus lim sup n→∞ An lim inf n→∞ An. Remark 1.15. We have lim sup n→∞ 1An(x) = 1lim supn→∞ An(x) and lim infn→∞ 1An(x) = 1lim infn→∞ An(x), x ∈ X. To see this, we first note that supk≥n 1Ak = 1 k≥n Ak . Note additionally that 1 k≥n Ak(x) is decreas- ing in n and bounded from below by 0, therefore limn→∞ 1 k≥n Ak(x) is well-defined. Next, suppose x ∈ X satisfies limn→∞ 1 k≥n Ak(x) = 1, then there must be Nx ∈ X such that 1 k≥n Ak(x) = 1 for n ≥ Nx, and thus x ∈ k≥nAk for n ≥ Nx. Since k≥nAk is decreasing in n we have x ∈ k≥nAk for n ∈ N, i.e., x ∈ lim supn→∞An. If x ∈ X satisfies limn→∞ 1 k≥0 Ak(x) = 0, there must be Nx ∈ X such that 1 k≥n Ak(x) = 0 for n ≥ Nx, and thus x /∈ lim supn→∞An. The theorem below regards the basic properties of probability. thm:MeasBasic Theorem 1.16. Let (X,X , μ) be a measure space. Then, for any A,B,A1, A2, · · · ∈X , (a) A B = μ(A) ≤ μ(B); (b) A n∈NAn = μ(A) ≤∑n∈N μ(A); (c) An ↑ A = limn→∞ P(An) = P(A); (d) if μ = P is a probability, then P(A) + P(Ac) = 1; (e) if μ = P is a probability, An ↓ A = limn→∞ P(An) = P(A). 5Proof. (a) Note B = A ∪ (B ∩ Ac) and A ∩ (B ∩ Ac) = . Then, by Definition 1.1 (ii), μ(B) = μ(A) + μ(B ∩Ac) ≥ μ(A). (b) Define B1 := A1 and Bn := An ∩ ( n 1 k=1 Ak) c. Note that (Bn)n∈N are mutually disjoint. Additionally, n k=1Ak = n k=1Bk, and thus k∈NAk = k∈NBk. This together with statement (a) and Definition 1.8 (ii) implies that μ(A) ≤ μ( n∈NAn) = μ( n∈NBn) = ∑n∈N μ(Bn) ≤∑ n∈N μ(An). (c) Let (Bn)n∈N be defined as above, and note Bn = An ∩ Acn 1 for n ≥ 2. It follows from Definition 1.8 (ii) that μ(A) = ∑ n∈N μ(Bn) = limm→∞ ∑m n=1 μ(Bn) = limm→∞ μ( m n=1Bn) = limm→∞ μ(Am). (d)&(e) DIY. Example 1.17. This is a non-example for Theorem 1.16 (e) when μ is an infinite measure. On the measurable space (N, 2N), we let μ be a counting measure, that is, μ(A) be the number of elements in A. It can be verified that μ indeed satisfies Definition 1.8. Let An := {n, n + 1, . . . }. Then, An An+1 and μ(An) =∞. On the other hand, note that n∈NAn = and thus μ( n∈NAn) = 0. The next theorem regards the continuity of probability. thm:ProbCont Theorem 1.18 (Continuity of Probability). Let A 2 be a σ-algebra. Suppose (An)n∈N A and limn→∞An = A. Then, A ∈ A and limn→∞ P(An) = P(A). Proof. In view of Definition 1.1, we have lim supn→∞An ∈ A and lim infn→∞An ∈ A . Next, note that by hypothesis, ω ∈ A if and only if there is Nω ∈ N such that ω ∈ An for any n ≥ Nω (why ). Therefore, A lim supn→∞An and A lim infn→∞An. On the other hand, if ω ∈ lim supn→∞An, then for any n ∈ N, there exists k ≥ n such that ω ∈ Ak. Note additionally that lim infn→∞An lim supn→∞An. It follows from hypothesis that ω ∈ A (why ), and thus A = lim sup n→∞ An = lim inf n→∞ An, (1.1) eq:Alimsupliminf which proves A ∈ A . In order to finish the proof, we let Bn := k≥nAk and Cn := k≥nAk. Note that Bn ↑ A and Cn ↓ A due to (1.1). By Theorem 1.16 (c) (e), we yield P(A) = limn→∞ P(Cn) = limn→∞ P(Bn). However, we have Bn An Cn. It follows from Theorem 1.16 (a) that P(Bn) ≤ P(An) ≤ P(Cn). Finally, we conclude limn→∞ P(An) = P(A). 2 Random Variables def:rv Definition 2.1. (i) Let (X,X ) and (Y,Y ) be two measurable spaces. We say a function f : X → Y is X -Y measurable if {x ∈ X : f(x) ∈ B} ∈ X for any B ∈ Y , and we write f : (X,X ) → (Y,Y ) for abbreviation. Sometimes it is convenient to write f 1(B) := {x ∈ X : f(x) ∈ B}. We also define σ(f) := f 1(Y ), where we note f 1(Y ) is a σ-algebra (why ). (ii) If we set X = and consider Y : ( ,A )→ (Y,Y ), to emphasize that Y maps from the event space, we call Y an A -Y random variable. (iii) For Y : ( ,A )→ (Rn,B(Rn)), we may call Y an Rn-valued A -random variable. If n = 1, we also call Y an real-valued A -random variable. When no confusion arises, we simply call Y a real-valued random variable. 6(iv) Let f and g be X -B(R) measurable. We say f = g, μ-almost surely for μ({x ∈ X : f(x) = g(x)}c) = 0. We write μ a.s. for abbreviation. For the rest of this course, unless specified otherwise, the ‘=’ relationship between functions are understood in the almost sure sense. The same is true for ‘<’, ‘>’, ‘≤’ and ‘≥’. When no confusion arise we will omit μ. rmk:Measurable Remark 2.2. 1. All functions f : X → Y are 2X-Y measurable, regardless of Y . It is tempting to always use 2X when possible. But it turns out that 2X has some pathology when X is uncountable, say, X = R. We defer to Remark 5.2 for more discussion. 2. f : X→ Y is X -σ(C ) measurable if and only if {x ∈ X : f(x) ∈ C} ∈X for any C ∈ C . The ‘if’ direction can be proved by showing {B X : f 1(B) ∈ A } = σ(C ) using monotone class theorem (cf. Remark 1.4). The ‘only if’ direction is clear from definition. 3. Composition preserves measurability. More precisely, consider f : (X,X ) → (Y,Y ) and g : (Y,Y )→ (Z,Z ), then the composition of g and f , defined as g f(a) := g(f(a)), is X -Z measurable. 4. Suppose X and Y are metric spaces. Then, any continuous f : X → Y is B(X)-B(Y) measur- able. This is a consequence of the fact that, f : X→ Y is continuous if and only if f 1(U) is open for any open U Y. 5. Consider Xk : ( ,A )→ (Xk,Xk) for k = 1, . . . , n. Then, (X1, . . . , Xn) as a mapping from to Xn, is A -X1 · · · Xn measurable, where X Y := σ({A×B : A ∈X , B ∈ Y }). 6. Suppose f and g are real-valued X -measurable function. Then, so are cf (for c ∈ R), f + g, fg, f/g (if g = 0), max{f, g} and min{f, g}. 7. Let (fn)n∈N be a sequence of real-valuedX -measurable functions. Using point 2, we can show that lim infn→∞ fn(ω) := limn→∞ infk≥n fk(ω) and lim supn→∞ fn(ω) := limn→∞ supk≥n fk(ω) are X -B(R) measurable. Moreover, if (fn(ω))n∈N converges as n → ∞ for each ω ∈ , then f(ω) := limn→∞ fn(ω) is X -B(R) measurable. 8. For any f : (X,X )→ (Y,Y ), σ(f) is the smallest σ-algebra on X such that f is measurable, and we have f : (X, σ(f))→ (Y,Y ). Moreover, if Y = σ(C ), then f 1(σ(C )) = σ(f 1(C )). 9. Let I be an uncountable set of indexes and consider fi : (X,X ) → (R,B(R)), supi∈I fi(x) may not be measurable (cf. …….). The next lemma can be proved using element chasing method. lem:PreimageComm Lemma 2.3. Let f : X → Y, B ∈ 2Y and (Bi)i∈I 2Y, where I is a set of indexes (possibly un- countable). We have f 1(Bc) = (f 1(B))c, f 1( n∈NBn) = n∈N f 1(Bn) and f 1( n∈NBn) = n∈N f 1(B). def:SimpleFunc Definition 2.4. A function f : X → R is called simple if it takes only finitely many value. In particular, it can be written as f(x) = n∑ k=1 rk1Ak(x), x ∈ X, (2.1) eq:SimpleFuncRepr for some n ∈ N, distinct r1, . . . , rn ∈ R and Ak = f 1({rk}) for k = 1, . . . , n. Clearly, 7Lemma 2.5. Let f be the simple function in (2.1). Then, Ai ∩ Aj = for i = j. Moreover, A1, . . . , An ∈X if and only if f is X -B(R) measurable. Proof. The first statement follows from Lemma 2.3 and the convention that f 1( ) = . Regarding the second statement, if f is measurable, in view of Lemma 1.6, Ak = f 1({rk}) ∈ X due to Definition 2.1. If A1, . . . , An ∈X , then for any B ∈ B(R), we have f 1(B) = f 1( k=1,…,n; rk∈B {rk}) = k=1,…,n; rk∈B f 1({rk}) ∈X , where we have used Lemma 2.3 in the last inequality. thm:SimpleFuncApprox Theorem 2.6 (Simple Function Approximation). For any f : (X,X ) → (R,B(R)), there is a sequence of simple functions (fn)n∈N such that fn is X -B(R) measurable, |fn(x)| ≤ |f(x)| and limn→∞ fn(x) = f(x) for any x ∈ X. In particular, we can construct fn as fn(x) = n1f 1(( ∞, n])(x) + 1∑ k= n2n k + 1 2n 1f 1(( k 2n , k+1 2n ])(x) + n2n 1∑ k=0 k 2n 1f 1([ k 2n , k+1 2n ))(x) + n1f 1([n,∞))(x), x ∈ X. Moreover, if f is non-negative, then fn(x) ≤ fn+1(x) for x ∈ X. thm:sigmagsigmaf Theorem 2.7. Consider f : (X,X )→ (Y,Y ) and g : (X,X )→ (R,B(R)). Then, σ(g) σ(f) if and only if there is h : (Y,Y )→ (R,B(R)) such that g = h f . Proof. Regarding the ‘if’ direction, we have σ(g) = g 1(Z ) = (h f) 1(Z ) (why )= f 1(h 1(Z )). Since h 1(Z ) Y due to the measurability of h, we conclude σ(g) σ(f). Now we prove the ‘only if’ direction. We first assume g is simple and suppose g(x) = ∑m k=1 rk1Ak(x) for some r1, . . . , rn ∈ R and A1, . . . , An ∈ X . Without loss of generality, we assume ri = rj and Ai ∩ Aj = for i = j, where we note Ak = g 1({rk}) ∈ σ(g). Since σ(g) σ(f), we must have Ak ∈ σ(f) for k = 1, . . . , n. If follows that there is Bk such that f 1(Bk) = Ak and Bi ∩ Bj = . Then, h(x) := ∑m k=1 rk1Bk(x) is the desired. Now we consider a generic g. There is a sequence ofX -measurable simple functions (gn)n∈N that approximates g in the sense of Theorem 2.6 and σ(gn) σ(f). For each n ∈ N, there is real-valued Y -measurable hn such that gn = hn f . Let L := {y ∈ Y : lim infn→∞ hn(y) = lim supn→∞ hn(y)}. Because limn→∞ hn(f(x)) = limn→∞ gn(x) = g(x) for x ∈ X, we have f(X) L. Define h(x) := limn→∞ hn(x)1L(x), in view of Remark 2.2 (7), the proof is complete. 3 Distribution as Induced Measure def:Distn Definition 3.1. Let (X,X , μ) be a measure space and consider f : (X,X )→ (Y,Y ). For the rest of the course, we will use the following abbreviation/notation μ({x ∈ X : f(x) ∈ B}) = μ(f ∈ B) = μ(f 1(B)) = μ f 1(B) =: μf (B), B ∈ Y , 8where we note for B /∈ Y the left hand side does not make sense. μf is also called the measure induced by f . On a probability space ( ,A ,P), for Y : ( ,A ) → (Y,Y ), PY is called the (probabilistic) distribution of Y . Using Lemma 2.3, we yield the result below. Theorem 3.2. μf is a measure on (Y,Y ). Definition 3.3. Let P be a probability on (R,B(R)). The cumulative distribution function (CDF) induced by P is defined as F (r) := P(( ∞, r]). If P = PY for some real-valued random variable Y , we call F the CDF of Y . Remark 3.4. Using Remark 1.7 and Remark 1.9 (3), we can show that if two probability measure induces the same distribution function, then the two measures must coincides. Remark 3.5. One important reason to adopt such framework is that it justifies the existence of continuous time random process, in terms of the result known as see Kolmogorov extension theorem (cf. [A&B, Section 15.6]). thm:DistFunc Theorem 3.6. Let F be a CDF on R. Then, (a) F is non-decreasing; (b) F is right-continuous on R, that is, limz→r+ F (z) = F (r) for r ∈ R; (c) limr→ ∞ F (r) = 0 and limr→∞ F (r) = 1; (d) F has left limit on R, that is, for any r ∈ R and (rn)n∈N increasing to r we have (F (rn))n∈N converges; additionally, F (r ) := limz→r F (z) = P(( ∞, r)); (e) F has at most countably many jumps. Proof. DIY. Remark 3.7. In view Remark 1.7, using Carathe′odory extension theorem (cf. Remark 1.9(2)), we can show that a function F satisfying conditions (a) (b) (c) above characterizes a probability measure P on (R,B(R)). The result below is an immediate consequence of Theorem 3.6. Corollary 3.8. Let P be a probability measure on (R,B(R)) and F be the corresponding CDF. Then, for any real numbers x < y, (a) P((x, y]) = F (y) F (x); (b) P([x, y]) = F (y) F (x ); (c) P([x, y)) = F (y ) F (x ); (d) P((x, y)) = F (y ) F (x ); (e) P({x}) = F (x) F (x ). 94 Expectation as Lebesgue Integral In what follows, we consider the extended real line R := R∪{ ∞,∞} with following rules 0×∞ = 0, 0× ( ∞) = 0, a±∞ = ±∞ and a× (±∞) = sgn(a) · ∞ for a ∈ R. Let (X,X , μ) be a measure space. We want to define an integral of a real-valuedX -measurable function with respect to μ. We first define the integral for simple random variable. def:LebIntSimple Definition 4.1. Suppose f is a simple function of the form f(x) = ∑n k=1 rk1Ak(x) with rk ∈ R and Ak ∈X for k = 1, . . . , n. We define∫ X f(x)μ(dx) := n∑ k=1 rkμ(Ak). Note, in particular, μ(A) = ∫ X 1A(x)μ(dx) for A ∈X . The lemma below argues that ∫ X f(x)μ(dx) is defined uniquely. lem:IntSimpleVerInv Lemma 4.2. Suppose f(ω) = ∑n k=1 rk1Ak(ω) = ∑m k=1 k1Bk(ω) for some rk, k ∈ R and Ak, Bk ∈ X for k = 1, . . . , n. Then, ∑n k=1 rkμ(Ak) = ∑m k=1 skμ(Bk). Proof. DIY. def:LebIntPosX Definition 4.3. Suppose f ≥ 0 (here f may take values in [0,+∞]). We define∫ X f(x)μ(dx) := sup {∫ X g(x)μ(dx) : g is simple real-valued X -measurable function and 0 ≤ g ≤ f } . def:LebInt Definition 4.4. Let f be a real-valued X -measurable function. We write f+ := f1{f≥0} and f := f1{f<0}. If ∫ X f +(x)μ(dx) < ∞ or ∫X f (x)μ(dx) < ∞, the Lebesgue integral (of f with respect to μ) is defined as∫ X f(x)μ(dx) := ∫ X f+(x)μ(dx) ∫ X f (x)μ(dx) We say f is integrable, if both ∫ X f +(x)μ(dx) <∞ and ∫X f (x)μ(dx) <∞, or equivalently,∫ X |f |(x)μ(dx) <∞. We use L1(X,X , μ) for the set of integrable functions. Furthermore, for p ∈ (0,∞), we let Lp(X,X , μ) be the set of real-valuedX -measurable functions such that ∫X |f(x)|pμ(dx) <∞, and L∞(X,X , μ) the set of real-valued X -measurable functions such that μ({x : |f(x)| > M}) = 0 for some M > 0. Remark 4.5. If f = u + iv is a complex-valued function and ∫ X(|u(x)| + |v(x)|)μ(dx) is finite, we define ∫ X f(x)μ(dx) := ∫ X u(x)μ(dx) + i ∫ X v(x)μ(dx). Definition 4.6. Let A ∈X . We write∫ A f(x)μ(dx) := ∫ f(x)1A(x)μ(dx). 10 Definition 4.7. Following Definition 4.4, set (X,X , μ) = ( ,A ,P) as a probability space, for a real-valued A -random variable Y , the expectation of Y is defined as the Lebesgue integral, EP(Y ) := ∫ Y (ω)P(dω). When no confusion arise, we simply write E(Y ). The proposition below follows immediately from the definitions above. prop:ExpnBasic Proposition 4.8. Let (X,X , μ) be a measure space. Let f and g be real-valued X -random vari- ables. Then following is true: (a) if f and g are integrable and f ≤ g, then ∫X f(x)μ(dx) ≤ ∫X g(x)μ(dx); (b) if f is integrable, then ∫ X cf(x)μ(dx) = c ∫ X f(x)μ(dx) for c ∈ R; (c) if A ∈X satisfies μ(A) = 0 and f ≥ 0, then ∫X f(x)1A(x)μ(dx) = 0. The theorem below is one of the most important result concerning Lebesgue integral. thm:PreMonoConv Theorem 4.9 (Monotone Convergence). Let (fn)n∈N be a sequence of non-negative real-valued X – measurable function such that f ′n ≥ fn for n′ ≥ n and limn→∞ fn(x) = f(x) for x ∈ X. Then, lim n→∞ ∫ X fn(x)μ(dx) = ∫ X f(x)μ(dx). Proof. By Proposition 4.8 (b), ( ∫ X fn(x)μ(dx))n∈N is an increasing sequence of real numbers and∫ X fn(x)μ(dx) ≤ ∫ X f(x)μ(dx). Let L be the limit. We thus have ∫ X f(x)μ(dx) ≥ L. What is left to prove is ∫ X fn(x)μ(dx) ≤ L. To this end let g(x) = ∑ k=1 rk1Ak(x) be a simple function such that g ≤ f . Let c ∈ (0, 1) and Bn = {x ∈ X : fn(x) ≥ cg(x)}. Note that (Bn)n∈N increases to X. It follows from Proposition 4.8 (c) that L ≥ ∫ X fn(x)μ(dx) ≥ ∫ Bn fn(x)μ(dx) ≥ c ∫ Bn g(x)μ(dx) = c ∑ k=1 rkμ(Ak ∩Bn). Note that (Ak ∩Bn)n∈N increases to Ak. Applying Theorem 1.16 (c) to the right hand side above, we have L ≥ c ∫X g(x)μ(dx). Since c ∈ (0, 1) is arbitrary, we have L ≥ ∫X g(x)μ(dx). In view of Definition 4.3, the proof is complete. Thanks to monotone convergence theorem, we are now in position to establish the linearity of Lebesgue integral. lem:IntSum Lemma 4.10. Let f and g be real-valued non-negative X -measurable functions. Then,∫ X (f(x) + g(x))μ(dx) = ∫ X f(x)μ(dx) + ∫ X g(x)μ(dx). 11 Proof. First suppose f and g are simple, say, f(x) = ∑m k=1 rk1Ak(x) and g(x) = ∑n k=1 sk1Bk(x). Then, (f + g)(x) = ∑m+n k=1 tk1Ck(x), where tk = rk, Ck = Ak for k = 1, . . . ,m, and tk = sk m, Ck = Bk m for k = n+ 1, . . . ,m+ n. If follows that∫ X (f(x) + g(x))μ(dx) = m+n∑ k=1 tkμ(Ck) = m∑ k=1 rkμ(Ak) + n∑ k=1 skμ(Bk) = ∫ X f(x)μ(dx) + ∫ X g(x)μ(dx). Next, we suppose f and g are non-negative. In view of Theorem 2.6, we let (fn)n∈N and (gn)n∈N be sequences of simple function increasing to f and g, respectively. Note that (fn + gn)n∈N also increases to f + g. Invoking monotone convergence (Theorem 4.9), the proof is complete. The results above imply that the Lebesgue integral is a linear functional on L1(X,X , μ). We formulates such linearity into the theorem below. thm:ExpnLinear Theorem 4.11. Suppose f, g ∈ L1(X,X , μ). Then, for any a, b ∈ R we have∫ X (af(x) + bg(x))μ(dx) = a ∫ X f(x)μ(dx) + b ∫ X g(x)μ(dx). Proof. DIY. The next theorem is useful, as it is a vital tools for calculating Lebesgue integral. It shows in particular that the expectations of g(Y ) only depends on the distribution of Y . thm:ExpnRule Theorem 4.12. Consider f : (X,X )→ (Y,Y ) and g : (Y,Y )→ (R,B(R)). (a) g f ∈ L1(X,X , μ) if and only if g ∈ L1(Y,Y , μf ); (b) If either g ≥ 0, or the equivalent conditions in (a) is satisfied, then∫ X g(f(x))μ(dx) = ∫ Y g(y)μf (dy). Proof. Recall Definition 3.1 and 4.1. Then,∫ X 1f 1(B)(x)μ(dx) = μ(f ∈ B) = μf (B) = ∫ B μf (dx). This proves (b) for g being simple function. Suppose g ≥ 0. In view of Theorem 2.6, we let (gn)n∈N be a sequence of simple Y -measurable function increasing to g. Note gn f also increases to g f . By monotone convergence (Theorem 4.9),∫ X g f(x)μ(dx) = lim n→∞ ∫ X gn f(x)μ(dx) = lim n→∞ ∫ Y gn(y)μ f (dy) = ∫ Y g(y)μf (dy), This proves (b) for g ≥ 0, and (a) follows immediately by substituting g above with |g|. Finally, for g ∈ L1, invoking the decomposition that g = g+ g finishes the proof. Proposition 4.13. We have Lp( ,A ,P) Lq( ,A ,P) for 1 ≤ p ≤ q ≤ ∞. 12 Proof. If q = ∞, there is M > 0 such that P(|Y | ≤ M) = 1 and thus P(|Y |p ≤ M) = 1, which implies that |Y |p is integrable. Now suppose q <∞. Let Y ∈ Lq. Since 1{|Y |≤1} + 1{|Y |>1} = 1, by Lemma 4.10, E(|Y |p) = E (|Y |p1{|Y |≤1})+ E (|Y |p1{|Y |>1}) ≤ 1 + E (|Y |q1{|Y |>1}) ≤ 1 + E(|Y |q) <∞, which implies Y ∈ Lp and thus completes the proof. Remark 4.14. The same is in general not true for infinite measure (why ). 5 Lebesgue Measure and Density Function sec:LebMeas def:LebMeasure Definition 5.1. A Lebesgue measure on (Rn,B(Rn)), denoted by λn, is a measure satisfying λ([a1, b1]× · · · × [an, bn]) = (b1 a1)× · · · × (bn an), ak ≤ bk, k = 1, . . . , n. rmk:LebMeas Remark 5.2. 1. In view of Remark 1.7 and 1.9 (3), we know Definition 5.1 defines a measure uniquely (if exists). The existence is a consequence of Carathe′odory extension theorem (cf. Remark 1.9(2)). In fact, we can define Lebesgue measure for a σ-algebra larger than B(R), and such σ-algebra is call Lebesgue σ-algebra. 2. We wonder whether we can define Lebesgue measure for 2R n . This turns out to be not possible. A counter example on R is available at [B, Section 4.4]. Example 5.3. In this example, we show that on (Rn,B(Rn)), λ(a × R . . .R) = 0. To this end let Ak,n := [a 1k , a + 1k ] × [ n, n] × · · · × [ , ], and λ(Ak, ) = 2(2 )n/k. Invoking Theorem 1.16 (e), we have λ((a × [ , ] × [ , ]) = 0. It follows from Theorem 1.16 (c) that (Rn,B(Rn)), λ(a × R . . .R) = 0. A similar (but more tedious) argument shows that a hyperplane has zero Lebesgue measure. The proposition below argue that Lebesgue integral extends Riemann integral. Proposition 5.4. Suppose f : (Rn,B(Rn)) → (R,B(R)) is Riemann integrable on [a1, b1] × · · · × [an, bn]. Then, f is also Lebesgue integrable on the rectangle, and the Riemann integral coincides with the Lebesgue integral. Proof. See Section 7 of ‘Lebesgue Integration on Euclidean Space’ by Frank Jones. Remark 5.5. On the other hand, not every Lebesgue integral make sense as a Riemann integral. An example will be provided in HW. From now on, we will omit λ from λ(dx) when writing Lebesgue integral with respect to Lebesgue measure. Note that for integral on Rn with n > 1, the dummy variable x ∈ Rn is a n-dimensional vector, i.e., x = (x1, . . . , xn). The following notations are equivalent,∫ Rn f(x)dx = ∫ Rn f(x1, . . . , xn) dx1 . . . dxn, The notation on the right hand sides deserves more discussion in later sections on product measures and independence. 13 def:PDF Definition 5.6. Let μ be a measure on (Rn,B(Rn)). We say μ is absolutely continuous with respect to Lebesgue measure if there is a non-negative f : (Rn,B(Rn))→ (R,B(R)) such that μ(A) = ∫ A f(x) dx, A ∈ B(Rn). In this case, we call f the density function of μ. If μ = P is a probability, we call f the probabilistic density function (PDF) of P. If μ = PX , we call f the PDF of X. Remark 5.7. 1. The notion of absolute continuity between measures is studied in a boarder setup. Consider a measurable space (X,X ). We say μ is absolutely continuous with respect to ν if for any A ∈ X with ν(A) = 0 we have μ(A) = 0. By Radon-Nikodym theorem, if μ is absolutely continuous with respect to ν, there is a real-valued X -measurable f such that μ(A) = ∫ A f(x)ν(dx), A ∈ A , where the f is unique up to a set with 0 measure under ν and is called the Radon-Nikodym derivative. We refer to [B, Section 13] for the detailed statement and proof. 2. For a measure μ on (Rn,B(Rn)), it needs not to be absolutely continuous w.r.t. λ, and there may not exists a density function. In general, we have the following decomposition μ = μD + μC + μS , where μD is a measure with atoms only, μC is a measure that is absolutely continuous w.r.t. λ, and μS is a measure with no atom but not absolutely continuous w.r.t. λ. We refer to ……. for further discussion. The next result is immediate from Definition 1.8 and 5.6. Proposition 5.8. Let f be a PDF of some Rn-valued random variables Y , then f ≥ 0, λ a.s. and ∫ Rn f(x)dx = 1. (5.1) eq:PDF Proof. We claim that for k ∈ N and Ak := {x ∈ Rn : f(x) < 1k}, we must have λ(Ak) = 0. Indeed, suppose otherwise, we have PY (Ak) = ∫ Ak f(x)dx ≤ k 1λ(Ak) < 0, contradicting the hypothesis that PY is a probability. Let A := {x ∈ Rn : f(x) < 0}. Note (Ak)k∈N increases to A. By Theorem 1.16 (c), we have λ(A) = 0. Regarding ∫ Rn f(x)dx = 1, it follows immediately from Definition 5.6 and that PY (Rn) = 1. Conversely, we can use f satisfying (5.1) to define a probability measure on (Rn,B(Rn)). Proposition 5.9. Suppose f : (Rn,B(Rn))→ (R,B(R)) satisfies (5.1). Then, μ(A) := ∫ A f(x) dx, A ∈ B(Rn) is a probability on (Rn,B(Rn)). Proof. DIY. 14 From now on, we call f a PDF as long as f satisfies (5.1). Below is a continuation of Theorem 4.12. Theorem 5.10. Let f be an Rn-valued measurable function and g : (R,B(R))→ (R,B(R)). Suppose Y admits a PDF f . If g ≥ 0, or fg ∈ L1(R,B(R), λ), then E(g(Y )) = ∫ R g(r)f(r) dr. Proof. DIY. The proposition below provide an alternative expression for expectation with non-negative real- valued random variable. prop:NonNegExpn Proposition 5.11. Let F be the CDF of a non-negative real-valued random variable Y . Then, E(Y ) = ∫ R+ (1 F (r)) dr, where the right hand side is understood as a integral with respect to Lebesgue measure (see Definition 5.1). Proof. DIY. Remark 5.12. In fact, a similar formula for random variable that is not necessarily non-negative is also possible. This can be easily proved with the notion of Lebesgue-Stieltjes integral. To heuristically derive an expression, we can assume Y is bounded and has a PDF, then we yield E(Y ) = ∫ R+ (1 F (r)) dr ∫ R F (r) dr. 6 Independence and Product Measures def:Indep Definition 6.1. Consider the probability space ( ,A ,P). Two event A,A′ ∈ A are independent if P(A ∩ A′) = P(A)P(A′). A sequence of events (An)n∈N A is pairwise independent if P(Ai ∩ Aj) = P(Ai)P(Aj) for i = j, is mutually independent if for any subset I N we have P( n∈I An) = ∏ n∈I P(An). Two random variables Y : ( ,A ) → (Y,Y ) and Z : ( ,A ) → (Z,Z ) independent if {ω ∈ : Y (ω) ∈ B} and {ω ∈ : Z(ω) ∈ C} are independent for any B ∈ Y and C ∈ Z . Equivalently, we write P(Y ∈ B,Z ∈ C) = P(Y ∈ B)P(Z ∈ C), B ∈ Y , C ∈ Z . A sequence of random variables (Yn)n∈N is pairwise independent if Yi and Yj are independent for any i = j, is mutually independent if for any I N P( n∈I {Yn ∈ Bn}) = ∏ n∈I P(Yn ∈ Bn), Bn ∈ Yn, n ∈ I. 15 thm:Indp Theorem 6.2. Let Y and Z be real-valued random variables. Then, Y and Z are independent, if and only if E(f(Y )g(Z)) = E(f(Y ))E(g(Z)) for any non-negative f : (Y,Y ) → (R,B(R)) and g : (Z,Z )→ (R,B(R)). Proof. The ‘if’ direction is immediately when we take f and g as indicators. Regarding the ‘only if’ direction, an application of simple function approximation and monotone converge finishes the proof. Remark 6.3. If we replace the f and g above by bounded measurable functions, the theorem is still true. But be careful when dealing integrable functions in a similar setting, as the product of integrable functions need not be integrable. Remark 6.4. Suppose Y and Z are metric spaces endowed with the corresponding Borel σ-algebra. For Y and Z to be independent, it is sufficient to have E(f(Y )g(Z)) = E(f(Y ))E(g(Z)) for any bounded continuous f and g. The proof of this statement involves more delicate treatment on the related σ-algebra. We refer to ........ The following technical result will be useful later. lem:BorelCantelli Lemma 6.5 (Borel-Cantelli). On ( ,A ,P), let (An)n∈N An. The following is true: (a) if ∑ n∈N P(An) <∞, then P( n∈N k≥nAk) = 0; (b) if P( n∈N k≥nAk) = 0 and (An)n∈N is mutually independent, we have ∑ n∈N P(An) <∞. Proof. (a) By Theorem 1.16 (a) (b), P( n∈N k≥n Ak) ≤ P( k≥m Ak) ≤ ∑ k≥m P(Ak), m ∈ N. By the hypothesis that ∑ n∈N P(An) < ∞, the right hand side above tends to 0 as m → ∞. The proof is complete. (b) Note that, by Theorem 1.16 (c) (e) (d), P( n∈N k≥n Ak) = lim n→∞P( k≥n Ak) = lim n→∞ limm→∞P( m k=1 Ak) = lim n→∞ limm→∞(1 P( m k=1 Ack)) = lim n→∞ limm→∞(1 m∏ k=1 P(Ack)) = 1 limn→∞ limm→∞ m∏ k=1 (1 P(Ak)). This together with the hypothesis that P( n∈N k≥nAk) = 0, we have lim n→∞ limm→∞ m∏ k=1 (1 P(Ak)) = 1. By taking log, we have lim n→∞ limm→∞ m∑ k=n log(1 P(Ak)) = lim n→∞ ∑ k≥n log(1 P(Ak)) = 0. It follows that ∑ k∈N log(1 P(Ak)) is a converging sum with non-positive summands. Because | log(1 z)| ≥ z for z ∈ [0, 1], we conclude to proof. 16 In view of Definition 3.1 and 6.1, for independent Y,Z and non-negative B ∈ Y , C ∈ Z , we have P(Y,Z)(B × C) = PY (B)PZ(C). This motivates the following notions of product measures. def:ProdMeas Definition 6.6. Consider two measurable spaces (X,X ) and (S,S ). Let μ and ν be measures on (X,X ) and (S,S ), respectively. The (Cartesian) product of sets A and B is defined as A × B := {(a, b) : a ∈ A, b ∈ B}. In particular, X× S = {(x, s) : x ∈ X, s ∈ S}. The product σ-algebra of X and S is defined as X S := σ({A×B : A ∈X , B ∈ S }). The product measure of μ and ν, denoted by μ ν, is defined as the measure on (X×S,X S ) that satisfies μ ν(A×B) = μ(A)× ν(B) for any A ∈X and B ∈ S . Remark 6.7. 1. One way to establish the existence of product measure is to use Carathe′odory ex- tension theorem (cf. Remark 1.9). Alternatively, we can also define μ ν(C) as ∫Y μ(Cs)ν(ds) or ∫ X ν(Cx)μ(dx) for C ∈ X S , where Cs := {x ∈ X : (x, s) ∈ C} and Cx := {s ∈ S : (x, y) ∈ C}. Note that it is not trivial to justify the above-mentioned definition. 2. We can also separately show the uniqueness using monotone class theorem (cf. Remark 1.4), in case some versions of Carathe′odory extension theorem does not cover the uniqueness. prop:Ind