10 Construction of the Lebesgue Integral

The classical Riemann integral runs into difficulties with certain functions.

Example 10.1

Integrating \(\int_0^1\sqrt{x}dx\) requires taking the limit \(\lim_{\varepsilon\downarrow 0}\int_\varepsilon^1\sqrt{x}dx\).
The integral lacks some completeness. If functions \(f_n\) continuous on \([0,1]\) and \(|f_n(x)| \le 1\) for all \(n\) and \(x\), while \(f_n(x)\) converges for all \(x\) to some \(f(x)\), then the Riemann integral \(\int_0^1 f_n(x)dx\) always converge, but the Riemann integral \(\int_0^1 f(x)dx\) may not be defined

The Lebesgue integral will make \(\int_0^1 \sqrt{x} dx\) defined without any special, ad hoc limit process; and the second integral will always be defined as a Lebesgue integral and will be the limit of integrals of \(f_n\), while the Lebesgue integrals of Riemann integrable functions equal the Riemann integrals.

The Lebesgue integral also applies to functions on spaces much more general than \(\mathbb R\), and with respect to general measures.

10.1 Simple Functions

Definition 10.1 A measurable space is a pair \((X, \mathcal S)\) where \(X\) is a set and \(\mathcal S\) is a \(\sigma\)-algebra of subsets of \(X\).

Definition 10.2 A simple function on \(X\) is any finite sum \[ f = \sum_i a_i 1_{B(i)}, \text{ where } a_i\in R \text{ and } B(i) \in \mathcal S \tag{10.1}\]

If \(\mu\) is a measure on \(\mathcal S\), we call \(f\) \(\mu\)-simple iff it is simple and can be written in the form 10.1 with \(\mu(B(i)) < \infty\) for all \(i\).

Example 10.2 If \(\mu(X) = \infty\), then \(0 = a_1 1_{B(1)} + a_2 1_{B(2)}\) for \(B(1) = B(2) = X\), \(a_1=1\) and \(a_2=-1\), but \(0\) is a \(\mu\)-simple function. Hence, the definition of \(\mu\)-simple only requires the existence of \(B(i)\) of finite measure and \(a_i\) for which 10.1 holds.

Any finite collection of sets \(B(1), ..., B(n)\) generates an algebra \(\mathcal A\). A non-empty set \(A\) is called an atom of an algebra \(\mathcal A\) iff \(A \in \mathcal A\) and for all \(C \in \mathcal A\), either \(A\subset C\) or \(A \cap C = \varnothing\).

The following proposition gives a method to break down a finite collection of sets into atoms.

Proposition 10.1 Let \(X\) be any set, \(B(1),...,B(n)\) any subsets of \(X\). Let \(\mathcal A\) be the smallest algebra of subsets of \(X\) containing \(B(i)\) for \(i = 1,...,n\).

Let \(\mathcal C\) be the collection of all intersections \(\bigcap_{1\le i\le n}C(i)\) where for each \(i\), either \(C(i) = B(i)\) or \(C(i) = X \setminus B(i)\). Then \(\mathcal C \setminus \{\varnothing\}\) is the set of all atoms of \(\mathcal A\), and every set in \(\mathcal A\) is the union of these atoms.

Proof

Any two elements of \(\mathcal C\) are disjoint (for some \(i\), one is included in \(B(i)\), the other in \(X \setminus B(i)\)). The union of \(\mathcal C\) is \(X\). Thus the set of all unions of members of \(\mathcal C\) is an algebra \(\mathcal B\).

Each \(B(i)\) is the union of all the intersections in \(\mathcal C\) with \(C(i) = B(i)\). Thus \(B(i) \in \mathcal B\) and \(\mathcal A \subset \mathcal B\). Clearly \(\mathcal B \subset \mathcal A\), so \(\mathcal B = \mathcal A\).

Each non-empty set in \(\mathcal C\) is thus an atom of \(\mathcal A\). A union of two or more distinct atoms is not an atom, so \(\mathcal C \setminus \{\varnothing\}\) is the set of all atoms of \(\mathcal A\), and the rest follows.

Now, any simple function \(f\) can be written as \(\sum_{1 \le j \le M}b_j 1_{A(j)}\) where the \(A(j)\) are disjoint atoms of the algebra \(\mathcal A\) generated by the \(B(i)\)’s, and by Proposition 10.1, we have \(M \le 2^n\). Thus in 10.1 we may assume that the \(B(i)\) are disjoint. Then, if \(f(x) \ge 0\) for all \(x\), we will have \(a_i\ge 0\) for all \(i\).

10.2 Integration of Simple Function

If \((X, \mathcal S,\mu)\) is any measure space, \(f\) any simple function on \(X\) with \(a_i \ge 0\) for all \(i\), the integral of \(f\) with respect to \(\mu\) is defined by

\[ \int f\, d\mu := \sum_i a_i \mu(B(i)) \in [0, \infty] \tag{10.2}\]

where \(0 \cdot \infty\) is taken to be \(0\). But firstly we must prove the following fact.

Proposition 10.2 For any nonnegative simple function \(f\), \(\int f\, d\mu\) is well-defined.

Proof

Suppose \(f = \sum_{i \in F} a_i 1_{E(i)} = \sum_{j \in G} b_j 1_{H(j)}\); \(E_i := E(i), H_j := H(j)\), where all \(a_i, b_j\) are nonnegative, \(F, G\) are finite, and \(E_i, H_j \in \mathcal S\).

Then we may assume that \(H_j\) are disjoint atoms of the algebra generated by \(H_j\)’s and \(E_i\)’s. In that case, \(b_j = \sum\{a_i : E_i \supset H_j\}\).

Thus \[ \begin{align*} \sum_j b_j \mu(H(j)) &= \sum_j \mu(H(j)) \sum \{a_i: E_i \supset H_j\} \\ &= \sum a_i \sum \{\mu(H_j) : H_j \subset E_i\} = \sum a_i \mu(E_i) \end{align*} \]

If \(f, g\) are simple, then clearly \(f+g\), \(fg\), \(\max(f,g)\) and \(\min(f,g)\) are all simple.

Corollary 10.1 It follows directly from 10.2 and Proposition 10.2 that if \(f, g \ge 0\) simple functions, \(c > 0\) a constant, then

\(\int f+g\,d\mu = \int f\,d\mu + \int g\,d\mu\)
\(\int cf\,d\mu = c\int f\,d\mu\)
If \(0 \le f \le g\), then \(\int f\,d\mu \le \int g\,d\mu\).
For \(E \in \mathcal S\) let \(\int_E f\,d\mu := \int f 1_E\,d\mu\). Then, for example, \(\int_E 1_A\,d\mu = \mu(A\cap E)\)

10.3 Measurability

Definition 10.3 If \((X, \mathcal S)\) and \((Y, \mathcal B)\) are measurable spaces, \(f: X\to Y\), then \(f\) is called measurable iff \(f^{-1}(B) \in \mathcal S\) for all \(B \in \mathcal B\).

If \(Y = \mathbb R\) or \([-\infty, \infty]\), then the \(\sigma\)-algebra \(\mathcal B\) for measurability of functions into \(Y\) will be the \(\sigma\)-algebra of Borel sets generated by all (bounded or unbounded) intervals or open sets. If \(X = \mathbb R\), with \(\sigma\)-algebras \(\mathcal B\) of Borel sets and \(\mathcal L\) of Lebesgue measurable sets, \(f\) is called

Borel measurable iff it is measurable on \((\mathbb R, \mathcal B)\)
Lebesgue measurable iff it is measurable on \((\mathbb R, \mathcal L)\)

For example, if \(X = Y\) and \(f\) is the identity function, measurability means that \(\mathcal B \subset \mathcal S\).

Similarly, in general, for measurability, the \(\sigma\)-algebra \(\mathcal S\) on the domain space needs to be large enough, and/or the \(\sigma\)-algebra \(\mathcal B\) on the range space not too large.

Definition 10.4 Given any measure space \((X, \mathcal S, \mu)\) and any measurable function \(f: X \to [0, \infty]\), we define \[ \int f\,d\mu := \sup \bigg\{\int g\,d\mu: 0 \le g\le f, g\text{ simple} \bigg\} \]

For \(a_n \in [-\infty,\infty]\), \(a_n \uparrow\) means \(a_n \le a_{n+1}\) for all \(n\). If \(a = \infty\), this means that for all \(M < \infty\), there is a \(K < \infty\) such that \(a_n > M\) for all \(n > K\).

The following fact gives a handy approach to the integral of a nonnegative measurable function as the limit of a sequence, rather than a more general supremum.

Proposition 10.3 For any measurable \(f \ge 0\), there exist simple \(f_n\) with \(0 \le f_n \uparrow f\), meaning that \(0 \le f_n(x) \uparrow f(x)\) for all \(x\). For any such sequence \(f_n\), \(\int f_n\,d\mu \uparrow \int f\,d\mu\)

Definition 10.5 A \(\sigma\)-ring is a collection \(\mathcal R\) of sets, with

\(\varnothing \in \mathcal R\)
\(A, B \in \mathcal R\) implies \(A \setminus B \in \mathcal R\)
\(\bigcup_{j \ge 1} A_j \in \mathcal R\) whenever \(A_j \in \mathcal R\) for \(j = 1,2,...\)

Any \(\sigma\)-algebra is a \(\sigma\)-ring. Conversely, a \(\sigma\)-ring \(\mathcal R\) of a set \(X\) is a \(\sigma\)-algebra iff \(X \in \mathcal R\).

Example 10.3 The set of all countable subsets of \(\mathbb R\) is a \(\sigma\)-ring which is not a \(\sigma\)-algebra

A \(\sigma\)-ring is said to be generated by \(\mathcal C\) iff \(\mathcal R\) is the smallest \(\sigma\)-ring including \(\mathcal C\). The following theorem gives an easier criteria for measurability of functions.

Theorem 10.1 Let \((X, \mathcal S)\) and \((Y, \mathcal B)\) be measurable spaces. Let \(\mathcal B\) be generated by \(\mathcal C\). Then a function \(f: X \to Y\) is measurable iff \(f^{-1}(C) \in \mathcal S\) for all \(C \in \mathcal C\).

The same is true if \(X\) is a set, \(\mathcal S\) is a \(\sigma\)-ring of subsets of \(X\), \(Y = \mathbb R\) and \(\mathcal B\) is the \(\sigma\)-ring of Borel subsets of \(\mathcal R\) not containing \(0\).

Proof

“Only if” is clear.

To prove “if”, let \(\mathcal D := \{D \in \mathcal B: f^{-1}(D) \in \mathcal S\}\). We are assuming \(\mathcal C \subset \mathcal D\). If \(D_n \in \mathcal D\) for all \(n\), then \(f^{-1}(\bigcup_n D_n) = \bigcup_n f^{-1}(D_n)\), so \(\bigcup_n D_n \in \mathcal D\).

If \(D \in \mathcal D\) and \(E \in \mathcal D\), then \(f^{-1}(E\setminus D) = f^{-1}(E) \setminus f^{-1}(D) \in \mathcal S\), so \(E\setminus D \in \mathcal D\).

Thus \(\mathcal D\) is a \(\sigma\)-ring, and if \(\mathcal S\) is a \(\sigma\)-algebra, we have \(f^{-1}(Y) = X \in \mathcal S\), so \(Y\in \mathcal D\). In either case, \(\mathcal B \subset \mathcal D\) and so \(\mathcal B = \mathcal D\).

Example 10.4 A reasonably small collection \(\mathcal C\) of subsets of \(\mathbb R\) which generates the whole Borel \(\sigma\)-algebra is the set of all half-lines \((t, \infty)\) for \(t\in \mathbb R\). Hence, to show that a real-valued function \(f\) is measurable, it is enough to show that \(\{x: f(x) > t\}\) is measurable for each real \(t\).

Proposition 10.4 Let \((X, \mathcal A), (Y, \mathcal B), (Z, \mathcal C)\) be measurable spaces. If \(f: X\to Y\) and \(g: Y\to Z\) are measurable, then for any \(C \in \mathcal C\), \((g\circ f)^{-1}(C) = f^{-1}(g^{-1}(C)) \in \mathcal A\), since \(g^{-1}(C) \in \mathcal B\).

Thus \(g\circ f\) is measurable from \(X\) into \(Z\). The proof is similar to the proof of Theorem 4.5.

Definition 10.6 On the Cartesian product \(Y\times Z\), let \(\mathcal B \otimes \mathcal C\) be the \(\sigma\)-algebra generated by the set of all “rectangles” \(B \times C\) with \(B \in \mathcal B\) and \(C\in \mathcal C\). Then \(\mathcal B \otimes \mathcal C\) is called the product \(\sigma\)-algebra on \(Y \times Z\).

A function \(h: X \to Y\times Z\) is of the form \(h(x) = (f(x), g(x))\) for some \(f: X\to Y\) and \(g: X\to Z\). By Theorem 10.1, \(h\) is measurable iff both \(f\) and \(g\) are measurable, considering rectangles \(B \times Z\) and \(Y \times C\) for \(B \in \mathcal B\) and \(C \in \mathcal C\).

Recall that a second-countable topology, by Proposition 2.1, has a countable base and that a Borel \(\sigma\)-algebra is generated by a topology. The next fact will be especially useful when \(X=Y=\mathbb R\).

Proposition 10.5 Let \((X, \mathcal T), (Y, \mathcal U)\) be any two topological spaces, and their Borel \(\sigma\)-algebras be \(\mathcal B(X, \mathcal T)\) and \(\mathcal B(Y, \mathcal U)\).

Then the Borel \(\sigma\)-algebra \(\mathcal C\) of the product topology on \(X\times Y\) includes the product \(\sigma\)-algebra \(\mathcal B(X, \mathcal T) \otimes \mathcal B(Y, \mathcal U)\).

If both \((X, \mathcal T)\) and \((Y, \mathcal U)\) are second-countable, then the two \(\sigma\)-algebras on \(X\times Y\) are equal.

Proof

For any set \(A \subset X\), let \(\mathcal U(A)\) be the set of all \(B \subset Y\) such that \(A \times B \in \mathcal C\). Now \(B \mapsto A \times B\) preserves set operations, specifically: for any \(B \subset Y\), \(A\times (Y\setminus B) = (A \times Y)\setminus (A\times B)\), and for any \(B_n \subset Y\), \(\bigcup_n (A\times B_n) = A \times \bigcup_n B_n\). It follows that \(\mathcal U(A)\) is a \(\sigma\)-algebra of subsets of \(Y\). It includes \(\mathcal U\) and hence \(\mathcal B(Y, \mathcal U)\).

Then, for \(B \in \mathcal B(Y, \mathcal U)\), let \(\mathcal T(B)\) be the set of all \(A\subset X\) such that \(A \times B \in \mathcal C\). Then \(X \in \mathcal T(B)\), and \(\mathcal T(B)\) is a \(\sigma\)-algebra. It includes \(\mathcal T\), and hence \(\mathcal B(X, \mathcal T)\).

Thus, the product \(\sigma\)-algebra of the Borel \(\sigma\)-algebras is included is included in the Borel \(\sigma\)-algebra \(\mathcal C\) of the product.

In the other direction, suppose \((X, \mathcal T)\) and \((Y, \mathcal U)\) are second-countable. The product topology has a base \(\mathcal W\) consisting of all sets \(A \times B\) where \(A\) belongs to a countable base of \(\mathcal T\) and \(B\) to a countable base of \(\mathcal U\). Then the \(\sigma\)-algebra generated by \(\mathcal W\) is the Borel \(\sigma\)-algebra of the product topology. It is clearly included in the product \(\sigma\)-algebra.

The usual topology on \(\mathbb R\) is second-countable, by Proposition 2.1 (or since the intervals \((a, b)\) for \(a, b\) rational form a base).

Thus, any continuous function from \(\mathbb R \times \mathbb R\) to \(\mathbb R\) (or any topological space), being measurable for the Borel \(\sigma\)-algebras, is measurable for the product \(\sigma\)-algebra on \(\mathbb R \times \mathbb R\) by Proposition 10.5.

In particular, addition and multiplication are measurable from \(\mathbb R \times \mathbb R\) into \(\mathbb R\). Thus, for any measurable spaces \((X, \mathcal S)\) and any two measurable real-valued functions \(f\) and \(g\) on \(X\), \(f+g\) and \(fg\) are measurable.

10.4 Integrals of Nonnegative Measurable Functions

Let \(\mathcal L^0(X, \mathcal S)\) denote the set of all measurable real-valued functions on \(X\) for \(\mathcal S\). Then since constant functions are measurable, \(\mathcal L^0(X, \mathcal S)\) is a vector space over \(\mathbb R\) for the usual operations of addition and multiplication by constants, \((f+g)(x) := f(x)+g(x)\) and \((cf)(x) := cf(x)\) for any constant \(c\).

Proposition 10.6 For any measure space \((X, \mathcal S, \mu)\) and any two measurable functions \(f, g: X \to [0, \infty]\), \[ \int f+g\,d\mu = \int f\,d\mu + \int g\,d\mu \]

Proof

First, \((f+g)(x) = \infty\) iff at least one of \(f(x)\) or \(g(x)\) is \(\infty\). The set where this happens is measurable, and \(f+g\) is measurable on it.

Restricted to the set where both \(f\) and \(g\) are finite, \(f+g\) is measurable by the argument made above. Thus \(f+g\) is measurable.

By Proposition 10.3, take simple functions \(f_n. g_n\) with \(0\le f_n \uparrow f\) and \(0 \le g_n \uparrow g\). So for each \(n\), \(\int f_n + g_n\,d\mu = \int f_n\,d\mu + \int g_n\,d\mu\) by Corollary 10.1.

Then, by Proposition 10.3, \[ \begin{align*} \int f+g\,d\mu &= \lim_{n\to\infty}\int f_n + g_n\,d\mu \\ &= \lim_{n\to\infty}\bigg(\int f_n\,d\mu + \int g_n\,d\mu\bigg) \\&= \int f\,d\mu + \int g\,d\mu \end{align*} \]

Proposition 10.6 extends, by induction, to any finite sum of nonnegative measurable functions.

10.5 Integrals of Measurable Functions

Given any measure space \((X, \mathcal S, \mu)\) and measurable function \(f: X \to [-\infty, \infty]\), let

\(f^+ := \max(f, 0)\),
\(f^- = -\min(f, 0)\).

Then both \(f^+\) and \(f^-\) are nonnegative and measurable, since \(\max\) and \(\min\) are continuous from \(\mathbb R\times \mathbb R\) to \(\mathbb R\). For all \(x\), either \(f^+(x) = 0\) or \(f^-(x) = 0\), and \(f(x) = f^+(x) - f^-(x)\), where this difference is always defined (i.e. not \(\infty - \infty\)).

Definition 10.7 We say the integral \(\int f\,d\mu\) is defined if and only if \(\int f^+\,d\mu\) and \(\int f^-\,d\mu\) are not both infinite. Then we define

\[ \int f\,d\mu := \int f^+\,d\mu - \int f^-\,d\mu \]

Integrals are often written with variables, for example, \(\int f(x)\,d\mu(x) := \int f\,d\mu\). Also, if \(\mu\) is the Lebesgue measure \(\lambda\), then \(d\lambda(x)\) is wriiten as \(dx\).

Lemma 10.1 For any measure space \((X, \mathcal S, \mu)\) and two measurable functions \(f \le g\) from \(X\) into \([-\infty, \infty]\), only the following cases are possible:

\(\int f\,d\mu \le \int g\,d\mu\) (both integrals defined)
\(\int f\,d\mu\) undefined, \(\int g\,d\mu = +\infty\)
\(\int f\,d\mu = -\infty\), \(\int g\,d\mu\) undefined
Both integrals undefined.

Definition 10.8 A measurable function \(f: X \to \mathbb R\) is called integrable if \[ \int |f|\, d\mu < +\infty \] The set of all integrable functions for \(\mu\) is called \(\mathcal L^1(X, \mathcal S, \mu)\). This set may also be called \(\mathcal L^1(\mu)\) or just \(\mathcal L^1\).

Theorem 10.2 On \(\mathcal L^1(X, \mathcal S, \mu)\), \(f\mapsto \int f\,d\mu\) is linear, i.e. for any \(f, g \in \mathcal L^1(X, \mathcal S, \mu)\) and \(c \in \mathbb R\),

\(\int cf\,d\mu = c \int f\,d\mu\)
\(\int f+g\,d\mu = \int f\,d\mu + \int g\,d\mu\)

The latter also holds if \(f \in \mathcal L^1(X, \mathcal S, \mu)\) and \(g\) any nonnegative, measurable function.

10.6 Integral of Transformations

Functions, especially if they are not real-valued, maybe called transformation, mappings or maps.

Definition 10.9 Let \((X, \mathcal S, \mu)\) be a measure space and \((Y, \mathcal B)\) a measurable space. Let \(T\) be a measurable transformation from \(X\) into \(Y\). Then \((\mu \circ T^{-1})(A) := \mu(T^{-1}(A))\) for all \(A \in\mathcal B\) is called the image measure of \(\mu\) by \(T\), since \(A\mapsto T^{-1}(A)\) preserves all set operations, such as countable unions, and preserves disjointness, \(\mu\circ T^{-1}\) is a countably additive measure.

It is finite if \(\mu\) is, but not necessarily \(\sigma\)-finite if \(\mu\) is (e.g. \(T\) a constant map).

Example 10.5 If \(\mu\) is Lebesgue measure, \(T(x) \equiv 2x\), then \(\mu \circ T^{-1} = \mu/2\).

Integrals for a measure and image of it are related by a “change of variable” theorem

Theorem 10.3 Let \(f: Y \to [-\infty,\infty]\) be any measurable function. Then \(\int f\,d(\mu \circ T^{-1}) = \int f\circ T\,d\mu\) if either integral is defined (possibly infinite).

Proof

The result is clear if \(f = c1_A\) for some \(A\) and \(c \ge 0\). Thus, by Proposition 10.6, it holds for any nonnegative simple function.

It follows for any measurable \(f \ge 0\) by Proposition 10.3. Then, taking \(f^+\) and \(f^-\), it holds for any measurable \(f\) from Definition 10.7, since \((f \circ T)^+ = f^+ \circ T\) and \((f\circ T)^- = f^- \circ T\).