10  Random Variables. Cumulative and Survival Distribution Function

A random variable is a mathematical formalization of a quantity or object which depends on random events. ## Random Variable A random variable is defined as a function mapping the sample space to a real number: \(X: \Omega \to \mathbb{R}\).

Example: The sample space of the experiment of tossing a coin 3 times is given by \(\Omega = \{HHH,HHT,HTH,HTT,THH,THT,TTH,TTT\}\).

Let \(X = \text{\# of Heads in 3 tosses}\). Find the range of \(X\).

Solution: We have:
\[\begin{matrix}X(HHH) = 3 & X(HHT) = 2 & X(HTH) = 2 & X(HTT) = 1 \\ X(THH) = 2 & X(THT) = 1 & X(TTH) = 1 & X(TTT) = 0\end{matrix}\]

The support of a random variable is the set of numbers that are possible values of the random variable, i.e. the image/range of that random variable. ## Probability Let \(X: \Omega \to \mathbb R\) be a random variable and \(P\) is a probability measure, that is \(P(E)\) is defined for an event \(E\) of the sample space. Let \(A\) be a subset of \(\mathbb R\), then the probability \(P\{X \in A\}\) is defined as: \[P(X\in A) = P(X^{-1}(A))\] where \(X^{-1}(A)\) is the set of all events \(\omega\in \Omega\) such that \(X(\omega) \in A\).

Example: For the experiment of tossing a coin 3 times, let \(X\) be the number of Heads in 3 tosses. Assume outcomes are equally likely, determine \(P(X \ge 2)\).

Solution: Note that, \(P\{X\ge 2\} = P(X\in \{2,3\})\) since the range of \(X\) is \(\{0,1,2,3\}\). Outcome are equally likely, hence \(P(X \in \{x\}) = \frac{1}{8}\) for each \(x\in \Omega\). Then, \(P(X\in \{2,3\}) = \frac{1}{8} \times 4 = \frac{1}{2}\). ## Cumulative Distribution Function The cumulative distribution function (cdf) of a random variable \(X\) is the probability that the random variable \(X\) will take a value less than or equal to \(x\). That is, \[F(x) = P(X \le x)\] ### Properties Propositions: The cumulative distribution function satisfies: * \(\lim_{x\to -\infty} F(x)= 0\) and \(\lim_{x\to\infty}F(x) = 1\) * \(P(a < X \le b) = F(b) - F(a)\) * \(\lim_{x\to c^+} = F(c)\) * \(F_X\) is a non-decreasing function.

Every probability distribution supported on the real numbers is uniquely identified by a right-continuous monotone increasing function \(F: \mathbb R\to [0,1]\) where: \[\lim_{x\to -\infty} F(x)= 0 \hskip6em \lim_{x\to\infty}F(x) = 1 \] Example: The distribution function of a random variable \(X\), is given by \[ F(x) = \begin{cases}0, & x < 0 \\ x/2, & 0 \le x < 1 \\ 2/3, & 1 \le x < 2 \\ 11/12, & 2 \le x < 3 \\ 1, & 3 \le x \\ \end{cases}\] 1. Compute \(P(X < 3)\) 2. Compute \(P (X = 1)\) 3. Compute \(P(X > 12)\) 4. Compute \(P(2 < X \le 4)\) Solution: 1. \(P(X \le 3) = F(3^-) = 11/12\) 2. \(P(X=1) = F(1) - F(1^-) = 2/3 - 1/2 = 1/6\) 3. \(P(X > 1/2) = 1 - F(1/2) = 3/4\) 4. \(P(2 < X \le 4) = F(4) - F(2) = 1 - 11/12 = 1/12\) ### Discrete and Continuous A discrete random variable is a random variable where the jumps of its cdf sum to 1. A continuous random variable is a random variable where its cdf is continuous on \(\mathbb R\).

Example: State whether the random variables are discrete, continuous or neither.
1. A coin is tossed ten times. The random variable X is the number of tails that are noted.
2. A light bulb is burned until it burns out. The random variable Y is its lifetime in hours

Solution: 1. \(X\) can only take the values 0, 1, …, 10, so the cdf of \(X\) will be a step function, increasing only at integer values. Therefore, \(X\) is a discrete random variable. 2. \(Y\) can take any non-negative real value, and the cdf of \(Y\) will be continuous since it can be expressed as a function without jumps. Therefore, \(Y\) is a continuous random variable

Theorem: A given cdf may be written as a weighted sum of a discrete and a continuous cdfs.

Let \(F\) be a cdf. Let \(A ≡ \{x : p(x) \equiv F(x) − F (x^−) > 0\}\). Then, \(A\) is at most countable. Write \(\alpha = \sum_{y\in A} p(y)\) and let \(\tilde{F}_d(x) = \sum_{y\in A} p(y) I_{(-\infty, y]}(y)\) and \(\tilde{F}_c(x) = F(x) - \tilde{F}_d(x)\). It is easy to verify that \(\tilde F_c\) is continuous on \(\mathbb R\). If \(\alpha = 0\), then \(F(x) = \tilde F_c(x)\); and if \(\alpha = 1\) then \(F(x) = \tilde F_d(x)\). If \(0 < \alpha < 1\) then \[F(x) = \alpha F_d(x) + (1-\alpha)F_c(x)\]where \(F_d \equiv \alpha^{-1} \tilde F_d\) and \(F_c \equiv (1-\alpha)^{-1} \tilde F_c\) are both cdfs, with \(F_d\) being discrete and \(F_c\) being continuous.

10.1 Survival Distribution Function

The survival distribution function (sdf) of a random variable \(X\) is the probability that the random variable \(X\) will take a value greater than \(x\): \[S_X(x) = P\{X > x\} = 1 - F(x)\] ### Properties Propositions: The survival distribution function satisfies: * \(\lim_{x\to -\infty} S_X(x)= 1\) and \(\lim_{x\to\infty}S_X(x) = 0\) * \(P(a < X \le b) = S_X(a) - S_X(b)\) * \(\lim_{x\to c^-} = S_X(c)\) * \(S_X\) is a non-increasing function.

Note: When there’s no ambiguity on which random variable is being discussed, we can drop the subscript \(X\).