21  Hypergeometric Distribution

21.1 Univariate Hypergeometric Distribution

Consider a population of \(N\) objects where the objects can be divided exactly into two types: Type A and Type B. Suppose that the number of objects of type A is \(K\); then the number objects of Type B is \(N − K\).

A random sample of size \(k\) is selected without replacement in such a way that each subset of size \(k\) is equally likely to be chosen. The hypergeometric random variable \(X\) counts the total number of objects of Type A in the sample. We denote \(X \sim \text{Hypergeometric}(N, K, n)\).

We have the support of \(X\) is the integers between \(\max\{0, n + K - N\}\) and \(\min\{n, K\}\). The probability that we choose exactly \(k\) objects of type A when we choose \(K\) objects from the population is \[p(k) = P(X = k) = \frac{{K \choose k}{N-K\choose n-k}}{N \choose n}\] and this is the pmf of the hypergeometric distribution.

Properties: For a hypergeometric random variable \(\text{Hypergeometric}(N, K, n)\): * \(\text E[X] = n \frac{K}{N}\) * \(\text{Var}(X) = n\frac{N-n}{N-1}\frac{K}{N}(1-\frac{K}{N})\)

Example: A barn consists of 13 cows, 12 pigs and 8 horses. A group of 8 will be selected to participate in the city fair. What is the probability that exactly 5 of the group will be cows?

Solution: Let X be the number of cows in the group. Then X is hypergeometric random variable with parameters \(N = 33\), \(n = 8\), \(K = 13\), and \(k=5\). Thus, the probability \[P(X = k) = \frac{{K \choose k}{N-K\choose n-k}}{N \choose n} = \frac{{13 \choose 5}{20\choose 3}}{33 \choose 13} \approx 0.10567\] ## Multivariate Hypergeometric Distribution If there are \(K_i\) objects of type \(i\) and you take \(n\) objects at random without replacement, then the number of objects of each type in the sample \((k_1, k_2,..., k_m)\) has the multivariate hypergeometric distribution: \[P\{X_1=k_1, X_2=k_2,...,X_m=k_m\} = \frac{\prod_{i=1}^m {K_i\choose k_i}}{N\choose n}\]

Properties: * \(\text E[X_i] = n\frac{K_i}{N}\) * \(\text{Var}(X_i) = n\frac{N-n}{N-1}\frac{K_i}{N}(1-\frac{K_i}{N})\) * \(\text{Cov}(X_i, X_j) = -n \frac{N-n}{N-1}\frac{K_i}{N}\frac{K_j}{N}\)