The expected value, \(\mathbb E[x]\), is the weighted average of possible values of a random variable, with weights given by their respective theoretical probabilities.

The variance, \(\mathbb D[X]\) (or \(\sigma^2,\text{Var}(X)\)), is the expectation of the squared deviation of a random variable from its population mean or sample mean.

Bernoulli distribution

The Bernoulli distribution essentially models a single trial of flipping a weighted coin. It is the probability distribution of a random variable taking on only two values, \(1\) and \(0\) with complementary probabilities \(p\) and \(1-p\) respectively. The Bernoulli distribution can be written as \(\text{Be}(p)\).

Simply we have that $$\mathbb E[X] = 1 \cdot p + 0 \cdot (1-p) = p$$ and $$\mathbb D[X] = \mathbb E[(X - \mathbb E[X])^2] = p(1-p).$$

Binomial distribution

The binomial distribution with parameters \(n\) and \(p\) is the discrete probability distribution of the number of successes in a sequence of \(n\) independent experiments, where success of the experiment has probability \(p\) and the probability of failure is \(1-p\). Thus, $$P(X=k) = \binom{n}{k}p^k(1-p)^{n-k},$$ and the binomial distribution can be denoted as \(\text{B}(n,p)\).

Simply by linearity of expectation and as every experiment has expected value \(\mathbb E[X_i] = p\), we have $$\mathbb E[X] = np.$$ For the variance, we use the fact that for independent experiments the sum of the variances equals to the variance of the sum. Thus, $$\mathbb D[X] = np(1-p).$$

Geometric distribution

Geometric distribution can be defined as a discrete probability distribution that represents the probability of getting the first success as \(k\)th experiment. Thus, $$P(X=k) = (1-p)^{k-1}p.$$ The geometric distribution can be denoted as \(\text{Geom}(p)\). We have $$p\mathbb E[X] = \sum_{k = 1}^\infty kq^{k-1}p - \sum_{k = 1}^\infty kq^{k}p = p\sum_{k = 0}^\infty q^k = \frac{p}{1-q} = 1,$$ thus \(\mathbb E[X] = \frac{1}{p}\). For variance, $$ \begin{aligned} \tfrac{1}{p}\mathbb E[X^2] &= \sum_{k=1}^\infty k^2(1-p)^{k-1} = \sum_{k=0}^\infty (k+1)^2(1-p)^k\\ &=1+\sum_{k=1}^\infty (k+1)^2(1-p)^k\\ &=\sum_{k=1}k^2(1-p)^k + 2\sum_{k=1}^\infty k(1-p)^k + \sum_{k=0}^\infty(1-p)^k\\ &=\frac{1}{p}+(1-p)\frac{\mathbb E[X^2]}{p}+2(1-p)\frac{\mathbb E[X]}{p},\end{aligned} $$ which yields that \(\mathbb E[X^2] = \frac{2-p}{p^2}\). Therefore, $$\mathbb D[X] = \mathbb E[X^2] - \mathbb E[X]^2 = \frac{1-p}{p^2}.$$


Excursion to Taylor x Binomial Series

The Taylor series of a real or complex-valued function \(f(x)\) that is infinitely differentiable at a real or complex number $a$ is the power series $$f(a)+\frac{f’(a)}{1!}(x-a)+\frac{f’’(a)}{2!}(x-a)^2+\ldots$$ By Taylor Series on function \(f(x)=(x+1)^\alpha\) centered at \(a=0\), we get extension for Binomial Series, i.e. $$(1+x)^\alpha =\sum_{k=0}^\infty \binom{\alpha}{k}x^k,$$ where \(\tbinom{\alpha}{k}\) stands for generalized binomial coefficients, $$\binom{\alpha}{k} = \frac{\alpha(\alpha-1)\ldots(\alpha-k+1)}{k!}.$$


Negative Binomial Distribution

The negative binomial distribution describes the number of trials \(m+k\) required to generate an event \(m\) times, where \(k\) stands for failed experiments and each experiment has probability of success \(p\). Thus, $$P(X=k) = \binom{m + k - 1}{m - 1}p^m(1-p)^k.$$ The negative binomial distribution can be denoted as \(\text{NB}(m,p)\).

This is valid distribution since $$ \begin{aligned} \sum_{k=0}^\infty P(X=k) &= p^m\sum_{k=0}^\infty \binom{m + k - 1}{m - 1}(1-p)^k \\ &= p^m\sum_{k=0}^\infty \binom{-m}{k} (-1)^k (1-p)^k \\ &= p^m\sum_{k=0}^\infty \binom{-m}{k}(p-1)^k = p^m\cdot p^{-m} = 1. \end{aligned} $$ We have $$ \begin{aligned} \mathbb E[X] &= \sum_{k=0}^\infty k\binom{m + k - 1}{m - 1}p^m(1-p)^k \\ &= \frac{m(1-p)}{p}\sum_{k=0}^\infty\binom{m + k -1}{m}p^{m+1} (1-p)^{k-1}\\ &= \frac{m(1-p)}{p}. \end{aligned} $$ and $$ \begin{aligned} \mathbb E[X(X-1)] &= \sum_{k=0}^\infty k(k-1)\binom{m + k - 1}{m - 1}p^m(1-p)^k \\ &= \tfrac{m(m+1)(1-p)^2}{p^2}\sum_{k=0}^\infty\tbinom{m + k -1}{m+1}p^{m+2} (1-p)^{k-2}\\ &= \frac{m(m+1)(1-p)^2}{p^2}, \end{aligned} $$ thus $$\begin{aligned} \mathbb D[X]& = \mathbb E[X(X-1)]+\mathbb E[X]-\mathbb E[X]^2\\&= \frac{m(m+1)(1-p)^2+m(1-p)p-m^2(1-p)^2}{p^2} \\ &= \frac{m(1-p)}{p^2}\left((m+1)(1-p)+p-m(1-p)\right)\\&=\frac{m(1-p)}{p^2}. \end{aligned}$$

Hypergeometric distribution

The hypergeometric distribution is a discrete probability distribution that calculates the likelihood an event happens \(k\) times in \(n\) trials without replacement from finite sample pool of size \(N\), which contains exactly \(K\) objects with success feature. Therefore, $$P(X=k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}.$$ First, let us verify that this indeed sum up to \(1\), which actually represents Vandermonde’s identity, so lets hop on the Vandermonde’s wagon.


Vandermonde’s identity

Vandermonde’s identity is identity for binomial coefficients:

$$\binom{m+n}{r} = \sum_{k=0}^r \binom{m}{k}\binom{n}{r-k}.$$

Wikipedia gives three proof (algebraic, combinatorial, geometrical). All of them are actually easy when you have come across with respective techniques (which is true for any other thing). Hints for the proofs, respectively: compare coefficients, double count and consider grid \(r\times (m+n-r)\). If you are interested in proofs, you can find them from Wikipedia .


The expected value is going to be $$ \begin{aligned}\mathbb E[X] &= \sum_{k=0}^n k\frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}} \\&= \frac{nK}{N} \sum_{k=0}^n \tfrac{(K-1)!(N-K)!(n-1)!(N-n)!}{(k-1)!(K-k)!(n-k)!(N-K-n+k)!(N-1)!} \\ &=\frac{nK}{N}\sum_{k=0}^n \frac{\binom{K-1}{k-1}\binom{N-K}{n-k}}{\binom{N-1}{n-1}} =\frac{nK}{N}.\end{aligned}$$ Also, we get that $$\begin{aligned}\mathbb E[X(X-1)]&= \sum_{k=0}^n k(k-1)\frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}} \\ &= \frac{K(K-1)n(n-1)}{N(N-1)} \sum_{k=0}^n \frac{\binom{K-2}{k-2}\binom{N-K}{n-k}}{\binom{N-2}{n-2}} \\ &= \frac{K(K-1)n(n-1)}{N(N-1)}, \end{aligned}$$ thus $$\begin{aligned} \mathbb D[X] &= \mathbb E[X(X-1)]+\mathbb E[X] - E[X]^2\\ &= \frac{K(K-1)n(n-1)}{N(N-1)} + \frac{nK}{N} -\left(\frac{nK}{N}\right)^2\\ &= \tfrac{K(K-1)n(n-1)N +nKN(N-1)-n^2K^2(N-1)}{N^2(N-1)}\\ &= \tfrac{n^2K^2N -nNK^2-n^2KN+nNK+nKN^2-nKN-n^2K^2N+n^2K^2}{N^2(N-1)}\\ &=\frac{n^2K^2 +nKN^2-nNK^2-n^2KN}{N^2(N-1)}\\ &=\frac{nK(N-K)(N-n)}{N^2(N-1)}. \end{aligned}$$

Poisson Distribution

The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event. This is defined as $$P(X=k)=\frac{\lambda^k}{k!}e^{-\lambda}.$$ The Poisson distribution can be denoted as \(\text{Po}(\lambda)\). This distribution is special since we have \(\mathbb E[X]=\mathbb D[X] = \lambda\). By Taylor Series, $$e^\lambda = \sum_{k=0}^\infty \frac{\lambda^k}{k!},$$ thus indeed, $$\mathbb E[X] = \sum_{k=0}^\infty k\frac{\lambda^k}{k!}e^{-\lambda}=e^{-\lambda}\lambda \sum_{k=1}^\infty\frac{\lambda^{k-1}}{(k-1)!}=\lambda.$$ We also get $$ \begin{aligned}\mathbb E[X(X-1)]&=\sum_{k=0}^\infty k(k-1)\frac{\lambda^k}{k!}e^{-\lambda}\\&=e^{-\lambda}\lambda^2\sum_{k=2}^\infty \frac{\lambda^{k-2}}{(k-2)!}=\lambda^2, \end{aligned}$$ thus \(\mathbb D[X] = \mathbb E[X(X-1)]+\mathbb E[X] - E[X]^2=\lambda\).