# Discrete

## Bernoulli distribution

• pmf

$f_{X}(x)=P(X=x)=\left\{\begin{array}{cl} (1-p)^{1-x} p^{x} & \text { for } \mathrm{x}=0 \text { or } 1 \\ 0 & \text { otherwise } \end{array}\right.$

• expectation
• $E(X) = p$
• variance
• $var(X) = (1-p)p$

## Binomial distribution

• pmf

f_{X}(k)=P(X=k)=\left\{\begin{aligned} C_{n}^{k} p^{k}(1-p)^{n-k} & \text { for } \mathrm{k}=0,1, \ldots, \mathrm{n} \\ 0 & \text { otherwise } \end{aligned}\right.

• expectation
• $E(X) = np$
• variance
• $var(X) = np(1-p)$

## Geometric distribution

• pmf

f_{X}(k)=P(X=k)=\left\{\begin{aligned} p(1-p)^{k-1} & \text { for } \mathrm{k}=1,2,3 \ldots \\ 0 & \text { otherwise } \end{aligned}\right.

• expectation
• $E(X) = \frac{1}{P}$
• variance
• $var(X) = \frac{1-P}{P^2}$
• memoryless
• $P(X>m+n|X>m) = P(X>n)$

## Negative binomial distribution(Pascal)

• The negative binomial distribution arises as a generalization of the geometric distribution.
• Suppose that a sequence of independent trials each with probability of success $p$ is performed until there are $r$ successes in all.
• so can be denote as $p \cdot C_{k-1}^{r-1} p^{r-1}(1-p)^{(k-1)-(r-1)}$
• $X\sim NB(r,p)$
• pmf

f_{X}(k)=P(X=k)=\left\{\begin{aligned} C_{k-1}^{r-1} p^{r}(1-p)^{k-r} & \text { for } \mathrm{k}=\mathrm{r}, \mathrm{r}+1, \mathrm{r}+2 \ldots \\ 0 & \text { otherwise } \end{aligned}\right.

• expectation
• $E(X) = \frac{r}{p}$
• variance
• $var(X) = \frac{r(1-p)}{p^2}$
• the conduct method can be seen there.

## Hypergeometric distribution

• Suppose that an urn contains $n$ balls, of which $r$ are black and $n-r$ are white. Let $X$ denote the number of black balls drawn when taking $m$ balls without replacement.
• denoted as $X\sim h(m,n,r)$
• pmf

$f_{X}(k)=P(X=k)=\left\{\begin{array}{cl} C_{k-1}^{r-1} p^{r}(1-p)^{k-r} & \text { for } \mathrm{k}=\mathrm{r}, \mathrm{r}+1, \mathrm{r}+2 \ldots \\ 0 & \text { otherwise } \end{array}\right.$

• expectation
• $E(X) = m\frac{r}{n}$
• variance
• $var(X) = \frac{mr(n-m)(n-r)}{n^2(n-1)}$

## Poisson distribution

• can be derived as the limit of a binomial distribution as the number of trials approaches infinity and the probability of success on each trial approaches zero in such a way that $np = \lambda$,$\lambda$ can be seen as the successful trials
• pmf
• $P(X = k) = \frac{\lambda^k }{k!} e^{-\lambda} \quad k = 0,1,2...$
• expectation
• $E(X) = \lambda$
• variance
• $var(X) = \lambda$
• Property
• Let $X$ and $Y$ are independent Poisson r.v.s with parameters $\theta_1$ and $\theta_2$, and $X+Y \sim Possion(\theta_1 + \theta_2)$

# Continuous

## Uniform distribution

• A uniform r.v on the interval [a,b] is a model for what we mean when we say "choose a number at random between a and b"
• pdf

f_{X}(x)=\left\{\begin{aligned} \frac{1}{b-a} & a \leq x \leq b \\ 0 & \text { otherwise } \end{aligned}\right.

• cdf(easy to get)

$F_{X}(x)=\left\{\begin{array}{rl} 0 & x \leq a \\ \frac{x-a}{b-a} & a \leq x \leq b \\ 1 & b \leq x \end{array}\right.$

• expectation
• $E(X) = \frac{a+b}{2}$
• variance
• $var(X) = \frac{(b-a)^2}{12}$

## Exponential distribution

• Exponential distribution is often used to model lifetimes or waiting times, in which context it is conventional to replace $x$ by $t$.
• pdf

$f_{X}(x)=\left\{\begin{array}{rl} \lambda e^{-\lambda x} & x \geq 0 \\ 0 & \text { otherwise } \end{array}\right.$

• cdf(easy to get)

$F_{X}(x)=\left\{\begin{array}{rl} 1-e^{-\lambda x} & x \geq 0 \\ 0 & \text { otherwise } \end{array}\right.$

• expectation
• $E(X) = \frac{1}{\lambda}$
• variance
• $var(X) = \frac{1}{\lambda^2}$

### property

• let $X,Y$ are independent Poisson r.v.s with $\theta_1,\theta_2$,then $X+Y\sim Poisson (\theta_1+\theta_2)$

• Memoryless

• $P(X > s+t | X> s) = P(X>t)$

## Gamma distribution

• pdf

$g(t)=\left\{\begin{array}{rl} \frac{\lambda^{\alpha}}{\tau(\alpha)} t^{\alpha-1} e^{-\lambda t} & t \geq 0 \\ 0 & \text { otherwise } \end{array}\right.$

• $\tau(x) = \int _0^\infty u^{x-1}e^{-u}du,x>0$
• expectation
• $E(X) = \frac{\alpha}{\lambda}$
• variance
• $Var(X)= \frac{\alpha}{\lambda ^2}$

### Property

• $Ga(1,\lambda) = \exp (\lambda)$
• $Ga(\frac{n}{2},\frac{1}{2}) = \chi ^2 (n)$
• $E(X) = n$
• $Var(X) = 2n$
• $X\sim Ga(\alpha,\lambda) \to kX\sim Ga(\alpha,\frac{\lambda}{k}),k>0$
• if $X\sim Ga(\alpha,\lambda),Y\sim Ga(\beta,\lambda),i.i.d$,then $X+Y \sim Ga(\alpha+\beta ,\lambda)$
• conduct
• $\because \tau(\alpha ) =\int_{0}^{\infty} x^{\alpha-1}e^{-x}dx$
• $\therefore x = \lambda t,\to \tau (\alpha) = \lambda^\alpha \int _{0}^{\infty} t^{\alpha-1}e^{-\lambda t}dt$
• $\therefore \frac{1}{\tau (\alpha)}\lambda^\alpha \int _{0}^{\infty} t^{\alpha-1}e^{-\lambda t}dt = 1$
• $\therefore g(t) =\frac{\lambda^\alpha}{\tau(\alpha)}t^{\alpha-1}e^{-\lambda t}$
• $\alpha$ is called a shape parameter for the gamma density,
• Varying $\alpha$ changes the shape of the density
• $\lambda$ is called a scale parameter
• Varying $\lambda$ corresponds to changing the units of measurement and does not affect the shape of the density
• how to understand gamma?

## Normal distribution

• pdf

g(t)=\left\{\begin{aligned} \frac{1}{\sigma \sqrt{2 \pi}} e^{-(x-\mu)^{2} /\left(2 \sigma^{2}\right)} & t \geq 0 \\ 0 & \text { otherwise } \end{aligned}\right.

• $\mu$ is the mean
• $\sigma$ is the standard deviation
• If $X \sim N(\mu; \sigma^2)$ ,and $Y = aX + b$, then $Y \sim N(a\mu+b,a^2\sigma^2)$
• especially, if $X \sim N(\mu,\sigma^2)$, then $Z = \frac{x-\mu}{\sigma}\sim N(0,1)$
• $aX+bY \sim N(a\mu_X+b\mu_Y,a^2\sigma_X^2 + b^2\sigma_Y^2 + 2ab\rho \sigma_X\sigma_Y)$

### property

• if $X,Y \sim N(0,1)$,then $U = \frac{X}{Y}$ is Cauchy r.v (lec3)
• $f_U(u) = \frac{1}{\pi (u^2+1)}$
• if $X_1,..,X_n\sim N(0,1)$ ,i.i.d,, then
• $X_1^2 + ... X_n^2 \sim \chi^2(n)$

## Logistic distribution

• consider the special logistic distribution(0,1):
• $F_X(x) = \frac{1}{1+e^{-x}}$

## Exponential family

• A family of pdfs or pmfs is called an exponential family if it can be expressed as:
• $p(x,\theta) = H(x)\exp(\theta^T \phi(x) - A(\theta))$
• $H(x)$ is the normalization factor
• It is very helpful to model heterogeneous data in the era of big data.
• Bernoulli, Gaussian, Binomial, Poisson, Exponential, Weibull, Laplace, Gamma, Beta, Multinomial, Wishart distributions are all exponential families
• for Bernoulli:
• $X\sim p^x(1-p)^{1-x}, for x\in \{0,1\}$
• $P^x(1-P)^{1-x} = \exp\{x\ln p + (1-x)\ln (1-p)\} = \exp\{\ln \frac{p}{1-p} x + \ln (1-p)\}$
• $\theta =\ln \frac{p}{1-p}, \phi(x) = x,A(\theta ) = \ln\frac{1}{1-p},H(x) = 1$
• the explain can be seen here

# Sample

• $Var(\bar{X} ) = \frac{\sigma^2}{n}$

• $(n-1)S^2 = \sum X^2 - n\bar{X}^2$

• $\bar{X}$$S^2$相互独立

• $\bar{X} \sim N(\mu,\frac{\sigma^2}{n})$

• $\frac{(n-1)S^2}{\sigma^2}\sim \chi^2(n-1)$

# Property

• $E(X) = E(E(X|Y))$
• 可以理解为先分组求期望，与直接求期望一样
• $Var(X) = E(Var(X|Y)) + Var(E(X|Y))$
• 可以理解为组内方差的期望 + 组间方差
• if r.v.s X and Y are independent, $E(X|Y) = E(X)$

# Inequality

## Markov's inequality

$P(X\ge a) \le \frac{E(X)}{a}$

## Chebyshev's inequality

$P(|X-E(X)| \ge a) \le \frac{Var(X)}{a^2}$

## Chernoff bounds

The generic Chernoff bound requires only the moment generating function of $X$, defined as $M_X(t) = E(e^{tX})$, provided it exists.

$P(X\ge a) \le \frac{E(e^{tx})}{e^{t\cdot a}}$

other inequalities can be seen here.