Probability & Statistics

Probability axioms, random variables, standard distributions, limit theorems, and statistical inference for GATE preparation with full proofs and worked examples.

Probability Axioms Random Variables Distributions Limit Theorems Statistical Inference

0 / 5 units completed0%

Probability Axioms & Conditional Probability

The axiomatic foundation of probability theory and Bayes' theorem for updating beliefs based on evidence.

◆ Kolmogorov Axioms

Definition — Probability Measure A probability measure \(P\) on a sample space \(\Omega\) with \(\sigma\)-algebra \(\mathcal{F}\) satisfies:

\(P(A) \ge 0\) for all \(A \in \mathcal{F}\)
\(P(\Omega) = 1\)
For pairwise disjoint \(A_1, A_2, \ldots\): \(P\!\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} P(A_i)\)

◆ Conditional Probability & Bayes’ Theorem

Conditional Probability \[P(A\mid B) = \frac{P(A \cap B)}{P(B)}, \quad P(B) > 0\]

Bayes’ Theorem If \(\{B_1, B_2, \ldots, B_n\}\) is a partition of \(\Omega\) with \(P(B_i) > 0\), then: \[P(B_j \mid A) = \frac{P(A \mid B_j)\,P(B_j)}{\sum_{i=1}^{n} P(A \mid B_i)\,P(B_i)}\]

★ Example

A test for a disease has 99% sensitivity and 95% specificity. If 1% of the population has the disease, what is the probability a person who tests positive actually has it?

Let \(D\) = disease, \(+\) = positive test. Given: \(P(+|D) = 0.99\), \(P(-|D^c) = 0.95\), \(P(D) = 0.01\).
\(P(+) = P(+|D)P(D) + P(+|D^c)P(D^c) = 0.99(0.01) + 0.05(0.99) = 0.0099 + 0.0495 = 0.0594\)
\[P(D|+) = \frac{0.0099}{0.0594} \approx 0.1667 \approx 16.7\%\] Despite the accurate test, only about 1 in 6 positive results is a true positive, due to the low base rate.

Random Variables — PMF, PDF, CDF

Discrete and continuous random variables, their distributions, expectations, and variances.

◆ Discrete Random Variables

Probability Mass Function (PMF) For a discrete random variable \(X\) taking values \(x_1, x_2, \ldots\): \[p_X(x) = P(X = x), \quad \sum_{x} p_X(x) = 1\]

Expectation & Variance \[E[X] = \sum_x x\,p_X(x), \quad \text{Var}(X) = E[X^2] - (E[X])^2\]

◆ Continuous Random Variables

Probability Density Function (PDF) & CDF A continuous random variable \(X\) has PDF \(f_X(x) \ge 0\) with \(\int_{-\infty}^{\infty} f_X(x)\,dx = 1\). The CDF is: \[F_X(x) = P(X \le x) = \int_{-\infty}^{x} f_X(t)\,dt\] and \(f_X(x) = F_X'(x)\) wherever the derivative exists.

★ Example

Let \(X\) have PDF \(f(x) = 2x\) for \(0 \le x \le 1\), and 0 otherwise. Find \(E[X]\) and \(\text{Var}(X)\).

\(E[X] = \int_0^1 x \cdot 2x\,dx = 2\int_0^1 x^2\,dx = \frac{2}{3}\).
\(E[X^2] = \int_0^1 x^2 \cdot 2x\,dx = 2\int_0^1 x^3\,dx = \frac{1}{2}\).
\(\text{Var}(X) = \frac{1}{2} - \left(\frac{2}{3}\right)^2 = \frac{1}{2} - \frac{4}{9} = \frac{1}{18}\).

Standard Distributions

The most important probability distributions for GATE: Binomial, Poisson, Normal, Exponential, and Uniform.

◆ Discrete Distributions

Binomial Distribution \(X \sim \text{Bin}(n,p)\) \[P(X=k) = \binom{n}{k}p^k(1-p)^{n-k}, \quad k=0,1,\ldots,n\] \(E[X] = np\), \(\text{Var}(X) = np(1-p)\).

Poisson Distribution \(X \sim \text{Poi}(\lambda)\) \[P(X=k) = \frac{e^{-\lambda}\lambda^k}{k!}, \quad k = 0,1,2,\ldots\] \(E[X] = \text{Var}(X) = \lambda\). Arises as the limit of \(\text{Bin}(n,p)\) with \(np \to \lambda\).

◆ Continuous Distributions

Normal Distribution \(X \sim N(\mu, \sigma^2)\) \[f(x) = \frac{1}{\sigma\sqrt{2\pi}}\exp\!\left(-\frac{(x-\mu)^2}{2\sigma^2}\right), \quad x \in \mathbb{R}\] \(E[X] = \mu\), \(\text{Var}(X) = \sigma^2\). The standard normal \(Z = \frac{X-\mu}{\sigma} \sim N(0,1)\).

Exponential Distribution \(X \sim \text{Exp}(\lambda)\) \[f(x) = \lambda e^{-\lambda x}, \quad x \ge 0\] \(E[X] = 1/\lambda\), \(\text{Var}(X) = 1/\lambda^2\). Key property: memoryless — \(P(X > s+t \mid X > s) = P(X > t)\).

Uniform Distribution \(X \sim U(a,b)\) \[f(x) = \frac{1}{b-a}, \quad a \le x \le b\] \(E[X] = \frac{a+b}{2}\), \(\text{Var}(X) = \frac{(b-a)^2}{12}\).

★ Example

If \(X \sim N(5, 4)\), find \(P(X > 7)\).

\(Z = \frac{X-5}{2}\). So \(P(X>7) = P\!\left(Z > \frac{7-5}{2}\right) = P(Z > 1) = 1 - \Phi(1) \approx 1 - 0.8413 = 0.1587\).

Limit Theorems

The Law of Large Numbers and Central Limit Theorem are the cornerstones of statistical theory, justifying the use of sample means as estimators.

◆ Law of Large Numbers

Weak Law of Large Numbers (WLLN) Let \(X_1, X_2, \ldots\) be i.i.d. with mean \(\mu\) and finite variance. Then for all \(\varepsilon > 0\): \[P\!\left(\left|\bar{X}_n - \mu\right| > \varepsilon\right) \to 0 \quad\text{as } n \to \infty\] where \(\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i\).

Strong Law of Large Numbers (SLLN) Under the same conditions: \[P\!\left(\lim_{n\to\infty}\bar{X}_n = \mu\right) = 1\] Almost sure convergence implies convergence in probability (WLLN).

◆ Central Limit Theorem

CLT (Lindeberg-Lévy) Let \(X_1, X_2, \ldots\) be i.i.d. with mean \(\mu\) and variance \(\sigma^2 < \infty\). Then: \[\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} N(0,1) \quad\text{as } n \to \infty\] Equivalently, \(\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} N(0, \sigma^2)\).

★ Example

A fair coin is tossed 400 times. Approximate \(P(190 \le S_{400} \le 210)\) where \(S_{400}\) is the number of heads.

\(S_{400} \sim \text{Bin}(400, 0.5)\), so \(\mu = 200\), \(\sigma = \sqrt{100} = 10\).
By CLT: \(P(190 \le S \le 210) = P\!\left(\frac{190-200}{10} \le Z \le \frac{210-200}{10}\right) = P(-1 \le Z \le 1)\)
\(= \Phi(1) - \Phi(-1) = 2\Phi(1) - 1 \approx 2(0.8413) - 1 = 0.6827 \approx 68.3\%\).

Statistical Inference

Estimation and hypothesis testing: the core tools of statistical inference including MLE, method of moments, confidence intervals, and regression.

◆ Maximum Likelihood Estimation (MLE)

Definition — MLE Given i.i.d. observations \(x_1, \ldots, x_n\) from a distribution with parameter \(\theta\), the MLE is: \[\hat{\theta}_{\text{MLE}} = \arg\max_\theta L(\theta) = \arg\max_\theta \prod_{i=1}^n f(x_i;\theta)\] Equivalently, maximize the log-likelihood \(\ell(\theta) = \sum_{i=1}^n \ln f(x_i;\theta)\).

★ Example

Find the MLE of \(\lambda\) for a Poisson sample \(x_1, \ldots, x_n\).

\(\ell(\lambda) = \sum(-\lambda + x_i\ln\lambda - \ln x_i!) = -n\lambda + \ln\lambda\sum x_i - \text{const}\).
Setting \(\ell'(\lambda) = -n + \frac{\sum x_i}{\lambda} = 0\) gives \(\hat{\lambda}_{\text{MLE}} = \bar{x}\).

◆ Method of Moments & Confidence Intervals

Method of Moments Equate population moments to sample moments: \[E[X^k] = \frac{1}{n}\sum_{i=1}^n X_i^k, \quad k = 1, 2, \ldots\] Solve for the unknown parameters.

Confidence Interval for Normal Mean If \(X_1,\ldots,X_n \sim N(\mu,\sigma^2)\) with \(\sigma\) known, a \(100(1-\alpha)\%\) CI for \(\mu\) is: \[\bar{X} \pm z_{\alpha/2}\,\frac{\sigma}{\sqrt{n}}\] With \(\sigma\) unknown, replace \(z_{\alpha/2}\) by \(t_{\alpha/2,n-1}\) and \(\sigma\) by \(S\).

◆ Hypothesis Testing & Regression

Hypothesis Testing Framework

Null hypothesis \(H_0\) vs. alternative \(H_1\)
Type I error (reject true \(H_0\)): probability \(\alpha\) (significance level)
Type II error (fail to reject false \(H_0\)): probability \(\beta\); Power = \(1 - \beta\)
p-value: smallest \(\alpha\) at which \(H_0\) would be rejected

Simple Linear Regression Model: \(Y_i = \beta_0 + \beta_1 x_i + \varepsilon_i\), \(\varepsilon_i \sim N(0,\sigma^2)\). The OLS estimators are: \[\hat{\beta}_1 = \frac{\sum(x_i - \bar{x})(Y_i - \bar{Y})}{\sum(x_i - \bar{x})^2}, \quad \hat{\beta}_0 = \bar{Y} - \hat{\beta}_1\bar{x}\]

★ Key Takeaways

Bayes' theorem inverts conditional probabilities: base rate matters significantly.
For Poisson, mean equals variance; for Exponential, the memoryless property is unique among continuous distributions.
CLT justifies normal approximation for sample means regardless of the underlying distribution.
MLE is asymptotically efficient and consistent under regularity conditions.
Confidence intervals and hypothesis tests are dual: reject \(H_0: \mu = \mu_0\) iff \(\mu_0\) lies outside the CI.

📝 Practice Problems

Problem 1

If \(P(A) = 0.3\), \(P(B) = 0.4\), and \(P(A \cap B) = 0.12\), are \(A\) and \(B\) independent?

Show Solution ▼

\(P(A)P(B) = 0.3 \times 0.4 = 0.12 = P(A \cap B)\). Yes, \(A\) and \(B\) are independent.

Problem 2

Find the MGF of \(X \sim \text{Exp}(\lambda)\) and use it to compute \(E[X^2]\).

Show Solution ▼

\(M_X(t) = E[e^{tX}] = \int_0^\infty \lambda e^{-(lambda-t)x}\,dx = \frac{\lambda}{\lambda-t}\) for \(t < \lambda\). \(M'(t) = \frac{\lambda}{(\lambda-t)^2}\), \(M''(t) = \frac{2\lambda}{(\lambda-t)^3}\). So \(E[X^2] = M''(0) = \frac{2}{\lambda^2}\).

Problem 3

Find the MLE of \(p\) for a Bernoulli sample \(x_1, \ldots, x_n\).

Show Solution ▼

\(L(p) = p^{\sum x_i}(1-p)^{n-\sum x_i}\). \(\ell(p) = \sum x_i \ln p + (n - \sum x_i)\ln(1-p)\). Setting \(\ell'(p) = 0\): \(\frac{\sum x_i}{p} - \frac{n - \sum x_i}{1-p} = 0\), giving \(\hat{p} = \bar{x}\).

Problem 4

Using Chebyshev's inequality, bound \(P(|X - \mu| \ge 3\sigma)\).

Show Solution ▼

Chebyshev: \(P(|X-\mu| \ge k\sigma) \le \frac{1}{k^2}\). With \(k=3\): \(P(|X-\mu| \ge 3\sigma) \le \frac{1}{9} \approx 0.111\).

Problem 5

A 95% CI for \(\mu\) based on \(n=25\) observations has width 4. What sample size is needed to halve the width?

Show Solution ▼

Width \(= 2z_{0.025}\frac{\sigma}{\sqrt{n}}\). Halving the width requires doubling \(\sqrt{n}\), i.e., quadrupling \(n\). So \(n = 4 \times 25 = 100\).

🎯 Interactive Quiz

1. If \(X \sim \text{Poi}(3)\), then \(\text{Var}(X)\) equals:

A 9

B 3

C \(\sqrt{3}\)

D \(1/3\)

2. The memoryless property is characteristic of which continuous distribution?

A Normal

B Uniform

C Exponential

D Gamma

3. By the Central Limit Theorem, \(\bar{X}_n\) is approximately normal for large \(n\) provided:

A The population is normally distributed

B The population has finite variance

C The population is symmetric

D \(n > 30\) exactly

4. The MLE is known to be:

A Always unbiased

B Consistent and asymptotically efficient

C Always minimum variance

D Always inferior to method of moments

5. In a hypothesis test at level \(\alpha = 0.05\), a p-value of 0.03 means:

A Reject \(H_0\) since \(p < \alpha\)

B Fail to reject \(H_0\)

C There is a 3% chance \(H_0\) is true

D The test is inconclusive