← Back to GATE Mathematics

Probability & Statistics

Probability axioms, random variables, standard distributions, limit theorems, and statistical inference for GATE preparation with full proofs and worked examples.

Probability Axioms Random Variables Distributions Limit Theorems Statistical Inference
0 / 5 units completed0%
01
Probability Axioms & Conditional Probability

The axiomatic foundation of probability theory and Bayes' theorem for updating beliefs based on evidence.

Kolmogorov Axioms
Definition — Probability Measure A probability measure \(P\) on a sample space \(\Omega\) with \(\sigma\)-algebra \(\mathcal{F}\) satisfies:
  • \(P(A) \ge 0\) for all \(A \in \mathcal{F}\)
  • \(P(\Omega) = 1\)
  • For pairwise disjoint \(A_1, A_2, \ldots\): \(P\!\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} P(A_i)\)
Conditional Probability & Bayes’ Theorem
Conditional Probability \[P(A\mid B) = \frac{P(A \cap B)}{P(B)}, \quad P(B) > 0\]
Bayes’ Theorem If \(\{B_1, B_2, \ldots, B_n\}\) is a partition of \(\Omega\) with \(P(B_i) > 0\), then: \[P(B_j \mid A) = \frac{P(A \mid B_j)\,P(B_j)}{\sum_{i=1}^{n} P(A \mid B_i)\,P(B_i)}\]
★ Example
A test for a disease has 99% sensitivity and 95% specificity. If 1% of the population has the disease, what is the probability a person who tests positive actually has it?
Let \(D\) = disease, \(+\) = positive test. Given: \(P(+|D) = 0.99\), \(P(-|D^c) = 0.95\), \(P(D) = 0.01\).
\(P(+) = P(+|D)P(D) + P(+|D^c)P(D^c) = 0.99(0.01) + 0.05(0.99) = 0.0099 + 0.0495 = 0.0594\)
\[P(D|+) = \frac{0.0099}{0.0594} \approx 0.1667 \approx 16.7\%\] Despite the accurate test, only about 1 in 6 positive results is a true positive, due to the low base rate.
02
Random Variables — PMF, PDF, CDF

Discrete and continuous random variables, their distributions, expectations, and variances.

Discrete Random Variables
Probability Mass Function (PMF) For a discrete random variable \(X\) taking values \(x_1, x_2, \ldots\): \[p_X(x) = P(X = x), \quad \sum_{x} p_X(x) = 1\]
Expectation & Variance \[E[X] = \sum_x x\,p_X(x), \quad \text{Var}(X) = E[X^2] - (E[X])^2\]
Continuous Random Variables
Probability Density Function (PDF) & CDF A continuous random variable \(X\) has PDF \(f_X(x) \ge 0\) with \(\int_{-\infty}^{\infty} f_X(x)\,dx = 1\). The CDF is: \[F_X(x) = P(X \le x) = \int_{-\infty}^{x} f_X(t)\,dt\] and \(f_X(x) = F_X'(x)\) wherever the derivative exists.
★ Example
Let \(X\) have PDF \(f(x) = 2x\) for \(0 \le x \le 1\), and 0 otherwise. Find \(E[X]\) and \(\text{Var}(X)\).
\(E[X] = \int_0^1 x \cdot 2x\,dx = 2\int_0^1 x^2\,dx = \frac{2}{3}\).
\(E[X^2] = \int_0^1 x^2 \cdot 2x\,dx = 2\int_0^1 x^3\,dx = \frac{1}{2}\).
\(\text{Var}(X) = \frac{1}{2} - \left(\frac{2}{3}\right)^2 = \frac{1}{2} - \frac{4}{9} = \frac{1}{18}\).
03
Standard Distributions

The most important probability distributions for GATE: Binomial, Poisson, Normal, Exponential, and Uniform.

Discrete Distributions
Binomial Distribution \(X \sim \text{Bin}(n,p)\) \[P(X=k) = \binom{n}{k}p^k(1-p)^{n-k}, \quad k=0,1,\ldots,n\] \(E[X] = np\), \(\text{Var}(X) = np(1-p)\).
Poisson Distribution \(X \sim \text{Poi}(\lambda)\) \[P(X=k) = \frac{e^{-\lambda}\lambda^k}{k!}, \quad k = 0,1,2,\ldots\] \(E[X] = \text{Var}(X) = \lambda\). Arises as the limit of \(\text{Bin}(n,p)\) with \(np \to \lambda\).
Continuous Distributions
Normal Distribution \(X \sim N(\mu, \sigma^2)\) \[f(x) = \frac{1}{\sigma\sqrt{2\pi}}\exp\!\left(-\frac{(x-\mu)^2}{2\sigma^2}\right), \quad x \in \mathbb{R}\] \(E[X] = \mu\), \(\text{Var}(X) = \sigma^2\). The standard normal \(Z = \frac{X-\mu}{\sigma} \sim N(0,1)\).
Exponential Distribution \(X \sim \text{Exp}(\lambda)\) \[f(x) = \lambda e^{-\lambda x}, \quad x \ge 0\] \(E[X] = 1/\lambda\), \(\text{Var}(X) = 1/\lambda^2\). Key property: memoryless — \(P(X > s+t \mid X > s) = P(X > t)\).
Uniform Distribution \(X \sim U(a,b)\) \[f(x) = \frac{1}{b-a}, \quad a \le x \le b\] \(E[X] = \frac{a+b}{2}\), \(\text{Var}(X) = \frac{(b-a)^2}{12}\).
★ Example
If \(X \sim N(5, 4)\), find \(P(X > 7)\).
\(Z = \frac{X-5}{2}\). So \(P(X>7) = P\!\left(Z > \frac{7-5}{2}\right) = P(Z > 1) = 1 - \Phi(1) \approx 1 - 0.8413 = 0.1587\).
04
Limit Theorems

The Law of Large Numbers and Central Limit Theorem are the cornerstones of statistical theory, justifying the use of sample means as estimators.

Law of Large Numbers
Weak Law of Large Numbers (WLLN) Let \(X_1, X_2, \ldots\) be i.i.d. with mean \(\mu\) and finite variance. Then for all \(\varepsilon > 0\): \[P\!\left(\left|\bar{X}_n - \mu\right| > \varepsilon\right) \to 0 \quad\text{as } n \to \infty\] where \(\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i\).
Strong Law of Large Numbers (SLLN) Under the same conditions: \[P\!\left(\lim_{n\to\infty}\bar{X}_n = \mu\right) = 1\] Almost sure convergence implies convergence in probability (WLLN).
Central Limit Theorem
CLT (Lindeberg-Lévy) Let \(X_1, X_2, \ldots\) be i.i.d. with mean \(\mu\) and variance \(\sigma^2 < \infty\). Then: \[\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} N(0,1) \quad\text{as } n \to \infty\] Equivalently, \(\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} N(0, \sigma^2)\).
★ Example
A fair coin is tossed 400 times. Approximate \(P(190 \le S_{400} \le 210)\) where \(S_{400}\) is the number of heads.
\(S_{400} \sim \text{Bin}(400, 0.5)\), so \(\mu = 200\), \(\sigma = \sqrt{100} = 10\).
By CLT: \(P(190 \le S \le 210) = P\!\left(\frac{190-200}{10} \le Z \le \frac{210-200}{10}\right) = P(-1 \le Z \le 1)\)
\(= \Phi(1) - \Phi(-1) = 2\Phi(1) - 1 \approx 2(0.8413) - 1 = 0.6827 \approx 68.3\%\).
05
Statistical Inference

Estimation and hypothesis testing: the core tools of statistical inference including MLE, method of moments, confidence intervals, and regression.

Maximum Likelihood Estimation (MLE)
Definition — MLE Given i.i.d. observations \(x_1, \ldots, x_n\) from a distribution with parameter \(\theta\), the MLE is: \[\hat{\theta}_{\text{MLE}} = \arg\max_\theta L(\theta) = \arg\max_\theta \prod_{i=1}^n f(x_i;\theta)\] Equivalently, maximize the log-likelihood \(\ell(\theta) = \sum_{i=1}^n \ln f(x_i;\theta)\).
★ Example
Find the MLE of \(\lambda\) for a Poisson sample \(x_1, \ldots, x_n\).
\(\ell(\lambda) = \sum(-\lambda + x_i\ln\lambda - \ln x_i!) = -n\lambda + \ln\lambda\sum x_i - \text{const}\).
Setting \(\ell'(\lambda) = -n + \frac{\sum x_i}{\lambda} = 0\) gives \(\hat{\lambda}_{\text{MLE}} = \bar{x}\).
Method of Moments & Confidence Intervals
Method of Moments Equate population moments to sample moments: \[E[X^k] = \frac{1}{n}\sum_{i=1}^n X_i^k, \quad k = 1, 2, \ldots\] Solve for the unknown parameters.
Confidence Interval for Normal Mean If \(X_1,\ldots,X_n \sim N(\mu,\sigma^2)\) with \(\sigma\) known, a \(100(1-\alpha)\%\) CI for \(\mu\) is: \[\bar{X} \pm z_{\alpha/2}\,\frac{\sigma}{\sqrt{n}}\] With \(\sigma\) unknown, replace \(z_{\alpha/2}\) by \(t_{\alpha/2,n-1}\) and \(\sigma\) by \(S\).
Hypothesis Testing & Regression
Hypothesis Testing Framework
  • Null hypothesis \(H_0\) vs. alternative \(H_1\)
  • Type I error (reject true \(H_0\)): probability \(\alpha\) (significance level)
  • Type II error (fail to reject false \(H_0\)): probability \(\beta\); Power = \(1 - \beta\)
  • p-value: smallest \(\alpha\) at which \(H_0\) would be rejected
Simple Linear Regression Model: \(Y_i = \beta_0 + \beta_1 x_i + \varepsilon_i\), \(\varepsilon_i \sim N(0,\sigma^2)\). The OLS estimators are: \[\hat{\beta}_1 = \frac{\sum(x_i - \bar{x})(Y_i - \bar{Y})}{\sum(x_i - \bar{x})^2}, \quad \hat{\beta}_0 = \bar{Y} - \hat{\beta}_1\bar{x}\]
★ Key Takeaways
📝 Practice Problems
Problem 1
If \(P(A) = 0.3\), \(P(B) = 0.4\), and \(P(A \cap B) = 0.12\), are \(A\) and \(B\) independent?
Show Solution ▼
\(P(A)P(B) = 0.3 \times 0.4 = 0.12 = P(A \cap B)\). Yes, \(A\) and \(B\) are independent.
Problem 2
Find the MGF of \(X \sim \text{Exp}(\lambda)\) and use it to compute \(E[X^2]\).
Show Solution ▼
\(M_X(t) = E[e^{tX}] = \int_0^\infty \lambda e^{-(lambda-t)x}\,dx = \frac{\lambda}{\lambda-t}\) for \(t < \lambda\). \(M'(t) = \frac{\lambda}{(\lambda-t)^2}\), \(M''(t) = \frac{2\lambda}{(\lambda-t)^3}\). So \(E[X^2] = M''(0) = \frac{2}{\lambda^2}\).
Problem 3
Find the MLE of \(p\) for a Bernoulli sample \(x_1, \ldots, x_n\).
Show Solution ▼
\(L(p) = p^{\sum x_i}(1-p)^{n-\sum x_i}\). \(\ell(p) = \sum x_i \ln p + (n - \sum x_i)\ln(1-p)\). Setting \(\ell'(p) = 0\): \(\frac{\sum x_i}{p} - \frac{n - \sum x_i}{1-p} = 0\), giving \(\hat{p} = \bar{x}\).
Problem 4
Using Chebyshev's inequality, bound \(P(|X - \mu| \ge 3\sigma)\).
Show Solution ▼
Chebyshev: \(P(|X-\mu| \ge k\sigma) \le \frac{1}{k^2}\). With \(k=3\): \(P(|X-\mu| \ge 3\sigma) \le \frac{1}{9} \approx 0.111\).
Problem 5
A 95% CI for \(\mu\) based on \(n=25\) observations has width 4. What sample size is needed to halve the width?
Show Solution ▼
Width \(= 2z_{0.025}\frac{\sigma}{\sqrt{n}}\). Halving the width requires doubling \(\sqrt{n}\), i.e., quadrupling \(n\). So \(n = 4 \times 25 = 100\).
🎯 Interactive Quiz
1. If \(X \sim \text{Poi}(3)\), then \(\text{Var}(X)\) equals:
A 9
B 3
C \(\sqrt{3}\)
D \(1/3\)
2. The memoryless property is characteristic of which continuous distribution?
A Normal
B Uniform
C Exponential
D Gamma
3. By the Central Limit Theorem, \(\bar{X}_n\) is approximately normal for large \(n\) provided:
A The population is normally distributed
B The population has finite variance
C The population is symmetric
D \(n > 30\) exactly
4. The MLE is known to be:
A Always unbiased
B Consistent and asymptotically efficient
C Always minimum variance
D Always inferior to method of moments
5. In a hypothesis test at level \(\alpha = 0.05\), a p-value of 0.03 means:
A Reject \(H_0\) since \(p < \alpha\)
B Fail to reject \(H_0\)
C There is a 3% chance \(H_0\) is true
D The test is inconclusive