← Back to CSIR NET

Statistical Inference

A rigorous treatment of the theory of statistical inference — from sufficiency and completeness to Bayesian methods and the general linear model.

Sufficiency & Completeness Point Estimation Hypothesis Testing Confidence Intervals Bayesian Inference Linear Models

0 of 6 units completed0%

Sufficiency & Completeness

Data reduction through sufficient statistics, minimal sufficiency, completeness, and the exponential family.

📐 Sufficient Statistics

Definition — Sufficient Statistic A statistic \(T(\mathbf{X})\) is sufficient for \(\theta\) if the conditional distribution of \(\mathbf{X}\) given \(T(\mathbf{X}) = t\) does not depend on \(\theta\). Intuitively, \(T\) captures all the information in \(\mathbf{X}\) about \(\theta\).

Theorem — Fisher-Neyman Factorization \(T(\mathbf{X})\) is sufficient for \(\theta\) if and only if the joint density/pmf can be factored as: \[f(\mathbf{x}|\theta) = g(T(\mathbf{x}), \theta) \cdot h(\mathbf{x})\] where \(g\) depends on \(\mathbf{x}\) only through \(T(\mathbf{x})\) and \(h\) does not depend on \(\theta\).

Definition — Minimal Sufficiency A sufficient statistic \(T\) is minimal sufficient if it is a function of every other sufficient statistic. Equivalently, \(T(\mathbf{x}) = T(\mathbf{y})\) if and only if \(f(\mathbf{x}|\theta)/f(\mathbf{y}|\theta)\) is constant in \(\theta\).

Definition — Completeness A statistic \(T\) is complete for the family \(\{f_\theta\}\) if: \[E_\theta[g(T)] = 0 \;\text{ for all }\theta \implies P_\theta(g(T) = 0) = 1 \;\text{ for all }\theta\] In words, the only unbiased estimator of zero based on \(T\) is the zero function. A complete sufficient statistic is automatically minimal sufficient.

Theorem — Basu's Theorem If \(T\) is a complete sufficient statistic and \(V\) is an ancillary statistic (its distribution does not depend on \(\theta\)), then \(T\) and \(V\) are independent.

Definition — Exponential Family A family of distributions belongs to the k-parameter exponential family if: \[f(x|\boldsymbol{\theta}) = h(x)c(\boldsymbol{\theta})\exp\!\left(\sum_{j=1}^k \eta_j(\boldsymbol{\theta})T_j(x)\right)\] The statistic \(\mathbf{T} = \left(\sum T_1(X_i), \ldots, \sum T_k(X_i)\right)\) is complete sufficient for \(\boldsymbol{\theta}\) (under regularity conditions on the natural parameter space).

Example

For \(X_1, \ldots, X_n\) i.i.d. \(N(\mu, \sigma^2)\) with both parameters unknown, find a complete sufficient statistic.

The joint density is \(\propto (\sigma^2)^{-n/2}\exp\left(-\frac{1}{2\sigma^2}\sum x_i^2 + \frac{\mu}{\sigma^2}\sum x_i - \frac{n\mu^2}{2\sigma^2}\right)\). This is a 2-parameter exponential family with natural sufficient statistics \(T_1 = \sum X_i\) and \(T_2 = \sum X_i^2\). Equivalently, \((\bar{X}, S^2)\) is complete sufficient for \((\mu, \sigma^2)\). By Basu's theorem, \(\bar{X}\) and \(S^2\) are independent.

Point Estimation

UMVUE via Rao-Blackwell and Lehmann-Scheffe, method of moments, MLE, and Fisher information.

🎯 UMVUE & Optimality

Theorem — Rao-Blackwell If \(T\) is an unbiased estimator of \(\tau(\theta)\) and \(S\) is a sufficient statistic, then \(T^* = E[T|S]\) satisfies:

\(E[T^*] = \tau(\theta)\) (unbiased)
\(\text{Var}(T^*) \le \text{Var}(T)\) for all \(\theta\), with equality iff \(T\) is already a function of \(S\)

The improvement comes from removing noise that is independent of \(\theta\).

Theorem — Lehmann-Scheffe If \(S\) is a complete sufficient statistic and \(T^* = g(S)\) is any unbiased estimator of \(\tau(\theta)\) that is a function of \(S\), then \(T^*\) is the unique UMVUE (Uniformly Minimum Variance Unbiased Estimator) of \(\tau(\theta)\).

Theorem — Cramer-Rao Lower Bound Under regularity conditions, for any unbiased estimator \(T\) of \(\tau(\theta)\): \[\text{Var}_\theta(T) \ge \frac{[\tau'(\theta)]^2}{nI(\theta)}\] where \(I(\theta) = E\!\left[\left(\frac{\partial}{\partial\theta}\ln f(X|\theta)\right)^2\right] = -E\!\left[\frac{\partial^2}{\partial\theta^2}\ln f(X|\theta)\right]\) is the Fisher information for a single observation. Equality holds iff \(T - \tau(\theta) = a(\theta)\frac{\partial}{\partial\theta}\ln f(\mathbf{x}|\theta)\).

📊 MLE & Method of Moments

Definition — Maximum Likelihood Estimator The MLE \(\hat{\theta}_{MLE} = \arg\max_\theta L(\theta|\mathbf{x})\) where \(L(\theta|\mathbf{x}) = \prod_{i=1}^n f(x_i|\theta)\). Properties of the MLE (under regularity conditions):

Consistency: \(\hat{\theta}_n \xrightarrow{P} \theta_0\)
Asymptotic normality: \(\sqrt{n}(\hat{\theta}_n - \theta_0) \xrightarrow{d} N(0, 1/I(\theta_0))\)
Asymptotic efficiency: achieves the CRLB asymptotically
Invariance: if \(\hat{\theta}\) is the MLE of \(\theta\), then \(g(\hat{\theta})\) is the MLE of \(g(\theta)\)

Definition — Method of Moments Equate population moments \(\mu_k' = E[X^k]\) to sample moments \(m_k' = \frac{1}{n}\sum X_i^k\) and solve for parameters. For \(k\) parameters, use the first \(k\) moment equations. Method of moments estimators are consistent but generally less efficient than MLEs.

Example

For \(X_1, \ldots, X_n \sim \text{Poisson}(\lambda)\), find the UMVUE of \(P(X = 0) = e^{-\lambda}\).

The complete sufficient statistic is \(T = \sum X_i \sim \text{Poisson}(n\lambda)\). We need an unbiased estimator of \(e^{-\lambda}\) that is a function of \(T\).
Consider \(\delta(T) = E[I(X_1 = 0)|T = t] = \frac{P(X_1=0, \sum_{i=2}^n X_i = t)}{P(T = t)}\).
\(= \frac{e^{-\lambda} \cdot e^{-(n-1)\lambda}[(n-1)\lambda]^t/t!}{e^{-n\lambda}(n\lambda)^t/t!} = \left(\frac{n-1}{n}\right)^t = \left(1 - \frac{1}{n}\right)^T\).
By Lehmann-Scheffe, \(\delta(T) = (1 - 1/n)^T\) is the UMVUE of \(e^{-\lambda}\).

Testing of Hypotheses

Neyman-Pearson theory, UMP tests, likelihood ratio tests, and error analysis.

⚡ Neyman-Pearson Framework

Theorem — Neyman-Pearson Lemma For testing simple hypotheses \(H_0: \theta = \theta_0\) vs \(H_1: \theta = \theta_1\), the most powerful test of size \(\alpha\) has the rejection region: \[\left\{\mathbf{x}: \frac{L(\theta_1|\mathbf{x})}{L(\theta_0|\mathbf{x})} > k\right\}\] where \(k\) is chosen so that \(P_{\theta_0}(\text{reject}) = \alpha\). This test maximizes the power \(\beta(\theta_1) = P_{\theta_1}(\text{reject})\) among all tests of size \(\le \alpha\).

Definition — UMP Tests & Monotone Likelihood Ratio A family \(\{f(x|\theta)\}\) has monotone likelihood ratio (MLR) in \(T(x)\) if \(f(x|\theta_1)/f(x|\theta_0)\) is non-decreasing in \(T(x)\) whenever \(\theta_1 > \theta_0\). For MLR families:

The test rejecting when \(T > c\) is UMP for \(H_0: \theta \le \theta_0\) vs \(H_1: \theta > \theta_0\)
The power function \(\beta(\theta)\) is non-decreasing
No UMP test exists for two-sided alternatives in general

Definition — Unbiased Tests A test \(\phi\) is unbiased if its power function satisfies \(\beta(\theta) \ge \alpha\) for all \(\theta \in \Theta_1\). For two-sided testing in exponential families, the UMP unbiased (UMPU) test exists and rejects when \(T < c_1\) or \(T > c_2\).

📉 Likelihood Ratio Tests

Theorem — Generalized Likelihood Ratio Test & Wilks' Theorem The generalized likelihood ratio statistic is: \[\Lambda = \frac{\sup_{\theta \in \Theta_0}L(\theta|\mathbf{x})}{\sup_{\theta \in \Theta}L(\theta|\mathbf{x})}\] Reject \(H_0\) when \(\Lambda \le c_\alpha\). Under \(H_0\) and regularity conditions (Wilks' theorem): \[-2\ln\Lambda \xrightarrow{d} \chi^2_r\] where \(r = \dim(\Theta) - \dim(\Theta_0)\). This provides a large-sample test for composite hypotheses.

Definition — Power Function & Error Types The power function is \(\beta(\theta) = P_\theta(\text{reject } H_0)\).

Type I error: \(\alpha = \sup_{\theta \in \Theta_0}\beta(\theta)\) (rejecting true \(H_0\))
Type II error: \(\beta = 1 - \beta(\theta_1)\) for specific \(\theta_1 \in \Theta_1\) (failing to reject false \(H_0\))
p-value: smallest \(\alpha\) at which the test rejects — \(p = P_{\theta_0}(T \ge t_{obs})\)

Example

For \(X_1, \ldots, X_n\) i.i.d. \(N(\mu, \sigma_0^2)\) with known \(\sigma_0^2\), derive the LRT for \(H_0: \mu = \mu_0\) vs \(H_1: \mu \neq \mu_0\).

Under \(H_0\): \(L(\mu_0) = \prod \phi((x_i - \mu_0)/\sigma_0)/\sigma_0\). Under \(\Theta\): MLE is \(\hat{\mu} = \bar{x}\). \[-2\ln\Lambda = \frac{n(\bar{x} - \mu_0)^2}{\sigma_0^2} = z^2\] where \(z = \sqrt{n}(\bar{x} - \mu_0)/\sigma_0 \sim N(0,1)\) under \(H_0\). So \(-2\ln\Lambda \sim \chi^2_1\). Reject when \(|z| > z_{\alpha/2}\). This is the standard two-sided z-test.

Confidence Intervals

Pivotal quantities, exact and large-sample confidence intervals, and confidence regions.

📏 Pivotal Quantities & Exact Intervals

Definition — Pivotal Quantity A random variable \(Q(\mathbf{X}, \theta)\) is a pivot if its distribution does not depend on \(\theta\) or any other unknown parameters. If \(P(a \le Q \le b) = 1 - \alpha\), then inverting the inequality gives a \((1-\alpha)\) confidence interval for \(\theta\).

Theorem — Exact Confidence Intervals for Normal Populations For \(X_1, \ldots, X_n\) i.i.d. \(N(\mu, \sigma^2)\):

CI for \(\mu\) (\(\sigma^2\) known): Pivot \(Z = \sqrt{n}(\bar{X}-\mu)/\sigma \sim N(0,1)\). CI: \(\bar{X} \pm z_{\alpha/2}\sigma/\sqrt{n}\).
CI for \(\mu\) (\(\sigma^2\) unknown): Pivot \(T = \sqrt{n}(\bar{X}-\mu)/S \sim t_{n-1}\). CI: \(\bar{X} \pm t_{\alpha/2,n-1} S/\sqrt{n}\).
CI for \(\sigma^2\): Pivot \(\chi^2 = (n-1)S^2/\sigma^2 \sim \chi^2_{n-1}\). CI: \(\left(\frac{(n-1)S^2}{\chi^2_{\alpha/2}},\; \frac{(n-1)S^2}{\chi^2_{1-\alpha/2}}\right)\).

Definition — Confidence Interval for Difference of Means For two independent normal samples with unknown but equal variances: \[(\bar{X} - \bar{Y}) \pm t_{\alpha/2, n_1+n_2-2} \cdot S_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}\] where \(S_p^2 = \frac{(n_1-1)S_1^2 + (n_2-1)S_2^2}{n_1 + n_2 - 2}\) is the pooled variance.

For unequal variances, the Welch-Satterthwaite approximation is used with approximate df \(\nu = \frac{(S_1^2/n_1 + S_2^2/n_2)^2}{(S_1^2/n_1)^2/(n_1-1) + (S_2^2/n_2)^2/(n_2-1)}\).

Example

From a sample of size \(n = 25\) from \(N(\mu, \sigma^2)\), \(\bar{x} = 50\) and \(s = 8\). Construct a 95% CI for \(\mu\) and a 95% CI for \(\sigma^2\).

For \(\mu\): \(t_{0.025, 24} = 2.064\). CI: \(50 \pm 2.064 \times 8/\sqrt{25} = 50 \pm 3.30 = (46.70, 53.30)\).
For \(\sigma^2\): \(\chi^2_{0.025, 24} = 39.364\), \(\chi^2_{0.975, 24} = 12.401\). CI: \(\left(\frac{24 \times 64}{39.364},\; \frac{24 \times 64}{12.401}\right) = (39.02, 123.86)\).

Bayesian Inference

Prior distributions, posterior analysis, Bayes estimators, credible intervals, and predictive distributions.

🔮 The Bayesian Framework

Theorem — Bayes' Theorem (Continuous Parameter) Given prior \(\pi(\theta)\) and likelihood \(f(\mathbf{x}|\theta)\), the posterior is: \[\pi(\theta|\mathbf{x}) = \frac{f(\mathbf{x}|\theta)\pi(\theta)}{m(\mathbf{x})} \propto f(\mathbf{x}|\theta)\pi(\theta)\] where \(m(\mathbf{x}) = \int f(\mathbf{x}|\theta)\pi(\theta)\,d\theta\) is the marginal likelihood (normalizing constant).

Definition — Conjugate & Non-Informative Priors

Conjugate prior: posterior belongs to the same family as the prior. Key pairs:
- Binomial \(+\) Beta(\(\alpha,\beta\)) \(\to\) Beta(\(\alpha+x, \beta+n-x\))
- Normal (known \(\sigma^2\)) \(+\) Normal \(\to\) Normal posterior
- Poisson \(+\) Gamma(\(\alpha,\beta\)) \(\to\) Gamma(\(\alpha+\sum x_i, \beta+n\))
- Exponential \(+\) Gamma(\(\alpha,\beta\)) \(\to\) Gamma(\(\alpha+n, \beta+\sum x_i\))
Jeffreys' prior: \(\pi(\theta) \propto \sqrt{I(\theta)}\), invariant under reparametrization.
Non-informative (flat) prior: \(\pi(\theta) \propto 1\), may be improper.

Theorem — Bayes Estimators Under Different Loss Functions

Squared error loss \(L(\theta, a) = (\theta-a)^2\): Bayes estimator = posterior mean \(E[\theta|\mathbf{x}]\).
Absolute error loss \(L(\theta, a) = |\theta-a|\): Bayes estimator = posterior median.
0-1 loss (for estimation): Bayes estimator = posterior mode (MAP estimator).

Definition — Credible Intervals & Bayes Factors A \((1-\alpha)\) credible interval \((a, b)\) satisfies \(P(\theta \in (a,b)|\mathbf{x}) = 1-\alpha\). The Highest Posterior Density (HPD) interval is the shortest such interval.

The Bayes factor for comparing \(H_0: \theta \in \Theta_0\) vs \(H_1: \theta \in \Theta_1\) is: \[B_{01} = \frac{m_0(\mathbf{x})}{m_1(\mathbf{x})} = \frac{\int_{\Theta_0} f(\mathbf{x}|\theta)\pi_0(\theta)\,d\theta}{\int_{\Theta_1} f(\mathbf{x}|\theta)\pi_1(\theta)\,d\theta}\] \(B_{01} > 1\) favors \(H_0\); \(B_{01} < 1\) favors \(H_1\).

Definition — Predictive Distribution The posterior predictive distribution of a future observation \(X_{n+1}\) given data \(\mathbf{x}\) is: \[f(x_{n+1}|\mathbf{x}) = \int f(x_{n+1}|\theta)\pi(\theta|\mathbf{x})\,d\theta\] This integrates out parameter uncertainty.

Example

Let \(X_1, \ldots, X_n\) i.i.d. \(N(\mu, 1)\) with prior \(\mu \sim N(\mu_0, \tau^2)\). Find the posterior distribution and the Bayes estimator under squared error loss.

Posterior: \(\mu|\mathbf{x} \sim N(\mu_n, \tau_n^2)\) where: \[\tau_n^2 = \frac{1}{n + 1/\tau^2} = \frac{\tau^2}{n\tau^2 + 1}, \quad \mu_n = \tau_n^2\left(\frac{\mu_0}{\tau^2} + n\bar{x}\right) = \frac{\mu_0/\tau^2 + n\bar{x}}{1/\tau^2 + n}\] The Bayes estimator is \(\hat{\mu} = \mu_n\), a weighted average of prior mean \(\mu_0\) and sample mean \(\bar{x}\) with weights \(1/\tau^2\) and \(n\). As \(n \to \infty\), \(\hat{\mu} \to \bar{x}\) (data dominates).

Linear Models & Regression

Simple and multiple linear regression, Gauss-Markov theorem, ANOVA, and ANCOVA as special cases of the general linear model.

📈 Regression Theory

Definition — General Linear Model The general linear model is: \[\mathbf{Y} = X\boldsymbol{\beta} + \boldsymbol{\varepsilon}, \quad \boldsymbol{\varepsilon} \sim N(\mathbf{0}, \sigma^2 I_n)\] where \(\mathbf{Y}\) is \(n \times 1\), \(X\) is \(n \times p\) (design matrix), \(\boldsymbol{\beta}\) is \(p \times 1\). The OLS estimator: \[\hat{\boldsymbol{\beta}} = (X^TX)^{-1}X^T\mathbf{Y}\] The hat matrix \(H = X(X^TX)^{-1}X^T\) gives fitted values \(\hat{\mathbf{Y}} = H\mathbf{Y}\). Residuals: \(\mathbf{e} = (I - H)\mathbf{Y}\).

Theorem — Gauss-Markov Under the assumptions \(E[\boldsymbol{\varepsilon}] = \mathbf{0}\) and \(\text{Cov}(\boldsymbol{\varepsilon}) = \sigma^2 I\) (no normality required), the OLS estimator \(\hat{\boldsymbol{\beta}}\) is BLUE:

\(E[\hat{\boldsymbol{\beta}}] = \boldsymbol{\beta}\) (unbiased)
For any other linear unbiased estimator \(\tilde{\boldsymbol{\beta}}\): \(\text{Var}(\mathbf{a}^T\tilde{\boldsymbol{\beta}}) \ge \text{Var}(\mathbf{a}^T\hat{\boldsymbol{\beta}})\) for all \(\mathbf{a}\)

\(\text{Cov}(\hat{\boldsymbol{\beta}}) = \sigma^2(X^TX)^{-1}\). The unbiased estimator of \(\sigma^2\) is \(s^2 = \frac{\|\mathbf{e}\|^2}{n-p} = \frac{SS_E}{n-p}\).

Definition — Simple Linear Regression For \(Y_i = \beta_0 + \beta_1 x_i + \varepsilon_i\): \[\hat{\beta}_1 = \frac{\sum(x_i - \bar{x})(Y_i - \bar{Y})}{\sum(x_i - \bar{x})^2} = \frac{S_{xy}}{S_{xx}}, \quad \hat{\beta}_0 = \bar{Y} - \hat{\beta}_1\bar{x}\] \[R^2 = \frac{SS_R}{SS_T} = 1 - \frac{SS_E}{SS_T} = r_{xy}^2\] where \(r_{xy}\) is the sample correlation. The F-test for \(H_0: \beta_1 = 0\) uses \(F = \frac{SS_R/1}{SS_E/(n-2)}\).

🔬 ANOVA & ANCOVA as Linear Models

Theorem — ANOVA as a Linear Model The one-way ANOVA model \(Y_{ij} = \mu + \tau_i + \varepsilon_{ij}\) is a special case of the linear model with indicator (dummy) variables. The design matrix \(X\) has columns for the intercept and treatment indicators. The F-test for \(H_0: \tau_1 = \cdots = \tau_k = 0\) is equivalent to comparing the full model to the reduced model \(Y_{ij} = \mu + \varepsilon_{ij}\): \[F = \frac{(SS_E^{reduced} - SS_E^{full})/(p_{full} - p_{reduced})}{SS_E^{full}/(n - p_{full})}\]

Definition — Analysis of Covariance (ANCOVA) ANCOVA combines ANOVA and regression: \[Y_{ij} = \mu + \tau_i + \beta(x_{ij} - \bar{x}_{\cdot\cdot}) + \varepsilon_{ij}\] It adjusts treatment means for the effect of a continuous covariate \(x\). The adjusted treatment means are \(\bar{Y}_{i\cdot} - \hat{\beta}(\bar{x}_{i\cdot} - \bar{x}_{\cdot\cdot})\). ANCOVA is more powerful than ANOVA when the covariate explains substantial variation.

Example

In a multiple regression with \(p = 3\) predictors and \(n = 20\), \(R^2 = 0.75\). Test the overall significance of the regression.

\(F = \frac{R^2/p}{(1-R^2)/(n-p-1)} = \frac{0.75/3}{0.25/16} = \frac{0.25}{0.015625} = 16\). With df \(= (3, 16)\), \(F_{0.05, 3, 16} = 3.24\). Since \(16 > 3.24\), the regression is highly significant. At least one predictor has a non-zero coefficient.

Key Takeaways

The Fisher-Neyman factorization theorem is the primary tool for identifying sufficient statistics; completeness ensures uniqueness of UMVUE via Lehmann-Scheffe.
Rao-Blackwell improves any unbiased estimator by conditioning on a sufficient statistic; the CRLB provides the efficiency benchmark.
The Neyman-Pearson lemma gives the most powerful test for simple hypotheses; MLR families yield UMP one-sided tests.
Pivotal quantities provide exact confidence intervals; Wilks' theorem gives large-sample intervals via the LRT.
Conjugate priors lead to tractable Bayesian updates; Jeffreys' prior provides objective, reparametrization-invariant inference.
The Gauss-Markov theorem guarantees OLS is BLUE without normality; ANOVA and ANCOVA are special cases of the general linear model.

Practice Problems

Problem 1

For \(X_1, \ldots, X_n\) i.i.d. \(\text{Exp}(\theta)\), find a complete sufficient statistic and the UMVUE of \(\theta\).

Show Solution ▼

\(f(\mathbf{x}|\theta) = \theta^n e^{-\theta\sum x_i}\). By factorization, \(T = \sum X_i\) is sufficient. Since exponential is a 1-parameter exponential family, \(T\) is complete. \(E[\bar{X}] = 1/\theta\), so the UMVUE of \(1/\theta\) is \(\bar{X}\). For \(\theta\) itself: since \(2\theta T \sim \chi^2_{2n}\), \(E[1/T] = \theta/(n-1)\) (using the inverse chi-squared moment), so the UMVUE of \(\theta\) is \((n-1)/T = (n-1)/(n\bar{X})\).

Problem 2

For \(X \sim \text{Bin}(n, p)\), show that the family has MLR in \(X\) and find the UMP test for \(H_0: p \le p_0\) vs \(H_1: p > p_0\).

Show Solution ▼

For \(p_1 > p_0\): \(\frac{f(x|p_1)}{f(x|p_0)} = \left(\frac{p_1(1-p_0)}{p_0(1-p_1)}\right)^x \cdot \left(\frac{1-p_1}{1-p_0}\right)^n\). Since \(p_1 > p_0\) implies \(\frac{p_1(1-p_0)}{p_0(1-p_1)} > 1\), the ratio is increasing in \(x\). So the family has MLR in \(X\). The UMP test rejects when \(X > c\) where \(c\) satisfies \(P_{p_0}(X > c) = \alpha\). For exact size \(\alpha\), randomization at \(X = c\) may be needed.

Problem 3

Derive Jeffreys' prior for the Bernoulli parameter \(p\) and identify the resulting posterior distribution given \(x\) successes in \(n\) trials.

Show Solution ▼

Fisher information: \(I(p) = \frac{1}{p(1-p)}\). Jeffreys' prior: \(\pi(p) \propto \sqrt{I(p)} = [p(1-p)]^{-1/2}\), which is \(\text{Beta}(1/2, 1/2)\). Posterior: \(p|x \sim \text{Beta}(x + 1/2, n - x + 1/2)\). The posterior mean is \(\frac{x + 1/2}{n + 1}\), which shrinks toward 1/2 compared to the MLE \(x/n\).

Problem 4

In simple linear regression with \(n = 12\), \(\hat{\beta}_1 = 2.5\), \(SE(\hat{\beta}_1) = 0.6\). Construct a 95% confidence interval for \(\beta_1\) and test \(H_0: \beta_1 = 0\).

Show Solution ▼

df = \(n - 2 = 10\). \(t_{0.025, 10} = 2.228\). CI: \(2.5 \pm 2.228 \times 0.6 = 2.5 \pm 1.337 = (1.163, 3.837)\). Test: \(t = 2.5/0.6 = 4.167\). Since \(|t| = 4.167 > 2.228\), reject \(H_0\) at \(\alpha = 0.05\). The slope is significantly different from zero.

Problem 5

Show that Basu's theorem implies \(\bar{X}\) and \(S^2\) are independent when sampling from \(N(\mu, \sigma^2)\) with \(\sigma^2\) known.

Show Solution ▼

When \(\sigma^2\) is known, the family is \(\{N(\mu, \sigma^2): \mu \in \mathbb{R}\}\). \(\bar{X}\) is a complete sufficient statistic for \(\mu\) (one-parameter exponential family). \(S^2 = \frac{1}{n-1}\sum(X_i - \bar{X})^2\) has distribution \(\frac{\sigma^2}{n-1}\chi^2_{n-1}\), which does not depend on \(\mu\), so \(S^2\) is ancillary. By Basu's theorem, \(\bar{X}\) and \(S^2\) are independent. (When \(\sigma^2\) is also unknown, one needs the joint complete sufficient statistic \((\bar{X}, \sum X_i^2)\) and a different argument.)

Self-Assessment Quiz

1. The Lehmann-Scheffe theorem states that an unbiased estimator that is a function of a complete sufficient statistic is:

A Consistent

B The unique UMVUE

C The MLE

D Admissible

2. The MLE is invariant under reparametrization. If \(\hat{\theta}\) is the MLE of \(\theta\), then the MLE of \(e^\theta\) is:

A \(e^{\hat{\theta}}\)

B \(\hat{\theta} \cdot e\)

C Requires re-maximizing the likelihood

D Cannot be determined without the delta method

3. In the Neyman-Pearson framework, a UMP test for \(H_0: \theta \le \theta_0\) vs \(H_1: \theta > \theta_0\) exists when:

A The likelihood ratio test is used

B The family has monotone likelihood ratio

C The distribution is normal

D The sample size is sufficiently large

4. The Bayes estimator under squared error loss is the:

A Posterior mode

B Posterior median

C Posterior mean

D Prior mean

5. The Gauss-Markov theorem requires which of the following assumptions?

A Normality of errors

B Uncorrelated errors with constant variance

C Independent errors with known distribution

D i.i.d. errors