Statistical Inference
A rigorous treatment of the theory of statistical inference — from sufficiency and completeness to Bayesian methods and the general linear model.
Sufficiency & Completeness
Point Estimation
Hypothesis Testing
Confidence Intervals
Bayesian Inference
Linear Models
0 of 6 units completed0%
01
Sufficiency & Completeness
Data reduction through sufficient statistics, minimal sufficiency, completeness, and the exponential family.
Sufficient Statistics
Definition — Sufficient Statistic
A statistic \(T(\mathbf{X})\) is sufficient for \(\theta\) if the conditional distribution of \(\mathbf{X}\) given \(T(\mathbf{X}) = t\) does not depend on \(\theta\). Intuitively, \(T\) captures all the information in \(\mathbf{X}\) about \(\theta\).
Theorem — Fisher-Neyman Factorization
\(T(\mathbf{X})\) is sufficient for \(\theta\) if and only if the joint density/pmf can be factored as:
\[f(\mathbf{x}|\theta) = g(T(\mathbf{x}), \theta) \cdot h(\mathbf{x})\]
where \(g\) depends on \(\mathbf{x}\) only through \(T(\mathbf{x})\) and \(h\) does not depend on \(\theta\).
Definition — Minimal Sufficiency
A sufficient statistic \(T\) is minimal sufficient if it is a function of every other sufficient statistic. Equivalently, \(T(\mathbf{x}) = T(\mathbf{y})\) if and only if \(f(\mathbf{x}|\theta)/f(\mathbf{y}|\theta)\) is constant in \(\theta\).
Definition — Completeness
A statistic \(T\) is complete for the family \(\{f_\theta\}\) if:
\[E_\theta[g(T)] = 0 \;\text{ for all }\theta \implies P_\theta(g(T) = 0) = 1 \;\text{ for all }\theta\]
In words, the only unbiased estimator of zero based on \(T\) is the zero function. A complete sufficient statistic is automatically minimal sufficient.
Theorem — Basu's Theorem
If \(T\) is a complete sufficient statistic and \(V\) is an ancillary statistic (its distribution does not depend on \(\theta\)), then \(T\) and \(V\) are independent.
Definition — Exponential Family
A family of distributions belongs to the k-parameter exponential family if:
\[f(x|\boldsymbol{\theta}) = h(x)c(\boldsymbol{\theta})\exp\!\left(\sum_{j=1}^k \eta_j(\boldsymbol{\theta})T_j(x)\right)\]
The statistic \(\mathbf{T} = \left(\sum T_1(X_i), \ldots, \sum T_k(X_i)\right)\) is complete sufficient for \(\boldsymbol{\theta}\) (under regularity conditions on the natural parameter space).
Example
For \(X_1, \ldots, X_n\) i.i.d. \(N(\mu, \sigma^2)\) with both parameters unknown, find a complete sufficient statistic.
The joint density is \(\propto (\sigma^2)^{-n/2}\exp\left(-\frac{1}{2\sigma^2}\sum x_i^2 + \frac{\mu}{\sigma^2}\sum x_i - \frac{n\mu^2}{2\sigma^2}\right)\). This is a 2-parameter exponential family with natural sufficient statistics \(T_1 = \sum X_i\) and \(T_2 = \sum X_i^2\). Equivalently, \((\bar{X}, S^2)\) is complete sufficient for \((\mu, \sigma^2)\). By Basu's theorem, \(\bar{X}\) and \(S^2\) are independent.
02
Point Estimation
UMVUE via Rao-Blackwell and Lehmann-Scheffe, method of moments, MLE, and Fisher information.
UMVUE & Optimality
Theorem — Rao-Blackwell
If \(T\) is an unbiased estimator of \(\tau(\theta)\) and \(S\) is a sufficient statistic, then \(T^* = E[T|S]\) satisfies:
- \(E[T^*] = \tau(\theta)\) (unbiased)
- \(\text{Var}(T^*) \le \text{Var}(T)\) for all \(\theta\), with equality iff \(T\) is already a function of \(S\)
Theorem — Lehmann-Scheffe
If \(S\) is a complete sufficient statistic and \(T^* = g(S)\) is any unbiased estimator of \(\tau(\theta)\) that is a function of \(S\), then \(T^*\) is the unique UMVUE (Uniformly Minimum Variance Unbiased Estimator) of \(\tau(\theta)\).
Theorem — Cramer-Rao Lower Bound
Under regularity conditions, for any unbiased estimator \(T\) of \(\tau(\theta)\):
\[\text{Var}_\theta(T) \ge \frac{[\tau'(\theta)]^2}{nI(\theta)}\]
where \(I(\theta) = E\!\left[\left(\frac{\partial}{\partial\theta}\ln f(X|\theta)\right)^2\right] = -E\!\left[\frac{\partial^2}{\partial\theta^2}\ln f(X|\theta)\right]\) is the Fisher information for a single observation. Equality holds iff \(T - \tau(\theta) = a(\theta)\frac{\partial}{\partial\theta}\ln f(\mathbf{x}|\theta)\).
MLE & Method of Moments
Definition — Maximum Likelihood Estimator
The MLE \(\hat{\theta}_{MLE} = \arg\max_\theta L(\theta|\mathbf{x})\) where \(L(\theta|\mathbf{x}) = \prod_{i=1}^n f(x_i|\theta)\). Properties of the MLE (under regularity conditions):
- Consistency: \(\hat{\theta}_n \xrightarrow{P} \theta_0\)
- Asymptotic normality: \(\sqrt{n}(\hat{\theta}_n - \theta_0) \xrightarrow{d} N(0, 1/I(\theta_0))\)
- Asymptotic efficiency: achieves the CRLB asymptotically
- Invariance: if \(\hat{\theta}\) is the MLE of \(\theta\), then \(g(\hat{\theta})\) is the MLE of \(g(\theta)\)
Definition — Method of Moments
Equate population moments \(\mu_k' = E[X^k]\) to sample moments \(m_k' = \frac{1}{n}\sum X_i^k\) and solve for parameters. For \(k\) parameters, use the first \(k\) moment equations. Method of moments estimators are consistent but generally less efficient than MLEs.
Example
For \(X_1, \ldots, X_n \sim \text{Poisson}(\lambda)\), find the UMVUE of \(P(X = 0) = e^{-\lambda}\).
The complete sufficient statistic is \(T = \sum X_i \sim \text{Poisson}(n\lambda)\). We need an unbiased estimator of \(e^{-\lambda}\) that is a function of \(T\).
Consider \(\delta(T) = E[I(X_1 = 0)|T = t] = \frac{P(X_1=0, \sum_{i=2}^n X_i = t)}{P(T = t)}\).
\(= \frac{e^{-\lambda} \cdot e^{-(n-1)\lambda}[(n-1)\lambda]^t/t!}{e^{-n\lambda}(n\lambda)^t/t!} = \left(\frac{n-1}{n}\right)^t = \left(1 - \frac{1}{n}\right)^T\).
By Lehmann-Scheffe, \(\delta(T) = (1 - 1/n)^T\) is the UMVUE of \(e^{-\lambda}\).
Consider \(\delta(T) = E[I(X_1 = 0)|T = t] = \frac{P(X_1=0, \sum_{i=2}^n X_i = t)}{P(T = t)}\).
\(= \frac{e^{-\lambda} \cdot e^{-(n-1)\lambda}[(n-1)\lambda]^t/t!}{e^{-n\lambda}(n\lambda)^t/t!} = \left(\frac{n-1}{n}\right)^t = \left(1 - \frac{1}{n}\right)^T\).
By Lehmann-Scheffe, \(\delta(T) = (1 - 1/n)^T\) is the UMVUE of \(e^{-\lambda}\).
03
Testing of Hypotheses
Neyman-Pearson theory, UMP tests, likelihood ratio tests, and error analysis.
Neyman-Pearson Framework
Theorem — Neyman-Pearson Lemma
For testing simple hypotheses \(H_0: \theta = \theta_0\) vs \(H_1: \theta = \theta_1\), the most powerful test of size \(\alpha\) has the rejection region:
\[\left\{\mathbf{x}: \frac{L(\theta_1|\mathbf{x})}{L(\theta_0|\mathbf{x})} > k\right\}\]
where \(k\) is chosen so that \(P_{\theta_0}(\text{reject}) = \alpha\). This test maximizes the power \(\beta(\theta_1) = P_{\theta_1}(\text{reject})\) among all tests of size \(\le \alpha\).
Definition — UMP Tests & Monotone Likelihood Ratio
A family \(\{f(x|\theta)\}\) has monotone likelihood ratio (MLR) in \(T(x)\) if \(f(x|\theta_1)/f(x|\theta_0)\) is non-decreasing in \(T(x)\) whenever \(\theta_1 > \theta_0\). For MLR families:
- The test rejecting when \(T > c\) is UMP for \(H_0: \theta \le \theta_0\) vs \(H_1: \theta > \theta_0\)
- The power function \(\beta(\theta)\) is non-decreasing
- No UMP test exists for two-sided alternatives in general
Definition — Unbiased Tests
A test \(\phi\) is unbiased if its power function satisfies \(\beta(\theta) \ge \alpha\) for all \(\theta \in \Theta_1\). For two-sided testing in exponential families, the UMP unbiased (UMPU) test exists and rejects when \(T < c_1\) or \(T > c_2\).
Likelihood Ratio Tests
Theorem — Generalized Likelihood Ratio Test & Wilks' Theorem
The generalized likelihood ratio statistic is:
\[\Lambda = \frac{\sup_{\theta \in \Theta_0}L(\theta|\mathbf{x})}{\sup_{\theta \in \Theta}L(\theta|\mathbf{x})}\]
Reject \(H_0\) when \(\Lambda \le c_\alpha\). Under \(H_0\) and regularity conditions (Wilks' theorem):
\[-2\ln\Lambda \xrightarrow{d} \chi^2_r\]
where \(r = \dim(\Theta) - \dim(\Theta_0)\). This provides a large-sample test for composite hypotheses.
Definition — Power Function & Error Types
The power function is \(\beta(\theta) = P_\theta(\text{reject } H_0)\).
- Type I error: \(\alpha = \sup_{\theta \in \Theta_0}\beta(\theta)\) (rejecting true \(H_0\))
- Type II error: \(\beta = 1 - \beta(\theta_1)\) for specific \(\theta_1 \in \Theta_1\) (failing to reject false \(H_0\))
- p-value: smallest \(\alpha\) at which the test rejects — \(p = P_{\theta_0}(T \ge t_{obs})\)
Example
For \(X_1, \ldots, X_n\) i.i.d. \(N(\mu, \sigma_0^2)\) with known \(\sigma_0^2\), derive the LRT for \(H_0: \mu = \mu_0\) vs \(H_1: \mu \neq \mu_0\).
Under \(H_0\): \(L(\mu_0) = \prod \phi((x_i - \mu_0)/\sigma_0)/\sigma_0\). Under \(\Theta\): MLE is \(\hat{\mu} = \bar{x}\).
\[-2\ln\Lambda = \frac{n(\bar{x} - \mu_0)^2}{\sigma_0^2} = z^2\]
where \(z = \sqrt{n}(\bar{x} - \mu_0)/\sigma_0 \sim N(0,1)\) under \(H_0\). So \(-2\ln\Lambda \sim \chi^2_1\). Reject when \(|z| > z_{\alpha/2}\). This is the standard two-sided z-test.
04
Confidence Intervals
Pivotal quantities, exact and large-sample confidence intervals, and confidence regions.
Pivotal Quantities & Exact Intervals
Definition — Pivotal Quantity
A random variable \(Q(\mathbf{X}, \theta)\) is a pivot if its distribution does not depend on \(\theta\) or any other unknown parameters. If \(P(a \le Q \le b) = 1 - \alpha\), then inverting the inequality gives a \((1-\alpha)\) confidence interval for \(\theta\).
Theorem — Exact Confidence Intervals for Normal Populations
For \(X_1, \ldots, X_n\) i.i.d. \(N(\mu, \sigma^2)\):
- CI for \(\mu\) (\(\sigma^2\) known): Pivot \(Z = \sqrt{n}(\bar{X}-\mu)/\sigma \sim N(0,1)\). CI: \(\bar{X} \pm z_{\alpha/2}\sigma/\sqrt{n}\).
- CI for \(\mu\) (\(\sigma^2\) unknown): Pivot \(T = \sqrt{n}(\bar{X}-\mu)/S \sim t_{n-1}\). CI: \(\bar{X} \pm t_{\alpha/2,n-1} S/\sqrt{n}\).
- CI for \(\sigma^2\): Pivot \(\chi^2 = (n-1)S^2/\sigma^2 \sim \chi^2_{n-1}\). CI: \(\left(\frac{(n-1)S^2}{\chi^2_{\alpha/2}},\; \frac{(n-1)S^2}{\chi^2_{1-\alpha/2}}\right)\).
Definition — Confidence Interval for Difference of Means
For two independent normal samples with unknown but equal variances:
\[(\bar{X} - \bar{Y}) \pm t_{\alpha/2, n_1+n_2-2} \cdot S_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}\]
where \(S_p^2 = \frac{(n_1-1)S_1^2 + (n_2-1)S_2^2}{n_1 + n_2 - 2}\) is the pooled variance.
For unequal variances, the Welch-Satterthwaite approximation is used with approximate df \(\nu = \frac{(S_1^2/n_1 + S_2^2/n_2)^2}{(S_1^2/n_1)^2/(n_1-1) + (S_2^2/n_2)^2/(n_2-1)}\).
For unequal variances, the Welch-Satterthwaite approximation is used with approximate df \(\nu = \frac{(S_1^2/n_1 + S_2^2/n_2)^2}{(S_1^2/n_1)^2/(n_1-1) + (S_2^2/n_2)^2/(n_2-1)}\).
Example
From a sample of size \(n = 25\) from \(N(\mu, \sigma^2)\), \(\bar{x} = 50\) and \(s = 8\). Construct a 95% CI for \(\mu\) and a 95% CI for \(\sigma^2\).
For \(\mu\): \(t_{0.025, 24} = 2.064\). CI: \(50 \pm 2.064 \times 8/\sqrt{25} = 50 \pm 3.30 = (46.70, 53.30)\).
For \(\sigma^2\): \(\chi^2_{0.025, 24} = 39.364\), \(\chi^2_{0.975, 24} = 12.401\). CI: \(\left(\frac{24 \times 64}{39.364},\; \frac{24 \times 64}{12.401}\right) = (39.02, 123.86)\).
For \(\sigma^2\): \(\chi^2_{0.025, 24} = 39.364\), \(\chi^2_{0.975, 24} = 12.401\). CI: \(\left(\frac{24 \times 64}{39.364},\; \frac{24 \times 64}{12.401}\right) = (39.02, 123.86)\).
05
Bayesian Inference
Prior distributions, posterior analysis, Bayes estimators, credible intervals, and predictive distributions.
The Bayesian Framework
Theorem — Bayes' Theorem (Continuous Parameter)
Given prior \(\pi(\theta)\) and likelihood \(f(\mathbf{x}|\theta)\), the posterior is:
\[\pi(\theta|\mathbf{x}) = \frac{f(\mathbf{x}|\theta)\pi(\theta)}{m(\mathbf{x})} \propto f(\mathbf{x}|\theta)\pi(\theta)\]
where \(m(\mathbf{x}) = \int f(\mathbf{x}|\theta)\pi(\theta)\,d\theta\) is the marginal likelihood (normalizing constant).
Definition — Conjugate & Non-Informative Priors
- Conjugate prior: posterior belongs to the same family as the prior. Key pairs:
- Binomial \(+\) Beta(\(\alpha,\beta\)) \(\to\) Beta(\(\alpha+x, \beta+n-x\))
- Normal (known \(\sigma^2\)) \(+\) Normal \(\to\) Normal posterior
- Poisson \(+\) Gamma(\(\alpha,\beta\)) \(\to\) Gamma(\(\alpha+\sum x_i, \beta+n\))
- Exponential \(+\) Gamma(\(\alpha,\beta\)) \(\to\) Gamma(\(\alpha+n, \beta+\sum x_i\))
- Jeffreys' prior: \(\pi(\theta) \propto \sqrt{I(\theta)}\), invariant under reparametrization.
- Non-informative (flat) prior: \(\pi(\theta) \propto 1\), may be improper.
Theorem — Bayes Estimators Under Different Loss Functions
- Squared error loss \(L(\theta, a) = (\theta-a)^2\): Bayes estimator = posterior mean \(E[\theta|\mathbf{x}]\).
- Absolute error loss \(L(\theta, a) = |\theta-a|\): Bayes estimator = posterior median.
- 0-1 loss (for estimation): Bayes estimator = posterior mode (MAP estimator).
Definition — Credible Intervals & Bayes Factors
A \((1-\alpha)\) credible interval \((a, b)\) satisfies \(P(\theta \in (a,b)|\mathbf{x}) = 1-\alpha\). The Highest Posterior Density (HPD) interval is the shortest such interval.
The Bayes factor for comparing \(H_0: \theta \in \Theta_0\) vs \(H_1: \theta \in \Theta_1\) is: \[B_{01} = \frac{m_0(\mathbf{x})}{m_1(\mathbf{x})} = \frac{\int_{\Theta_0} f(\mathbf{x}|\theta)\pi_0(\theta)\,d\theta}{\int_{\Theta_1} f(\mathbf{x}|\theta)\pi_1(\theta)\,d\theta}\] \(B_{01} > 1\) favors \(H_0\); \(B_{01} < 1\) favors \(H_1\).
The Bayes factor for comparing \(H_0: \theta \in \Theta_0\) vs \(H_1: \theta \in \Theta_1\) is: \[B_{01} = \frac{m_0(\mathbf{x})}{m_1(\mathbf{x})} = \frac{\int_{\Theta_0} f(\mathbf{x}|\theta)\pi_0(\theta)\,d\theta}{\int_{\Theta_1} f(\mathbf{x}|\theta)\pi_1(\theta)\,d\theta}\] \(B_{01} > 1\) favors \(H_0\); \(B_{01} < 1\) favors \(H_1\).
Definition — Predictive Distribution
The posterior predictive distribution of a future observation \(X_{n+1}\) given data \(\mathbf{x}\) is:
\[f(x_{n+1}|\mathbf{x}) = \int f(x_{n+1}|\theta)\pi(\theta|\mathbf{x})\,d\theta\]
This integrates out parameter uncertainty.
Example
Let \(X_1, \ldots, X_n\) i.i.d. \(N(\mu, 1)\) with prior \(\mu \sim N(\mu_0, \tau^2)\). Find the posterior distribution and the Bayes estimator under squared error loss.
Posterior: \(\mu|\mathbf{x} \sim N(\mu_n, \tau_n^2)\) where:
\[\tau_n^2 = \frac{1}{n + 1/\tau^2} = \frac{\tau^2}{n\tau^2 + 1}, \quad \mu_n = \tau_n^2\left(\frac{\mu_0}{\tau^2} + n\bar{x}\right) = \frac{\mu_0/\tau^2 + n\bar{x}}{1/\tau^2 + n}\]
The Bayes estimator is \(\hat{\mu} = \mu_n\), a weighted average of prior mean \(\mu_0\) and sample mean \(\bar{x}\) with weights \(1/\tau^2\) and \(n\). As \(n \to \infty\), \(\hat{\mu} \to \bar{x}\) (data dominates).
06
Linear Models & Regression
Simple and multiple linear regression, Gauss-Markov theorem, ANOVA, and ANCOVA as special cases of the general linear model.
Regression Theory
Definition — General Linear Model
The general linear model is:
\[\mathbf{Y} = X\boldsymbol{\beta} + \boldsymbol{\varepsilon}, \quad \boldsymbol{\varepsilon} \sim N(\mathbf{0}, \sigma^2 I_n)\]
where \(\mathbf{Y}\) is \(n \times 1\), \(X\) is \(n \times p\) (design matrix), \(\boldsymbol{\beta}\) is \(p \times 1\). The OLS estimator:
\[\hat{\boldsymbol{\beta}} = (X^TX)^{-1}X^T\mathbf{Y}\]
The hat matrix \(H = X(X^TX)^{-1}X^T\) gives fitted values \(\hat{\mathbf{Y}} = H\mathbf{Y}\). Residuals: \(\mathbf{e} = (I - H)\mathbf{Y}\).
Theorem — Gauss-Markov
Under the assumptions \(E[\boldsymbol{\varepsilon}] = \mathbf{0}\) and \(\text{Cov}(\boldsymbol{\varepsilon}) = \sigma^2 I\) (no normality required), the OLS estimator \(\hat{\boldsymbol{\beta}}\) is BLUE:
- \(E[\hat{\boldsymbol{\beta}}] = \boldsymbol{\beta}\) (unbiased)
- For any other linear unbiased estimator \(\tilde{\boldsymbol{\beta}}\): \(\text{Var}(\mathbf{a}^T\tilde{\boldsymbol{\beta}}) \ge \text{Var}(\mathbf{a}^T\hat{\boldsymbol{\beta}})\) for all \(\mathbf{a}\)
Definition — Simple Linear Regression
For \(Y_i = \beta_0 + \beta_1 x_i + \varepsilon_i\):
\[\hat{\beta}_1 = \frac{\sum(x_i - \bar{x})(Y_i - \bar{Y})}{\sum(x_i - \bar{x})^2} = \frac{S_{xy}}{S_{xx}}, \quad \hat{\beta}_0 = \bar{Y} - \hat{\beta}_1\bar{x}\]
\[R^2 = \frac{SS_R}{SS_T} = 1 - \frac{SS_E}{SS_T} = r_{xy}^2\]
where \(r_{xy}\) is the sample correlation. The F-test for \(H_0: \beta_1 = 0\) uses \(F = \frac{SS_R/1}{SS_E/(n-2)}\).
ANOVA & ANCOVA as Linear Models
Theorem — ANOVA as a Linear Model
The one-way ANOVA model \(Y_{ij} = \mu + \tau_i + \varepsilon_{ij}\) is a special case of the linear model with indicator (dummy) variables. The design matrix \(X\) has columns for the intercept and treatment indicators. The F-test for \(H_0: \tau_1 = \cdots = \tau_k = 0\) is equivalent to comparing the full model to the reduced model \(Y_{ij} = \mu + \varepsilon_{ij}\):
\[F = \frac{(SS_E^{reduced} - SS_E^{full})/(p_{full} - p_{reduced})}{SS_E^{full}/(n - p_{full})}\]
Definition — Analysis of Covariance (ANCOVA)
ANCOVA combines ANOVA and regression:
\[Y_{ij} = \mu + \tau_i + \beta(x_{ij} - \bar{x}_{\cdot\cdot}) + \varepsilon_{ij}\]
It adjusts treatment means for the effect of a continuous covariate \(x\). The adjusted treatment means are \(\bar{Y}_{i\cdot} - \hat{\beta}(\bar{x}_{i\cdot} - \bar{x}_{\cdot\cdot})\). ANCOVA is more powerful than ANOVA when the covariate explains substantial variation.
Example
In a multiple regression with \(p = 3\) predictors and \(n = 20\), \(R^2 = 0.75\). Test the overall significance of the regression.
\(F = \frac{R^2/p}{(1-R^2)/(n-p-1)} = \frac{0.75/3}{0.25/16} = \frac{0.25}{0.015625} = 16\).
With df \(= (3, 16)\), \(F_{0.05, 3, 16} = 3.24\). Since \(16 > 3.24\), the regression is highly significant. At least one predictor has a non-zero coefficient.
Key Takeaways
- The Fisher-Neyman factorization theorem is the primary tool for identifying sufficient statistics; completeness ensures uniqueness of UMVUE via Lehmann-Scheffe.
- Rao-Blackwell improves any unbiased estimator by conditioning on a sufficient statistic; the CRLB provides the efficiency benchmark.
- The Neyman-Pearson lemma gives the most powerful test for simple hypotheses; MLR families yield UMP one-sided tests.
- Pivotal quantities provide exact confidence intervals; Wilks' theorem gives large-sample intervals via the LRT.
- Conjugate priors lead to tractable Bayesian updates; Jeffreys' prior provides objective, reparametrization-invariant inference.
- The Gauss-Markov theorem guarantees OLS is BLUE without normality; ANOVA and ANCOVA are special cases of the general linear model.
Practice Problems
Problem 1
For \(X_1, \ldots, X_n\) i.i.d. \(\text{Exp}(\theta)\), find a complete sufficient statistic and the UMVUE of \(\theta\).
Show Solution ▼
\(f(\mathbf{x}|\theta) = \theta^n e^{-\theta\sum x_i}\). By factorization, \(T = \sum X_i\) is sufficient. Since exponential is a 1-parameter exponential family, \(T\) is complete. \(E[\bar{X}] = 1/\theta\), so the UMVUE of \(1/\theta\) is \(\bar{X}\). For \(\theta\) itself: since \(2\theta T \sim \chi^2_{2n}\), \(E[1/T] = \theta/(n-1)\) (using the inverse chi-squared moment), so the UMVUE of \(\theta\) is \((n-1)/T = (n-1)/(n\bar{X})\).
Problem 2
For \(X \sim \text{Bin}(n, p)\), show that the family has MLR in \(X\) and find the UMP test for \(H_0: p \le p_0\) vs \(H_1: p > p_0\).
Show Solution ▼
For \(p_1 > p_0\): \(\frac{f(x|p_1)}{f(x|p_0)} = \left(\frac{p_1(1-p_0)}{p_0(1-p_1)}\right)^x \cdot \left(\frac{1-p_1}{1-p_0}\right)^n\). Since \(p_1 > p_0\) implies \(\frac{p_1(1-p_0)}{p_0(1-p_1)} > 1\), the ratio is increasing in \(x\). So the family has MLR in \(X\). The UMP test rejects when \(X > c\) where \(c\) satisfies \(P_{p_0}(X > c) = \alpha\). For exact size \(\alpha\), randomization at \(X = c\) may be needed.
Problem 3
Derive Jeffreys' prior for the Bernoulli parameter \(p\) and identify the resulting posterior distribution given \(x\) successes in \(n\) trials.
Show Solution ▼
Fisher information: \(I(p) = \frac{1}{p(1-p)}\). Jeffreys' prior: \(\pi(p) \propto \sqrt{I(p)} = [p(1-p)]^{-1/2}\), which is \(\text{Beta}(1/2, 1/2)\). Posterior: \(p|x \sim \text{Beta}(x + 1/2, n - x + 1/2)\). The posterior mean is \(\frac{x + 1/2}{n + 1}\), which shrinks toward 1/2 compared to the MLE \(x/n\).
Problem 4
In simple linear regression with \(n = 12\), \(\hat{\beta}_1 = 2.5\), \(SE(\hat{\beta}_1) = 0.6\). Construct a 95% confidence interval for \(\beta_1\) and test \(H_0: \beta_1 = 0\).
Show Solution ▼
df = \(n - 2 = 10\). \(t_{0.025, 10} = 2.228\). CI: \(2.5 \pm 2.228 \times 0.6 = 2.5 \pm 1.337 = (1.163, 3.837)\). Test: \(t = 2.5/0.6 = 4.167\). Since \(|t| = 4.167 > 2.228\), reject \(H_0\) at \(\alpha = 0.05\). The slope is significantly different from zero.
Problem 5
Show that Basu's theorem implies \(\bar{X}\) and \(S^2\) are independent when sampling from \(N(\mu, \sigma^2)\) with \(\sigma^2\) known.
Show Solution ▼
When \(\sigma^2\) is known, the family is \(\{N(\mu, \sigma^2): \mu \in \mathbb{R}\}\). \(\bar{X}\) is a complete sufficient statistic for \(\mu\) (one-parameter exponential family). \(S^2 = \frac{1}{n-1}\sum(X_i - \bar{X})^2\) has distribution \(\frac{\sigma^2}{n-1}\chi^2_{n-1}\), which does not depend on \(\mu\), so \(S^2\) is ancillary. By Basu's theorem, \(\bar{X}\) and \(S^2\) are independent. (When \(\sigma^2\) is also unknown, one needs the joint complete sufficient statistic \((\bar{X}, \sum X_i^2)\) and a different argument.)
Self-Assessment Quiz
1. The Lehmann-Scheffe theorem states that an unbiased estimator that is a function of a complete sufficient statistic is:
A Consistent
B The unique UMVUE
C The MLE
D Admissible
2. The MLE is invariant under reparametrization. If \(\hat{\theta}\) is the MLE of \(\theta\), then the MLE of \(e^\theta\) is:
A \(e^{\hat{\theta}}\)
B \(\hat{\theta} \cdot e\)
C Requires re-maximizing the likelihood
D Cannot be determined without the delta method
3. In the Neyman-Pearson framework, a UMP test for \(H_0: \theta \le \theta_0\) vs \(H_1: \theta > \theta_0\) exists when:
A The likelihood ratio test is used
B The family has monotone likelihood ratio
C The distribution is normal
D The sample size is sufficiently large
4. The Bayes estimator under squared error loss is the:
A Posterior mode
B Posterior median
C Posterior mean
D Prior mean
5. The Gauss-Markov theorem requires which of the following assumptions?
A Normality of errors
B Uncorrelated errors with constant variance
C Independent errors with known distribution
D i.i.d. errors