---
title: "Statistical Inference with femtograd"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Statistical Inference with femtograd}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(femtograd)
```

## Overview

After fitting a model with `fit()`, femtograd provides a full suite of
inference tools:

- **Standard errors** from the observed Fisher information
- **Wald confidence intervals** and hypothesis tests
- **Likelihood ratio tests** for nested model comparison
- **Profile likelihood** confidence intervals
- **Bootstrap inference** for non-asymptotic settings
- **Model diagnostics** to verify inference reliability

This vignette demonstrates each approach using a running example.

## Running Example

We'll fit a gamma distribution to waiting time data:

```{r}
set.seed(42)
waiting <- rgamma(150, shape = 3, rate = 0.5)

# Fit gamma model (both parameters positive, so use log-transform)
gamma_fit <- fit(
  function(log_shape, log_rate) {
    shape <- exp(log_shape)
    rate <- exp(log_rate)
    loglik_gamma(shape, rate, waiting)
  },
  params = c(log_shape = 0, log_rate = 0)
)

summary(gamma_fit)
```

The estimates are on the log scale. To interpret, exponentiate:

```{r}
exp(coef(gamma_fit))
```

## Wald Inference

### Standard Errors

Standard errors come from the inverse of the observed Fisher information
matrix (the negative Hessian of the log-likelihood at the MLE):

$$\text{SE}(\hat{\theta}_j) = \sqrt{[\mathcal{I}(\hat{\theta})^{-1}]_{jj}}$$

```{r}
se(gamma_fit)
```

### Confidence Intervals

Wald confidence intervals assume asymptotic normality of the MLE:

$$\hat{\theta}_j \pm z_{\alpha/2} \cdot \text{SE}(\hat{\theta}_j)$$

```{r}
# 95% CIs (default)
confint(gamma_fit)

# 99% CIs
confint(gamma_fit, level = 0.99)
```

These are CIs for the log-transformed parameters. For CIs on the original
scale, exponentiate the endpoints (this works because exp is monotone):

```{r}
exp(confint(gamma_fit))
```

### Wald Test

Test whether a parameter equals a hypothesized value:

```{r}
# Test H0: log_shape = 0 (i.e., shape = 1, which is exponential)
wt <- wald_test(gamma_fit, parm = "log_shape", null_value = 0)
wt
```

Access individual components:

```{r}
test_stat(wt)
pval(wt)
is_significant_at(wt, alpha = 0.05)
```

## Likelihood Ratio Tests

For comparing nested models, the likelihood ratio test (LRT) is often
more powerful than the Wald test. The test statistic is:

$$\Lambda = -2[\ell(\hat{\theta}_0) - \ell(\hat{\theta}_1)] \sim \chi^2_{df}$$

where $df$ is the difference in number of parameters.

```{r}
# Null model: exponential (shape = 1, equivalent to log_shape = 0)
exp_fit <- fit(
  function(log_rate) {
    rate <- exp(log_rate)
    loglik_exponential(rate, waiting)
  },
  params = c(log_rate = 0)
)

# LRT: Is gamma significantly better than exponential?
lr <- lrt(exp_fit, gamma_fit)
lr
```

The `anova()` function provides the same test in a familiar format:

```{r}
anova(exp_fit, gamma_fit)
```

## Model Comparison

When models are not nested, use information criteria instead:

```{r}
# Compare gamma vs exponential
compare(exp_fit, gamma_fit, names = c("Exponential", "Gamma"))
```

The `dAIC` column shows how much worse each model is than the best.
AIC weights give the relative probability that each model is best
(assuming one of them is true).

## Profile Likelihood

Profile likelihood provides confidence intervals that don't rely on the
normality assumption. They are based on the likelihood ratio:

$$\{\ \theta_j : 2[\ell(\hat{\theta}) - \ell_P(\theta_j)] \leq \chi^2_{1, \alpha} \}$$

```{r}
# Profile likelihood for log_shape
prof <- profile_loglik(gamma_fit, parm = "log_shape")
prof

# Profile-based confidence intervals
suppressWarnings(confint_profile(gamma_fit))
```

Profile CIs are generally preferred over Wald CIs for small samples or
when the likelihood surface is asymmetric.

## Bootstrap Inference

For situations where asymptotic theory may not hold (small samples,
boundary parameters, complex models), bootstrap inference provides a
nonparametric alternative.

The `bootstrap_fit()` function uses the `loglik_maker` pattern:
a function that takes data and returns a log-likelihood function.

```{r}
boot <- bootstrap_fit(
  loglik_maker = function(d) {
    function(log_shape, log_rate) {
      shape <- exp(log_shape)
      rate <- exp(log_rate)
      loglik_gamma(shape, rate, d)
    }
  },
  data = waiting,
  params = c(log_shape = 0, log_rate = 0),
  n_boot = 200,
  progress = FALSE
)

boot
```

Bootstrap confidence intervals come in three flavors:

```{r}
# Percentile method (most common)
confint(boot, type = "percentile")

# Basic bootstrap
confint(boot, type = "basic")

# Normal approximation (uses bootstrap SE)
confint(boot, type = "normal")
```

## Model Diagnostics

Before trusting inference results, verify that the model is well-behaved:

```{r}
# Check Hessian properties
check_hessian(gamma_fit)

# Check optimization convergence
check_convergence(gamma_fit)

# Full diagnostics
diagnostics(gamma_fit)
```

The quick check:

```{r}
# Are standard errors reliable?
se_reliable(gamma_fit)
```

### What Can Go Wrong

- **Non-positive-definite Hessian**: The MLE may be at a saddle point,
  or the model may be unidentifiable
- **Large condition number**: Parameters are highly correlated, SEs may
  be unstable
- **Non-zero gradient**: Optimization may not have converged

## Inference Summary

| Method | When to use | Assumptions |
|--------|------------|-------------|
| Wald CI/test | Large samples, well-behaved likelihood | Asymptotic normality |
| Likelihood ratio test | Nested models, moderate samples | Regularity conditions |
| Profile likelihood | Asymmetric likelihood, moderate samples | Smooth likelihood |
| Bootstrap | Small samples, complex models | Resampling validity |

For routine analysis with moderate-to-large samples, Wald inference
(the default from `confint()` and `summary()`) is usually sufficient.
Use profile likelihood or bootstrap when you suspect the asymptotic
approximation may be poor.