---
title: "Survival Analysis from Scratch"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Survival Analysis from Scratch}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(femtograd)
```

## Introduction

This vignette demonstrates femtograd's compositional philosophy: rather
than providing a pre-built survival analysis function, we build one from
primitives. This shows how the log-likelihood functions, parameter
transforms, and inference tools compose to solve real statistical problems.

We cover:

1. Fitting a Weibull survival model
2. Handling right-censored data
3. Building a custom censored log-likelihood
4. Adding covariate effects (accelerated failure time model)
5. Full inference: standard errors, CIs, hypothesis tests

## Parametric Survival: The Weibull Model

The Weibull distribution is a workhorse of survival analysis. Its hazard
function can model increasing, decreasing, or constant risk:

- $k > 1$: increasing hazard (aging, wear-out)
- $k = 1$: constant hazard (exponential, memoryless)
- $k < 1$: decreasing hazard (infant mortality, burn-in)

### Fitting Without Censoring

For complete (uncensored) data, we can use the built-in `loglik_weibull()`:

```{r}
# Simulate survival times from Weibull(shape=1.5, scale=10)
set.seed(42)
n <- 100
true_shape <- 1.5
true_scale <- 10
times <- rweibull(n, shape = true_shape, scale = true_scale)

# Fit Weibull model (both parameters positive -> log-transform)
weibull_fit <- fit(
  function(log_k, log_lambda) {
    k <- exp(log_k)
    lambda <- exp(log_lambda)
    loglik_weibull(k, lambda, times)
  },
  params = c(log_k = 0, log_lambda = 1)
)

# Estimates on original scale
estimates <- exp(coef(weibull_fit))
names(estimates) <- c("shape", "scale")
estimates
```

The estimates should be close to the true values (shape = `r true_shape`,
scale = `r true_scale`).

```{r}
summary(weibull_fit)
```

## Handling Right Censoring

In real survival studies, some subjects haven't experienced the event by the
end of observation (right censoring). The log-likelihood for censored data is:

$$\ell(\theta) = \sum_{i: \delta_i = 1} \log f(t_i; \theta) + \sum_{i: \delta_i = 0} \log S(t_i; \theta)$$

where $\delta_i = 1$ for observed events and $\delta_i = 0$ for censored
observations. $f$ is the density and $S$ is the survival function.

For the Weibull with shape $k$ and scale $\lambda$:

- Density: $f(t) = \frac{k}{\lambda}\left(\frac{t}{\lambda}\right)^{k-1} \exp\left[-\left(\frac{t}{\lambda}\right)^k\right]$
- Survival: $S(t) = \exp\left[-\left(\frac{t}{\lambda}\right)^k\right]$
- Log-survival: $\log S(t) = -\left(\frac{t}{\lambda}\right)^k$

Let's build a custom censored Weibull log-likelihood:

```{r}
weibull_censored_loglik <- function(log_k, log_lambda, times, status) {
  k <- exp(log_k)
  lambda <- exp(log_lambda)

  # For observed events: log f(t) = log(k) - k*log(lambda) + (k-1)*log(t)
  #                                 - (t/lambda)^k
  # For censored: log S(t) = -(t/lambda)^k
  #
  # Combined:
  #   event_term: d*log(k) - d*k*log(lambda) + (k-1)*sum_log_t_events
  #   surv_term:  -sum( exp(k * (log(t_i) - log(lambda))) )  [all obs]

  n_events <- sum(status)
  sum_log_t_events <- sum(log(times[status == 1]))

  # Event contribution (only uncensored observations)
  event_term <- log(k) * n_events -
    k * log(lambda) * n_events +
    (k - 1) * sum_log_t_events

  # Survival contribution for ALL observations:
  # (t/lambda)^k = exp(k * (log(t) - log(lambda)))
  # We use this form because log(t) is plain data (numeric),
  # and k, log(lambda) are parameters tracked by AD.
  log_times <- log(times)
  surv_term <- k * 0  # creates a zero of the correct type (value or dual)
  for (i in seq_along(times)) {
    surv_term <- surv_term - exp(k * (log_times[i] - log(lambda)))
  }

  event_term + surv_term
}
```

Note the use of `exp(k * (log(t) - log(lambda)))` instead of `(t/lambda)^k`.
This avoids creating temporary `val()` objects for data, which is important
for compatibility with femtograd's Hessian computation (which uses dual numbers
internally).

### Simulating Censored Data

```{r}
set.seed(123)
n <- 150

# True survival times from Weibull
true_times <- rweibull(n, shape = 1.5, scale = 10)

# Censoring times (uniform)
censor_times <- runif(n, min = 5, max = 20)

# Observed data
obs_times <- pmin(true_times, censor_times)
status <- as.integer(true_times <= censor_times)

cat("Events:", sum(status), "/ Censored:", sum(1 - status), "\n")
```

### Fitting the Censored Model

```{r}
censored_fit <- fit(
  function(log_k, log_lambda) {
    weibull_censored_loglik(log_k, log_lambda, obs_times, status)
  },
  params = c(log_k = 0, log_lambda = 1)
)

# Estimates on original scale
est <- exp(coef(censored_fit))
names(est) <- c("shape", "scale")
est
```

```{r}
# CIs on original scale
exp(confint(censored_fit))
```

## Testing Shape = 1 (Is Exponential Adequate?)

A natural question: is the simpler exponential model sufficient? This
corresponds to testing $H_0: k = 1$ (i.e., $\log k = 0$).

```{r}
# Wald test
wt <- wald_test(censored_fit, parm = "log_k", null_value = 0)
wt
```

We can also use a likelihood ratio test by fitting the nested exponential model:

```{r}
# Exponential model (shape fixed at 1, i.e., log_k = 0)
exp_censored_fit <- fit(
  function(log_lambda) {
    weibull_censored_loglik(0, log_lambda, obs_times, status)
  },
  params = c(log_lambda = 1)
)

# LRT
anova(exp_censored_fit, censored_fit)
```

## Adding Covariates: Accelerated Failure Time Model

The accelerated failure time (AFT) model links covariates to the scale
parameter:

$$\log(\lambda_i) = \beta_0 + \beta_1 x_{i1} + \ldots + \beta_p x_{ip}$$

This is straightforward to build from our primitives:

```{r}
set.seed(42)
n <- 200

# Covariate: treatment group (0 = control, 1 = treatment)
treatment <- rbinom(n, 1, 0.5)

# True model: treatment increases scale (longer survival)
true_scale <- exp(2 + 0.5 * treatment)  # beta0=2, beta1=0.5
true_shape <- 1.5

# Simulate survival times
true_times <- rweibull(n, shape = true_shape, scale = true_scale)
censor_times <- runif(n, min = 5, max = 30)
obs_times2 <- pmin(true_times, censor_times)
status2 <- as.integer(true_times <= censor_times)

cat("Events:", sum(status2), "/ Censored:", sum(1 - status2), "\n")
```

```{r}
# AFT Weibull with treatment covariate
aft_loglik <- function(log_k, beta0, beta1, times, status, x) {
  k <- exp(log_k)
  log_times <- log(times)

  # Per-subject log-scale: log(lambda_i) = beta0 + beta1 * x_i
  n_obs <- length(times)
  ll <- k * 0  # zero of the correct type

  for (i in seq_len(n_obs)) {
    log_lambda_i <- beta0 + beta1 * x[i]

    # log(t/lambda) = log(t) - log(lambda)
    log_ratio <- log_times[i] - log_lambda_i

    # Survival contribution: -(t/lambda)^k = -exp(k * log(t/lambda))
    ll <- ll - exp(k * log_ratio)

    if (status[i] == 1) {
      # Event contribution: log(k) - log(lambda) + (k-1)*log(t/lambda)
      ll <- ll + log(k) - log_lambda_i + (k - 1) * log_ratio
    }
  }
  ll
}

aft_fit <- fit(
  function(log_k, beta0, beta1) {
    aft_loglik(log_k, beta0, beta1, obs_times2, status2, treatment)
  },
  params = c(log_k = 0, beta0 = 1, beta1 = 0)
)

summary(aft_fit)
```

### Interpreting the Treatment Effect

```{r}
# Shape parameter
cat("Shape (k):", exp(coef(aft_fit)["log_k"]), "\n")

# Treatment effect: exp(beta1) is the scale ratio
cat("Scale ratio (treatment/control):", exp(coef(aft_fit)["beta1"]), "\n")

# 95% CI for treatment effect
beta1_ci <- confint(aft_fit, parm = "beta1")
cat("95% CI for beta1:", beta1_ci, "\n")
cat("95% CI for scale ratio:", exp(beta1_ci), "\n")
```

### Is Treatment Significant?

```{r}
# Wald test
wald_test(aft_fit, parm = "beta1", null_value = 0)
```

## Summary

This vignette demonstrated femtograd's compositional approach:

1. **Built-in primitives** (`loglik_weibull`, `exp`, `log`) handle
   the uncensored case directly
2. **Custom log-likelihoods** are straightforward to write---just compose
   operations on `value` objects
3. **Parameter transforms** (`exp(log_k)`) handle constraints naturally
4. **Inference tools** (`summary`, `confint`, `wald_test`, `anova`)
   work identically on custom models
5. **Covariate models** (AFT) are built by linking parameters to covariates
   inside the likelihood

The same approach extends to other survival models (log-normal, log-logistic,
piecewise exponential) and other statistical domains---define a log-likelihood
from femtograd's differentiable operations, and the full inference pipeline
follows automatically.