---
title: "Getting Started with femtograd"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with femtograd}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Introduction

`femtograd` is an R package for automatic differentiation (AD) designed for
classical statistics. Unlike packages focused on deep learning, femtograd
emphasizes:

- **Pedagogy**: Clear, readable code that demonstrates how AD works
- **Statistics**: Built-in support for MLE, Hessians, and statistical inference
- **R integration**: Standard R generics (`coef`, `vcov`, `confint`, etc.)

This vignette introduces the core concepts and workflow.

## Creating Differentiable Values

The foundation of femtograd is the `value` object, which represents a node in
a computational graph. Create values using `val()`:

```{r}
library(femtograd)

# Create a differentiable value
x <- val(3)
x
```

Values can be scalars, vectors, or matrices:

```{r}
# Scalar
a <- val(2.5)

# Vector (stored as column matrix)
v <- val(c(1, 2, 3))

# Matrix
m <- val(matrix(1:4, 2, 2))
```

## Building Computations

Standard mathematical operations work with value objects and automatically
build a computational graph:

```{r}
x <- val(3)
y <- val(4)

# Build computation
z <- x^2 + x * y

# Access the result
get_data(z)
```

Femtograd supports many operations:

- **Arithmetic**: `+`, `-`, `*`, `/`, `^`
- **Transcendental**: `log`, `exp`, `sqrt`, `sin`, `cos`, `tanh`
- **Special functions**: `lgamma`, `digamma`, `log1p`
- **Activations**: `sigmoid`, `relu`, `softplus`, `logit`

## Computing Gradients

Call `backward()` on any value to compute gradients via reverse-mode AD
(backpropagation). Then use `grad()` to access the gradient:

```{r}
x <- val(3)
y <- val(4)
z <- x^2 + x * y  # z = 9 + 12 = 21

# Compute gradients
backward(z)

# Access gradients
grad(x)  # dz/dx = 2x + y = 10
grad(y)  # dz/dy = x = 3
```

The gradient represents how much `z` changes when we change each input.

### Resetting Gradients

When reusing parameters across multiple computations, reset gradients with
`zero_grad()`:

```{r}
x <- val(2)

# First computation
y1 <- x^2
backward(y1)
grad(x)  # 4

# Reset and recompute
zero_grad(y1)
y2 <- x^3
backward(y2)
grad(x)  # 12
```

## Maximum Likelihood Estimation

The primary use case for femtograd is maximum likelihood estimation. The
`fit()` function combines optimization and inference:

```{r}
# Generate data from a normal distribution
set.seed(42)
data <- rnorm(100, mean = 5, sd = 2)

# Define log-likelihood with named parameters
# Use log(sigma) to ensure sigma > 0
result <- fit(
  function(mu, log_sigma) {
    sigma <- exp(log_sigma)
    loglik_normal(mu, sigma, data)
  },
  params = c(mu = 0, log_sigma = 0)
)

result
```

## Using Base R Generics

The fitted model supports standard R generics:

```{r}
# Parameter estimates
coef(result)

# Standard errors
se(result)

# Variance-covariance matrix
vcov(result)

# Confidence intervals
confint(result)

# Log-likelihood (enables AIC/BIC)
logLik(result)
AIC(result)
```

For a detailed summary with p-values:

```{r}
summary(result)
```

## Built-in Distributions

Femtograd includes log-likelihood functions for common distributions:

| Distribution | Function | Parameters |
|--------------|----------|------------|
| Normal | `loglik_normal(mu, sigma, x)` | mean, sd |
| Exponential | `loglik_exponential(rate, x)` | rate |
| Poisson | `loglik_poisson(lambda, x)` | mean |
| Binomial | `loglik_binomial(p, n, x)` | probability, trials |
| Gamma | `loglik_gamma(shape, rate, x)` | shape, rate |
| Beta | `loglik_beta(alpha, beta, x)` | shape1, shape2 |
| Weibull | `loglik_weibull(shape, scale, x)` | shape, scale |
| Pareto | `loglik_pareto(alpha, x_min, x)` | shape, scale |

### Example: Exponential Distribution

```{r}
# Generate exponential data
set.seed(123)
waiting_times <- rexp(50, rate = 0.5)

# Fit exponential model
exp_fit <- fit(
  function(rate) loglik_exponential(rate, waiting_times),
  params = c(rate = 1)
)

# True rate was 0.5, MLE should be close to 1/mean
coef(exp_fit)
1 / mean(waiting_times)
```

## Parameter Constraints

For parameters with constraints (e.g., variance > 0, probability in [0,1]),
use transformations:

```{r}
# sigma > 0: use log(sigma), then exp() in likelihood
# 0 < p < 1: use logit(p), then sigmoid() in likelihood

# Example: Beta distribution with shape constraints (both > 0)
set.seed(456)
beta_data <- rbeta(50, shape1 = 2, shape2 = 5)

beta_fit <- fit(
  function(log_alpha, log_beta) {
    alpha <- exp(log_alpha)  # Ensures alpha > 0
    beta <- exp(log_beta)    # Ensures beta > 0
    loglik_beta(alpha, beta, beta_data)
  },
  params = c(log_alpha = 0, log_beta = 0)
)

# Transform back to original scale
exp(coef(beta_fit))
```

Femtograd also provides helper functions: `positive()`, `probability()`,
and `bounded()`.

## Model Comparison

Compare multiple models using AIC:

```{r}
# Fit two models: fixed vs estimated sigma
m1 <- fit(function(mu) loglik_normal(mu, 2, data), c(mu = 0))
m2 <- fit(
  function(mu, log_sigma) loglik_normal(mu, exp(log_sigma), data),
  c(mu = 0, log_sigma = 0)
)

compare(m1, m2, names = c("Fixed sigma", "Free sigma"))
```

For nested models, use `anova()` for likelihood ratio tests:

```{r}
anova(m1, m2)
```

## Next Steps

- See `vignette("computational-graphs")` to understand how AD works
- See `vignette("inference")` for advanced inference (bootstrap, profile likelihood)
- See `vignette("survival-analysis")` for a complete worked example
- See `vignette("architecture")` for how AD engines are built and what makes femtograd different from production tools