--- title: "Getting Started with femtograd" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with femtograd} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction `femtograd` is an R package for automatic differentiation (AD) designed for classical statistics. Unlike packages focused on deep learning, femtograd emphasizes: - **Pedagogy**: Clear, readable code that demonstrates how AD works - **Statistics**: Built-in support for MLE, Hessians, and statistical inference - **R integration**: Standard R generics (`coef`, `vcov`, `confint`, etc.) This vignette introduces the core concepts and workflow. ## Creating Differentiable Values The foundation of femtograd is the `value` object, which represents a node in a computational graph. Create values using `val()`: ```{r} library(femtograd) # Create a differentiable value x <- val(3) x ``` Values can be scalars, vectors, or matrices: ```{r} # Scalar a <- val(2.5) # Vector (stored as column matrix) v <- val(c(1, 2, 3)) # Matrix m <- val(matrix(1:4, 2, 2)) ``` ## Building Computations Standard mathematical operations work with value objects and automatically build a computational graph: ```{r} x <- val(3) y <- val(4) # Build computation z <- x^2 + x * y # Access the result get_data(z) ``` Femtograd supports many operations: - **Arithmetic**: `+`, `-`, `*`, `/`, `^` - **Transcendental**: `log`, `exp`, `sqrt`, `sin`, `cos`, `tanh` - **Special functions**: `lgamma`, `digamma`, `log1p` - **Activations**: `sigmoid`, `relu`, `softplus`, `logit` ## Computing Gradients Call `backward()` on any value to compute gradients via reverse-mode AD (backpropagation). Then use `grad()` to access the gradient: ```{r} x <- val(3) y <- val(4) z <- x^2 + x * y # z = 9 + 12 = 21 # Compute gradients backward(z) # Access gradients grad(x) # dz/dx = 2x + y = 10 grad(y) # dz/dy = x = 3 ``` The gradient represents how much `z` changes when we change each input. ### Resetting Gradients When reusing parameters across multiple computations, reset gradients with `zero_grad()`: ```{r} x <- val(2) # First computation y1 <- x^2 backward(y1) grad(x) # 4 # Reset and recompute zero_grad(y1) y2 <- x^3 backward(y2) grad(x) # 12 ``` ## Maximum Likelihood Estimation The primary use case for femtograd is maximum likelihood estimation. The `fit()` function combines optimization and inference: ```{r} # Generate data from a normal distribution set.seed(42) data <- rnorm(100, mean = 5, sd = 2) # Define log-likelihood with named parameters # Use log(sigma) to ensure sigma > 0 result <- fit( function(mu, log_sigma) { sigma <- exp(log_sigma) loglik_normal(mu, sigma, data) }, params = c(mu = 0, log_sigma = 0) ) result ``` ## Using Base R Generics The fitted model supports standard R generics: ```{r} # Parameter estimates coef(result) # Standard errors se(result) # Variance-covariance matrix vcov(result) # Confidence intervals confint(result) # Log-likelihood (enables AIC/BIC) logLik(result) AIC(result) ``` For a detailed summary with p-values: ```{r} summary(result) ``` ## Built-in Distributions Femtograd includes log-likelihood functions for common distributions: | Distribution | Function | Parameters | |--------------|----------|------------| | Normal | `loglik_normal(mu, sigma, x)` | mean, sd | | Exponential | `loglik_exponential(rate, x)` | rate | | Poisson | `loglik_poisson(lambda, x)` | mean | | Binomial | `loglik_binomial(p, n, x)` | probability, trials | | Gamma | `loglik_gamma(shape, rate, x)` | shape, rate | | Beta | `loglik_beta(alpha, beta, x)` | shape1, shape2 | | Weibull | `loglik_weibull(shape, scale, x)` | shape, scale | | Pareto | `loglik_pareto(alpha, x_min, x)` | shape, scale | ### Example: Exponential Distribution ```{r} # Generate exponential data set.seed(123) waiting_times <- rexp(50, rate = 0.5) # Fit exponential model exp_fit <- fit( function(rate) loglik_exponential(rate, waiting_times), params = c(rate = 1) ) # True rate was 0.5, MLE should be close to 1/mean coef(exp_fit) 1 / mean(waiting_times) ``` ## Parameter Constraints For parameters with constraints (e.g., variance > 0, probability in [0,1]), use transformations: ```{r} # sigma > 0: use log(sigma), then exp() in likelihood # 0 < p < 1: use logit(p), then sigmoid() in likelihood # Example: Beta distribution with shape constraints (both > 0) set.seed(456) beta_data <- rbeta(50, shape1 = 2, shape2 = 5) beta_fit <- fit( function(log_alpha, log_beta) { alpha <- exp(log_alpha) # Ensures alpha > 0 beta <- exp(log_beta) # Ensures beta > 0 loglik_beta(alpha, beta, beta_data) }, params = c(log_alpha = 0, log_beta = 0) ) # Transform back to original scale exp(coef(beta_fit)) ``` Femtograd also provides helper functions: `positive()`, `probability()`, and `bounded()`. ## Model Comparison Compare multiple models using AIC: ```{r} # Fit two models: fixed vs estimated sigma m1 <- fit(function(mu) loglik_normal(mu, 2, data), c(mu = 0)) m2 <- fit( function(mu, log_sigma) loglik_normal(mu, exp(log_sigma), data), c(mu = 0, log_sigma = 0) ) compare(m1, m2, names = c("Fixed sigma", "Free sigma")) ``` For nested models, use `anova()` for likelihood ratio tests: ```{r} anova(m1, m2) ``` ## Next Steps - See `vignette("computational-graphs")` to understand how AD works - See `vignette("inference")` for advanced inference (bootstrap, profile likelihood) - See `vignette("survival-analysis")` for a complete worked example - See `vignette("architecture")` for how AD engines are built and what makes femtograd different from production tools