| Title: | Automatic Differentiation for Statistical Computing |
|---|---|
| Description: | Provides automatic differentiation via reverse-mode AD (backpropagation) for first-order gradients and forward-over-reverse AD for Hessian computation. Includes log-likelihood functions for exponential family distributions (normal, exponential, Poisson, binomial, gamma, beta, negative binomial, Weibull, Pareto), MLE optimization via fit() with base R generics (coef, vcov, confint, logLik, AIC/BIC), hypothesis testing (likelihood ratio, Wald), profile likelihood, bootstrap inference, and model diagnostics. Designed for pedagogy and modern statistics rather than large-scale ML. |
| Authors: | Alexander Towell [aut, cre] (ORCID: <https://orcid.org/0000-0001-6443-9897>) |
| Maintainer: | Alexander Towell <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.3.1 |
| Built: | 2026-05-14 07:31:53 UTC |
| Source: | https://github.com/queelius/femtograd |
Element-wise subtraction or unary negation. Supports broadcasting when one operand is a 1x1 scalar matrix.
## S3 method for class 'value' x - y = NULL## S3 method for class 'value' x - y = NULL
x |
A value object or numeric (minuend) |
y |
A value object or numeric (subtrahend), or NULL for unary negation |
A new value object representing x - y or -x
Element-wise multiplication of two value objects or a value and numeric. Supports broadcasting when one operand is a 1x1 scalar matrix.
## S3 method for class 'value' e1 * e2## S3 method for class 'value' e1 * e2
e1 |
A value object or numeric |
e2 |
A value object or numeric |
A new value object representing the element-wise product
Element-wise division. Supports broadcasting when one operand is a 1x1 scalar matrix.
## S3 method for class 'value' x / y## S3 method for class 'value' x / y
x |
A value object or numeric (numerator) |
y |
A value object or numeric (denominator) |
A new value object representing x / y
Element-wise power operation. Supports broadcasting when one operand is a 1x1 scalar matrix.
## S3 method for class 'value' b ^ e## S3 method for class 'value' b ^ e
b |
A value object or numeric (base) |
e |
A value object or numeric (exponent) |
A new value object representing the element-wise power b^e
Element-wise addition of two value objects or a value and numeric. Both operands are converted to matrices internally. Supports broadcasting when one operand is a 1x1 scalar matrix.
## S3 method for class 'value' e1 + e2## S3 method for class 'value' e1 + e2
e1 |
A value object or numeric |
e2 |
A value object or numeric |
A new value object representing the element-wise sum
Element-wise absolute value.
## S3 method for class 'value' abs(x)## S3 method for class 'value' abs(x)
x |
A value object |
A new value object representing abs(x)
Performs likelihood ratio tests comparing nested models. When given a single model, reports the log-likelihood and degrees of freedom. When given multiple models, performs sequential likelihood ratio tests.
## S3 method for class 'femtofit' anova(object, ...)## S3 method for class 'femtofit' anova(object, ...)
object |
A |
... |
Additional |
Models should be provided in order from simplest (fewest parameters) to most complex. The likelihood ratio test compares each model to the previous one.
An anova.femtofit object containing the comparison table.
compare for AIC-based comparison,
lrt for explicit likelihood ratio tests
## Not run: # Compare nested models m1 <- fit(function(mu) loglik_normal(mu, 1, x), c(mu = 0)) m2 <- fit(function(mu, sigma) loglik_normal(mu, sigma, x), c(mu = 0, sigma = 1)) anova(m1, m2) # LRT comparing m1 vs m2 ## End(Not run)## Not run: # Compare nested models m1 <- fit(function(mu) loglik_normal(mu, 1, x), c(mu = 0)) m2 <- fit(function(mu, sigma) loglik_normal(mu, sigma, x), c(mu = 0, sigma = 1)) anova(m1, m2) # LRT comparing m1 vs m2 ## End(Not run)
e). In other words, it is responsible for computing the gradient with
respect to e.This generic function should be implemented for specific classes. We provide
an implementation for value objects.
backward(e, ...)backward(e, ...)
e |
An object for which the backward pass should be performed |
... |
additional arguments to pass |
Default implementation does not propagate gradients. For instance, if we have a constant, then the partial of the constant is not meaningful.
## Default S3 method: backward(e, ...)## Default S3 method: backward(e, ...)
e |
An object for which the backward pass should be performed |
... |
pass additional arguments |
Performs the backward pass (gradient computation) for a value object in the computational graph with automatic differentiation. Initializes the gradient of the output node to a matrix of ones (same shape as data).
## S3 method for class 'value' backward(e, ...)## S3 method for class 'value' backward(e, ...)
e |
A value object for which the backward pass should be performed |
... |
pass additional arguments |
Finds the optimum using the BFGS algorithm, which approximates the inverse Hessian using gradient information. More efficient than Newton-Raphson when Hessian computation is expensive.
bfgs( objective_fn, params, max_iter = 1000, tol = 1e-06, maximize = FALSE, line_search_fn = NULL, verbose = 0 )bfgs( objective_fn, params, max_iter = 1000, tol = 1e-06, maximize = FALSE, line_search_fn = NULL, verbose = 0 )
objective_fn |
Function taking list of value parameters, returns scalar |
params |
List of value objects (initial parameter values) |
max_iter |
Maximum iterations, default 1000 |
tol |
Convergence tolerance on gradient norm, default 1e-6 |
maximize |
If TRUE, maximize; if FALSE (default), minimize |
line_search_fn |
Line search function (default: backtracking) |
verbose |
Print progress every N iterations (0 for silent) |
BFGS maintains an approximation B to the inverse Hessian, updated as: B_k+1 = (I - ρsy') B_k (I - ρys') + ρss' where s = x_k+1 - x_k, y = g_k+1 - g_k, ρ = 1/(y'*s)
The search direction is d = -B*g, followed by line search.
A list containing:
params |
List of value objects at optimum |
value |
Objective function value at optimum |
gradient |
Gradient at optimum |
inv_hessian |
Approximate inverse Hessian |
iterations |
Number of iterations performed |
converged |
TRUE if gradient norm < tol |
## Not run: # Minimize Rosenbrock function rosenbrock <- function(p) { x <- p[[1]]; y <- p[[2]] (1 - x)^2 + 100 * (y - x^2)^2 } result <- bfgs(rosenbrock, list(val(-1), val(-1))) sapply(result$params, get_data) # Should be close to c(1, 1) ## End(Not run)## Not run: # Minimize Rosenbrock function rosenbrock <- function(p) { x <- p[[1]]; y <- p[[2]] (1 - x)^2 + 100 * (y - x^2)^2 } result <- bfgs(rosenbrock, list(val(-1), val(-1))) sapply(result$params, get_data) # Should be close to c(1, 1) ## End(Not run)
Functions for bootstrap-based inference on fitted models. Bootstrap methods provide non-parametric standard errors and confidence intervals without relying on asymptotic normality.
Performs bootstrap resampling to estimate the sampling distribution of parameter estimates.
bootstrap_fit(loglik_maker, data, params, n_boot = 500, progress = TRUE, ...)bootstrap_fit(loglik_maker, data, params, n_boot = 500, progress = TRUE, ...)
loglik_maker |
A function that takes data and returns a log-likelihood
function suitable for |
data |
The data (vector, matrix, or data frame). |
params |
Initial parameter values for |
n_boot |
Number of bootstrap replications (default 500). |
progress |
Logical; show progress messages (default TRUE). |
... |
Additional arguments passed to |
The loglik_maker function should accept data and return a log-likelihood
function that can be passed to fit(). This design allows bootstrap
to work with any model specification.
A bootstrap_result object containing:
Matrix of bootstrap estimates (n_boot x n_params)
Original parameter estimates
Bootstrap standard errors
Estimated bias
## Not run: set.seed(42) x <- rnorm(50, mean = 5, sd = 2) # Define a function that creates the loglik function from data make_loglik <- function(data) { function(mu, log_sigma) { loglik_normal(mu, exp(log_sigma), data) } } # Bootstrap boot <- bootstrap_fit( make_loglik, data = x, params = c(mu = 0, log_sigma = 0), n_boot = 200 ) # Results boot$se # Bootstrap SEs confint(boot) # Bootstrap CIs # Compare with Wald SEs result <- fit(make_loglik(x), params = c(mu = 0, log_sigma = 0)) se(result) # Wald SEs ## End(Not run)## Not run: set.seed(42) x <- rnorm(50, mean = 5, sd = 2) # Define a function that creates the loglik function from data make_loglik <- function(data) { function(mu, log_sigma) { loglik_normal(mu, exp(log_sigma), data) } } # Bootstrap boot <- bootstrap_fit( make_loglik, data = x, params = c(mu = 0, log_sigma = 0), n_boot = 200 ) # Results boot$se # Bootstrap SEs confint(boot) # Bootstrap CIs # Compare with Wald SEs result <- fit(make_loglik(x), params = c(mu = 0, log_sigma = 0)) se(result) # Wald SEs ## End(Not run)
Maps unconstrained values to a bounded interval (lower, upper). Use this for parameters with both lower and upper bounds.
bounded(x, lower, upper)bounded(x, lower, upper)
x |
Unconstrained value (scalar, vector, or value object) |
lower |
Lower bound of the interval |
upper |
Upper bound of the interval |
The transformation is:
bounded(x, a, b) = a + (b - a) * sigmoid(x)
This maps (-∞, ∞) to (a, b) smoothly.
Value in (lower, upper)
## Not run: # Parameter in (0, 10) raw <- val(0) param <- bounded(raw, 0, 10) # sigmoid(0) * 10 = 5 get_data(param) # Correlation parameter in (-1, 1) raw_rho <- val(1) rho <- bounded(raw_rho, -1, 1) ## End(Not run)## Not run: # Parameter in (0, 10) raw <- val(0) param <- bounded(raw, 0, 10) # sigmoid(0) * 10 = 5 get_data(param) # Correlation parameter in (-1, 1) raw_rho <- val(1) rho <- bounded(raw_rho, -1, 1) ## End(Not run)
Examines convergence properties of a fitted model.
check_convergence(object, tol = 1e-04)check_convergence(object, tol = 1e-04)
object |
A |
tol |
Tolerance for gradient norm (default 1e-4). |
At a maximum, the gradient should be (near) zero. Large gradient norms may indicate premature termination or numerical issues.
A convergence_check object containing:
From the optimizer
Norm of gradient at solution
Is gradient norm below tolerance?
Number of iterations used
Final log-likelihood value
Character vector of any issues detected
## Not run: set.seed(42) x <- rnorm(100, mean = 5, sd = 2) result <- fit( function(mu, log_sigma) loglik_normal(mu, exp(log_sigma), x), params = c(mu = 0, log_sigma = 0) ) check <- check_convergence(result) check ## End(Not run)## Not run: set.seed(42) x <- rnorm(100, mean = 5, sd = 2) result <- fit( function(mu, log_sigma) loglik_normal(mu, exp(log_sigma), x), params = c(mu = 0, log_sigma = 0) ) check <- check_convergence(result) check ## End(Not run)
Examines the Hessian matrix for numerical issues that may indicate problems with the optimization or model specification.
check_hessian(object)check_hessian(object)
object |
A |
For a proper maximum of the log-likelihood, the Hessian should be negative definite (equivalently, -H should be positive definite).
Common issues:
Not negative definite: May indicate a saddle point rather than maximum
High condition number: Indicates near-singularity, poorly identified parameters
Singular matrix: Parameters are not identifiable from the data
A hessian_check object containing:
Logical: is -H positive definite? (required for MLE)
Eigenvalues of -H (observed information)
Ratio of largest to smallest eigenvalue
Numerical rank of the matrix
Logical: is the matrix numerically singular?
Character vector of any issues detected
## Not run: set.seed(42) x <- rnorm(100, mean = 5, sd = 2) result <- fit( function(mu, log_sigma) loglik_normal(mu, exp(log_sigma), x), params = c(mu = 0, log_sigma = 0) ) check <- check_hessian(result) check # Access specific properties check$is_negative_definite check$condition_number ## End(Not run)## Not run: set.seed(42) x <- rnorm(100, mean = 5, sd = 2) result <- fit( function(mu, log_sigma) loglik_normal(mu, exp(log_sigma), x), params = c(mu = 0, log_sigma = 0) ) check <- check_hessian(result) check # Access specific properties check$is_negative_definite check$condition_number ## End(Not run)
Creates a comparison table of AIC, BIC, and other statistics for multiple fitted models. Useful for model selection.
compare(..., names = NULL)compare(..., names = NULL)
... |
One or more |
names |
Optional character vector of model names. If not provided, names are extracted from the call or generated automatically. |
Models are ranked by AIC. Delta AIC is calculated relative to the best model (lowest AIC). AIC weights represent the relative likelihood that each model is the best, assuming one of the models is true.
A data.frame with columns for each model's statistics, including log-likelihood, AIC, BIC, delta AIC, and AIC weights.
anova.femtofit for likelihood ratio tests
## Not run: # Fit competing models m1 <- fit(function(mu) loglik_normal(mu, 1, x), c(mu = 0)) m2 <- fit(function(mu, sigma) loglik_normal(mu, sigma, x), c(mu = 0, sigma = 1)) # Compare models compare(m1, m2, names = c("Fixed sigma", "Free sigma")) ## End(Not run)## Not run: # Fit competing models m1 <- fit(function(mu) loglik_normal(mu, 1, x), c(mu = 0)) m2 <- fit(function(mu, sigma) loglik_normal(mu, sigma, x), c(mu = 0, sigma = 1)) # Compare models compare(m1, m2, names = c("Fixed sigma", "Free sigma")) ## End(Not run)
Compute confidence intervals from MLE results
confint_mle(mle_result, level = 0.95, param_names = NULL)confint_mle(mle_result, level = 0.95, param_names = NULL)
mle_result |
Result from find_mle |
level |
Confidence level, default 0.95 |
param_names |
Optional names for parameters |
A matrix with columns: estimate, se, lower, upper
Computes confidence intervals based on the profile likelihood. These are more accurate than Wald intervals when the likelihood is non-quadratic.
confint_profile(object, parm = NULL, level = 0.95, ...)confint_profile(object, parm = NULL, level = 0.95, ...)
object |
A |
parm |
Parameter name(s) or indices. If NULL, computes for all. |
level |
Confidence level (default 0.95). |
... |
Additional arguments passed to |
The profile confidence interval at level 1-α is:
This is the set of parameter values that would not be rejected by a likelihood ratio test at level α.
A matrix with columns for lower and upper bounds.
## Not run: set.seed(42) x <- rnorm(100, mean = 5, sd = 2) result <- fit( function(mu, log_sigma) loglik_normal(mu, exp(log_sigma), x), params = c(mu = 0, log_sigma = 0) ) # Profile-based CIs (more accurate for non-quadratic likelihoods) confint_profile(result) # Compare with Wald CIs confint(result) ## End(Not run)## Not run: set.seed(42) x <- rnorm(100, mean = 5, sd = 2) result <- fit( function(mu, log_sigma) loglik_normal(mu, exp(log_sigma), x), params = c(mu = 0, log_sigma = 0) ) # Profile-based CIs (more accurate for non-quadratic likelihoods) confint_profile(result) # Compare with Wald CIs confint(result) ## End(Not run)
Computes confidence intervals using bootstrap methods.
## S3 method for class 'bootstrap_result' confint( object, parm = NULL, level = 0.95, type = c("percentile", "basic", "normal"), ... )## S3 method for class 'bootstrap_result' confint( object, parm = NULL, level = 0.95, type = c("percentile", "basic", "normal"), ... )
object |
A |
parm |
Parameter names or indices (default: all). |
level |
Confidence level (default 0.95). |
type |
Type of interval: "percentile" (default), "basic", or "normal". |
... |
Additional arguments (ignored). |
Available methods:
Uses quantiles of bootstrap distribution directly
Uses 2*theta_hat - quantiles (pivot method)
Uses bootstrap SE with normal quantiles
Matrix of confidence intervals.
Element-wise cosine.
## S3 method for class 'value' cos(x)## S3 method for class 'value' cos(x)
x |
A value object |
A new value object representing cos(x)
Functions for checking model fit quality, convergence, and potential numerical issues.
Runs all diagnostic checks on a fitted model.
diagnostics(object, ...) ## S3 method for class 'femtofit' diagnostics(object, ...)diagnostics(object, ...) ## S3 method for class 'femtofit' diagnostics(object, ...)
object |
A |
... |
Additional arguments passed to individual check functions. |
A model_diagnostics object containing results from all checks.
diagnostics(femtofit): Diagnostics for femtofit objects
## Not run: set.seed(42) x <- rnorm(100, mean = 5, sd = 2) result <- fit( function(mu, log_sigma) loglik_normal(mu, exp(log_sigma), x), params = c(mu = 0, log_sigma = 0) ) diagnostics(result) ## End(Not run)## Not run: set.seed(42) x <- rnorm(100, mean = 5, sd = 2) result <- fit( function(mu, log_sigma) loglik_normal(mu, exp(log_sigma), x), params = c(mu = 0, log_sigma = 0) ) diagnostics(result) ## End(Not run)
Element-wise digamma: d/dx log(gamma(x)).
## S3 method for class 'value' digamma(x)## S3 method for class 'value' digamma(x)
x |
A value object |
A new value object representing digamma(x)
These functions compute log-likelihoods that can be differentiated using femtograd's automatic differentiation. Each function returns a value object suitable for gradient-based optimization.
Computes x / y with protection against division by zero.
div_safe(x, y, eps = .Machine$double.eps)div_safe(x, y, eps = .Machine$double.eps)
x |
Numerator (value object or numeric) |
y |
Denominator (value object or numeric) |
eps |
Minimum absolute value for denominator (default: .Machine$double.eps) |
x / sign(y) * max(abs(y), eps)
Extract degrees of freedom from hypothesis test
dof(x, ...) ## S3 method for class 'hypothesis_test' dof(x, ...)dof(x, ...) ## S3 method for class 'hypothesis_test' dof(x, ...)
x |
A hypothesis test object |
... |
Additional arguments (ignored) |
Degrees of freedom
dof(hypothesis_test): Degrees of freedom for hypothesis tests
dual R6 class for forward-mode automatic differentiation
dual R6 class for forward-mode automatic differentiation
Represents a dual number (primal, tangent) for computing directional derivatives via forward-mode AD. When combined with reverse-mode (value), enables Hessian computation via forward-over-reverse.
Forward-mode AD propagates derivatives alongside values during the forward pass. For a function f(x), setting tangent(x) = 1 gives f'(x) in the tangent of the output.
For Hessian computation (forward-over-reverse):
primal is a value object (for reverse-mode gradient)
tangent is also a value object (tracks d/dx of the gradient)
After backward() on primal, tangent of grad gives Hessian entries
primalThe function value (can be numeric or value object)
tangentThe directional derivative (can be numeric or value object)
new()
Create a new dual number
dual$new(primal, tangent = 0)
primalThe primal value (numeric or value object)
tangentThe tangent (derivative direction), default 0
clone()
The objects of this class are cloneable with this method.
dual$clone(deep = FALSE)
deepWhether to make a deep clone.
Create a dual number
dual_num(primal, tangent = 0)dual_num(primal, tangent = 0)
primal |
The primal value |
tangent |
The tangent (directional derivative), default 0 |
A dual object
Computes exp(x) with optional clamping to prevent Inf.
exp_safe(x, max_val = 709)exp_safe(x, max_val = 709)
x |
A value object or numeric |
max_val |
Maximum value for x to prevent overflow (default: 709) |
In double precision, exp(710) overflows to Inf. This function clamps the input to prevent overflow while maintaining correct gradients.
exp(min(x, max_val))
Element-wise exponential.
## S3 method for class 'value' exp(x)## S3 method for class 'value' exp(x)
x |
A value object |
A new value object representing exp(x)
Creates a fitted model object from MLE results. This is primarily
an internal constructor; users typically obtain femtofit objects
from the fit() function.
femtofit( coefficients, vcov, loglik, hessian = NULL, gradient = NULL, converged = TRUE, iterations = NA_integer_, nobs = NULL, call = NULL, predict_fn = NULL ) ## S3 method for class 'femtofit' coef(object, ...) ## S3 method for class 'femtofit' vcov(object, ...) ## S3 method for class 'femtofit' confint(object, parm, level = 0.95, ...) ## S3 method for class 'femtofit' logLik(object, ...) ## S3 method for class 'femtofit' nobs(object, ...) ## S3 method for class 'femtofit' print(x, ...) ## S3 method for class 'femtofit' summary(object, ...) ## S3 method for class 'summary.femtofit' print(x, ...)femtofit( coefficients, vcov, loglik, hessian = NULL, gradient = NULL, converged = TRUE, iterations = NA_integer_, nobs = NULL, call = NULL, predict_fn = NULL ) ## S3 method for class 'femtofit' coef(object, ...) ## S3 method for class 'femtofit' vcov(object, ...) ## S3 method for class 'femtofit' confint(object, parm, level = 0.95, ...) ## S3 method for class 'femtofit' logLik(object, ...) ## S3 method for class 'femtofit' nobs(object, ...) ## S3 method for class 'femtofit' print(x, ...) ## S3 method for class 'femtofit' summary(object, ...) ## S3 method for class 'summary.femtofit' print(x, ...)
coefficients |
Named numeric vector of parameter estimates. |
vcov |
Variance-covariance matrix (named). |
loglik |
Log-likelihood value at the MLE. |
hessian |
Hessian matrix at the MLE. |
gradient |
Gradient vector at the MLE (should be near zero). |
converged |
Logical indicating convergence. |
iterations |
Number of iterations performed. |
nobs |
Number of observations (optional). |
call |
The original function call (optional). |
predict_fn |
Optional prediction function (optional). |
object |
A |
... |
Additional arguments (ignored). |
parm |
Parameter names or indices (default: all parameters). |
level |
Confidence level (default: 0.95). |
x |
A |
A femtofit object.
coef(femtofit): Extract coefficient estimates
vcov(femtofit): Extract variance-covariance matrix
confint(femtofit): Compute confidence intervals
logLik(femtofit): Extract log-likelihood (enables AIC/BIC)
nobs(femtofit): Number of observations
print(femtofit): Print fitted model
summary(femtofit): Summary of fitted model
print(summary.femtofit): Print summary
Convenience function that finds the MLE and computes standard errors from the observed Fisher information.
find_mle(loglik_fn, params, method = "newton", ...)find_mle(loglik_fn, params, method = "newton", ...)
loglik_fn |
Log-likelihood function taking list of value parameters |
params |
List of value objects (initial parameter values) |
method |
Optimization method: "newton" or "gradient" |
... |
Additional arguments passed to optimizer |
A list containing:
estimate |
MLE as numeric vector |
se |
Standard errors |
vcov |
Variance-covariance matrix |
loglik |
Log-likelihood at MLE |
hessian |
Hessian at MLE |
converged |
Convergence indicator |
## Not run: x <- rnorm(100, mean = 5, sd = 2) loglik <- function(p) loglik_normal(p[[1]], p[[2]], x) result <- find_mle(loglik, list(val(0), val(1))) result$estimate # MLEs result$se # Standard errors ## End(Not run)## Not run: x <- rnorm(100, mean = 5, sd = 2) loglik <- function(p) loglik_normal(p[[1]], p[[2]], x) result <- find_mle(loglik, list(val(0), val(1))) result$estimate # MLEs result$se # Standard errors ## End(Not run)
For a log-likelihood function, the observed Fisher information is the negative Hessian evaluated at the MLE. This is used for computing standard errors of MLEs.
fisher_information(loglik_fn, params)fisher_information(loglik_fn, params)
loglik_fn |
Log-likelihood function taking a list of value parameters |
params |
List of value objects at MLE |
The observed Fisher information I(θ̂) = -H(θ̂) where H is the Hessian of the log-likelihood. Standard errors are sqrt(diag(I⁻¹)).
The observed Fisher information matrix (negative Hessian)
Similar to Newton-Raphson but uses expected Fisher information (negative expected Hessian) instead of observed Hessian. More stable for some problems.
fisher_scoring(loglik_fn, params, max_iter = 100, tol = 1e-08, verbose = 0)fisher_scoring(loglik_fn, params, max_iter = 100, tol = 1e-08, verbose = 0)
loglik_fn |
Log-likelihood function |
params |
List of value objects (initial parameter values) |
max_iter |
Maximum iterations, default 100 |
tol |
Convergence tolerance, default 1e-8 |
verbose |
Print progress every N iterations (0 for silent) |
For regular exponential families, Fisher scoring is equivalent to Newton-Raphson since observed = expected information.
Fisher scoring: theta_{n+1} = theta_n + solve(I(theta_n)) %*% S(theta_n)
where I = -E(H) (Fisher information) and S = gradient (score).
This implementation uses the observed Hessian as an approximation
to the expected Hessian, making it identical to Newton-Raphson.
For a true Fisher scoring implementation, one would need to compute
E(H) analytically or via Monte Carlo.
Same structure as newton_raphson
Finds the maximum likelihood estimates of parameters and returns a fitted model object with standard errors, confidence intervals, and other inference quantities computed automatically via autodiff.
fit( loglik, params, method = c("bfgs", "lbfgs", "newton", "gradient"), nobs = NULL, predict_fn = NULL, ... )fit( loglik, params, method = c("bfgs", "lbfgs", "newton", "gradient"), nobs = NULL, predict_fn = NULL, ... )
loglik |
A log-likelihood function. Can be specified in two ways:
The function should return a scalar value object. |
params |
Named numeric vector of initial parameter values,
e.g., |
method |
Optimization method: "bfgs" (default), "lbfgs", "newton", or "gradient". |
nobs |
Optional number of observations used in fitting. Setting this
enables |
predict_fn |
Optional prediction function. Should take arguments
|
... |
Additional arguments passed to the optimizer. |
A femtofit object containing:
Coefficient estimates (accessible via coef())
Variance-covariance matrix (accessible via vcov())
Log-likelihood value (accessible via logLik())
Standard errors, Hessian, convergence info
## Not run: # Generate data set.seed(42) x <- rnorm(100, mean = 5, sd = 2) # Fit using named arguments (recommended) result <- fit( function(mu, sigma) loglik_normal(mu, sigma, x), params = c(mu = 0, sigma = 1) ) # Or using single parameter argument result <- fit( function(p) loglik_normal(p$mu, p$sigma, x), params = c(mu = 0, sigma = 1) ) # Standard R generics work coef(result) # Parameter estimates vcov(result) # Variance-covariance matrix confint(result) # 95% confidence intervals logLik(result) # Log-likelihood (works with AIC/BIC) AIC(result) # Akaike information criterion summary(result) # Full summary with p-values ## End(Not run)## Not run: # Generate data set.seed(42) x <- rnorm(100, mean = 5, sd = 2) # Fit using named arguments (recommended) result <- fit( function(mu, sigma) loglik_normal(mu, sigma, x), params = c(mu = 0, sigma = 1) ) # Or using single parameter argument result <- fit( function(p) loglik_normal(p$mu, p$sigma, x), params = c(mu = 0, sigma = 1) ) # Standard R generics work coef(result) # Parameter estimates vcov(result) # Variance-covariance matrix confint(result) # 95% confidence intervals logLik(result) # Log-likelihood (works with AIC/BIC) AIC(result) # Akaike information criterion summary(result) # Full summary with p-values ## End(Not run)
This module provides the fit() function for maximum likelihood estimation
and the femtofit class for representing fitted models. The interface
follows standard R conventions, implementing base R generics like
coef(), vcov(), confint(), logLik(), etc.
Generic function to extract the numeric data from a differentiable object. For value objects, returns the underlying matrix. For plain numeric inputs, converts to matrix representation.
Assignment function to update the data field of a value object without breaking the computational graph reference.
get_data(x, ...) ## S3 method for class 'value' get_data(x, drop = TRUE, ...) ## Default S3 method: get_data(x, ...) get_data(x) <- value ## S3 replacement method for class 'value' get_data(x) <- value ## Default S3 replacement method: get_data(x) <- valueget_data(x, ...) ## S3 method for class 'value' get_data(x, drop = TRUE, ...) ## Default S3 method: get_data(x, ...) get_data(x) <- value ## S3 replacement method for class 'value' get_data(x) <- value ## Default S3 replacement method: get_data(x) <- value
x |
A value object |
... |
additional arguments to pass |
drop |
If TRUE (default) and result is 1x1, return scalar. Set to FALSE to always return a matrix. |
value |
The new numeric value to assign (converted to matrix) |
The data as scalar (if 1x1 and drop=TRUE) or matrix
The modified value object (invisibly)
x <- val(5) get_data(x) # Returns 5 (scalar) get_data(x, drop = FALSE) # Returns 1x1 matrix ## Not run: x <- val(5) get_data(x) <- 10 get_data(x) # Returns 10 ## End(Not run)x <- val(5) get_data(x) # Returns 5 (scalar) get_data(x, drop = FALSE) # Returns 1x1 matrix ## Not run: x <- val(5) get_data(x) <- 10 get_data(x) # Returns 10 ## End(Not run)
x with respect to e in backward(e),
e.g., dx/de. (applies the chain rule)Gradient of x with respect to e in backward(e),
e.g., dx/de. (applies the chain rule)
grad(x, ...)grad(x, ...)
x |
A differential object |
... |
pass additional arguments |
The gradient matrix (same dimensions as data)
Default gradient is zero matrix
## Default S3 method: grad(x, ...)## Default S3 method: grad(x, ...)
x |
A non-value object |
... |
pass additional arguments |
Zero matrix of appropriate dimensions
value object x with respect to e in
backward(e), e.g., dx/de. (applies the chain rule)Gradient of a value object x with respect to e in
backward(e), e.g., dx/de. (applies the chain rule)
## S3 method for class 'value' grad(x, drop = TRUE, ...)## S3 method for class 'value' grad(x, drop = TRUE, ...)
x |
A value object |
drop |
If TRUE (default) and result is 1x1, return scalar. Set to FALSE to always return a matrix. |
... |
pass additional arguments |
The gradient as scalar (if 1x1 and drop=TRUE) or matrix
Convenience function to compute the gradient of a loss function at the current parameter values.
gradient(loss_fn, params)gradient(loss_fn, params)
loss_fn |
A function taking a list of value parameters |
params |
A list of value objects |
A numeric vector of gradients (one per parameter)
Finds the optimum of a function using gradient-based optimization. Supports gradient clipping and adaptive step sizes.
gradient_ascent( objective_fn, params, lr = 0.01, max_iter = 1000, tol = 1e-06, maximize = TRUE, grad_clip = NULL, verbose = 0 )gradient_ascent( objective_fn, params, lr = 0.01, max_iter = 1000, tol = 1e-06, maximize = TRUE, grad_clip = NULL, verbose = 0 )
objective_fn |
Function taking list of value parameters, returns scalar |
params |
List of value objects (initial parameter values) |
lr |
Learning rate (step size), default 0.01 |
max_iter |
Maximum iterations, default 1000 |
tol |
Convergence tolerance on gradient norm, default 1e-6 |
maximize |
If TRUE (default), maximize; if FALSE, minimize |
grad_clip |
Maximum gradient norm (NULL for no clipping) |
verbose |
Print progress every N iterations (0 for silent) |
A list containing:
params |
List of value objects at optimum |
value |
Objective function value at optimum |
gradient |
Gradient at optimum |
iterations |
Number of iterations performed |
converged |
TRUE if gradient norm < tol |
## Not run: # Find MLE for exponential distribution x <- rexp(100, rate = 2) loglik <- function(p) loglik_exponential(p[[1]], x) result <- gradient_ascent(loglik, list(val(1))) get_data(result$params[[1]]) # Should be close to 1/mean(x) ## End(Not run)## Not run: # Find MLE for exponential distribution x <- rexp(100, rate = 2) loglik <- function(p) loglik_exponential(p[[1]], x) result <- gradient_ascent(loglik, list(val(1))) get_data(result$params[[1]]) # Should be close to 1/mean(x) ## End(Not run)
Convenience wrapper for gradient_ascent with maximize=FALSE.
gradient_descent( objective_fn, params, lr = 0.01, max_iter = 1000, tol = 1e-06, grad_clip = NULL, verbose = 0 )gradient_descent( objective_fn, params, lr = 0.01, max_iter = 1000, tol = 1e-06, grad_clip = NULL, verbose = 0 )
objective_fn |
Function taking list of value parameters, returns scalar |
params |
List of value objects (initial parameter values) |
lr |
Learning rate (step size), default 0.01 |
max_iter |
Maximum iterations, default 1000 |
tol |
Convergence tolerance on gradient norm, default 1e-6 |
grad_clip |
Maximum gradient norm (NULL for no clipping) |
verbose |
Print progress every N iterations (0 for silent) |
Computes the Hessian matrix (matrix of second partial derivatives) of a scalar function with respect to a vector of parameters. Uses forward-mode AD on top of reverse-mode AD for efficient, accurate computation.
hessian(loss_fn, params, value_creator = val)hessian(loss_fn, params, value_creator = val)
loss_fn |
A function that takes a list of parameters and returns a scalar loss/objective. The function must use femtograd operations. |
params |
A list of value objects (the parameters) |
value_creator |
Function to create value objects (default: val). This allows customization for different value types. |
The Hessian is computed using forward-over-reverse mode AD:
For each parameter i, create dual numbers where:
Primal = value object holding parameter value
Tangent = value object (1 for param i, 0 for others)
Run the loss function with these dual-value inputs
The tangent of the loss is df/dθᵢ (as a value expression)
Call backward() on the tangent loss
Each tangent parameter's gradient gives d²f/dθⱼdθᵢ
This is the "forward-over-reverse" pattern: forward-mode (dual numbers) computes df/dθᵢ symbolically, then reverse-mode differentiates that to get d²f/dθⱼdθᵢ.
A numeric matrix of dimension (n x n) where n = length(params), containing the Hessian d²f/dθᵢdθⱼ
## Not run: # Hessian of f(x,y) = x^2 + x*y + y^2 loss_fn <- function(p) { x <- p[[1]] y <- p[[2]] x^2 + x*y + y^2 } params <- list(val(1), val(2)) H <- hessian(loss_fn, params) # H should be [[2, 1], [1, 2]] ## End(Not run)## Not run: # Hessian of f(x,y) = x^2 + x*y + y^2 loss_fn <- function(p) { x <- p[[1]] y <- p[[2]] x^2 + x*y + y^2 } params <- list(val(1), val(2)) H <- hessian(loss_fn, params) # H should be [[2, 1], [1, 2]] ## End(Not run)
Provides hypothesis testing functions that produce results compatible
with the hypothesize package interface. Results have stat, p.value,
and dof components accessible via standard methods.
Converts a bounded value back to unconstrained scale.
inv_bounded(x, lower, upper)inv_bounded(x, lower, upper)
x |
Value in (lower, upper) |
lower |
Lower bound |
upper |
Upper bound |
Unconstrained value
Converts a positive value back to unconstrained scale.
inv_positive(x)inv_positive(x)
x |
Positive value |
Unconstrained value (log of x)
Converts a probability to unconstrained scale (logit).
inv_probability(x)inv_probability(x)
x |
Value in (0, 1) |
Unconstrained value (logit of x)
These functions convert fitted unconstrained parameters back to their natural scale.
Check if object is a dual number
is_dual(x)is_dual(x)
x |
Object to check |
TRUE if x is a dual object
Check if test is significant at given level
is_significant_at(x, alpha = 0.05, ...) ## S3 method for class 'hypothesis_test' is_significant_at(x, alpha = 0.05, ...)is_significant_at(x, alpha = 0.05, ...) ## S3 method for class 'hypothesis_test' is_significant_at(x, alpha = 0.05, ...)
x |
A hypothesis test object |
alpha |
Significance level (default 0.05) |
... |
Additional arguments (ignored) |
Logical indicating significance
is_significant_at(hypothesis_test): Significance check for hypothesis tests
Determines if the input object is an instance of the value R6 class.
is_value(x)is_value(x)
x |
The object to be checked |
TRUE if the object is of class value, FALSE otherwise
Memory-efficient variant of BFGS that stores only the last m correction pairs instead of the full inverse Hessian approximation. Suitable for large-scale optimization.
lbfgs( objective_fn, params, m = 10, max_iter = 1000, tol = 1e-06, maximize = FALSE, verbose = 0 )lbfgs( objective_fn, params, m = 10, max_iter = 1000, tol = 1e-06, maximize = FALSE, verbose = 0 )
objective_fn |
Function taking list of value parameters, returns scalar |
params |
List of value objects (initial parameter values) |
m |
Number of correction pairs to store, default 10 |
max_iter |
Maximum iterations, default 1000 |
tol |
Convergence tolerance on gradient norm, default 1e-6 |
maximize |
If TRUE, maximize; if FALSE (default), minimize |
verbose |
Print progress every N iterations (0 for silent) |
L-BFGS uses the two-loop recursion algorithm to compute the search direction without explicitly forming the inverse Hessian. Memory usage is O(m*n) instead of O(n^2) for full BFGS.
A list containing:
params |
List of value objects at optimum |
value |
Objective function value at optimum |
gradient |
Gradient at optimum |
iterations |
Number of iterations performed |
converged |
TRUE if gradient norm < tol |
Element-wise log(gamma(x)).
## S3 method for class 'value' lgamma(x)## S3 method for class 'value' lgamma(x)
x |
A value object |
A new value object representing lgamma(x)
Finds a step size that satisfies the Armijo condition for sufficient decrease.
line_search( objective_fn, param_values, direction, grad, maximize = FALSE, alpha = 1, c1 = 1e-04, rho = 0.5, max_iter = 20 )line_search( objective_fn, param_values, direction, grad, maximize = FALSE, alpha = 1, c1 = 1e-04, rho = 0.5, max_iter = 20 )
objective_fn |
Function to minimize/maximize |
param_values |
Current parameter values (numeric vector) |
direction |
Search direction (numeric vector) |
grad |
Current gradient (numeric vector) |
maximize |
If TRUE, maximize; if FALSE, minimize |
alpha |
Initial step size, default 1 |
c1 |
Armijo parameter (sufficient decrease), default 1e-4 |
rho |
Backtracking factor, default 0.5 |
max_iter |
Maximum backtracking iterations, default 20 |
Armijo condition (for minimization): f(x + αd) <= f(x) + c1α*g'*d where d is the search direction and g is the gradient.
Step size satisfying Armijo condition
Computes log(x) with protection against log(0) = -Inf. Optionally clamps input to a minimum value.
log_safe(x, eps = .Machine$double.eps)log_safe(x, eps = .Machine$double.eps)
x |
A value object or numeric |
eps |
Minimum value to clamp x to (default: .Machine$double.eps) |
This is useful when computing log-likelihoods where probabilities
might numerically become zero. Where x > eps, the gradient is the true
1/x. Where the clamp triggered (x <= eps), the gradient is zeroed out
rather than returning the explosive 1/eps value — propagating a huge
gradient from a region where the primal is known to be wrong would
actively mislead the optimizer.
log(max(x, eps))
Computes log(sigmoid(x)) = -log(1 + exp(-x)) stably.
log_sigmoid(x)log_sigmoid(x)
x |
A value object or numeric |
Direct computation of log(sigmoid(x)) fails for large negative x (sigmoid underflows to 0). This uses:
For x >= 0: -log(1 + exp(-x)) = -softplus(-x)
For x < 0: x - log(1 + exp(x)) = x - softplus(x)
log(sigmoid(x))
Element-wise natural logarithm.
## S3 method for class 'value' log(x, ...)## S3 method for class 'value' log(x, ...)
x |
A value object |
... |
Additional arguments (ignored). |
A new value object representing log(x)
This is a thin alias for log1p(), kept only for backward compatibility.
log1p() already has log1p.value and log1p.dual methods registered by
femtograd, so it is numerically stable and autodiff-aware on its own. Use
log1p() directly.
log1p_safe(x)log1p_safe(x)
x |
A value object, dual, or numeric. |
log1p(x).
Element-wise log(1+x), numerically stable for small x.
## S3 method for class 'value' log1p(x)## S3 method for class 'value' log1p(x)
x |
A value object |
A new value object representing log(1+x)
Element-wise logit: log(p/(1-p)).
logit(x)logit(x)
x |
A value object representing probabilities |
A new value object representing logit(x)
Special case of binomial with size=1.
L(p|x) = sum(x*log(p) + (1-x)*log(1-p))
loglik_bernoulli(p, x)loglik_bernoulli(p, x)
p |
Success probability (value object), must be in (0,1) |
x |
Binary vector (0 or 1) |
A value object representing the log-likelihood
Computes the log-likelihood for i.i.d. beta observations.
L(a,b|x) = n*(lgamma(a+b) - lgamma(a) - lgamma(b)) + (a-1)*sum(log(x)) + (b-1)*sum(log(1-x))
loglik_beta(alpha, beta, x)loglik_beta(alpha, beta, x)
alpha |
Shape parameter α (value object), must be positive |
beta |
Shape parameter β (value object), must be positive |
x |
Numeric vector of observations in (0,1) |
A value object representing the log-likelihood
Computes the log-likelihood for binomial observations.
L(p|x,n) = sum(x*log(p) + (n-x)*log(1-p) + log(C(n,x)))
loglik_binomial(p, x, size)loglik_binomial(p, x, size)
p |
Success probability (value object), must be in (0,1) |
x |
Integer vector of successes |
size |
Integer vector of trial counts (or single value if constant) |
A value object representing the log-likelihood
## Not run: x <- rbinom(50, size = 10, prob = 0.3) p <- val(0.5) ll <- loglik_binomial(p, x, size = 10) backward(ll) # MLE is sum(x) / sum(size) ## End(Not run)## Not run: x <- rbinom(50, size = 10, prob = 0.3) p <- val(0.5) ll <- loglik_binomial(p, x, size = 10) backward(ll) # MLE is sum(x) / sum(size) ## End(Not run)
Computes the log-likelihood for i.i.d. exponential observations. L(λ|x) = nlog(λ) - λΣxᵢ
loglik_exponential(rate, x)loglik_exponential(rate, x)
rate |
Rate parameter λ (value object), must be positive |
x |
Numeric vector of observations (must be non-negative) |
A value object representing the log-likelihood
## Not run: x <- rexp(50, rate = 2) rate <- val(1) ll <- loglik_exponential(rate, x) backward(ll) grad(rate) # should be n/rate - sum(x) ## End(Not run)## Not run: x <- rexp(50, rate = 2) rate <- val(1) ll <- loglik_exponential(rate, x) backward(ll) grad(rate) # should be n/rate - sum(x) ## End(Not run)
Computes the log-likelihood for i.i.d. gamma observations. L(α,β|x) = nαlog(β) - n*log(Γ(α)) + (α-1)Σlog(xᵢ) - βΣxᵢ
loglik_gamma(shape, rate, x)loglik_gamma(shape, rate, x)
shape |
Shape parameter α (value object), must be positive |
rate |
Rate parameter β (value object), must be positive |
x |
Numeric vector of observations (must be positive) |
The gamma distribution is parameterized with shape alpha and rate beta,
where E(X) = alpha/beta and Var(X) = alpha/beta^2.
A value object representing the log-likelihood
Computes the log-likelihood for binary logistic regression.
L(beta|X,y) = sum(y*log(p) + (1-y)*log(1-p))
where p = sigmoid(X %*% beta)
loglik_logistic(beta, X, y)loglik_logistic(beta, X, y)
beta |
Coefficient vector (list of value objects) |
X |
Design matrix (n x p numeric matrix) |
y |
Binary response vector (0 or 1) |
Uses the numerically stable form: log(p) = -log(1 + exp(-η)) and log(1-p) = -log(1 + exp(η)) where η = Xβ
A value object representing the log-likelihood
Computes the log-likelihood for negative binomial (failures until r successes).
L(r,p|x) = sum(lchoose(x+r-1, x) + r*log(p) + x*log(1-p))
loglik_negbinom(r, p, x)loglik_negbinom(r, p, x)
r |
Number of successes parameter (value object or fixed positive) |
p |
Success probability (value object), must be in (0,1) |
x |
Integer vector of failure counts |
A value object representing the log-likelihood
Computes the log-likelihood for i.i.d. normal observations. L(μ,σ²|x) = -n/2 log(2π) - n/2 log(σ²) - Σ(xᵢ-μ)²/(2σ²)
loglik_normal(mu, sigma, x)loglik_normal(mu, sigma, x)
mu |
Mean parameter (value object) |
sigma |
Standard deviation parameter (value object), must be positive |
x |
Numeric vector of observations |
A value object representing the log-likelihood
## Not run: x <- rnorm(100, mean = 5, sd = 2) mu <- val(0) sigma <- val(1) ll <- loglik_normal(mu, sigma, x) backward(ll) grad(mu) # score for mu ## End(Not run)## Not run: x <- rnorm(100, mean = 5, sd = 2) mu <- val(0) sigma <- val(1) ll <- loglik_normal(mu, sigma, x) backward(ll) grad(mu) # score for mu ## End(Not run)
Computes the log-likelihood for i.i.d. Pareto observations. L(α, xₘ | x) = nlog(α) + nα*log(xₘ) - (α+1)*Σlog(xᵢ)
loglik_pareto(alpha, x_min, x)loglik_pareto(alpha, x_min, x)
alpha |
Shape parameter α (value object), must be positive |
x_min |
Minimum/scale parameter xₘ (fixed positive number). All observations must be >= x_min. |
x |
Numeric vector of observations (must be >= x_min) |
The Pareto distribution is used to model heavy-tailed phenomena like income distributions, city sizes, etc. Here x_min is typically known (e.g., min(x)) and alpha is estimated.
A value object representing the log-likelihood
## Not run: # Generate Pareto data alpha_true <- 2 x_min <- 1 u <- runif(100) x <- x_min * (1 - u)^(-1/alpha_true) # Fit (alpha only, x_min = min(x) is fixed) result <- fit( function(log_alpha) { alpha <- exp(log_alpha) loglik_pareto(alpha, x_min = min(x), x) }, params = c(log_alpha = 0) ) ## End(Not run)## Not run: # Generate Pareto data alpha_true <- 2 x_min <- 1 u <- runif(100) x <- x_min * (1 - u)^(-1/alpha_true) # Fit (alpha only, x_min = min(x) is fixed) result <- fit( function(log_alpha) { alpha <- exp(log_alpha) loglik_pareto(alpha, x_min = min(x), x) }, params = c(log_alpha = 0) ) ## End(Not run)
Computes the log-likelihood for i.i.d. Poisson observations. L(λ|x) = Σxᵢlog(λ) - nλ - Σlog(xᵢ!)
loglik_poisson(lambda, x)loglik_poisson(lambda, x)
lambda |
Rate parameter λ (value object), must be positive |
x |
Integer vector of observations (counts) |
The term Σlog(xᵢ!) is constant w.r.t. λ and is included for completeness.
A value object representing the log-likelihood
## Not run: x <- rpois(100, lambda = 3) lambda <- val(1) ll <- loglik_poisson(lambda, x) backward(ll) # MLE is mean(x) ## End(Not run)## Not run: x <- rpois(100, lambda = 3) lambda <- val(1) ll <- loglik_poisson(lambda, x) backward(ll) # MLE is mean(x) ## End(Not run)
Computes the log-likelihood for i.i.d. Weibull observations. L(k, λ | x) = nlog(k) - nk*log(λ) + (k-1)*Σlog(xᵢ) - Σ(xᵢ/λ)^k
loglik_weibull(shape, scale, x)loglik_weibull(shape, scale, x)
shape |
Shape parameter k (value object), must be positive |
scale |
Scale parameter λ (value object), must be positive |
x |
Numeric vector of observations (must be positive) |
The Weibull distribution is commonly used in survival analysis and reliability engineering. It generalizes the exponential distribution (k=1 gives exponential with rate 1/λ).
A value object representing the log-likelihood
## Not run: x <- rweibull(100, shape = 2, scale = 3) # Using log-parameterization for positivity result <- fit( function(log_shape, log_scale) { shape <- exp(log_shape) scale <- exp(log_scale) loglik_weibull(shape, scale, x) }, params = c(log_shape = 0, log_scale = 0) ) ## End(Not run)## Not run: x <- rweibull(100, shape = 2, scale = 3) # Using log-parameterization for positivity result <- fit( function(log_shape, log_scale) { shape <- exp(log_shape) scale <- exp(log_scale) loglik_weibull(shape, scale, x) }, params = c(log_shape = 0, log_scale = 0) ) ## End(Not run)
Computes log(sum(exp(x))) in a numerically stable way by factoring out the maximum value: log(sum(exp(x))) = m + log(sum(exp(x - m))) where m = max(x).
logsumexp(..., na.rm = FALSE)logsumexp(..., na.rm = FALSE)
... |
value objects and/or numeric vectors to include in the sum |
na.rm |
Logical, whether to remove NA values |
The naive computation of log(sum(exp(x))) overflows when x contains large values (exp(710) = Inf in double precision) and underflows when x contains very negative values. This implementation is stable for any finite input.
A value object representing log(sum(exp(x)))
## Not run: x <- val(c(1000, 1001, 1002)) # Would overflow with naive exp() logsumexp(x) # Returns ~1002.41 correctly ## End(Not run)## Not run: x <- val(c(1000, 1001, 1002)) # Would overflow with naive exp() logsumexp(x) # Returns ~1002.41 correctly ## End(Not run)
Maps unconstrained values to (lower, ∞). Equivalent to lower + positive(x).
lower_bounded(x, lower)lower_bounded(x, lower)
x |
Unconstrained value |
lower |
Lower bound |
The transformation is: lower_bounded(x, a) = a + exp(x)
Value > lower
## Not run: # Parameter > 2 raw <- val(0) param <- lower_bounded(raw, 2) # 2 + exp(0) = 3 ## End(Not run)## Not run: # Parameter > 2 raw <- val(0) param <- lower_bounded(raw, 2) # 2 + exp(0) = 3 ## End(Not run)
Computes the likelihood ratio test (LRT) for comparing nested models. Works with femtofit objects or raw log-likelihood values.
lrt(null_model, alt_model, df = NULL)lrt(null_model, alt_model, df = NULL)
null_model |
Either a |
alt_model |
Either a |
df |
Degrees of freedom. If NULL and both arguments are femtofit objects, computed as the difference in number of parameters. |
The likelihood ratio test statistic is:
where is the log-likelihood under the null (simpler) model
and is the log-likelihood under the alternative (complex) model.
Under the null hypothesis and regularity conditions,
asymptotically follows a chi-squared distribution with degrees of freedom
equal to the difference in number of free parameters.
A likelihood_ratio_test object (also inherits from hypothesis_test)
containing:
The LRT statistic: -2 * (loglik_null - loglik_alt)
P-value from chi-squared distribution
Degrees of freedom
Log-likelihood of null model
Log-likelihood of alternative model
## Not run: # Generate data set.seed(42) x <- rnorm(100, mean = 5, sd = 2) # Fit full model (estimate both mu and sigma) full <- fit( function(mu, log_sigma) { sigma <- exp(log_sigma) loglik_normal(mu, sigma, x) }, params = c(mu = 0, log_sigma = 0) ) # Fit null model (mu fixed at 0) null <- fit( function(log_sigma) { sigma <- exp(log_sigma) loglik_normal(0, sigma, x) # mu = 0 }, params = c(log_sigma = 0) ) # Likelihood ratio test test <- lrt(null, full) test pval(test) is_significant_at(test, 0.05) # Can also use raw log-likelihoods lrt(null_loglik = -150, alt_loglik = -140, df = 1) ## End(Not run)## Not run: # Generate data set.seed(42) x <- rnorm(100, mean = 5, sd = 2) # Fit full model (estimate both mu and sigma) full <- fit( function(mu, log_sigma) { sigma <- exp(log_sigma) loglik_normal(mu, sigma, x) }, params = c(mu = 0, log_sigma = 0) ) # Fit null model (mu fixed at 0) null <- fit( function(log_sigma) { sigma <- exp(log_sigma) loglik_normal(0, sigma, x) # mu = 0 }, params = c(log_sigma = 0) ) # Likelihood ratio test test <- lrt(null, full) test pval(test) is_significant_at(test, 0.05) # Can also use raw log-likelihoods lrt(null_loglik = -150, alt_loglik = -140, df = 1) ## End(Not run)
Computes the arithmetic mean of all elements across value objects.
## S3 method for class 'value' mean(x, ...)## S3 method for class 'value' mean(x, ...)
x |
A value object or list of value objects |
... |
Additional arguments (for compatibility with base::mean) |
A new value object (1x1 matrix) representing the mean
Finds the optimum using Newton-Raphson method with exact Hessian. Uses second-order information for faster convergence near optimum.
newton_raphson( objective_fn, params, max_iter = 100, tol = 1e-08, maximize = TRUE, step_scale = 1, verbose = 0 )newton_raphson( objective_fn, params, max_iter = 100, tol = 1e-08, maximize = TRUE, step_scale = 1, verbose = 0 )
objective_fn |
Function taking list of value parameters, returns scalar |
params |
List of value objects (initial parameter values) |
max_iter |
Maximum iterations, default 100 |
tol |
Convergence tolerance on step size, default 1e-8 |
maximize |
If TRUE (default), maximize; if FALSE, minimize |
step_scale |
Scale factor for Newton step (< 1 for damping), default 1 |
verbose |
Print progress every N iterations (0 for silent) |
Newton-Raphson update: θ_n+1 = θ_n - H⁻¹ g For maximization, uses: θ_n+1 = θ_n - H⁻¹ g (H is negative definite) For minimization, uses: θ_n+1 = θ_n - H⁻¹ g (H is positive definite)
The Hessian is computed via forward-over-reverse AD at each iteration. This is exact but can be slow for many parameters.
A list containing:
params |
List of value objects at optimum |
value |
Objective function value at optimum |
gradient |
Gradient at optimum |
hessian |
Hessian at optimum |
iterations |
Number of iterations performed |
converged |
TRUE if step size < tol |
## Not run: # Find MLE for normal distribution x <- rnorm(100, mean = 5, sd = 2) loglik <- function(p) loglik_normal(p[[1]], p[[2]], x) result <- newton_raphson(loglik, list(val(0), val(1))) sapply(result$params, get_data) # Should be close to c(mean(x), sd(x)) ## End(Not run)## Not run: # Find MLE for normal distribution x <- rnorm(100, mean = 5, sd = 2) loglik <- function(p) loglik_normal(p[[1]], p[[2]], x) result <- newton_raphson(loglik, list(val(0), val(1))) sapply(result$params, get_data) # Should be close to c(mean(x), sd(x)) ## End(Not run)
Returns the observed Fisher information matrix, which is the negative Hessian of the log-likelihood at the MLE.
observed_info(object, ...) ## S3 method for class 'femtofit' observed_info(object, ...)observed_info(object, ...) ## S3 method for class 'femtofit' observed_info(object, ...)
object |
A fitted model object. |
... |
Additional arguments. |
The observed information matrix.
observed_info(femtofit): Observed information for femtofit
These functions find the optimum (maximum or minimum) of differentiable objective functions using femtograd's automatic differentiation.
Plot profile likelihood
## S3 method for class 'profile_likelihood' plot(x, level = 0.95, ...)## S3 method for class 'profile_likelihood' plot(x, level = 0.95, ...)
x |
A |
level |
Confidence level for adding threshold line (default 0.95). |
... |
Additional arguments passed to plot. |
Maps unconstrained values to positive values via exp(x). Use this for parameters like standard deviations, rates, or variances.
positive(x)positive(x)
x |
Unconstrained value (scalar, vector, or value object) |
The transformation is: positive(x) = exp(x)
For optimization, work with the unconstrained parameter and transform:
result <- fit(
function(mu, log_sigma) {
sigma <- positive(log_sigma) # exp(log_sigma)
loglik_normal(mu, sigma, data)
},
params = c(mu = 0, log_sigma = 0) # log(1) = 0
)
# To recover sigma: exp(coef(result)["log_sigma"])
Positive value (always > 0)
## Not run: # Parameter that must be positive log_sigma <- val(-1) sigma <- positive(log_sigma) # exp(-1) ≈ 0.368 get_data(sigma) # Works in optimization fit( function(mu, log_sigma) loglik_normal(mu, positive(log_sigma), x), params = c(mu = 0, log_sigma = 0) ) ## End(Not run)## Not run: # Parameter that must be positive log_sigma <- val(-1) sigma <- positive(log_sigma) # exp(-1) ≈ 0.368 get_data(sigma) # Works in optimization fit( function(mu, log_sigma) loglik_normal(mu, positive(log_sigma), x), params = c(mu = 0, log_sigma = 0) ) ## End(Not run)
Generate predictions from a femtofit object using the prediction
function supplied at fit time.
## S3 method for class 'femtofit' predict(object, newdata, ...)## S3 method for class 'femtofit' predict(object, newdata, ...)
object |
A |
newdata |
New data for prediction. The format depends on the
prediction function provided to |
... |
Additional arguments passed to the prediction function. |
For predict() to work, a predict_fn must have been provided
when calling fit(). The prediction function should have signature
function(params, newdata, ...) where params is a named list
of parameter values.
Predictions, as returned by the prediction function.
## Not run: # Linear regression example x <- 1:10 y <- 2 + 3*x + rnorm(10, sd = 0.5) # Fit with prediction function result <- fit( function(a, b, log_sigma) { sigma <- exp(log_sigma) predicted <- a + b * x loglik_normal(predicted, sigma, y) }, params = c(a = 0, b = 1, log_sigma = 0), predict_fn = function(params, newdata) { params$a + params$b * newdata } ) # Predict for new x values predict(result, newdata = c(11, 12, 13)) ## End(Not run)## Not run: # Linear regression example x <- 1:10 y <- 2 + 3*x + rnorm(10, sd = 0.5) # Fit with prediction function result <- fit( function(a, b, log_sigma) { sigma <- exp(log_sigma) predicted <- a + b * x loglik_normal(predicted, sigma, y) }, params = c(a = 0, b = 1, log_sigma = 0), predict_fn = function(params, newdata) { params$a + params$b * newdata } ) # Predict for new x values predict(result, newdata = c(11, 12, 13)) ## End(Not run)
Extract primal from dual or return value unchanged
primal(x)primal(x)
x |
A dual or non-dual object |
The primal value
Print anova table for femtofit
## S3 method for class 'anova.femtofit' print(x, digits = 4, ...)## S3 method for class 'anova.femtofit' print(x, digits = 4, ...)
x |
An |
digits |
Number of significant digits for printing. |
... |
Additional arguments (ignored). |
Print method for bootstrap results
## S3 method for class 'bootstrap_result' print(x, ...)## S3 method for class 'bootstrap_result' print(x, ...)
x |
A |
... |
Additional arguments (ignored). |
Print methods for diagnostic objects
## S3 method for class 'hessian_check' print(x, ...) ## S3 method for class 'convergence_check' print(x, ...) ## S3 method for class 'model_diagnostics' print(x, ...)## S3 method for class 'hessian_check' print(x, ...) ## S3 method for class 'convergence_check' print(x, ...) ## S3 method for class 'model_diagnostics' print(x, ...)
x |
A diagnostic object. |
... |
Additional arguments (ignored). |
Print method for likelihood ratio test
## S3 method for class 'likelihood_ratio_test' print(x, ...)## S3 method for class 'likelihood_ratio_test' print(x, ...)
x |
A likelihood_ratio_test object |
... |
Additional arguments (ignored) |
Print model comparison
## S3 method for class 'model_comparison' print(x, digits = 2, ...)## S3 method for class 'model_comparison' print(x, digits = 2, ...)
x |
A |
digits |
Number of digits for rounding. |
... |
Additional arguments (ignored). |
Print method for profile likelihood
## S3 method for class 'profile_likelihood' print(x, ...)## S3 method for class 'profile_likelihood' print(x, ...)
x |
A |
... |
Additional arguments (ignored). |
Print value object and its computational graph
## S3 method for class 'value' print(x, depth = Inf, indent = " ", ...)## S3 method for class 'value' print(x, depth = Inf, indent = " ", ...)
x |
A value object |
depth |
Integer indicates depth (dfs) to recurse down the graph |
indent |
A character string specifying the indentation for each level in the computational graph (default: " ") |
... |
Additional arguments (ignored). |
Print method for Wald test
## S3 method for class 'wald_test' print(x, ...)## S3 method for class 'wald_test' print(x, ...)
x |
A wald_test object |
... |
Additional arguments (ignored) |
Maps unconstrained values to (0, 1) via the sigmoid function. Use this for probability parameters.
probability(x)probability(x)
x |
Unconstrained value (scalar, vector, or value object) |
The transformation is: probability(x) = 1 / (1 + exp(-x)) = sigmoid(x)
For optimization:
result <- fit(
function(logit_p) {
p <- probability(logit_p)
loglik_bernoulli(p, data)
},
params = c(logit_p = 0) # sigmoid(0) = 0.5
)
# To recover p: sigmoid(coef(result)["logit_p"])
Value in (0, 1)
## Not run: # Parameter that must be in (0, 1) logit_p <- val(2) p <- probability(logit_p) # sigmoid(2) ≈ 0.88 get_data(p) ## End(Not run)## Not run: # Parameter that must be in (0, 1) logit_p <- val(2) p <- probability(logit_p) # sigmoid(2) ≈ 0.88 get_data(p) ## End(Not run)
Functions for computing profile likelihoods and profile-based confidence intervals. Profile likelihood provides more accurate inference than Wald-based methods when the likelihood is non-quadratic.
Computes the profile log-likelihood for a single parameter by maximizing over all other parameters at each fixed value.
profile_loglik(object, parm, values = NULL, n_points = 20, range_mult = 3, ...)profile_loglik(object, parm, values = NULL, n_points = 20, range_mult = 3, ...)
object |
A |
parm |
The parameter name or index to profile. |
values |
Optional vector of values at which to evaluate the profile. If NULL, a grid is computed based on the MLE and standard error. |
n_points |
Number of points in the grid (default 20). Ignored if
|
range_mult |
Multiplier for the range around MLE (default 3). The grid spans MLE ± range_mult * SE. |
... |
Additional arguments passed to the optimizer. |
The profile log-likelihood for parameter θᵢ is defined as:
where θ₋ᵢ denotes all parameters except θᵢ. For each grid value of θᵢ, this re-optimizes over the remaining parameters, warm-started from the MLE.
If the femtofit object was produced by an older version of fit() that
did not stash the wrapped objective, a quadratic approximation based on
the stored Hessian is returned instead and is_quadratic_approx = TRUE
is set on the result.
A profile_likelihood object containing:
Name of the profiled parameter
Grid of parameter values
Profile log-likelihood at each value
MLE value of the parameter
Maximum log-likelihood
## Not run: set.seed(42) x <- rnorm(100, mean = 5, sd = 2) result <- fit( function(mu, log_sigma) loglik_normal(mu, exp(log_sigma), x), params = c(mu = 0, log_sigma = 0) ) # Profile likelihood for mu prof <- profile_loglik(result, "mu") plot(prof) ## End(Not run)## Not run: set.seed(42) x <- rnorm(100, mean = 5, sd = 2) result <- fit( function(mu, log_sigma) loglik_normal(mu, exp(log_sigma), x), params = c(mu = 0, log_sigma = 0) ) # Profile likelihood for mu prof <- profile_loglik(result, "mu") plot(prof) ## End(Not run)
Extract p-value from hypothesis test
pval(x, ...) ## S3 method for class 'hypothesis_test' pval(x, ...)pval(x, ...) ## S3 method for class 'hypothesis_test' pval(x, ...)
x |
A hypothesis test object |
... |
Additional arguments (ignored) |
The p-value
pval(hypothesis_test): p-value for hypothesis tests
Element-wise ReLU: max(0, x)
relu(x)relu(x)
x |
A value object |
A new value object representing max(0, x)
Extracts standard errors of coefficient estimates.
se(object, ...) ## S3 method for class 'femtofit' se(object, ...)se(object, ...) ## S3 method for class 'femtofit' se(object, ...)
object |
A |
... |
Additional arguments (ignored). |
Named numeric vector of standard errors.
se(femtofit): Standard errors for femtofit
A quick check of whether standard errors from the variance-covariance matrix are likely to be reliable.
se_reliable(object)se_reliable(object)
object |
A |
Standard errors may be unreliable if:
The Hessian is not negative definite
The model did not converge
Any SE is NA or negative
Logical indicating whether SEs are likely reliable.
Element-wise sigmoid: 1/(1+exp(-x))
sigmoid(x)sigmoid(x)
x |
A value object |
A new value object representing sigmoid(x)
Computes sigmoid(x) = 1/(1+exp(-x)) with overflow protection.
sigmoid_stable(x)sigmoid_stable(x)
x |
A value object or numeric |
For large positive x, exp(-x) underflows to 0, giving sigmoid = 1 (correct). For large negative x, we use sigmoid(x) = exp(x)/(1+exp(x)) to avoid overflow.
Sigmoid values in (0, 1)
Element-wise sine.
## S3 method for class 'value' sin(x)## S3 method for class 'value' sin(x)
x |
A value object |
A new value object representing sin(x)
Computes softmax(x)_i = exp(x_i) / sum(exp(x)) in a numerically stable way.
softmax(x)softmax(x)
x |
A value object or numeric vector |
Uses the identity softmax(x) = softmax(x - max(x)) to prevent overflow.
A value object (or numeric) with softmax probabilities
Element-wise softplus: log(1 + exp(x)).
softplus(x)softplus(x)
x |
A value object |
A new value object representing softplus(x)
Element-wise square root.
## S3 method for class 'value' sqrt(x)## S3 method for class 'value' sqrt(x)
x |
A value object |
A new value object representing sqrt(x)
These functions provide numerically stable implementations of common operations that can overflow or underflow with naive implementations.
Extracts standard errors from the Hessian of a log-likelihood.
SE(theta_i) = sqrt(diag(solve(I(theta)))) where I = -H.
std_errors(hess, is_loglik = TRUE)std_errors(hess, is_loglik = TRUE)
hess |
The Hessian matrix (from |
is_loglik |
If TRUE, treats hess as Hessian of log-likelihood and uses -H⁻¹. If FALSE, uses H⁻¹ directly. |
A numeric vector of standard errors
Sum for dual numbers
## S3 method for class 'dual' sum(..., na.rm = FALSE)## S3 method for class 'dual' sum(..., na.rm = FALSE)
... |
Dual numbers or numeric values to sum. |
na.rm |
Logical; should missing values be removed? |
Sums all elements of value objects, returning a 1x1 value.
## S3 method for class 'value' sum(..., na.rm = FALSE)## S3 method for class 'value' sum(..., na.rm = FALSE)
... |
value objects and/or numeric values to sum |
na.rm |
Logical, whether to remove NA values |
A new value object (1x1 matrix) representing the sum
Summary for bootstrap results
## S3 method for class 'bootstrap_result' summary(object, level = 0.95, ...)## S3 method for class 'bootstrap_result' summary(object, level = 0.95, ...)
object |
A |
level |
Confidence level (default 0.95). |
... |
Additional arguments (ignored). |
Extract tangent from dual or return 0
tangent(x)tangent(x)
x |
A dual or non-dual object |
The tangent value
Element-wise tanh.
## S3 method for class 'value' tanh(x)## S3 method for class 'value' tanh(x)
x |
A value object |
A new value object representing tanh(x)
Extract test statistic from hypothesis test
test_stat(x, ...) ## S3 method for class 'hypothesis_test' test_stat(x, ...)test_stat(x, ...) ## S3 method for class 'hypothesis_test' test_stat(x, ...)
x |
A hypothesis test object |
... |
Additional arguments (ignored) |
The test statistic
test_stat(hypothesis_test): Test statistic for hypothesis tests
These functions provide clean, differentiable transformations for constrained parameters. They allow optimization in unconstrained space while ensuring parameters stay in valid ranges.
Element-wise trigamma: second derivative of lgamma.
## S3 method for class 'value' trigamma(x)## S3 method for class 'value' trigamma(x)
x |
A value object |
A new value object representing trigamma(x)
Maps unconstrained values to (-∞, upper). Equivalent to upper - positive(x).
upper_bounded(x, upper)upper_bounded(x, upper)
x |
Unconstrained value |
upper |
Upper bound |
The transformation is: upper_bounded(x, b) = b - exp(x)
Value < upper
## Not run: # Parameter < 10 raw <- val(0) param <- upper_bounded(raw, 10) # 10 - exp(0) = 9 ## End(Not run)## Not run: # Parameter < 10 raw <- val(0) param <- upper_bounded(raw, 10) # 10 - exp(0) = 9 ## End(Not run)
value object constructorCreates a new value node in the computational graph with automatic differentiation. Input is converted to matrix representation internally.
val(data, label = "")val(data, label = "")
data |
Numeric value (scalar, vector, or matrix) |
label |
Optional character label for debugging |
All values are stored as matrices internally:
val(5) creates a 1x1 matrix
val(c(1,2,3)) creates a 3x1 column vector
val(matrix(...)) preserves matrix dimensions
A value object
value R6 class
value R6 class
Represents a node in the computational graph with automatic differentiation. Each value object holds data as a matrix, gradient as a matrix of the same dimensions, a backward function, previous nodes, and an optional label.
All data in femtograd uses matrix representation:
Scalars are 1x1 matrices
Column vectors are n x 1 matrices
Row vectors are 1 x n matrices
Matrices are m x n matrices
This ensures consistent behavior for get_data(), grad(), and hessian().
dataNumeric matrix containing the value
gradGradient matrix (same dimensions as data), initially zeros
backward_fnA function that performs the backward pass (gradient computation)
prevA list of previous nodes in the computational graph
labelOptional character label for debugging (default: "")
new()
Initializes a new value object with the given data, list of children, and optional label.
value$new(data, children = list(), label = "")
dataNumeric value (scalar, vector, or matrix) - will be converted to matrix
childrenList of previous nodes in the computational graph (default: empty list)
labelOptional character label for debugging
clone()
The objects of this class are cloneable with this method.
value$clone(deep = FALSE)
deepWhether to make a deep clone.
Compute variance-covariance matrix from Hessian
vcov_matrix(hess, is_loglik = TRUE)vcov_matrix(hess, is_loglik = TRUE)
hess |
The Hessian matrix |
is_loglik |
If TRUE, returns -H⁻¹ (for log-likelihood) |
The variance-covariance matrix
Performs Wald tests for individual parameters or linear combinations of parameters in a fitted model.
wald_test(object, ...) ## S3 method for class 'femtofit' wald_test(object, parm = NULL, null_value = 0, ...) ## Default S3 method: wald_test(object, se, null_value = 0, ...)wald_test(object, ...) ## S3 method for class 'femtofit' wald_test(object, parm = NULL, null_value = 0, ...) ## Default S3 method: wald_test(object, se, null_value = 0, ...)
object |
A |
... |
Additional arguments. |
parm |
Parameter name(s) or indices to test. If NULL, tests all parameters. |
null_value |
The null hypothesis value(s). Default is 0. |
se |
Standard error (for default method) |
The Wald test statistic for a single parameter is:
which follows a chi-squared distribution with 1 degree of freedom under H0.
For single parameter tests, a wald_test object (also inherits from
hypothesis_test) containing:
The Wald chi-squared statistic
P-value from chi-squared(1) distribution
Degrees of freedom (1 for single parameter)
The z-score
Parameter estimate
Standard error
The null hypothesis value
For multiple parameters, returns a list of wald_test objects.
wald_test(femtofit): Wald test for femtofit objects
wald_test(default): Wald test from raw estimate and SE
## Not run: set.seed(42) x <- rnorm(100, mean = 5, sd = 2) result <- fit( function(mu, log_sigma) loglik_normal(mu, exp(log_sigma), x), params = c(mu = 0, log_sigma = 0) ) # Test if mu = 0 test <- wald_test(result, "mu") pval(test) # Test if mu = 5 wald_test(result, "mu", null_value = 5) # Test all parameters wald_test(result) ## End(Not run)## Not run: set.seed(42) x <- rnorm(100, mean = 5, sd = 2) result <- fit( function(mu, log_sigma) loglik_normal(mu, exp(log_sigma), x), params = c(mu = 0, log_sigma = 0) ) # Test if mu = 0 test <- wald_test(result, "mu") pval(test) # Test if mu = 5 wald_test(result, "mu", null_value = 5) # Test all parameters wald_test(result) ## End(Not run)
Resets the gradient of a value object (and all its ancestors in the computational graph) to zero. This is useful when reusing parameters across multiple optimization steps.
zero_grad(e) ## S3 method for class 'value' zero_grad(e) ## S3 method for class 'list' zero_grad(e) ## Default S3 method: zero_grad(e)zero_grad(e) ## S3 method for class 'value' zero_grad(e) ## S3 method for class 'list' zero_grad(e) ## Default S3 method: zero_grad(e)
e |
A value object or list of value objects |
After calling backward(), gradients accumulate in the computational graph.
Before computing new gradients, you should reset them with zero_grad().
Invisibly returns the input (for chaining)
x <- val(3) y <- x^2 backward(y) grad(x) # 6 zero_grad(y) grad(x) # 0x <- val(3) y <- x^2 backward(y) grad(x) # 6 zero_grad(y) grad(x) # 0