--- title: "Exponential Parallel Systems: Closed-Form MLE via Inclusion-Exclusion" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Exponential Parallel Systems: Closed-Form MLE via Inclusion-Exclusion} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, echo = TRUE, comment = "#>" ) # Set to TRUE to regenerate long-running simulation results. # With run_long = FALSE, only small demos run (< 30 seconds total). run_long <- FALSE library(kofn) library(flexhaz) old_opts <- options(digits = 4) ``` # The Exponential Parallel System A **parallel system** of $m$ independent components functions as long as at least one component survives. The system lifetime is $$ T = \max(T_1, \ldots, T_m), \qquad T_j \sim \mathrm{Exp}(\lambda_j). $$ The system density is the derivative of the system CDF $F_{\mathrm{sys}}(t) = \prod_{j=1}^{m} F_j(t) = \prod_{j=1}^{m}(1 - e^{-\lambda_j t})$: $$ f_{\mathrm{sys}}(t) = \sum_{j=1}^{m} f_j(t) \prod_{i \neq j} F_i(t) = \sum_{j=1}^{m} \lambda_j e^{-\lambda_j t} \prod_{i \neq j} \bigl(1 - e^{-\lambda_i t}\bigr). $$ Each summand $w_j(t) = f_j(t)\prod_{i \neq j} F_i(t)$ is the contribution from the event "component $j$ is the last to fail." ```{r model-setup} # Create an exponential parallel system model with 3 components model <- kofn(k = 3, m = 3, component = dfr_exponential()) print(model) ``` The `kofn()` constructor with `k = m` creates a parallel system. The model object carries the system structure, column name conventions, and dispatches to methods for `loglik()`, `score()`, `hess_loglik()`, `fit()`, and `rdata()`. # Inclusion-Exclusion Expansion The product $\prod_{i}(1 - e^{-\lambda_i t})$ cannot be differentiated or integrated term-by-term as written. The **inclusion-exclusion expansion** rewrites it as a finite signed sum of exponentials: $$ \prod_{i=1}^{m} \bigl(1 - e^{-\lambda_i t}\bigr) = \sum_{S \subseteq \{1,\ldots,m\}} (-1)^{|S|}\, e^{-(\sum_{i \in S} \lambda_i)\, t}. $$ This has $2^m$ terms --- practical for moderate $m$ (up to $\sim$15 components). Because every term is a simple exponential $e^{-rt}$, all integrals needed for censored-data likelihoods evaluate in closed form. ```{r ie-demo} lam <- c(0.5, 0.3, 0.2) ie <- ie_expand(lam) # Display the expansion terms data.frame( subset_size = sapply(seq_along(ie$sign), function(k) { sum(as.integer(intToBits(k - 1L))[1:3]) }), sign = ie$sign, rate_sum = ie$rate_sum ) ``` The first row ($S = \emptyset$) has sign $+1$ and rate sum $0$, contributing the constant term $1$. The three singleton subsets contribute $-e^{-\lambda_j t}$, the three pairs contribute $+e^{-(\lambda_i + \lambda_j)t}$, and the full set contributes $-e^{-(\lambda_1 + \lambda_2 + \lambda_3)t}$. We can verify the expansion numerically: ```{r ie-verify} t_val <- 2.0 # Direct product direct <- prod(1 - exp(-lam * t_val)) # Via IE expansion ie_val <- sum(ie$sign * exp(-ie$rate_sum * t_val)) c(direct = direct, ie_expansion = ie_val, difference = direct - ie_val) ``` The key functions built on this expansion are: - `w_j_exact(t, par, j)` --- evaluates $w_j(t)$ at a single time point - `w_j_integral(a, b, par, j)` --- closed-form $\int_a^b w_j(t)\,dt$ - `F_sys_exp(t, par)` / `S_sys_exp(t, par)` --- system CDF and survival ```{r ie-functions} # The three w_j contributions at t = 2 sapply(1:3, function(j) w_j_exact(t_val, lam, j)) # Their sum equals the system density sum(sapply(1:3, function(j) w_j_exact(t_val, lam, j))) # System CDF and survival at t = 2 c(F_sys = F_sys_exp(t_val, lam), S_sys = S_sys_exp(t_val, lam)) ``` # Likelihood Under Different Observation Types The IE expansion gives closed-form log-likelihood contributions for all four observation types. Let $\boldsymbol{\lambda} = (\lambda_1, \ldots, \lambda_m)$ and $C_i \subseteq \{1, \ldots, m\}$ be the candidate set for observation $i$. **Exact** observation at $t$: $$ \ell_i = \log \sum_{j \in C_i} w_j(t;\, \boldsymbol{\lambda}). $$ **Right-censored** at $t$ (system still functioning): $$ \ell_i = \log S_{\mathrm{sys}}(t;\, \boldsymbol{\lambda}). $$ **Left-censored** at $t$ (system already failed): $$ \ell_i = \log \frac{\sum_{j \in C_i} \int_0^{t} w_j(s)\,ds} {F_{\mathrm{sys}}(t;\, \boldsymbol{\lambda})}. $$ **Interval-censored** on $(a, b]$: $$ \ell_i = \log \frac{\sum_{j \in C_i} \int_a^{b} w_j(s)\,ds} {F_{\mathrm{sys}}(b) - F_{\mathrm{sys}}(a)}. $$ All integrals $\int_a^b w_j(s)\,ds$ are computed analytically by `w_j_integral()`. The `loglik(model)` closure handles all four types automatically: ```{r loglik-demo} set.seed(42) # Generate data under Scheme 0 (system lifetime only, no censoring) gen <- rdata(model) df <- gen(theta = lam, n = 50) head(df, 5) # Evaluate the log-likelihood at the true parameters ll_fn <- loglik(model) ll_fn(df, lam) # Compare with a wrong parameter vector ll_fn(df, c(1, 1, 1)) ``` # Maximum Likelihood Estimation ## Data generation We generate $n = 100$ system lifetimes from a 3-component parallel system with rates $\boldsymbol{\lambda} = (0.5, 0.3, 0.2)$ under Scheme 0 (system lifetime only --- no component-level information). ```{r generate-data} set.seed(123) theta_true <- c(0.5, 0.3, 0.2) n <- 50 gen <- rdata(model) df <- gen(theta = theta_true, n = n) cat("Observations:", nrow(df), "\n") cat("Observation types:\n") table(df$omega) ``` Under the default observation scheme (exact), we observe only when the system fails, not which component failed last. This is the Scheme 0 (black-box) setting. ## Fitting The `fit(model)` closure performs multi-start L-BFGS-B optimization with Nelder-Mead fallback. Standard errors come from the observed Fisher information (inverse of the negative Hessian at the MLE). ```{r fit-model} fit_fn <- fit(model) result <- fit_fn(df, n_starts = 1L) # MLE estimates coef(result) # Standard errors sqrt(diag(vcov(result))) # 95% confidence intervals confint(result) # Log-likelihood, AIC, BIC c(loglik = logLik(result), AIC = AIC(result), BIC = BIC(result)) ``` ```{r fit-summary} summary(result) ``` Every method used above, `coef`, `vcov`, `confint`, `logLik`, `AIC`, `BIC`, `summary`, comes from `algebraic.mle` via its `mle` result class, not from `kofn` itself. `kofn` supplies the likelihood; the inference operations are inherited. See `vignette("ecosystem")` for the full walk through the rlang MLE stack. ## Comparison with true values ```{r compare-truth} est <- coef(result) se <- sqrt(diag(vcov(result))) ci <- confint(result) comparison <- data.frame( component = 1:3, true = theta_true, estimate = est, se = se, ci_lower = ci[, 1], ci_upper = ci[, 2], covered = theta_true >= ci[, 1] & theta_true <= ci[, 2] ) comparison ``` # Permutation Symmetry Under Scheme 0, all components are candidates for every observation. The log-likelihood is then **invariant under permutation** of the parameter vector: swapping $\lambda_1 \leftrightarrow \lambda_2$ yields the same log-likelihood value. This means there are $m!$ equivalent global maxima. ```{r permutation-demo} ll_fn <- loglik(model) # Original parameter order ll_original <- ll_fn(df, theta_true) # Permuted parameters: swap components 1 and 3 theta_permuted <- theta_true[c(3, 1, 2)] ll_permuted <- ll_fn(df, theta_permuted) # Reversed order theta_reversed <- rev(theta_true) ll_reversed <- ll_fn(df, theta_reversed) data.frame( ordering = c("original (0.5, 0.3, 0.2)", "permuted (0.2, 0.5, 0.3)", "reversed (0.2, 0.3, 0.5)"), loglik = c(ll_original, ll_permuted, ll_reversed), difference = c(0, ll_permuted - ll_original, ll_reversed - ll_original) ) ``` The log-likelihood values are identical (up to machine precision). This symmetry has two important consequences: 1. **Initialization matters.** The optimizer may converge to any of the $m!$ equivalent modes. The `default_init_exp()` function uses a **spread initialization** --- rates spaced as `lam0 * seq(0.5, 1.5, length.out = m)` --- to break the symmetry and consistently find the ascending-order mode. 2. **Sorted matching for Monte Carlo.** When evaluating estimator performance across replications, we sort both the estimates and the true parameters in ascending order before computing bias, RMSE, and coverage. This resolves the label-switching problem. # Monte Carlo Validation We assess the MLE's finite-sample performance via Monte Carlo simulation. For each replicate, we generate data from the true model, fit the MLE, and record the estimates, standard errors, and confidence interval coverage. ```{r mc-precomputed, include=FALSE, eval=!run_long} # When run_long = FALSE, we run a small demo instead of loading precomputed # results. The full simulation (R = 100, n = 300) takes several minutes. ``` ```{r mc-full, eval=run_long, echo=run_long, cache=TRUE} set.seed(2026) R <- 100 # Monte Carlo replications n_mc <- 300 # Sample size per replicate alpha <- 0.05 # Significance level theta_true_mc <- c(0.5, 0.3, 0.2) theta_sorted <- sort(theta_true_mc) m_mc <- length(theta_true_mc) gen_mc <- rdata(model) fit_mc <- fit(model) # Storage estimates <- matrix(NA, nrow = R, ncol = m_mc) se_hat <- matrix(NA, nrow = R, ncol = m_mc) ci_lower <- matrix(NA, nrow = R, ncol = m_mc) ci_upper <- matrix(NA, nrow = R, ncol = m_mc) converged <- logical(R) for (r in seq_len(R)) { df_r <- gen_mc(theta = theta_true_mc, n = n_mc) res_r <- tryCatch(fit_mc(df_r, n_starts = 10L), error = function(e) NULL) if (!is.null(res_r) && !any(is.na(coef(res_r)))) { # Sort estimates ascending for identifiability ord <- order(coef(res_r)) estimates[r, ] <- coef(res_r)[ord] se_r <- sqrt(diag(vcov(res_r))) se_hat[r, ] <- se_r[ord] ci_r <- confint(res_r, level = 1 - alpha) ci_lower[r, ] <- ci_r[ord, 1] ci_upper[r, ] <- ci_r[ord, 2] converged[r] <- TRUE } } # Save for reproducibility saveRDS( list(estimates = estimates, se_hat = se_hat, ci_lower = ci_lower, ci_upper = ci_upper, converged = converged, R = R, n_mc = n_mc, theta_sorted = theta_sorted, alpha = alpha), file.path(tempdir(), "precomputed_exp_parallel.rds") ) ``` ```{r mc-small, eval=!run_long} # Small demo: R = 5 replications for illustration (keeps build < 30 sec). # Set run_long = TRUE above for the full R = 100 simulation. set.seed(2026) R <- 3 n_mc <- 50 alpha <- 0.05 theta_true_mc <- c(0.5, 0.3, 0.2) theta_sorted <- sort(theta_true_mc) m_mc <- length(theta_true_mc) gen_mc <- rdata(model) fit_mc <- fit(model) estimates <- matrix(NA, nrow = R, ncol = m_mc) se_hat <- matrix(NA, nrow = R, ncol = m_mc) ci_lower <- matrix(NA, nrow = R, ncol = m_mc) ci_upper <- matrix(NA, nrow = R, ncol = m_mc) converged <- logical(R) for (r in seq_len(R)) { df_r <- gen_mc(theta = theta_true_mc, n = n_mc) res_r <- tryCatch(fit_mc(df_r, n_starts = 1L), error = function(e) NULL) if (!is.null(res_r) && !any(is.na(coef(res_r)))) { ord <- order(coef(res_r)) estimates[r, ] <- coef(res_r)[ord] se_r <- sqrt(diag(vcov(res_r))) se_hat[r, ] <- se_r[ord] ci_r <- confint(res_r, level = 1 - alpha) ci_lower[r, ] <- ci_r[ord, 1] ci_upper[r, ] <- ci_r[ord, 2] converged[r] <- TRUE } } ``` ## Results After sorting each replicate's estimates in ascending order (to match the sorted true values $\lambda_{(1)} = 0.2 < \lambda_{(2)} = 0.3 < \lambda_{(3)} = 0.5$), we compute bias, RMSE, and coverage: ```{r mc-results} ok <- converged & complete.cases(estimates) cat("Converged replications:", sum(ok), "of", R, "\n\n") est_ok <- estimates[ok, , drop = FALSE] se_ok <- se_hat[ok, , drop = FALSE] ci_lo <- ci_lower[ok, , drop = FALSE] ci_hi <- ci_upper[ok, , drop = FALSE] bias <- colMeans(est_ok) - theta_sorted rmse <- sqrt(colMeans((est_ok - matrix(theta_sorted, nrow = sum(ok), ncol = m_mc, byrow = TRUE))^2)) coverage <- colMeans( ci_lo <= matrix(theta_sorted, nrow = sum(ok), ncol = m_mc, byrow = TRUE) & ci_hi >= matrix(theta_sorted, nrow = sum(ok), ncol = m_mc, byrow = TRUE) ) mean_se <- colMeans(se_ok) mc_table <- data.frame( component = paste0("lambda_(", 1:m_mc, ")"), true = theta_sorted, mean_est = colMeans(est_ok), bias = bias, rmse = rmse, mean_se = mean_se, coverage_95 = coverage ) mc_table ``` Key observations: - **Bias** should be near zero for a correctly specified MLE. - **RMSE** reflects finite-sample variability; the smallest rate ($\lambda_{(1)} = 0.2$) is typically estimated with larger relative error because it contributes least to the system density (its component is least likely to be the last to fail). - **Coverage** near 0.95 validates the Hessian-based confidence intervals. Note that the SE ratio artifact --- Hessian-based SEs computed at the unsorted MLE, then reordered --- can cause mild under- or over-coverage. # Observation Functors The `observe_*` family of functions controls how system lifetimes are recorded. Each functor maps a true system lifetime to an observed record (time, observation type, and optional interval bounds). ## Scheme 0: System lifetime with right-censoring `observe_right_censor(tau)` is the default: systems failing before $\tau$ are observed exactly, and those surviving past $\tau$ are right-censored. ```{r observe-scheme0} # Different censoring times obs_light <- observe_right_censor(tau = 20) # Light censoring obs_heavy <- observe_right_censor(tau = 5) # Heavy censoring obs_none <- observe_right_censor() # No censoring (tau = Inf) # Example: system fails at t = 8 obs_light(8) # exact obs_heavy(8) # right-censored at tau = 5 obs_none(8) # exact ``` ## Effect of censoring on estimation Heavier right-censoring discards information about long-lived systems, increasing estimator variance --- especially for the smallest rate (the most reliable component, which is most likely to be the last survivor). ```{r censoring-effect} set.seed(99) tau_values <- c(5, Inf) cat(sprintf("%-8s %6s %10s %10s %10s\n", "tau", "n_right", "est_lam1", "est_lam2", "est_lam3")) cat(strrep("-", 56), "\n") for (tau in tau_values) { obs_fn <- if (is.finite(tau)) observe_right_censor(tau) else NULL df_tau <- gen(theta = theta_true, n = 50, observe = obs_fn) n_right <- sum(df_tau$omega == "right") res_tau <- tryCatch( fit_fn(df_tau, n_starts = 1L), error = function(e) NULL ) if (!is.null(res_tau)) { est <- sort(coef(res_tau)) cat(sprintf("%-8s %6d %10.4f %10.4f %10.4f\n", if (is.finite(tau)) as.character(tau) else "Inf", n_right, est[1], est[2], est[3])) } } ``` ## Scheme 1: Periodic inspection Under `observe_periodic(delta, tau)`, the system is inspected at times $\delta, 2\delta, 3\delta, \ldots$ The failure time is known only to lie within the inspection interval containing it (interval-censored). ```{r observe-scheme1} obs_periodic <- observe_periodic(delta = 2, tau = 30) # System fails at t = 7.3 -> interval-censored to (6, 8] obs_periodic(7.3) # System survives past tau = 30 -> right-censored obs_periodic(35) ``` ```{r scheme1-demo} set.seed(42) df_s1 <- gen(theta = theta_true, n = 50, observe = observe_periodic(delta = 2, tau = 50)) table(df_s1$omega) # Fit under interval censoring res_s1 <- fit_fn(df_s1, n_starts = 1L) sort(coef(res_s1)) ``` Interval censoring retains more information than pure right-censoring at the same study duration, because every failure is localized to a finite window rather than lost entirely. ```{r cleanup, include = FALSE} options(old_opts) ```