--- title: "Weibull Parallel Systems: EM Algorithm for Shape-Scale Estimation" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Weibull Parallel Systems: EM Algorithm for Shape-Scale Estimation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 4 ) # Heavy simulations are disabled by default for CRAN. # Set run_long <- TRUE to reproduce the full comparison tables. run_long <- FALSE ``` The exponential distribution is the workhorse of reliability theory, but its constant hazard rate is a strong assumption. Real components exhibit *increasing* hazard (wear-out), *decreasing* hazard (early-life failures), or bathtub-shaped hazard curves. The Weibull distribution captures increasing and decreasing hazard through a single shape parameter, making it the natural next step beyond exponential models. This vignette shows why the exponential machinery for parallel system estimation does not extend to Weibull components, and how the EM algorithm in `kofn` solves the resulting estimation problem. ```{r load-package} library(kofn) library(flexhaz) ``` ## Why Weibull? The Weibull distribution with shape $\alpha$ and scale $\beta$ has hazard function $h(t) = (\alpha / \beta)(t / \beta)^{\alpha - 1}$. The shape controls the failure mechanism: - $\alpha < 1$: decreasing failure rate (DFR) -- early-life failures, burn-in - $\alpha = 1$: constant failure rate -- exponential (memoryless) - $\alpha > 1$: increasing failure rate (IFR) -- wear-out, fatigue ```{r weibull-hazard-plot, fig.cap = "Weibull hazard functions for different shape parameters."} t_grid <- seq(0.01, 5, length.out = 200) alphas <- c(0.5, 1.0, 1.5, 2.5) beta <- 2.0 plot(NULL, xlim = c(0, 5), ylim = c(0, 3), xlab = "Time", ylab = "Hazard rate h(t)", main = "Weibull hazard: shape controls failure mechanism") cols <- c("steelblue", "grey40", "darkorange", "firebrick") for (i in seq_along(alphas)) { a <- alphas[i] h <- (a / beta) * (t_grid / beta)^(a - 1) lines(t_grid, h, col = cols[i], lwd = 2) } legend("topright", legend = paste0("alpha = ", alphas), col = cols, lwd = 2, bty = "n") ``` A parallel system with two Weibull components -- one DFR (infant mortality) and one IFR (wear-out) -- models a system where one component is fragile early but the other degrades over time. Neither a pure exponential nor a single Weibull can capture this heterogeneity. ## What breaks: from exponential to Weibull For exponential parallel systems, the `kofn` package uses an *inclusion-exclusion (IE) expansion* that converts the product $\prod_{i \neq j}(1 - e^{-\lambda_i t})$ into a finite sum of exponentials. Every integral needed for the likelihood, score, and EM E-step has a closed form. For Weibull components, the analogous product is $$ \prod_{j=1}^{m}\bigl(1 - e^{-(t/\beta_j)^{\alpha_j}}\bigr). $$ Expanding this product yields terms like $\exp\bigl[-\sum_{j \in S}(t/\beta_j)^{\alpha_j}\bigr]$. When the $\alpha_j$ differ, the exponent is a sum of *different power functions* of $t$ -- not a linear function. The resulting integrals have no closed form, so the IE approach that works so well for exponentials is inapplicable. Beyond the IE failure, the exponential's memoryless property also breaks. For exponentials, $E[T_j \mid T_j < t]$ has a simple closed form. For Weibull, the truncated mean involves the incomplete gamma function, and the M-step requires solving a transcendental equation for the shape. ### Direct MLE struggles The direct MLE (`method = "mle"`) optimizes the Weibull parallel system log-likelihood numerically via L-BFGS-B with Nelder-Mead fallback. This works, but the likelihood surface has problematic features: - **Shape-scale confounding**: different $(\alpha, \beta)$ combinations produce similar system CDFs, creating ridges in the likelihood surface. - **Boundary violations**: the optimizer can push shape estimates toward zero or infinity, especially for fast-failing components whose lifetimes are deeply left-censored under Scheme 0. - **Shape blow-up**: at $\alpha = 1$ (the exponential boundary), the direct MLE's shape RMSE can be 24 times worse than the EM's. ```{r direct-mle-demo} # Generate data from a Weibull parallel system set.seed(42) model_mle <- kofn(k = 2, m = 2, component = dfr_weibull(), method = "mle") gen <- rdata(model_mle) true_par <- c(1.5, 2.0, # component 1: shape=1.5, scale=2.0 2.0, 3.0) # component 2: shape=2.0, scale=3.0 df <- gen(true_par, n = 50) # Fit with direct MLE fit_mle_fn <- fit(model_mle) result_mle <- fit_mle_fn(df) cat("Direct MLE estimates:\n") cat(sprintf(" Component 1: shape = %.3f, scale = %.3f\n", result_mle$shapes[1], result_mle$scales[1])) cat(sprintf(" Component 2: shape = %.3f, scale = %.3f\n", result_mle$shapes[2], result_mle$scales[2])) cat(sprintf(" Converged: %s\n", result_mle$converged)) ``` ## The EM algorithm The EM algorithm sidesteps the difficult joint optimization by introducing a latent variable: **which component failed last**. In a parallel system ($k = m$), the system lifetime is $T = \max_j T_j$, so exactly one component has $T_j = T$ (the critical component) and all others have $T_j < T$ (left-censored). The identity $J = \arg\max_j T_j$ is unobserved under Scheme 0. ### E-step: classification and truncated moments Given current parameter estimates $\theta^{(s)}$, the E-step computes: 1. **Classification probabilities**: $p_j(t) = P(J = j \mid T = t)$, the posterior probability that component $j$ failed last. This is proportional to $f_j(t) \prod_{i \neq j} F_i(t)$ -- the component weight in the system density. 2. **Truncated power moment**: $E[T_j^{\alpha_j} \mid T_j < t]$ for left-censored components. Using the substitution $U = (T/\beta)^{\alpha} \sim \text{Exp}(1)$, this reduces to an incomplete gamma function: $$ E[T^k \mid T < t] = \beta^k \frac{\gamma(k/\alpha + 1,\; (t/\beta)^{\alpha})} {1 - e^{-(t/\beta)^{\alpha}}}. $$ 3. **Truncated log-moment**: $E[\log T_j \mid T_j < t]$, computed via numerical differentiation of the incomplete gamma function. The `kofn` package implements these as `trunc_pow_moment()` and `trunc_log_moment_vec()`: ```{r truncated-moments-demo} # E[T^alpha | T < t] for T ~ Weibull(alpha=1.5, beta=2.0) alpha <- 1.5 beta <- 2.0 t_trunc <- 3.0 # Scalar interface pow_moment <- trunc_pow_moment(k = alpha, t = t_trunc, alpha = alpha, beta = beta) cat(sprintf("E[T^%.1f | T < %.1f] = %.4f\n", alpha, t_trunc, pow_moment)) cat(sprintf(" (compare: t^alpha = %.4f)\n", t_trunc^alpha)) # Vectorized interface (used in EM inner loop) v_max <- (t_trunc / beta)^alpha pow_vec <- trunc_pow_moment_vec(k = alpha, v_max_vec = v_max, alpha = alpha, beta = beta) cat(sprintf(" Vectorized result: %.4f (same)\n", pow_vec)) # Truncated log-moment log_moment <- trunc_log_moment_vec(v_max_vec = v_max, alpha = alpha, beta = beta) cat(sprintf("E[log T | T < %.1f] = %.4f\n", t_trunc, log_moment)) cat(sprintf(" (compare: log t = %.4f)\n", log(t_trunc))) ``` ### M-step: profile optimization Given the E-step expectations, the M-step maximizes the expected complete-data log-likelihood. The key insight is that this decomposes into $m$ independent Weibull MLEs -- one per component. For each component $j$: 1. **Profile over shape**: Fix $\alpha_j$ and solve for the scale in closed form: $\beta_j^{\alpha_j} = B_j(\alpha_j) / n$, where $B_j(\alpha) = \sum_i [p_{ij} \cdot t_i^{\alpha} + (1 - p_{ij}) \cdot E[T_j^{\alpha} \mid T_j < t_i]]$. 2. **Optimize the profile**: Maximize $Q_j(\alpha_j)$ over $\alpha_j$ in one dimension using `optimize()`. This avoids the joint $2m$-dimensional optimization of the direct MLE, replacing it with $m$ one-dimensional optimizations at each EM iteration. ## EM vs direct MLE comparison Let us compare the two estimation methods on a concrete example. ```{r em-vs-mle} set.seed(123) # Create models model_em <- kofn(k = 2, m = 2, component = dfr_weibull(), method = "em") model_mle <- kofn(k = 2, m = 2, component = dfr_weibull(), method = "mle") # True parameters: shapes (1.5, 2.0), scales (2.0, 3.0) true_par <- c(1.5, 2.0, 2.0, 3.0) # Generate data gen <- rdata(model_em) df <- gen(true_par, n = 50) # Fit with EM fit_em_fn <- fit(model_em) result_em <- fit_em_fn(df) # Fit with direct MLE fit_mle_fn <- fit(model_mle) result_mle <- fit_mle_fn(df) # Compare cat("True parameters:\n") cat(sprintf(" Component 1: shape = %.2f, scale = %.2f\n", 1.5, 2.0)) cat(sprintf(" Component 2: shape = %.2f, scale = %.2f\n", 2.0, 3.0)) cat("\nEM estimates:\n") cat(sprintf(" Component 1: shape = %.3f, scale = %.3f\n", result_em$shapes[1], result_em$scales[1])) cat(sprintf(" Component 2: shape = %.3f, scale = %.3f\n", result_em$shapes[2], result_em$scales[2])) cat(sprintf(" Converged: %s, Iterations: %d\n", result_em$converged, result_em$iterations)) cat("\nDirect MLE estimates:\n") cat(sprintf(" Component 1: shape = %.3f, scale = %.3f\n", result_mle$shapes[1], result_mle$scales[1])) cat(sprintf(" Component 2: shape = %.3f, scale = %.3f\n", result_mle$shapes[2], result_mle$scales[2])) cat(sprintf(" Converged: %s\n", result_mle$converged)) ``` On a single dataset, both methods may produce similar estimates. The EM's advantage becomes apparent across many replications, especially when shape parameters are near 1.0 or components have very different scales. ## Shape effect on estimation difficulty ```{r shape-effect-sim, eval = run_long} # Full simulation: vary shape, compare EM vs MLE (takes several minutes). # Set run_long <- TRUE in setup chunk to reproduce. set.seed(2026) alphas <- c(0.5, 1.0, 1.5, 2.0, 3.0) R <- 200; n <- 300 for (a in alphas) { true_par <- c(a, 2.0, a, 3.0) mdl_em <- kofn(k = 2, m = 2, component = dfr_weibull(), method = "em") mdl_mle <- kofn(k = 2, m = 2, component = dfr_weibull(), method = "mle") gen <- rdata(mdl_em) f_em <- fit(mdl_em); f_mle <- fit(mdl_mle) em_sh <- mle_sh <- matrix(NA, R, 2) for (r in seq_len(R)) { d <- gen(true_par, n = n) re <- tryCatch(f_em(d), error = function(e) NULL) rm <- tryCatch(f_mle(d), error = function(e) NULL) if (!is.null(re) && re$converged) em_sh[r, ] <- sort(re$shapes) if (!is.null(rm) && rm$converged) mle_sh[r, ] <- sort(rm$shapes) } ok_em <- complete.cases(em_sh); ok_mle <- complete.cases(mle_sh) cat(sprintf("alpha=%.1f: EM RMSE=%.3f, MLE RMSE=%.3f\n", a, sqrt(mean((em_sh[ok_em,] - a)^2)), sqrt(mean((mle_sh[ok_mle,] - a)^2)))) } ``` The table below summarizes the shape effect results from 200 Monte Carlo replications per configuration ($n = 300$, two-component parallel system with scales $(\beta_1, \beta_2) = (2, 3)$, common shape $\alpha$). | $\alpha$ | EM Conv. (%) | EM Shape RMSE | EM Scale RMSE | MLE Conv. (%) | MLE Shape RMSE | MLE Scale RMSE | |:--------:|:----------:|:----------:|:-----------:|:----------:|:-----------:|:----------:| | 0.5 | 98 | 0.21 | 2.06 | 100 | 0.64 | 2.21 | | 1.0 | 100 | 0.32 | 0.82 | 100 | 7.49 | 0.91 | | 1.5 | 100 | 0.50 | 0.53 | 100 | 2.05 | 0.55 | | 2.0 | 100 | 0.74 | 0.40 | 100 | 0.91 | 0.40 | | 3.0 | 100 | 0.97 | 0.24 | 100 | 1.26 | 0.26 | Two systematic patterns emerge: 1. **Shape RMSE increases with $\alpha$** (0.21 at $\alpha = 0.5$ to 0.97 at $\alpha = 3.0$), while **scale RMSE decreases** (2.06 to 0.24). Higher shapes concentrate the Weibull distribution, making scales easier to pin down but creating a steeper likelihood surface where the shape is harder to identify. 2. **The EM dominates in shape recovery.** At $\alpha = 1.0$ (the exponential boundary), the EM achieves shape RMSE = 0.32 while the direct MLE explodes to 7.49 -- a 24-fold difference. The MLE's shape estimates for the fast component are driven to extreme values by the flat likelihood ridge along the shape-scale confounding direction. Both methods achieve nearly identical scale RMSE, confirming they find the same likelihood optima when they converge -- the EM simply reaches them more reliably. ## Mixed shapes: DFR + IFR systems Real systems contain components with heterogeneous failure mechanisms. A motor with early-life electrical failures (DFR, $\alpha < 1$) and mechanical wear-out (IFR, $\alpha > 1$) is a natural example. The EM handles this shape diversity better than direct MLE. ```{r mixed-shapes-demo} set.seed(456) # Mixed DFR + IFR: shape (0.8, 1.5), scale (2.0, 3.0) model_em <- kofn(k = 2, m = 2, component = dfr_weibull(), method = "em") model_mle <- kofn(k = 2, m = 2, component = dfr_weibull(), method = "mle") true_par <- c(0.8, 2.0, 1.5, 3.0) gen <- rdata(model_em) df <- gen(true_par, n = 50) fit_em_fn <- fit(model_em) fit_mle_fn <- fit(model_mle) result_em <- fit_em_fn(df) result_mle <- fit_mle_fn(df) cat("Mixed DFR + IFR system: alpha = (0.8, 1.5), beta = (2.0, 3.0)\n\n") cat("EM estimates:\n") cat(sprintf(" Component 1: shape = %.3f (true 0.8), scale = %.3f (true 2.0)\n", result_em$shapes[1], result_em$scales[1])) cat(sprintf(" Component 2: shape = %.3f (true 1.5), scale = %.3f (true 3.0)\n", result_em$shapes[2], result_em$scales[2])) cat("\nDirect MLE estimates:\n") cat(sprintf(" Component 1: shape = %.3f (true 0.8), scale = %.3f (true 2.0)\n", result_mle$shapes[1], result_mle$scales[1])) cat(sprintf(" Component 2: shape = %.3f (true 1.5), scale = %.3f (true 3.0)\n", result_mle$shapes[2], result_mle$scales[2])) ``` ```{r mixed-shapes-sim, eval = run_long} # Full mixed-shape simulation (set run_long <- TRUE to reproduce) set.seed(2026) scenarios <- list( mild = list(par = c(0.8, 2.0, 1.5, 3.0), label = "Mild (0.8, 1.5)"), strong = list(par = c(0.5, 2.0, 3.0, 3.0), label = "Strong (0.5, 3.0)"), three = list(par = c(0.7, 1.5, 1.5, 2.0, 2.5, 3.0), label = "Three-comp (0.7, 1.5, 2.5)") ) R <- 200; n <- 300 for (sc in scenarios) { m <- length(sc$par) / 2 mdl <- kofn(k = m, m = m, component = dfr_weibull(), method = "em") gen <- rdata(mdl); f <- fit(mdl) true_sh <- sc$par[seq(1, length(sc$par), by = 2)] errs <- numeric(0) for (r in seq_len(R)) { res <- tryCatch(f(gen(sc$par, n = n)), error = function(e) NULL) if (!is.null(res) && res$converged) errs <- c(errs, (sort(res$shapes) - sort(true_sh))^2) } cat(sprintf("%s: EM shape RMSE = %.2f\n", sc$label, sqrt(mean(errs)))) } ``` Across replications, three scenarios illustrate the pattern: - **Mild mix** ($\alpha = (0.8, 1.5)$): Both methods converge reliably. Mean shape RMSE is 0.34 (EM) -- the modest shape separation makes this a tractable problem. - **Strong mix** ($\alpha = (0.5, 3.0)$): One DFR component with unbounded hazard at $t = 0$ and one strongly IFR component that fails in a narrow window. This creates a bimodal posterior over which component failed last. Mean shape RMSE is 0.39 (EM). - **Three-component** ($\alpha = (0.7, 1.5, 2.5)$): The fast component ($\alpha = 0.7$, $\beta = 1.5$) is deeply left-censored, producing shape RMSE of 3.60 under EM. The EM still outperforms direct MLE: mean shape RMSE 1.62 vs. 5.32, with the MLE's fast-component shape RMSE reaching 14.7 from boundary violations. ## Base R stats integration The `kofn` package returns `fisher_mle` objects (from `likelihood.model`), which provide standard R methods for inference. This means `coef()`, `vcov()`, `logLik()`, `AIC()`, `confint()`, and `summary()` all work out of the box. ```{r stats-integration} set.seed(789) model_em <- kofn(k = 2, m = 2, component = dfr_weibull(), method = "em") gen <- rdata(model_em) true_par <- c(1.5, 2.0, 2.0, 3.0) df <- gen(true_par, n = 50) fit_em_fn <- fit(model_em) result <- fit_em_fn(df) # Coefficient extraction cat("coef():\n") print(coef(result)) # Variance-covariance matrix (from observed Fisher information) cat("\nvcov():\n") print(round(vcov(result), 5)) # Log-likelihood with degrees of freedom cat("\nlogLik():\n") print(logLik(result)) # Model selection criteria cat("\nAIC:", AIC(result), "\n") cat("BIC:", BIC(result), "\n") # Confidence intervals (Wald-type, based on Hessian) cat("\nconfint() (95% Wald intervals):\n") print(confint(result)) # Full summary cat("\nsummary():\n") print(summary(result)) ``` The confidence intervals are Wald-type, based on the observed Fisher information (negative inverse Hessian of the log-likelihood). Under Scheme 0, the shape parameters typically have wider intervals than the scales, reflecting the information asymmetry: the scale of the last-failing component is well-determined by the observed system lifetime, but the shape is harder to identify from the system CDF alone. ## Summary The Weibull extension to parallel system estimation is not a simple generalization of the exponential case. Three structural properties break: 1. **No IE closed form**: the inclusion-exclusion expansion produces sums of different power functions, not simple exponentials. 2. **No memoryless property**: truncated moments require incomplete gamma functions instead of elementary formulas. 3. **Shape-scale confounding**: the likelihood surface has ridges that trap gradient-based optimizers. The EM algorithm addresses these difficulties by: - Decomposing the $2m$-dimensional optimization into $m$ independent one-dimensional profile optimizations. - Guaranteeing monotone likelihood increase at every iteration. - Using numerically stable truncated Weibull moments via the $U = (T/\beta)^{\alpha} \sim \text{Exp}(1)$ substitution. The result is substantially better shape recovery than direct MLE, particularly at $\alpha = 1.0$ (24x improvement) and for multi-component systems with mixed shapes (3.3x improvement). Both methods achieve similar scale RMSE -- the EM's advantage is structural, preventing the boundary violations and parameter blow-up that plague gradient-based optimization on the Weibull likelihood surface. For applications requiring even better shape estimation, the package also supports Scheme 1 (periodic inspection), which provides component-level interval censoring data. See `vignette("scheme1")` for details.