--- title: "Observation Scheme Composability" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Observation Scheme Composability} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, echo = TRUE, comment = "#>" ) library(kofn) library(flexhaz) set.seed(42) old_opts <- options(digits = 4) ``` # The Observe Functor API Every `observe_*` function returns a closure `function(t_true) -> list(t, omega, t_upper)` that maps a true system lifetime to an observed record. The `omega` field classifies the observation: `"exact"`, `"right"`, `"left"`, or `"interval"`. ```{r functor-demo} # Exact: the trivial case -- no information loss observe_exact()(3.7) # Right-censoring: systems surviving past tau are censored obs_rc <- observe_right_censor(tau = 5) obs_rc(3.7) # fails before tau -> exact obs_rc(8.2) # survives past tau -> right-censored at 5 # Left-censoring: systems failing before tau are censored obs_lc <- observe_left_censor(tau = 2) obs_lc(1.5) # fails before tau -> left-censored at 2 obs_lc(3.7) # fails after tau -> exact # Interval-censoring: failures in [a, b) are binned obs_ic <- observe_interval_censor(a = 2, b = 6) obs_ic(4.0) # inside window -> interval [2, 6) obs_ic(1.0) # outside window -> exact # Periodic inspection: regular grid with right-censoring at tau obs_per <- observe_periodic(delta = 3, tau = 15) obs_per(7.3) # falls in [6, 9) -> interval obs_per(20) # past tau -> right-censored at 15 ``` # Composing Schemes with rdata The `rdata(model)` closure accepts any observe functor via its `observe` argument. This decouples the data-generating process from the observation protocol. ```{r rdata-compose} model <- kofn(k = 2, m = 2, component = dfr_exponential()) theta <- c(1.0, 0.5) gen <- rdata(model) # Exact observation (default) df_exact <- gen(theta, n = 6) head(df_exact) # Right-censoring at tau = 2 df_right <- gen(theta, n = 6, observe = observe_right_censor(tau = 2)) head(df_right) # Periodic inspection every delta = 1 time unit df_per <- gen(theta, n = 6, observe = observe_periodic(delta = 1, tau = 10)) head(df_per) ``` The same model and fitting infrastructure handles all observation types transparently -- the log-likelihood dispatches on the `omega` column. # Effect on Estimation We compare MLE quality across observation schemes for a 2-component exponential parallel system with rates $\lambda = (1.0, 0.5)$. We run R = 3 replicates of n = 60 observations under each scheme and report the RMSE of the sorted rate estimates. This is a small demo for vignette build speed; for tighter estimates set `R <- 100, n <- 200` and rerun. ```{r scheme-comparison} set.seed(2026) R <- 3; n <- 60 theta <- c(1.0, 0.5) theta_sorted <- sort(theta) model <- kofn(k = 2, m = 2, component = dfr_exponential()) gen <- rdata(model) fit_fn <- fit(model) schemes <- list( exact = NULL, right_tau3 = observe_right_censor(tau = 3), right_tau1 = observe_right_censor(tau = 1), periodic_d1 = observe_periodic(delta = 1, tau = 20), left_tau1 = observe_left_censor(tau = 1) ) results <- lapply(names(schemes), function(nm) { ests <- matrix(NA, nrow = R, ncol = 2) for (r in seq_len(R)) { df <- gen(theta, n, observe = schemes[[nm]]) res <- tryCatch(fit_fn(df, n_starts = 1L), error = function(e) NULL) if (!is.null(res) && !any(is.na(coef(res)))) ests[r, ] <- sort(coef(res)) } ok <- complete.cases(ests) errs <- sweep(ests[ok, , drop = FALSE], 2, theta_sorted) data.frame( scheme = nm, rmse_1 = round(sqrt(mean(errs[, 1]^2)), 3), rmse_2 = round(sqrt(mean(errs[, 2]^2)), 3), converged = sum(ok), stringsAsFactors = FALSE ) }) scheme_rmse <- do.call(rbind, results) scheme_rmse ``` Key patterns: - **Right-censoring** degrades as `tau` decreases -- more observations are lost to censoring, reducing the effective sample size. - **Periodic inspection** trades exact times for interval bounds. The interval-censored likelihood surface is flatter, which can cause occasional optimizer outliers that inflate the RMSE. - **Left-censoring** at `tau = 1` has little effect here because most parallel system lifetimes exceed 1 (mean $\approx 2.3$). Left-censoring would hurt more if `tau` were closer to the median system lifetime. Note that the parallel system's permutation symmetry dominates these differences -- the optimizer's ability to separate two rates depends more on the likelihood geometry than on the observation scheme. # Mixed Observation Environments `observe_mixture()` randomly selects a scheme for each observation. This models heterogeneous monitoring -- e.g., 70% of units are continuously monitored, 30% are only inspected periodically. ```{r mixture-demo} obs_mix <- observe_mixture( observe_exact(), observe_periodic(delta = 2, tau = 20), weights = c(0.7, 0.3) ) set.seed(99) model <- kofn(k = 2, m = 2, component = dfr_exponential()) gen <- rdata(model) df_mix <- gen(c(1.0, 0.5), n = 60, observe = obs_mix) table(df_mix$omega) ``` The mixture preserves the functor interface -- `fit(model)` handles the resulting data identically, dispatching on the per-row `omega` values. ```{r mixture-fit} fit_fn <- fit(model) res_mix <- fit_fn(df_mix, n_starts = 1L) sort(coef(res_mix)) ``` ```{r cleanup, include = FALSE} options(old_opts) ```