- class surpyval.univariate.nonparametric.nonparametric.NonParametric
Bases:
NonParametricDistributionResult of
.fit()method for every non-parametric surpyval distribution. This means that each of the methods in this class can be called with a model created from theNelsonAalen,KaplanMeier,FlemingHarrington, orTurnbullestimators.- Hf(x: ArrayLike, interp: str = 'step') ndarray[tuple[Any, ...], dtype[_ScalarT]]
Cumulative hazard rate with the non-parametric estimates from the data. This is calculated using the relationship between the hazard function and the density:
\[H(x) = -\ln (R(x))\]- Parameters
x (array like or scalar) – The values of the random variables at which the function will be calculated.
- Returns
Hf – The value(s) of the density function at x
- Return type
scalar or numpy array
Examples
>>> from surpyval import NelsonAalen >>> x = np.array([1, 2, 3, 4, 5]) >>> model = NelsonAalen.fit(x) >>> model.Hf(2) array([0.45]) >>> model.Hf([1., 1.5, 2., 2.5]) array([0.2 , 0.2 , 0.45, 0.45])
- band(x: ArrayLike | None = None, method: str = 'hall-wellner', bound_type: str = 'exp', alpha_ci: float = 0.05, n_sims: int = 10000, random_state: int | None = 1) ndarray[tuple[Any, ...], dtype[_ScalarT]]
Simultaneous confidence band of the survival function.
The pointwise bounds from
cb()cover the true value of the survival function at each single time with probability 1 - alpha_ci, but the probability that the whole true curve lies between them is lower, since the curve has many opportunities to escape. A confidence band is widened so that, with probability 1 - alpha_ci, the entire survival function lies within the band over the observed range. Use the band, not the pointwise bounds, to assess whether a hypothesised curve (e.g. a fitted parametric distribution) is consistent with the data as a whole.Two classical bands are available:
“hall-wellner”: width proportional to \((1 + n\sigma^2(t))/\sqrt{n}\); tends to be relatively wider in the middle of the curve.
“nair” (equal precision): width proportional to the pointwise standard error, i.e. the band is the pointwise interval scaled by a larger critical value, so its width follows the pointwise bounds everywhere.
Critical values are computed by Monte Carlo simulation of the limiting Brownian bridge process, with a fixed default seed so results are reproducible.
The band is only defined between the first and last observed events (where the variance estimate is positive and finite); NaN is returned outside that range. The asymptotic theory for these bands is for right censored data; for Turnbull models with interval censoring prefer
bootstrap_cb().- Parameters
x (array like or scalar, optional) – The values at which the band will be evaluated. Defaults to the observed values.
method (('hall-wellner', 'nair'), str, optional) – The type of band. Defaults to ‘hall-wellner’.
bound_type (('exp', 'normal'), str, optional) – As for
cb(): ‘exp’ applies the band on the log(-log) scale, keeping it within [0, 1]. Defaults to ‘exp’.alpha_ci (scalar, optional) – The level of significance of the band. Defaults to 0.05.
n_sims (int, optional) – Number of simulated paths for the critical value.
random_state (int or numpy.random.Generator, optional) – Seed for the critical value simulation. Defaults to a fixed seed for reproducibility.
- Returns
band – Array of shape (len(x), 2) with the
[lower, upper]band values for the survival function at each x.- Return type
numpy array
References
Hall, W. J. and Wellner, J. A. (1980), “Confidence bands for a survival curve from censored data”, Biometrika 67, 133-143.
Nair, V. N. (1984), “Confidence bands for survival functions with censored data: a comparative study”, Technometrics 26, 265-275.
Klein, J. P. and Moeschberger, M. L. (2003), “Survival Analysis”, 2nd ed., Section 4.4.
- bootstrap_cb(x: ArrayLike, bound: str = 'two-sided', alpha_ci: float = 0.05, B: int = 200, random_state: int | None = None) ndarray[tuple[Any, ...], dtype[_ScalarT]]
Confidence bounds of the survival function computed with a non-parametric bootstrap: the data are resampled with replacement, the model is refitted with the same estimator, and the percentile interval across the refits is taken at each x.
This is the recommended way to compute bounds for the Turnbull estimator. The Greenwood-style bounds from
cb()treat the expected (fractional) at risk and death counts from the Turnbull EM as if they were observed counts, which ignores the uncertainty in the EM allocation itself; the bootstrap does not.Note that the Turnbull NPMLE only identifies the probability mass within each Turnbull interval, not how it is distributed inside one. Point estimates and bounds evaluated strictly inside an interval therefore reflect the step convention rather than an estimate of the underlying continuous survival function, and are best evaluated at the interval bounds.
- Parameters
x (array like or scalar) – The values at which the confidence bounds will be calculated.
bound (('two-sided', 'upper', 'lower'), str, optional) – Compute either the two-sided, upper or lower confidence bound(s). Defaults to two-sided.
alpha_ci (scalar, optional) – The level of significance at which the bound will be computed. Defaults to 0.05.
B (int, optional) – The number of bootstrap resamples. Defaults to 200. Larger values give smoother bounds at a linear cost in runtime; note that refitting the Turnbull estimator is relatively expensive.
random_state (int or numpy.random.Generator, optional) – Seed or generator for reproducible resampling.
- Returns
cb – For two-sided bounds an array of shape (len(x), 2) with
[lower, upper]columns; otherwise an array of the requested bound at each x.- Return type
numpy array
- cb(x: ArrayLike, on: str = 'sf', bound: str = 'two-sided', interp: str = 'step', alpha_ci: float = 0.05, bound_type: str = 'exp', dist: str = 'z') ndarray[tuple[Any, ...], dtype[_ScalarT]]
Confidence bounds of the
onfunction at thealpha_cilevel of significance. Can be the upper, lower, or two-sided confidence by changing value ofbound. Can change the bound type to be regular (normal) or exponential using either the ‘t’ or ‘z’ statistic.The variance used is the one appropriate to the estimator with which the model was fitted: Greenwood’s formula for Kaplan-Meier, Aalen’s (Poisson) variance for Nelson-Aalen, and the tie-corrected variance for Fleming-Harrington.
- Parameters
x (array like or scalar) – The values of the random variables at which the confidence bounds will be calculated
on (('sf', 'ff', 'Hf'), optional) – The function on which the confidence bound will be calculated.
bound (('two-sided', 'upper', 'lower'), str, optional) – Compute either the two-sided, upper or lower confidence bound(s). Defaults to two-sided.
interp (('step', 'linear', 'cubic'), optional) – How to interpolate the values between observations. Survival statistics traditionally uses step functions, but can use interpolated values if desired. Defaults to step.
alpha_ci (scalar, optional) – The level of significance at which the bound will be computed.
bound_type (('exp', 'normal'), str, optional) – The method with which the bounds will be calculated. Using ‘normal’ (i.e. the plain Greenwood-style interval) will allow for the bounds to exceed 1 or be less than 0 and tends to undercover in small samples. Defaults to ‘exp’ (the log(-log) transformed interval) as this ensures the bounds are within 0 and 1 and has better coverage.
dist (('t', 'z'), str, optional) – The statistic to use in calculating the bounds (student-t or normal). Defaults to the normal (z). The ‘t’ option is a conservative heuristic which uses the at risk count minus one at each point as the degrees of freedom; it is not supported by asymptotic theory, widens (overcovers) as the at risk count falls, and is undefined when only one item remains at risk.
- Returns
cb – The value(s) of the upper, lower, or both confidence bound(s) of the selected function at x. For two-sided bounds the result has one row per value of x with the columns being the
[lower, upper]bounds of theonfunction.- Return type
scalar or numpy array
Notes
If the last observation is an event (i.e. there is no right censoring) the Kaplan-Meier variance is undefined at, and after, that point. The bounds there are filled with the last finite upper bound and 0 for the lower bound, rather than NaN, so that bounds can be drawn up to the last observation.
Examples
>>> from surpyval import NelsonAalen >>> x = np.array([1, 2, 3, 4, 5]) >>> model = NelsonAalen.fit(x) >>> model.cb([1., 1.5, 2., 2.5], bound='lower') array([0.35485348, 0.35485348, 0.2345113 , 0.2345113 ]) >>> model.cb([1., 1.5, 2., 2.5]) array([[0.24175891, 0.97222045], [0.24175891, 0.97222045], [0.16288538, 0.89441253], [0.16288538, 0.89441253]])
References
http://reliawiki.org/index.php/Non-Parametric_Life_Data_Analysis
- df(x: ArrayLike, interp: str = 'step') ndarray[tuple[Any, ...], dtype[_ScalarT]]
Density function with the non-parametric estimates from the data. This is calculated using the relationship between the hazard function and the density:
\[f(x) = h(x)e^{-H(x)}\]- Parameters
x (array like or scalar) – The values of the random variables at which the survival function will be calculated
- Returns
df – The value(s) of the density function at x
- Return type
scalar or numpy array
Examples
>>> from surpyval import NelsonAalen >>> x = np.array([1, 2, 3, 4, 5]) >>> model = NelsonAalen.fit(x) >>> model.df([1.5, 2.5, 3.5]) array([0.20468269, 0.15940704, 0.15229351])
- ff(x: ArrayLike, interp: str = 'step') ndarray[tuple[Any, ...], dtype[_ScalarT]]
CDF (failure or unreliability) function with the non-parametric estimates from the data
- Parameters
x (array like or scalar) – The values of the random variables at which the survival function will be calculated.
- Returns
ff – The value(s) of the failure function at each x
- Return type
scalar or numpy array
Examples
>>> from surpyval import NelsonAalen >>> x = np.array([1, 2, 3, 4, 5]) >>> model = NelsonAalen.fit(x) >>> model.ff(2) array([0.36237185]) >>> model.ff([1., 1.5, 2., 2.5]) array([0.18126925, 0.18126925, 0.36237185, 0.36237185])
- hf(x: ArrayLike, interp: str = 'step') ndarray[tuple[Any, ...], dtype[_ScalarT]]
Instantaneous hazard function with the non-parametric estimates from the data. This is calculated using simply the difference between consecutive H(x).
- Parameters
x (array like or scalar) – The values of the random variables at which the survival function will be calculated
- Returns
hf – The value(s) of the failure function at each x
- Return type
scalar or numpy array
Examples
>>> from surpyval import NelsonAalen >>> x = np.array([1, 2, 3, 4, 5]) >>> model = NelsonAalen.fit(x) >>> model.hf([1.5, 2.5, 3.5]) array([0.25 , 0.25 , 0.33333333])
- mean(tau: float | None = None) float
The (restricted) mean survival time: the area under the estimated survival function from 0 to tau.
If the survival function reaches zero this is the mean of the estimated distribution. With right censoring the survival function does not reach zero and the unrestricted mean is undefined; the restricted mean up to tau (defaulting to the largest observed value) is reported instead, which is the standard restricted mean survival time (RMST).
- Parameters
tau (scalar, optional) – The horizon up to which the survival function is integrated. Defaults to the largest observed value. If tau is beyond the last observation the survival function is extended at its final value.
- Returns
mean – The restricted mean survival time.
- Return type
float
Examples
>>> from surpyval import KaplanMeier >>> x = np.array([1, 2, 3, 4, 5]) >>> model = KaplanMeier.fit(x) >>> model.mean() 3.0
- mean_cb(tau: float | None = None, alpha_ci: float = 0.05) ndarray[tuple[Any, ...], dtype[_ScalarT]]
Two-sided confidence interval of the (restricted) mean survival time, using the normal approximation with the standard variance estimate:
\[\widehat{Var}(\hat{\mu}) = \sum_{i: x_i \leq \tau} A_i^2 v_i\]where \(A_i\) is the area under the survival function from \(x_i\) to \(\tau\) and \(v_i\) is the variance increment of the cumulative hazard at \(x_i\) (e.g. the Greenwood increment for the Kaplan-Meier estimator).
- Parameters
tau (scalar, optional) – The horizon up to which the survival function is integrated. Defaults to the largest observed value.
alpha_ci (scalar, optional) – The level of significance at which the interval will be computed. Defaults to 0.05.
- Returns
cb – The
[lower, upper]interval of the (restricted) mean.- Return type
numpy array
- property median: float
The median survival time; the smallest observed value at which the estimated CDF reaches, or exceeds, 0.5. NaN if the estimate never reaches 0.5 (e.g. due to right censoring).
- plot(ax: Axes | None = None, **kwargs) Any
Creates a plot of the survival function.
Two-sided confidence bounds are drawn as a shaded band in the same colour as the survival curve, and right censored observations are marked with ticks on the curve. Any keyword arguments not listed below (e.g.
colororlabel) are passed to the matplotlib plotting call for the survival curve.- Parameters
ax (matplotlib axis, optional) – The axis on which the plot will be drawn. Defaults to the current axis.
plot_bounds (bool, optional) – Whether to draw the confidence bounds. Defaults to True.
show_censors (bool, optional) – Whether to mark right censored observations on the curve. Defaults to True.
interp (('step', 'linear', 'cubic'), optional) – How to draw the curve between observations.
bound (optional) – Passed to the confidence bound calculation; see
cb().alpha_ci (optional) – Passed to the confidence bound calculation; see
cb().bound_type (optional) – Passed to the confidence bound calculation; see
cb().dist (optional) – Passed to the confidence bound calculation; see
cb().
- Returns
ax
- Return type
matplotlib axis
- qf(p: ArrayLike) ndarray[tuple[Any, ...], dtype[_ScalarT]]
Quantile function of the non-parametric estimate. Returns the smallest observed value at which the estimated CDF reaches, or exceeds, the probability p.
- Parameters
p (array like or scalar) – The probabilities at which the quantile will be computed. Values must be in (0, 1].
- Returns
q – The value(s) of the quantile at each p. NaN where the estimated CDF never reaches p (e.g. due to right censoring).
- Return type
numpy array
Examples
>>> from surpyval import KaplanMeier >>> x = np.array([1, 2, 3, 4, 5]) >>> model = KaplanMeier.fit(x) >>> model.qf(0.5) array([3.]) >>> model.qf([0.1, 0.5, 0.9]) array([1., 3., 5.])
- quantile_cb(p: ArrayLike, alpha_ci: float = 0.05, bound_type: str = 'exp', dist: str = 'z') ndarray[tuple[Any, ...], dtype[_ScalarT]]
Two-sided confidence interval of the quantile at each probability p using the Brookmeyer-Crowley method: the interval is the set of times at which the pointwise confidence interval of the survival function contains 1 - p.
- Parameters
p (array like or scalar) – The probabilities at which the quantile interval will be computed. Values must be in (0, 1].
alpha_ci (scalar, optional) – The level of significance at which the interval will be computed. Defaults to 0.05.
bound_type (('exp', 'normal'), str, optional) – The method for the underlying survival function bounds.
dist (('t', 'z'), str, optional) – The statistic used in the underlying survival function bounds.
- Returns
cb – Array of shape (len(p), 2) with the
[lower, upper]interval of the quantile for each p. The upper limit is NaN where the relevant bound of the survival function never crosses 1 - p (i.e. the interval is open to the right).- Return type
numpy array
- random(size: int) ndarray[tuple[Any, ...], dtype[_ScalarT]]
Draws random samples from the fitted distribution. Each observed value x is drawn with the probability mass the estimated survival function assigns to it. If the estimate does not reach zero (e.g. due to right censoring) the remaining mass is distributed over the observed values, i.e. sampling is conditional on an event occurring at one of the observed values.
- sf(x: ArrayLike, interp: str = 'step') ndarray[tuple[Any, ...], dtype[_ScalarT]]
Surival (or Reliability) function with the non-parametric estimates from the data.
- Parameters
x (array like or scalar) – The values of the random variables at which t he survival function will be calculated.
- Returns
sf – The value(s) of the survival function at each x
- Return type
scalar or numpy array
Examples
>>> from surpyval import NelsonAalen >>> x = np.array([1, 2, 3, 4, 5]) >>> model = NelsonAalen.fit(x) >>> model.sf(2) array([0.63762815]) >>> model.sf([1., 1.5, 2., 2.5]) array([0.81873075, 0.81873075, 0.63762815, 0.63762815])
- smoothed_hf(x: ArrayLike, bandwidth: float | None = None) ndarray[tuple[Any, ...], dtype[_ScalarT]]
Kernel smoothed estimate of the hazard rate, using an Epanechnikov kernel over the increments of the cumulative hazard estimate:
\[\hat{h}(t) = \frac{1}{b} \sum_{i} K\left ( \frac{t - x_i}{b} \right ) \Delta \hat{H}(x_i)\]Contributions are renormalised near the boundaries of the observed range so the estimate is not biased downward where the kernel window extends past the data.
This is a better estimate of the hazard rate than
hf(), which simply differences the cumulative hazard between the requested points.- Parameters
x (array like or scalar) – The values at which the hazard rate will be estimated.
bandwidth (scalar, optional) – The kernel bandwidth in the units of x. Defaults to a rough rule of thumb (one eighth of the observed range); for serious use choose by inspection or cross-validation.
- Returns
hf – The estimated hazard rate at each x. NaN outside the observed range.
- Return type
numpy array
References
Klein, J. P. and Moeschberger, M. L. (2003), “Survival Analysis”, 2nd ed., Section 6.2.