Non-Parametric¶

class surpyval.nonparametric.nonparametric.NonParametric¶

Bases: object

Result of .fit() method for every non-parametric surpyval distribution. This means that each of the methods in this class can be called with a model created from the NelsonAalen, KaplanMeier, FlemingHarrington, or Turnbull estimators.

Hf(x, interp='step')¶

Cumulative hazard rate with the non-parametric estimates from the data. This is calculated using the relationship between the hazard function and the density:

\[H(x) = -\ln(R(x))\]

Parameters:	x (array like or scalar) – The values of the random variables at which the survival function will be calculated
Returns:	Hf – The value(s) of the density function at x
Return type:	scalar or numpy array

Examples

>>> from surpyval import NelsonAalen
>>> x = np.array([1, 2, 3, 4, 5])
>>> model = NelsonAalen.fit(x)
>>> model.Hf(2)
array([0.45])
>>> model.df([1., 1.5, 2., 2.5])
model.Hf([1., 1.5, 2., 2.5])

cb(x, on='sf', bound='two-sided', interp='step', alpha_ci=0.05, bound_type='exp', dist='z')¶

Confidence bounds of the on function at the alpa_ci level of significance. Can be the upper, lower, or two-sided confidence by changing value of bound. Can change the bound type to be regular or exponential using either the ‘t’ or ‘z’ statistic.

Parameters:	x (array like or scalar) – The values of the random variables at which the confidence bounds will be calculated on (('sf', 'ff', 'Hf'), optional) – The function on which the confidence bound will be calculated. bound (('two-sided', 'upper', 'lower'), str, optional) – Compute either the two-sided, upper or lower confidence bound(s). Defaults to two-sided. interp (('step', 'linear', 'cubic'), optional) – How to interpolate the values between observations. Survival statistics traditionally uses step functions, but can use interpolated values if desired. Defaults to step. alpha_ci (scalar, optional) – The level of significance at which the bound will be computed. bound_type (('exp', 'regular'), str, optional) – The method with which the bounds will be calculated. Using regular will allow for the bounds to exceed 1 or be less than 0. Defaults to exp as this ensures the bounds are within 0 and 1. dist (('t', 'z'), str, optional) – The statistic to use in calculating the bounds (student-t or normal). Defaults to the normal (z).
Returns:	cb – The value(s) of the upper, lower, or both confidence bound(s) of the selected function at x
Return type:	scalar or numpy array

Examples

>>> from surpyval import NelsonAalen
>>> x = np.array([1, 2, 3, 4, 5])
>>> model = NelsonAalen.fit(x)
>>> model.cb([1., 1.5, 2., 2.5], bound='lower', dist='t')
array([0.11434813, 0.11434813, 0.04794404, 0.04794404])
>>> model.cb([1., 1.5, 2., 2.5])
array([[0.97789387, 0.16706394],
       [0.97789387, 0.16706394],
       [0.91235117, 0.10996882],
       [0.91235117, 0.10996882]])

References

http://reliawiki.org/index.php/Non-Parametric_Life_Data_Analysis

df(x, interp='step')¶

Density function with the non-parametric estimates from the data. This is calculated using the relationship between the hazard function and the density:

\[f(x) = h(x)e^{-H(x)}\]

Parameters:	x (array like or scalar) – The values of the random variables at which the survival function will be calculated
Returns:	df – The value(s) of the density function at x
Return type:	scalar or numpy array

Examples

>>> from surpyval import NelsonAalen
>>> x = np.array([1, 2, 3, 4, 5])
>>> model = NelsonAalen.fit(x)
>>> model.df(2)
array([0.28693267])
>>> model.df([1., 1.5, 2., 2.5])
array([0.16374615, 0.        , 0.15940704, 0.        ])

ff(x, interp='step')¶

CDF (failure or unreliability) function with the non-parametric estimates from the data

Parameters:	x (array like or scalar) – The values of the random variables at which the survival function will be calculated
Returns:	ff – The value(s) of the failure function at each x
Return type:	scalar or numpy array

Examples

>>> from surpyval import NelsonAalen
>>> x = np.array([1, 2, 3, 4, 5])
>>> model = NelsonAalen.fit(x)
>>> model.ff(2)
array([0.36237185])
>>> model.ff([1., 1.5, 2., 2.5])
array([0.18126925, 0.18126925, 0.36237185, 0.36237185])

hf(x, interp='step')¶

Instantaneous hazard function with the non-parametric estimates from the data. This is calculated using simply the difference between consecutive H(x).

Parameters:	x (array like or scalar) – The values of the random variables at which the survival function will be calculated
Returns:	hf – The value(s) of the failure function at each x
Return type:	scalar or numpy array

Examples

>>> from surpyval import NelsonAalen
>>> x = np.array([1, 2, 3, 4, 5])
>>> model = NelsonAalen.fit(x)
>>> model.ff(2)
array([0.36237185])
>>> model.ff([1., 1.5, 2., 2.5])
array([0.18126925, 0.18126925, 0.36237185, 0.36237185])

plot(**kwargs)¶: Creates a plot of the survival function.

sf(x, interp='step')¶

Surival (or Reliability) function with the non-parametric estimates from the data

Parameters:	x (array like or scalar) – The values of the random variables at which the survival function will be calculated
Returns:	sf – The value(s) of the survival function at each x
Return type:	scalar or numpy array

Examples

>>> from surpyval import NelsonAalen
>>> x = np.array([1, 2, 3, 4, 5])
>>> model = NelsonAalen.fit(x)
>>> model.sf(2)
array([0.63762815])
>>> model.sf([1., 1.5, 2., 2.5])
array([0.81873075, 0.81873075, 0.63762815, 0.63762815])

class surpyval.nonparametric.kaplan_meier.KaplanMeier_¶

Bases: surpyval.nonparametric.nonparametric_fitter.NonParametricFitter

Kaplan-Meier estimator class. Calculates the Non-Parametric estimate of the survival function using:

\[R(x) = \prod_{i:x_{i} \leq x}^{} \left ( 1 - \frac{d_{i} }{r_{i}} \right )\]

Examples

>>> import numpy as np
>>> from surpyval import KaplanMeier
>>> x = np.array([1, 2, 3, 4, 5])
>>> model = KaplanMeier.fit(x)
>>> model.R
array([0.8, 0.6, 0.4, 0.2, 0. ])

fit(x=None, c=None, n=None, t=None, xl=None, xr=None, tl=None, tr=None, turnbull_estimator='Fleming-Harrington')¶

The central feature to SurPyval’s capability. This function aimed to have an API to mimic the simplicity of the scipy API. That is, to use a simple fit() call, with as many or as few parameters as is needed.

Parameters:	x (array like, optional) – Array of observations of the random variables. If x is `None`, xl and xr must be provided. c (array like, optional) – Array of censoring flag. -1 is left censored, 0 is observed, 1 is right censored, and 2 is intervally censored. If not provided will assume all values are observed. n (array like, optional) – Array of counts for each x. If data is proivded as counts, then this can be provided. If `None` will assume each observation is 1. t (2D-array like, optional) – 2D array like of the left and right values at which the respective observation was truncated. If not provided it assumes that no truncation occurs. tl (array like or scalar, optional) – Values of left truncation for observations. If it is a scalar value assumes each observation is left truncated at the value. If an array, it is the respective ‘late entry’ of the observation tr (array like or scalar, optional) – Values of right truncation for observations. If it is a scalar value assumes each observation is right truncated at the value. If an array, it is the respective right truncation value for each observation xl (array like, optional) – Array like of the left array for 2-dimensional input of x. This is useful for data that is all intervally censored. Must be used with the `xr` input. xr (array like, optional) – Array like of the right array for 2-dimensional input of x. This is useful for data that is all intervally censored. Must be used with the `xl` input. turnbull_estimator (('Nelson-Aalen', 'Kaplan-Meier', or 'Fleming-Harrington'), str, optional) – If using the Turnbull heuristic, you can elect to use either the KM, NA, or FH estimator with the Turnbull estimates of r, and d. Defaults to FH.
Returns:	model – A parametric model with the fitted parameters and methods for all functions of the distribution using the fitted parameters.
Return type:	NonParametric

Examples

>>> from surpyval import NelsonAalen, Weibull, Turnbull
>>> import numpy as np
>>> x = Weibull.random(100, 10, 4)
>>> model = NelsonAalen.fit(x)
>>> print(model)
Non-Parametric SurPyval Model
=============================
Model            : Nelson-Aalen
>>> Turnbull.fit(x, turnbull_estimator='Kaplan-Meier')
Non-Parametric SurPyval Model
=============================
Model            : Turnbull
Estimator        : Kaplan-Meier

class surpyval.nonparametric.nelson_aalen.NelsonAalen_¶

Bases: surpyval.nonparametric.nonparametric_fitter.NonParametricFitter

Nelson-Aalen estimator class. Returns a NonParametric object from method fit() Calculates the Non-Parametric estimate of the survival function using:

\[R(x) = e^{-\sum_{i:x_{i} \leq x}^{} \frac{d_{i} }{r_{i}}}\]

Examples

>>> import numpy as np
>>> from surpyval import NelsonAalen
>>> x = np.array([1, 2, 3, 4, 5])
>>> model = NelsonAalen.fit(x)
>>> model.R
array([0.81873075, 0.63762815, 0.45688054, 0.27711205, 0.10194383])

fit(x=None, c=None, n=None, t=None, xl=None, xr=None, tl=None, tr=None, turnbull_estimator='Fleming-Harrington')¶

The central feature to SurPyval’s capability. This function aimed to have an API to mimic the simplicity of the scipy API. That is, to use a simple fit() call, with as many or as few parameters as is needed.

Parameters:	x (array like, optional) – Array of observations of the random variables. If x is `None`, xl and xr must be provided. c (array like, optional) – Array of censoring flag. -1 is left censored, 0 is observed, 1 is right censored, and 2 is intervally censored. If not provided will assume all values are observed. n (array like, optional) – Array of counts for each x. If data is proivded as counts, then this can be provided. If `None` will assume each observation is 1. t (2D-array like, optional) – 2D array like of the left and right values at which the respective observation was truncated. If not provided it assumes that no truncation occurs. tl (array like or scalar, optional) – Values of left truncation for observations. If it is a scalar value assumes each observation is left truncated at the value. If an array, it is the respective ‘late entry’ of the observation tr (array like or scalar, optional) – Values of right truncation for observations. If it is a scalar value assumes each observation is right truncated at the value. If an array, it is the respective right truncation value for each observation xl (array like, optional) – Array like of the left array for 2-dimensional input of x. This is useful for data that is all intervally censored. Must be used with the `xr` input. xr (array like, optional) – Array like of the right array for 2-dimensional input of x. This is useful for data that is all intervally censored. Must be used with the `xl` input. turnbull_estimator (('Nelson-Aalen', 'Kaplan-Meier', or 'Fleming-Harrington'), str, optional) – If using the Turnbull heuristic, you can elect to use either the KM, NA, or FH estimator with the Turnbull estimates of r, and d. Defaults to FH.
Returns:	model – A parametric model with the fitted parameters and methods for all functions of the distribution using the fitted parameters.
Return type:	NonParametric

Examples

>>> from surpyval import NelsonAalen, Weibull, Turnbull
>>> import numpy as np
>>> x = Weibull.random(100, 10, 4)
>>> model = NelsonAalen.fit(x)
>>> print(model)
Non-Parametric SurPyval Model
=============================
Model            : Nelson-Aalen
>>> Turnbull.fit(x, turnbull_estimator='Kaplan-Meier')
Non-Parametric SurPyval Model
=============================
Model            : Turnbull
Estimator        : Kaplan-Meier

class surpyval.nonparametric.fleming_harrington.FlemingHarrington_¶

Bases: surpyval.nonparametric.nonparametric_fitter.NonParametricFitter

Fleming-Harrington estimation of survival distribution. Returns a NonParametric object from method fit() Calculates the Non-Parametric estimate of the survival function using:

\[R = e^{-\sum_{i:x_{i} \leq x} \sum_{i=0}^{d_x-1} \frac{1}{r_x - i}}\]

See ‘NonParametric section for detailed estimate of how H is computed.’

Examples

>>> import numpy as np
>>> from surpyval import FlemingHarrington
>>> x = np.array([1, 2, 3, 4, 5])
>>> model = FlemingHarrington.fit(x)
>>> model.R
array([0.81873075, 0.63762815, 0.45688054, 0.27711205, 0.10194383])

fit(x=None, c=None, n=None, t=None, xl=None, xr=None, tl=None, tr=None, turnbull_estimator='Fleming-Harrington')¶

The central feature to SurPyval’s capability. This function aimed to have an API to mimic the simplicity of the scipy API. That is, to use a simple fit() call, with as many or as few parameters as is needed.

Parameters:	x (array like, optional) – Array of observations of the random variables. If x is `None`, xl and xr must be provided. c (array like, optional) – Array of censoring flag. -1 is left censored, 0 is observed, 1 is right censored, and 2 is intervally censored. If not provided will assume all values are observed. n (array like, optional) – Array of counts for each x. If data is proivded as counts, then this can be provided. If `None` will assume each observation is 1. t (2D-array like, optional) – 2D array like of the left and right values at which the respective observation was truncated. If not provided it assumes that no truncation occurs. tl (array like or scalar, optional) – Values of left truncation for observations. If it is a scalar value assumes each observation is left truncated at the value. If an array, it is the respective ‘late entry’ of the observation tr (array like or scalar, optional) – Values of right truncation for observations. If it is a scalar value assumes each observation is right truncated at the value. If an array, it is the respective right truncation value for each observation xl (array like, optional) – Array like of the left array for 2-dimensional input of x. This is useful for data that is all intervally censored. Must be used with the `xr` input. xr (array like, optional) – Array like of the right array for 2-dimensional input of x. This is useful for data that is all intervally censored. Must be used with the `xl` input. turnbull_estimator (('Nelson-Aalen', 'Kaplan-Meier', or 'Fleming-Harrington'), str, optional) – If using the Turnbull heuristic, you can elect to use either the KM, NA, or FH estimator with the Turnbull estimates of r, and d. Defaults to FH.
Returns:	model – A parametric model with the fitted parameters and methods for all functions of the distribution using the fitted parameters.
Return type:	NonParametric

Examples

>>> from surpyval import NelsonAalen, Weibull, Turnbull
>>> import numpy as np
>>> x = Weibull.random(100, 10, 4)
>>> model = NelsonAalen.fit(x)
>>> print(model)
Non-Parametric SurPyval Model
=============================
Model            : Nelson-Aalen
>>> Turnbull.fit(x, turnbull_estimator='Kaplan-Meier')
Non-Parametric SurPyval Model
=============================
Model            : Turnbull
Estimator        : Kaplan-Meier

class surpyval.nonparametric.turnbull.Turnbull_¶

Bases: surpyval.nonparametric.nonparametric_fitter.NonParametricFitter

Turnbull estimator class. Returns a NonParametric object from method fit(). Calculates the Non-Parametric estimate of the survival function using the Turnbull NPMLE

Examples

>>> import numpy as np
>>> from surpyval import Turnbull
>>> x = np.array([[1, 5], [2, 3], [3, 6], [1, 8], [9, 10]])
>>> model = Turnbull.fit(x)
>>> model.R
array([1.        , 0.59999999, 0.20000002, 0.2       , 0.2       ,
       0.2       , 0.        , 0.        ])

fit(x=None, c=None, n=None, t=None, xl=None, xr=None, tl=None, tr=None, turnbull_estimator='Fleming-Harrington')¶

The central feature to SurPyval’s capability. This function aimed to have an API to mimic the simplicity of the scipy API. That is, to use a simple fit() call, with as many or as few parameters as is needed.

Parameters:	x (array like, optional) – Array of observations of the random variables. If x is `None`, xl and xr must be provided. c (array like, optional) – Array of censoring flag. -1 is left censored, 0 is observed, 1 is right censored, and 2 is intervally censored. If not provided will assume all values are observed. n (array like, optional) – Array of counts for each x. If data is proivded as counts, then this can be provided. If `None` will assume each observation is 1. t (2D-array like, optional) – 2D array like of the left and right values at which the respective observation was truncated. If not provided it assumes that no truncation occurs. tl (array like or scalar, optional) – Values of left truncation for observations. If it is a scalar value assumes each observation is left truncated at the value. If an array, it is the respective ‘late entry’ of the observation tr (array like or scalar, optional) – Values of right truncation for observations. If it is a scalar value assumes each observation is right truncated at the value. If an array, it is the respective right truncation value for each observation xl (array like, optional) – Array like of the left array for 2-dimensional input of x. This is useful for data that is all intervally censored. Must be used with the `xr` input. xr (array like, optional) – Array like of the right array for 2-dimensional input of x. This is useful for data that is all intervally censored. Must be used with the `xl` input. turnbull_estimator (('Nelson-Aalen', 'Kaplan-Meier', or 'Fleming-Harrington'), str, optional) – If using the Turnbull heuristic, you can elect to use either the KM, NA, or FH estimator with the Turnbull estimates of r, and d. Defaults to FH.
Returns:	model – A parametric model with the fitted parameters and methods for all functions of the distribution using the fitted parameters.
Return type:	NonParametric

Examples

>>> from surpyval import NelsonAalen, Weibull, Turnbull
>>> import numpy as np
>>> x = Weibull.random(100, 10, 4)
>>> model = NelsonAalen.fit(x)
>>> print(model)
Non-Parametric SurPyval Model
=============================
Model            : Nelson-Aalen
>>> Turnbull.fit(x, turnbull_estimator='Kaplan-Meier')
Non-Parametric SurPyval Model
=============================
Model            : Turnbull
Estimator        : Kaplan-Meier

surpyval.nonparametric.success_run.success_run(n, confidence=0.95, alpha=None)¶: Function that can be used to estimte the confidence given n samples all survive a test.

surpyval.nonparametric.plotting_positions.plotting_positions(x, c=None, n=None, t=None, heuristic='Blom', turnbull_estimator='Fleming-Harrington')¶

This function takes in data in the xcnt format and outputs an approximation of the CDF. This function can be used to produce estimates of F using the Nelson-Aalen, Kaplan-Meier, Fleming-Harrington, and the Turnbull estimates. Additionally, it can be used to create ‘plotting heuristics.’

Plotting heuristics are the values that are used to plot on probability paper and can be used to estiamte the parameters of a distribution. The use of probability plots is one of the traditional ways to estimate the parameters of a distribution.

If right censored data can be used by the regular plotting positions. If there is right censored data this method adjusts the ranks of the values using the mean order number.

Parameters:

x (array like, optional) – Array of observations of the random variables. If x is None, xl and xr must be provided.
c (array like, optional) – Array of censoring flag. -1 is left censored, 0 is observed, 1 is right censored, and 2 is intervally censored. If not provided will assume all values are observed.
n (array like, optional) – Array of counts for each x. If data is proivded as counts, then this can be provided. If None will assume each observation is 1.
t (2D-array like, optional) – 2D array like of the left and right values at which the respective observation was truncated. If not provided it assumes that no truncation occurs.
heuristic (("Blom", "Median", "ECDF", "ECDF_Adj", "Modal", "Midpoint", "Mean", "Weibull", "Benard", "Beard", "Hazen", "Gringorten", "None", "Larsen", "Tukey", "DPW"), str, optional) – Method to use to compute the heuristic of F. See details of each heursitic in the probability plotting section.
turnbull_estimator (('Nelson-Aalen', 'Kaplan-Meier'), str, optional) – If using the Turnbull heuristic, you can elect to use the NA or KM method to compute R with the Turnbull estimates of the risk and deat sets.

Returns:

x (numpy array) – x values for the plotting points
r (numpy array) – risk set at each x
d (numpy array) – death set at each x
F (numpy array) – estimate of F to use in plotting positions.

Examples

>>> from surpyval.nonparametric import plotting_positions
>>> import numpy as np
>>> x = np.array([1, 2, 3, 4, 5, 6, 7, 8])
>>> x, r, d, F = plotting_positions(x, heuristic="Filliben")
>>> F
array([0.08299596, 0.20113568, 0.32068141, 0.44022714, 0.55977286,
       0.67931859, 0.79886432, 0.91700404])