Parametric¶

Distribution Classes¶

Parametric Class¶

class surpyval.parametric.parametric.Parametric(dist, method, data, offset, lfp, zi)¶

Bases: object

Result of .fit() or .from_params() method for every parametric surpyval distribution.

Instances of this class are very useful when a user needs the other functions of a distribution for plotting, optimizations, monte carlo analysis and numeric integration.

Hf(x)¶

The cumulative hazard function for a distribution using the parameters found in the .params attribute.

Parameters:	x (array like or scalar) – The values of the random variables at which the cumulative hazard function will be calculated
Returns:	Hf – The scalar value of the cumulative hazard function of the distribution if a scalar was passed. If an array like object was passed then a numpy array is returned with the value of the cumulative hazard function at each corresponding value in the input array.
Return type:	scalar or numpy array

Examples

>>> from surpyval import Weibull
>>> model = Weibull.from_params([10, 3])
>>> model.Hf(2)
0.008000000000000002
>>> model.Hf([1, 2, 3, 4, 5])
array([0.001, 0.008, 0.027, 0.064, 0.125])

aic()¶

The the Aikake Information Criterion (AIC) for the model, if it was fit with the fit() method. Not available if fit with the from_params() method.

Parameters:	None –
Returns:	aic – The AIC of the model
Return type:	float

Examples

>>> from surpyval import Weibull
>>> import numpy as np
>>> np.random.seed(1)
>>> x = Weibull.random(100, 10, 3)
>>> model = Weibull.fit(x)
>>> model.aic()
529.0537128477147

aic_c()¶

The the Corrected Aikake Information Criterion (AIC) for the model, if it was fit with the fit() method. Not available if fit with the from_params() method.

Parameters:	None –
Returns:	aic_c – The Corrected AIC of the model
Return type:	float

Examples

>>> from surpyval import Weibull
>>> import numpy as np
>>> np.random.seed(1)
>>> x = Weibull.random(100, 10, 3)
>>> model = Weibull.fit(x)
>>> model.aic()
529.1774241879209

bic()¶

The the Bayesian Information Criterion (BIC) for the model, if it was fit with the fit() method. Not available if fit with the from_params() method.

Parameters:	None –
Returns:	bic – The BIC of the model
Return type:	float

Examples

>>> from surpyval import Weibull
>>> import numpy as np
>>> np.random.seed(1)
>>> x = Weibull.random(100, 10, 3)
>>> model = Weibull.fit(x)
>>> model.bic()
534.2640532196908

Bayesian Information Criterion for Censored Survival Models.

cb(t, on='R', alpha_ci=0.05, bound='two-sided')¶

Confidence bounds of the on function at the alpa_ci level of significance. Can be the upper, lower, or two-sided confidence by changing value of bound.

Parameters:	x (array like or scalar) – The values of the random variables at which the confidence bounds will be calculated on (('sf', 'ff', 'Hf'), optional) – The function on which the confidence bound will be calculated. bound (('two-sided', 'upper', 'lower'), str, optional) – Compute either the two-sided, upper or lower confidence bound(s). Defaults to two-sided. alpha_ci (scalar, optional) – The level of significance at which the bound will be computed.
Returns:	cb – The value(s) of the upper, lower, or both confidence bound(s) of the selected function at x
Return type:	scalar or numpy array

cs(x, X)¶

The conditional survival of the model.

Parameters:	x (array like or scalar) – The values at which conditional survival is to be calculated. X (array like or scalar) – The value(s) at which it is known the item has survived
Returns:	cs – The conditional survival probability.
Return type:	array

Examples

>>> from surpyval import Weibull
>>> model = Weibull.from_params([10, 3])
>>> model.cs(11, 10)
0.00025840046151723767

df(x)¶

The density function for a distribution using the parameters found in the .params attribute.

Parameters:	x (array like or scalar) – The values of the random variables at which the density function will be calculated
Returns:	df – The scalar value of the density function of the distribution if a scalar was passed. If an array like object was passed then a numpy array is returned with the value of the density function at each corresponding value in the input array.
Return type:	scalar or numpy array

Examples

>>> from surpyval import Weibull
>>> model = Weibull.from_params([10, 3])
>>> model.df(2)
0.01190438297804473
>>> model.df([1, 2, 3, 4, 5])
array([0.002997  , 0.01190438, 0.02628075, 0.04502424, 0.06618727])

entropy()¶

A method to draw random samples from the distributions using the parameters found in the .params attribute.

Parameters:	None –
Returns:	entropy – Returns entropy of the distribution
Return type:	float

References

ENTROPY REF

Examples

>>> from surpyval import Normal
>>> model = Normal.from_params([10, 3])
>>> model.entropy()
2.588783247593625

ff(x)¶

The cumulative distribution function, or failure function, for a distribution using the parameters found in the .params attribute.

Parameters:	x (array like or scalar) – The values of the random variables at which the failure function (CDF) will be calculated
Returns:	ff – The scalar value of the CDF of the distribution if a scalar was passed. If an array like object was passed then a numpy array is returned with the value of the CDF at each corresponding value in the input array.
Return type:	scalar or numpy array

Examples

>>> from surpyval import Weibull
>>> model = Weibull.from_params([10, 3])
>>> model.ff(2)
0.007968085162939342
>>> model.ff([1, 2, 3, 4, 5])
array([0.0009995 , 0.00796809, 0.02663876, 0.061995  , 0.1175031 ])

get_plot_data(heuristic='Turnbull', alpha_ci=0.05)¶

A method to gather plot data

Parameters:	heuristic ({'Blom', 'Median', 'ECDF', 'Modal', 'Midpoint', 'Mean', 'Weibull', 'Benard', 'Beard', 'Hazen', 'Gringorten', 'None', 'Tukey', 'DPW', 'Fleming-Harrington', 'Kaplan-Meier', 'Nelson-Aalen', 'Filliben', 'Larsen', 'Turnbull'}, optional) – The method that the plotting point on the probablility plot will be calculated. alpha_ci (float, optional) – The confidence with which the confidence bounds, if able, will be calculated. Defaults to 0.95.
Returns:	data – Returns dictionary containing the data needed to do a plot.
Return type:	dict

Examples

>>> from surpyval import Weibull
>>> x = Weibull.random(100, 10, 3)
>>> model = Weibull.fit(x)
>>> data = model.get_plot_data()

hf(x)¶

The instantaneous hazard function for a distribution using the parameters found in the .params attribute.

Parameters:	x (array like or scalar) – The values of the random variables at which the instantaneous hazard function will be calculated
Returns:	hf – The scalar value of the instantaneous hazard function of the distribution if a scalar was passed. If an array like object was passed then a numpy array is returned with the value of the instantaneous hazard function at each corresponding value in the input array.
Return type:	scalar or numpy array

Examples

>>> from surpyval import Weibull
>>> model = Weibull.from_params([10, 3])
>>> model.hf(2)
0.012000000000000002
>>> model.hf([1, 2, 3, 4, 5])
array([0.003, 0.012, 0.027, 0.048, 0.075])

mean()¶

A method to draw random samples from the distributions using the parameters found in the .params attribute.

Parameters:	None –
Returns:	mean – Returns the mean of the distribution.
Return type:	float

Examples

>>> from surpyval import Weibull
>>> model = Weibull.from_params([10, 3])
>>> model.mean()
8.929795115692489

moment(n)¶

A method to draw random samples from the distributions using the parameters found in the .params attribute.

Parameters:	n (integer) – The degree of the moment to be computed
Returns:	moment[n] – Returns the n-th moment of the distribution
Return type:	float

References

INSERT WIKIPEDIA HERE

Examples

>>> from surpyval import Normal
>>> model = Normal.from_params([10, 3])
>>> model.moment(1)
10.0
>>> model.moment(5)
202150.0

neg_ll()¶

The the negative log-likelihood for the model, if it was fit with the fit() method. Not available if fit with the from_params() method.

Parameters:	None –
Returns:	neg_ll – The negative log-likelihood of the model
Return type:	float

Examples

>>> from surpyval import Weibull
>>> import numpy as np
>>> np.random.seed(1)
>>> x = Weibull.random(100, 10, 3)
>>> model = Weibull.fit(x)
>>> model.neg_ll()
262.52685642385734

plot(heuristic='Turnbull', plot_bounds=True, alpha_ci=0.05)¶

A method to do a probability plot

Parameters:	heuristic ({'Blom', 'Median', 'ECDF', 'Modal', 'Midpoint', 'Mean', 'Weibull', 'Benard', 'Beard', 'Hazen', 'Gringorten', 'None', 'Tukey', 'DPW', 'Fleming-Harrington', 'Kaplan-Meier', 'Nelson-Aalen', 'Filliben', 'Larsen', 'Turnbull'}, optional) – The method that the plotting point on the probablility plot will be calculated. plot_bounds (Boolean, optional) – A Boolean value to indicate whehter you want the probability bounds to be calculated. alpha_ci (float, optional) – The confidence with which the confidence bounds, if able, will be calculated. Defaults to 0.95.
Returns:	plot – list of a matplotlib plot object
Return type:	list

Examples

>>> from surpyval import Weibull
>>> x = Weibull.random(100, 10, 3)
>>> model = Weibull.fit(x)
>>> model.plot()

qf(p)¶

The quantile function for a distribution using the parameters found in the .params attribute.

Parameters:	p (array like or scalar) – The values, which must be between 0 and 1, at which the the quantile will be calculated
Returns:	qf – The scalar value of the quantile of the distribution if a scalar was passed. If an array like object was passed then a numpy array is returned with the value of the quantile at each corresponding value in the input array.
Return type:	scalar or numpy array

Examples

>>> from surpyval import Weibull
>>> model = Weibull.from_params([10, 3])
>>> model.qf(0.2)
6.06542793124108
>>> model.qf([.1, .2, .3, .4, .5])
array([4.72308719, 6.06542793, 7.09181722, 7.99387877, 8.84997045])

random(size)¶

A method to draw random samples from the distributions using the parameters found in the .params attribute.

Parameters:	size (int) – The number of random samples to be drawn from the distribution.
Returns:	random – Returns a numpy array of size `size` with random values drawn from the distribution.
Return type:	numpy array

Examples

>>> from surpyval import Weibull
>>> model = Weibull.from_params([10, 3])
>>> np.random.seed(1)
>>> model.random(1)
array([8.14127103])
>>> model.random(10)
array([10.84103403,  0.48542084,  7.11387062,  5.41420125,  4.59286657,
        5.90703589,  7.5124326 ,  7.96575225,  9.18134126,  8.16000438])

sf(x)¶

Surival (or Reliability) function for a distribution using the parameters found in the .params attribute.

Parameters:	x (array like or scalar) – The values of the random variables at which the survival function will be calculated
Returns:	sf – The scalar value of the survival function of the distribution if a scalar was passed. If an array like object was passed then a numpy array is returned with the value of the survival function at each corresponding value in the input array.
Return type:	scalar or numpy array

Examples

>>> from surpyval import Weibull
>>> model = Weibull.from_params([10, 3])
>>> model.sf(2)
0.9920319148370607
>>> model.sf([1, 2, 3, 4, 5])
array([0.9990005 , 0.99203191, 0.97336124, 0.938005  , 0.8824969 ])

Parametric Fitter¶

class surpyval.parametric.parametric_fitter.ParametricFitter¶

Bases: object

fit(x=None, c=None, n=None, t=None, how='MLE', offset=False, zi=False, lfp=False, tl=None, tr=None, xl=None, xr=None, fixed=None, heuristic='Turnbull', init=[], rr='y', on_d_is_0=False, turnbull_estimator='Fleming-Harrington')¶

The central feature to SurPyval’s capability. This function aimed to have an API to mimic the simplicity of the scipy API. That is, to use a simple fit() call, with as many or as few parameters as is needed.

Parameters:	x (array like, optional) – Array of observations of the random variables. If x is `None`, xl and xr must be provided. c (array like, optional) – Array of censoring flag. -1 is left censored, 0 is observed, 1 is right censored, and 2 is intervally censored. If not provided will assume all values are observed. n (array like, optional) – Array of counts for each x. If data is proivded as counts, then this can be provided. If `None` will assume each observation is 1. t (2D-array like, optional) – 2D array like of the left and right values at which the respective observation was truncated. If not provided it assumes that no truncation occurs. how ({'MLE', 'MPP', 'MOM', 'MSE', 'MPS'}, optional) – Method to estimate parameters, these are: MLE : Maximum Likelihood Estimation MPP : Method of Probability Plotting MOM : Method of Moments MSE : Mean Square Error MPS : Maximum Product Spacing offset (boolean, optional) – If `True` finds the shifted distribution. If not provided assumes not a shifted distribution. Only works with distributions that are supported on the half-real line. tl (array like or scalar, optional) – Values of left truncation for observations. If it is a scalar value assumes each observation is left truncated at the value. If an array, it is the respective ‘late entry’ of the observation tr (array like or scalar, optional) – Values of right truncation for observations. If it is a scalar value assumes each observation is right truncated at the value. If an array, it is the respective right truncation value for each observation xl (array like, optional) – Array like of the left array for 2-dimensional input of x. This is useful for data that is all intervally censored. Must be used with the `xr` input. xr (array like, optional) – Array like of the right array for 2-dimensional input of x. This is useful for data that is all intervally censored. Must be used with the `xl` input. fixed (dict, optional) – Dictionary of parameters and their values to fix. Fixes parameter by name. heuristic ({'"Blom", "Median", "ECDF", "Modal", "Midpoint", "Mean", "Weibull", "Benard", "Beard", "Hazen", "Gringorten", "None", "Tukey", "DPW", "Fleming-Harrington", "Kaplan-Meier", "Nelson-Aalen", "Filliben", "Larsen", "Turnbull"}) – Plotting method to use, if using the probability plotting, MPP, method. init (array like, optional) – initial guess of parameters. Useful if method is failing. rr (('y', 'x')) – The dimension on which to minimise the spacing between the line and the observation. If ‘y’ the mean square error between the line and vertical distance to each point is minimised. If ‘x’ the mean square error between the line and horizontal distance to each point is minimised. on_d_is_0 (boolean, optional) – For the case when using MPP and the highest value is right censored, you can choosed to include this value into the regression analysis or not. That is, if `False`, all values where there are 0 deaths are excluded from the regression. If `True` all values regardless of whether there is a death or not are included in the regression. turnbull_estimator (('Nelson-Aalen', 'Kaplan-Meier', or 'Fleming-Harrington'), str, optional) – If using the Turnbull heuristic, you can elect to use either the KM, NA, or FH estimator with the Turnbull estimates of r, and d. Defaults to FH.
Returns:	model – A parametric model with the fitted parameters and methods for all functions of the distribution using the fitted parameters.
Return type:	Parametric

Examples

>>> from surpyval import Weibull
>>> import numpy as np
>>> x = Weibull.random(100, 10, 4)
>>> model = Weibull.fit(x)
>>> print(model)
Parametric SurPyval Model
=========================
Distribution        : Weibull
Fitted by           : MLE
Parameters          :
     alpha: 10.551521182640098
      beta: 3.792549834495306
>>> Weibull.fit(x, how='MPS', fixed={'alpha' : 10})
Parametric SurPyval Model
=========================
Distribution        : Weibull
Fitted by           : MPS
Parameters          :
     alpha: 10.0
      beta: 3.4314657446866836
>>> Weibull.fit(xl=x-1, xr=x+1, how='MPP')
Parametric SurPyval Model
=========================
Distribution        : Weibull
Fitted by           : MPP
Parameters          :
     alpha: 9.943092756713078
      beta: 8.613016934518258
>>> c = np.zeros_like(x)
>>> c[x > 13] = 1
>>> x[x > 13] = 13
>>> c = c[x > 6]
>>> x = x[x > 6]
>>> Weibull.fit(x=x, c=c, tl=6)
Parametric SurPyval Model
=========================
Distribution        : Weibull
Fitted by           : MLE
Parameters          :
     alpha: 10.363725328793413
      beta: 4.9886821457305865

fit_from_df(df, x=None, c=None, n=None, xl=None, xr=None, tl=None, tr=None, **fit_options)¶

The central feature to SurPyval’s capability. This function aimed to have an API to mimic the simplicity of the scipy API. That is, to use a simple fit() call, with as many or as few parameters as is needed.

Parameters:	df (DataFrame) – DataFrame of data to be used to create surpyval model x (string, optional) – column name for the column in df containing the variable data. If not provided must provide both xl and xr c (string, optional) – column name for the column in df containing the censor flag of x. If not provided assumes all values of x are observed. n (string, optional) – column name in for the column in df containing the counts of x. If not provided assumes each x is one observation. tl (string or scalar, optional) – If string, column name in for the column in df containing the left truncation data. If scalar assumes each x is left truncated by that value. If not provided assumes x is not left truncated. tr (string or scalar, optional) – If string, column name in for the column in df containing the right truncation data. If scalar assumes each x is right truncated by that value. If not provided assumes x is not right truncated. xl (string, optional) – column name for the column in df containing the left interval for interval censored data. If left interval is -Inf, assumes left censored. If xl[i] == xr[i] assumes observed. Cannot be provided with x, must be provided with xr. xr (string, optional) – column name for the column in df containing the right interval for interval censored data. If right interval is Inf, assumes right censored. If xl[i] == xr[i] assumes observed. Cannot be provided with x, must be provided with xl. fit_options (dict, optional) – dictionary of fit options that will be passed to the `fit` method, see that method for options.
Returns:	model – A parametric model with the fitted parameters and methods for all functions of the distribution using the fitted parameters.
Return type:	Parametric

Examples

>>> import surpyval as surv
>>> df = surv.datasets.BoforsSteel.df
>>> model = surv.Weibull.fit_from_df(df, x='x', n='n', offset=True)
>>> print(model)
Parametric SurPyval Model
=========================
Distribution        : Weibull
Fitted by           : MLE
Offset (gamma)      : 39.76562962867477
Parameters          :
     alpha: 7.141925216146524
      beta: 2.6204524040137844

from_params(params, gamma=None, p=None, f0=None)¶

Creating a SurPyval Parametric class with provided parameters.

Parameters:	params (array like) – array of the parameters of the distribution. gamma (scalar, optional) – offset value for the distribution. If not provided will fit a regular, unshifted/not offset, distribution.
Returns:	model – A parametric model with the fitted parameters and methods for all functions of the distribution using the fitted parameters.
Return type:	Parametric

Examples

>>> from surpyval import Weibull
>>> model = Weibull.from_params([10, 4])
>>> print(model)
Parametric SurPyval Model
=========================
Distribution        : Weibull
Fitted by           : given parameters
Parameters          :
     alpha: 10
      beta: 4
>>> model = Weibull.from_params([10, 4], gamma=2)
>>> print(model)
Parametric SurPyval Model
=========================
Distribution        : Weibull
Fitted by           : given parameters
Offset (gamma)      : 2
Parameters          :
     alpha: 10
      beta: 4

Parametric Mixture Model¶

class surpyval.parametric.mixture_model.MixtureModel(x, dist=<surpyval.parametric.weibull.Weibull_ object>, **kwargs)¶

Bases: object

Generalised from algorithm found here https://www.sciencedirect.com/science/article/pii/S0307904X12002545