- class surpyval.univariate.parametric.distributions.beta4.Beta4_(name)
Bases:
ParametricFitterThe four-parameter (generalised) Beta distribution.
The standard
Betadistribution is supported on[0, 1]. The four-parameter Beta generalises it to an arbitrary finite interval[a, b]by introducing a location parametera(the lower bound) and a scale that stretches the unit interval out to an upper boundb. IfYis a standard Beta random variable thenX = a + (b - a) Yis four-parameter Beta distributed.Because the support
[a, b]is itself estimated, this is the distribution to reach for when data are bounded on both sides but neither bound is zero — the case whereBeta(..., offset=True)would (deliberately) refuse, since a one-sided offset cannot move the lower bound while keeping the upper bound pinned at 1.- Hf(x, alpha, beta, a, b)
Cumulative hazard rate for the four-parameter Beta distribution.
\[H(x) = -\ln\left(R(x)\right)\]- Parameters
x (numpy array or scalar) – The values at which the function will be calculated
alpha (numpy array or scalar) – The first shape parameter for the Beta distribution
beta (numpy array or scalar) – The second shape parameter for the Beta distribution
a (numpy array or scalar) – The lower bound of the support
b (numpy array or scalar) – The upper bound of the support
- Returns
Hf – The value(s) of the cumulative hazard rate at x.
- Return type
scalar or numpy array
- cs(x, X, alpha, beta, a, b)
Conditional survival (or reliability) function for the four-parameter Beta distribution:
\[R(x, X) = \frac{R(x + X)}{R(X)}\]- Parameters
x (numpy array or scalar) – The values at which the function will be calculated
X (numpy array or scalar) – The value(s) at which each value(s) in x was known to have survived
alpha (numpy array or scalar) – The first shape parameter for the Beta distribution
beta (numpy array or scalar) – The second shape parameter for the Beta distribution
a (numpy array or scalar) – The lower bound of the support
b (numpy array or scalar) – The upper bound of the support
- Returns
cs – The value(s) of the conditional survival function at x.
- Return type
scalar or numpy array
- df(x, alpha, beta, a, b)
Density function for the four-parameter Beta distribution:
\[f(x) = \frac{\left(x - a\right)^{\alpha - 1} \left(b - x\right)^{\beta - 1}}{B\left(\alpha, \beta\right) \left(b - a\right)^{\alpha + \beta - 1}}\]- Parameters
x (numpy array or scalar) – The values at which the function will be calculated
alpha (numpy array or scalar) – The first shape parameter for the Beta distribution
beta (numpy array or scalar) – The second shape parameter for the Beta distribution
a (numpy array or scalar) – The lower bound of the support
b (numpy array or scalar) – The upper bound of the support
- Returns
df – The value(s) of the density function at x.
- Return type
scalar or numpy array
Examples
>>> import numpy as np >>> from surpyval import Beta4 >>> x = np.array([2.1, 2.2, 2.3, 2.4, 2.5]) >>> Beta4.df(x, 3, 4, 2, 3) array([0.4374, 1.2288, 1.8522, 2.0736, 1.875 ])
- entropy(alpha, beta, a, b)
Differential entropy of the four-parameter Beta distribution.
Equal to the standard Beta entropy plus \(\ln(b - a)\) for the change of scale.
- Parameters
alpha (numpy array or scalar) – The first shape parameter for the Beta distribution
beta (numpy array or scalar) – The second shape parameter for the Beta distribution
a (numpy array or scalar) – The lower bound of the support
b (numpy array or scalar) – The upper bound of the support
- Returns
entropy – The entropy(ies) of the Beta distribution
- Return type
scalar or numpy array
- ff(x, alpha, beta, a, b)
Failure (CDF or unreliability) function for the four-parameter Beta distribution:
\[F(x) = I_{z}\left(\alpha, \beta\right), \quad z = \frac{x - a}{b - a}\]- Parameters
x (numpy array or scalar) – The values at which the function will be calculated
alpha (numpy array or scalar) – The first shape parameter for the Beta distribution
beta (numpy array or scalar) – The second shape parameter for the Beta distribution
a (numpy array or scalar) – The lower bound of the support
b (numpy array or scalar) – The upper bound of the support
- Returns
ff – The value(s) of the failure function at x.
- Return type
scalar or numpy array
Examples
>>> import numpy as np >>> from surpyval import Beta4 >>> x = np.array([2.1, 2.2, 2.3, 2.4, 2.5]) >>> Beta4.ff(x, 3, 4, 2, 3) array([0.01585, 0.09888, 0.25569, 0.45568, 0.65625])
- fit(x: ArrayLike | None = None, c: ArrayLike | None = None, n: ArrayLike | None = None, t: ArrayLike | None = None, how: str = 'MLE', offset: bool = False, zi: bool = False, lfp: bool = False, tl: ArrayLike | numbers.Number | None = None, tr: ArrayLike | numbers.Number | None = None, xl: ArrayLike | None = None, xr: ArrayLike | None = None, fixed: dict[str, float] | None = None, heuristic: str = 'Nelson-Aalen', init: ArrayLike = [], rr: str = 'y', on_d_is_0: bool = False, turnbull_estimator: str = 'Fleming-Harrington') Parametric
The central feature to SurPyval’s capability. This function aimed to have an API to mimic the simplicity of the scipy API. That is, to use a simple
fit()call, with as many or as few parameters as is needed.- Parameters
x (array like, optional) – Array of observations of the random variables. If x is
None, xl and xr must be provided.c (array like, optional) – Array of censoring flag. -1 is left censored, 0 is observed, 1 is right censored, and 2 is intervally censored. If not provided will assume all values are observed.
n (array like, optional) – Array of counts for each x. If data is provided as counts, then this can be provided. If
Nonewill assume each observation is 1.t (2D-array like, optional) – 2D array like of the left and right values at which the respective observation was truncated. If not provided it assumes that no truncation occurs.
how ({'MLE', 'MPP', 'MOM', 'MSE', 'MPS'}, optional) –
Method to estimate parameters, these are:
MLE, Maximum Likelihood Estimation
MPP, Method of Probability Plotting
MOM, Method of Moments
MSE, Mean Square Error
MPS, Maximum Product Spacing
offset (boolean, optional) – If
Truefinds the shifted distribution. If not provided assumes not a shifted distribution. Only works with distributions that are supported on the half-real line.tl (array like or scalar, optional) – Values of left truncation for observations. If it is a scalar value assumes each observation is left truncated at the value. If an array, it is the respective ‘late entry’ of the observation
tr (array like or scalar, optional) – Values of right truncation for observations. If it is a scalar value assumes each observation is right truncated at the value. If an array, it is the respective right truncation value for each observation
xl (array like, optional) – Array like of the left array for 2-dimensional input of x. This is useful for data that is all intervally censored. Must be used with the
xrinput.xr (array like, optional) – Array like of the right array for 2-dimensional input of x. This is useful for data that is all intervally censored. Must be used with the
xlinput.fixed (dict, optional) – Dictionary of parameters and their values to fix. Fixes parameter by name.
heuristic ({"Blom", "Median", "ECDF", "Modal", "Midpoint", "Mean", "Weibull", "Benard", "Beard", "Hazen", "Gringorten", "None", "Tukey", "DPW", "Fleming-Harrington", "Kaplan-Meier", "Nelson-Aalen", "Filliben", "Larsen", "Turnbull"}, str, optional.) – Plotting method to use, if using the probability plotting, MPP, method.
init (array like, optional) – initial guess of parameters. Instead of finding an initial guess for the optimization you can provide one. Can be useful to see if optimization is failing due to poor initial guess.
rr ({'y', 'x'}, str, optional) – The dimension on which to minimise the spacing between the line and the observation. If ‘y’ the mean square error between the line and vertical distance to each point is minimised. If ‘x’ the mean square error between the line and horizontal distance to each point is minimised.
on_d_is_0 (boolean, optional) – For the case when using MPP and the highest value is right censored, you can choose to include this value into the regression analysis or not. That is, if
False, all values where there are 0 deaths are excluded from the regression. IfTrueall values regardless of whether there is a death or not are included in the regression.turnbull_estimator ({'Nelson-Aalen', 'Kaplan-Meier', or 'Fleming-Harrington'), str, optional) – If using the Turnbull heuristic, you can elect to use either the KM, NA, or FH estimator with the Turnbull estimates of r, and d. Defaults to FH.
- Returns
A parametric model with the fitted parameters and methods for all functions of the distribution using the fitted parameters.
- Return type
Examples
>>> from surpyval import Weibull >>> import numpy as np >>> x = Weibull.random(100, 10, 4) >>> model = Weibull.fit(x) >>> print(model) Parametric SurPyval Model ========================= Distribution : Weibull Fitted by : MLE Parameters : alpha: 10.551521182640098 beta: 3.792549834495306 >>> Weibull.fit(x, how='MPS', fixed={'alpha' : 10}) Parametric SurPyval Model ========================= Distribution : Weibull Fitted by : MPS Parameters : alpha: 10.0 beta: 3.4314657446866836 >>> Weibull.fit(xl=x-1, xr=x+1, how='MPP') Parametric SurPyval Model ========================= Distribution : Weibull Fitted by : MPP Parameters : alpha: 9.943092756713078 beta: 8.613016934518258 >>> c = np.zeros_like(x) >>> c[x > 13] = 1 >>> x[x > 13] = 13 >>> c = c[x > 6] >>> x = x[x > 6] >>> Weibull.fit(x=x, c=c, tl=6) Parametric SurPyval Model ========================= Distribution : Weibull Fitted by : MLE Parameters : alpha: 10.363725328793413 beta: 4.9886821457305865
- fit_from_df(df: DataFrame, x: str | None = None, c: str | None = None, n: str | None = None, xl: str | None = None, xr: str | None = None, tl: str | float | None = None, tr: str | float | None = None, **fit_options) Parametric
The central feature to SurPyval’s capability. This function aimed to have an API to mimic the simplicity of the scipy API. That is, to use a simple
fit()call, with as many or as few parameters as is needed.- Parameters
df (DataFrame) – DataFrame of data to be used to create surpyval model
x (string, optional) – column name for the column in df containing the variable data. If not provided must provide both xl and xr.
c (string, optional) – column name for the column in df containing the censor flag of x. If not provided assumes all values of x are observed.
n (string, optional) – column name in for the column in df containing the counts of x. If not provided assumes each x is one observation.
tl (string or scalar, optional) – If string, column name in for the column in df containing the left truncation data. If scalar assumes each x is left truncated by that value. If not provided assumes x is not left truncated.
tr (string or scalar, optional) – If string, column name in for the column in df containing the right truncation data. If scalar assumes each x is right truncated by that value. If not provided assumes x is not right truncated.
xl (string, optional) – column name for the column in df containing the left interval for interval censored data. If left interval is -Inf, assumes left censored. If xl[i] == xr[i] assumes observed. Cannot be provided with x, must be provided with xr.
xr (string, optional) – column name for the column in df containing the right interval for interval censored data. If right interval is Inf, assumes right censored. If xl[i] == xr[i] assumes observed. Cannot be provided with x, must be provided with xl.
fit_options (dict, optional) – dictionary of fit options that will be passed to the
fitmethod, see that method for options.
- Returns
A parametric model with the fitted parameters and methods for all functions of the distribution using the fitted parameters.
- Return type
Examples
>>> import surpyval as surv >>> from surpyval.datasets import load_bofors_steel >>> df = load_bofors_steel() >>> model = surv.Weibull.fit_from_df(df, x='x', n='n', offset=True) >>> print(model) Parametric SurPyval Model ========================= Distribution : Weibull Fitted by : MLE Offset (gamma) : 39.76562962867477 Parameters : alpha: 7.141925216146524 beta: 2.6204524040137844
- fit_from_surpyval_data(surv_data: SurpyvalData, how: str = 'MLE', offset: bool = False, zi: bool = False, lfp: bool = False, fixed: dict[str, float] | None = None, heuristic: str = 'Nelson-Aalen', init: ArrayLike = [], rr: str = 'y', on_d_is_0: bool = False, turnbull_estimator: str = 'Fleming-Harrington') Parametric
The central feature to SurPyval’s capability. This function aimed to have an API to mimic the simplicity of the scipy API. That is, to use a simple
fit()call, with as many or as few parameters as is needed.- Parameters
surv_data (SurpyvalData) – Survival data in the SurpyvalData class.
For other input options see
fitmethod.- Returns
A parametric model with the fitted parameters and methods for all functions of the distribution using the fitted parameters.
- Return type
- from_params(params, gamma=None, p=None, f0=None)
Creating a SurPyval Parametric class with provided parameters.
- Parameters
params (array like) – array of the parameters of the distribution.
gamma (scalar, optional) – offset value for the distribution. If not provided will fit a regular, unshifted/not offset, distribution.
p (scalar, optional) – The proportion of the population that will never die or fail. If used it must be a value between 0 and 1. If None will assume 1, i.e. no proportion of the population will never die or fail.
f0 (scalar, optional) – The proportion of the population that will die or fail at time 0. If used it must be a value between 0 and 1. If None will assume 0, i.e. no proportion of the population will die or fail at time 0.
- Returns
A parametric model with the parameters provided.
- Return type
Examples
>>> from surpyval import Weibull >>> model = Weibull.from_params([10, 4]) >>> print(model) Parametric SurPyval Model ========================= Distribution : Weibull Fitted by : given parameters Parameters : alpha: 10 beta: 4 >>> model = Weibull.from_params([10, 4], gamma=2) >>> print(model) Parametric SurPyval Model ========================= Distribution : Weibull Fitted by : given parameters Offset (gamma) : 2 Parameters : alpha: 10 beta: 4
- hf(x, alpha, beta, a, b)
Instantaneous hazard rate for the four-parameter Beta distribution.
\[h(x) = \frac{f(x)}{R(x)}\]- Parameters
x (numpy array or scalar) – The values at which the function will be calculated
alpha (numpy array or scalar) – The first shape parameter for the Beta distribution
beta (numpy array or scalar) – The second shape parameter for the Beta distribution
a (numpy array or scalar) – The lower bound of the support
b (numpy array or scalar) – The upper bound of the support
- Returns
hf – The value(s) of the instantaneous hazard rate at x.
- Return type
scalar or numpy array
- mean(alpha, beta, a, b)
Mean of the four-parameter Beta distribution
\[E = a + \left(b - a\right)\frac{\alpha}{\alpha + \beta}\]- Parameters
alpha (numpy array or scalar) – The first shape parameter for the Beta distribution
beta (numpy array or scalar) – The second shape parameter for the Beta distribution
a (numpy array or scalar) – The lower bound of the support
b (numpy array or scalar) – The upper bound of the support
- Returns
mean – The mean(s) of the Beta distribution
- Return type
scalar or numpy array
Examples
>>> from surpyval import Beta4 >>> Beta4.mean(3, 4, 2, 3) 2.4285714285714284
- moment(m, alpha, beta, a, b)
m-th (non central) moment of the four-parameter Beta distribution.
Computed from the standard Beta moments via the binomial expansion of \(\left(a + (b - a) U\right)^m\).
- Parameters
m (integer) – The ordinal of the moment to calculate
alpha (numpy array or scalar) – The first shape parameter for the Beta distribution
beta (numpy array or scalar) – The second shape parameter for the Beta distribution
a (numpy array or scalar) – The lower bound of the support
b (numpy array or scalar) – The upper bound of the support
- Returns
moment – The moment(s) of the Beta distribution
- Return type
scalar or numpy array
Examples
>>> from surpyval import Beta4 >>> Beta4.moment(1, 3, 4, 2, 3) 2.4285714285714284
- qf(p, alpha, beta, a, b)
Quantile function for the four-parameter Beta distribution:
\[q(p) = a + \left(b - a\right) I^{-1}_{p}\left(\alpha, \beta\right)\]- Parameters
p (numpy array or scalar) – The percentiles at which the quantile will be calculated
alpha (numpy array or scalar) – The first shape parameter for the Beta distribution
beta (numpy array or scalar) – The second shape parameter for the Beta distribution
a (numpy array or scalar) – The lower bound of the support
b (numpy array or scalar) – The upper bound of the support
- Returns
q – The quantiles for the Beta distribution at each value p.
- Return type
scalar or numpy array
Examples
>>> import numpy as np >>> from surpyval import Beta4 >>> p = np.array([.1, .2, .3, .4, .5]) >>> Beta4.qf(p, 3, 4, 2, 3) array([2.20090888, 2.26864915, 2.32332388, 2.37307973, 2.42140719])
- random(size, *params)
Draws random samples from the distribution in shape size, using the inverse transform method with the distribution’s quantile function.
- Parameters
size (integer or tuple of positive integers) – Shape or size of the random draw
params (numpy array or scalar) – The parameters of the distribution
- Returns
random – Random values drawn from the distribution in shape size
- Return type
scalar or numpy array
Examples
>>> import numpy as np >>> from surpyval import Weibull >>> np.random.seed(1) >>> Weibull.random(5, 3, 4) array([2.57122697, 3.18730986, 0.31024877, 2.32381059, 1.89352939])
- sf(x, alpha, beta, a, b)
Survival (or reliability) function for the four-parameter Beta distribution:
\[R(x) = 1 - I_{z}\left(\alpha, \beta\right), \quad z = \frac{x - a}{b - a}\]- Parameters
x (numpy array or scalar) – The values at which the function will be calculated
alpha (numpy array or scalar) – The first shape parameter for the Beta distribution
beta (numpy array or scalar) – The second shape parameter for the Beta distribution
a (numpy array or scalar) – The lower bound of the support
b (numpy array or scalar) – The upper bound of the support
- Returns
sf – The value(s) of the reliability function at x.
- Return type
scalar or numpy array
Examples
>>> import numpy as np >>> from surpyval import Beta4 >>> x = np.array([2.1, 2.2, 2.3, 2.4, 2.5]) >>> Beta4.sf(x, 3, 4, 2, 3) array([0.98415, 0.90112, 0.74431, 0.54432, 0.34375])