Parametric Regression Models

SurPyval provides three families of parametric regression model plus the Accelerated Life family for physics-motivated stress relationships. Each family is available as:

Pre-built instances — ready to use with any number of covariates
Factory functions — compose any surpyval distribution on the fly
Low-level fitter classes — for custom phi functions or life models

All parametric regression models use \(\phi(Z) = e^{\beta'Z}\) (log-linear covariates) except Accelerated Life models, which use domain-specific stress functions.

Proportional Hazards (PH)

\[h(x \mid Z) = h_0(x) \cdot \phi(Z)\]

Pre-built instances: ExponentialPH, NormalPH, WeibullPH, GumbelPH, LogisticPH, LogNormalPH, GammaPH.

Factory:

from surpyval import LogNormal
from surpyval import PH
model = PH(LogNormal).fit(x, Z=Z, c=c)

class surpyval.univariate.regression.proportional_hazards.proportional_hazards_fitter.ProportionalHazardsFitter(name, dist, phi, phi_name, phi_bounds, phi_param_map, phi_init=None)

Bases: DataFrameRegressionMixin

static create(distribution)

Create a Proportional Hazards fitter for the given distribution using exp(beta’Z) as the hazard multiplier.

Parameters: distribution (ParametricFitter) – A surpyval parametric distribution (e.g. Weibull, Exponential).
Returns: A configured fitter with a .fit(x, Z, ...) method.
Return type: ProportionalHazardsFitter

Fit the proportional hazards model to the data.

Parameters

x (array_like) – The observed event times.
Z (array_like) – The covariates to fit the model to.
c (array_like, optional) – The censoring indicators.
n (array_like, optional) – The number of observations at each time.
t (array_like, optional) – The time intervals.
init (array_like, optional) – The initial values for the parameters.
fixed (dict, optional) – A dictionary of parameters to fix to a specific value.

Returns

The fitted model.

Return type

ParametricRegressionModel

Examples

>>> from surpyval import WeibullPH
>>> from surpyval.datasets import load_tires_data
>>> from autograd import numpy as anp
>>> import numpy as np
>>>
>>> data = load_tires_data()
>>>
>>> x = data['Survival'].values
>>> c = data['Censoring'].values
>>> Z = data[[
    'Wedge gauge', 'Interbelt gauge', 'Peel force',
    'Wedge gauge×peel force'
]].values
>>> model = WeibullPH.fit(x=x, Z=Z, c=c)
>>> model
Parametric Regression SurPyval Model
====================================
Kind                : Proportional Hazard
Distribution        : Weibull
Regression Model    : Log Linear [e^(beta'Z)]
Fitted by           : MLE
Distribution        :
    alpha: 0.24255054642143947
    beta: 16.057791674515805
Regression Model    :
    beta_0: -9.165062641226692
    beta_1: -7.998599877425742
    beta_2: -27.503283340963034
    beta_3: 18.38550143851751
>>> model = WeibullPH.fit(x=x, Z=Z, c=c, fixed={"beta": 15})
>>> model
Parametric Regression SurPyval Model
====================================
Kind                : Proportional Hazard
Distribution        : Weibull
Regression Model    : Log Linear [e^(beta'Z)]
Fitted by           : MLE
Distribution        :
    alpha: 0.23772915681951018
    beta: 15.0
Regression Model    :
    beta_0: -8.628333861229965
    beta_1: -7.617541980158942
    beta_2: -25.952407717383302
    beta_3: 17.270173771235655

Fit the regression model using a pandas DataFrame as the input.

The names of the covariates are retained on the fitted model so that a DataFrame can later be passed to the prediction methods (sf, ff, df, hf, Hf, random) and the correct columns will be selected automatically.

Parameters

df (pandas.DataFrame) – The dataframe containing the data.
x_col (str) – The column name of the observed times.
Z_cols (str or list of str, optional) – The column name(s) of the covariates. Mutually exclusive with formula.
c_col (str, optional) – The column name of the censoring indicator.
n_col (str, optional) – The column name of the number of observations at each time.
tl_col (str, optional) – The column name of the left truncation values.
tr_col (str, optional) – The column name of the right truncation values.
formula (str, optional) – A formulaic formula describing the covariates, e.g. "age + sex". Mutually exclusive with Z_cols.
init (array_like, optional) – The initial values for the parameters.
fixed (dict, optional) – A dictionary of parameters to fix to a specific value.

Returns

The fitted model, with feature_names (and formula) set.

Return type

ParametricRegressionModel

Examples

>>> from surpyval import WeibullPH
>>> model = WeibullPH.fit_from_df(
...     df, x_col="time", Z_cols=["age", "weight"], c_col="censored"
... )
>>> model.sf([10, 20], df[["age", "weight"]])

Accelerated Failure Time (AFT)

\[H(x \mid Z) = H_0\!\left(e^{\beta'Z} \cdot x\right)\]

Pre-built instances: ExponentialAFT, NormalAFT, WeibullAFT, GumbelAFT, LogisticAFT, LogNormalAFT, GammaAFT.

Factory:

from surpyval import Gamma
from surpyval import AFT
model = AFT(Gamma).fit(x, Z=Z, c=c)

class surpyval.univariate.regression.accelerated_failure_time.aft_fitter.AFTFitter(distribution)

Bases: DataFrameRegressionMixin

Accelerated Failure Time fitter using exp(beta’Z) as the acceleration factor.

The cumulative hazard is:: H(x | Z) = H_0(exp(beta’Z) * x)

A positive beta coefficient means higher covariate values accelerate failure (shorter life), consistent with the PH sign convention.

Fit the regression model using a pandas DataFrame as the input.

The names of the covariates are retained on the fitted model so that a DataFrame can later be passed to the prediction methods (sf, ff, df, hf, Hf, random) and the correct columns will be selected automatically.

Parameters

df (pandas.DataFrame) – The dataframe containing the data.
x_col (str) – The column name of the observed times.
Z_cols (str or list of str, optional) – The column name(s) of the covariates. Mutually exclusive with formula.
c_col (str, optional) – The column name of the censoring indicator.
n_col (str, optional) – The column name of the number of observations at each time.
tl_col (str, optional) – The column name of the left truncation values.
tr_col (str, optional) – The column name of the right truncation values.
formula (str, optional) – A formulaic formula describing the covariates, e.g. "age + sex". Mutually exclusive with Z_cols.
init (array_like, optional) – The initial values for the parameters.
fixed (dict, optional) – A dictionary of parameters to fix to a specific value.

Returns

The fitted model, with feature_names (and formula) set.

Return type

ParametricRegressionModel

Examples

>>> from surpyval import WeibullPH
>>> model = WeibullPH.fit_from_df(
...     df, x_col="time", Z_cols=["age", "weight"], c_col="censored"
... )
>>> model.sf([10, 20], df[["age", "weight"]])

Proportional Odds (PO)

\[\frac{S(x \mid Z)}{F(x \mid Z)} = \frac{S_0(x)}{F_0(x)} \cdot e^{\beta'Z}\]

Pre-built instances: ExponentialPO, NormalPO, WeibullPO, GumbelPO, LogisticPO, LogNormalPO, GammaPO.

Factory:

from surpyval import Logistic
from surpyval import PO
model = PO(Logistic).fit(x, Z=Z, c=c)

class surpyval.univariate.regression.proportional_odds.proportional_odds_fitter.ProportionalOddsFitter(distribution)

Bases: DataFrameRegressionMixin

Proportional Odds model fitter using exp(beta’Z) as the odds multiplier.

The survival odds satisfy:: O(x | Z) = O_0(x) * exp(beta’Z) where O(x) = S(x) / F(x)
This gives:: sf(x | Z) = exp(beta’Z) * S_0(x) / (F_0(x) + exp(beta’Z) * S_0(x)) ff(x | Z) = F_0(x) / (F_0(x) + exp(beta’Z) * S_0(x)) hf(x | Z) = h_0(x) / (F_0(x) + exp(beta’Z) * S_0(x))

A positive beta coefficient means higher covariate values increase the survival odds (protective effect — longer life). To match the PH sign convention (positive beta = shorter life), negate your covariates or betas.

Fit the regression model using a pandas DataFrame as the input.

The names of the covariates are retained on the fitted model so that a DataFrame can later be passed to the prediction methods (sf, ff, df, hf, Hf, random) and the correct columns will be selected automatically.

Parameters

df (pandas.DataFrame) – The dataframe containing the data.
x_col (str) – The column name of the observed times.
Z_cols (str or list of str, optional) – The column name(s) of the covariates. Mutually exclusive with formula.
c_col (str, optional) – The column name of the censoring indicator.
n_col (str, optional) – The column name of the number of observations at each time.
tl_col (str, optional) – The column name of the left truncation values.
tr_col (str, optional) – The column name of the right truncation values.
formula (str, optional) – A formulaic formula describing the covariates, e.g. "age + sex". Mutually exclusive with Z_cols.
init (array_like, optional) – The initial values for the parameters.
fixed (dict, optional) – A dictionary of parameters to fix to a specific value.

Returns

The fitted model, with feature_names (and formula) set.

Return type

ParametricRegressionModel

Examples

>>> from surpyval import WeibullPH
>>> model = WeibullPH.fit_from_df(
...     df, x_col="time", Z_cols=["age", "weight"], c_col="censored"
... )
>>> model.sf([10, 20], df[["age", "weight"]])

Accelerated Life (AL)

AL models substitute the life parameter of a distribution with a physics-motivated stress function. Designed for discrete, controlled stress levels (e.g. temperature, voltage).

Factory:

from surpyval import Weibull
from surpyval import AcceleratedLife, Power, Eyring
model = AcceleratedLife(Weibull, Power).fit(x, Z=stress, c=c)

Available life models: Power, InversePower, Eyring, InverseEyring, ExponentialLifeModel, InverseExponential, Linear, DualExponential, DualPower, PowerExponential.

Custom life models can be created by subclassing LifeModel:

from surpyval import LifeModel, AcceleratedLife
import autograd.numpy as anp

class MyStressModel(LifeModel):
    def __init__(self):
        super().__init__(
            name="MyStressModel",
            phi_param_map={"a": 0, "b": 1},
            phi_bounds=((None, None), (None, None)),
        )

    def phi(self, Z, *params):
        a, b = params
        return anp.exp(a + b * Z)

    def phi_init(self, life, Z):
        b, a = anp.polyfit(Z.flatten(), anp.log(life), 1)
        return [float(a), float(b)]

model = AcceleratedLife(Weibull, MyStressModel()).fit(x, Z=stress, c=c)

class surpyval.univariate.regression.accelerated_life.parameter_substitution.ParameterSubstitutionFitter(kind, name, distribution, life_model, life_parameter, baseline=None, param_transform=None, inverse_param_transform=None)

Bases: DataFrameRegressionMixin

Fit the regression model using a pandas DataFrame as the input.

The names of the covariates are retained on the fitted model so that a DataFrame can later be passed to the prediction methods (sf, ff, df, hf, Hf, random) and the correct columns will be selected automatically.

Parameters

df (pandas.DataFrame) – The dataframe containing the data.
x_col (str) – The column name of the observed times.
Z_cols (str or list of str, optional) – The column name(s) of the covariates. Mutually exclusive with formula.
c_col (str, optional) – The column name of the censoring indicator.
n_col (str, optional) – The column name of the number of observations at each time.
tl_col (str, optional) – The column name of the left truncation values.
tr_col (str, optional) – The column name of the right truncation values.
formula (str, optional) – A formulaic formula describing the covariates, e.g. "age + sex". Mutually exclusive with Z_cols.
init (array_like, optional) – The initial values for the parameters.
fixed (dict, optional) – A dictionary of parameters to fix to a specific value.

Returns

The fitted model, with feature_names (and formula) set.

Return type

ParametricRegressionModel

Examples

>>> from surpyval import WeibullPH
>>> model = WeibullPH.fit_from_df(
...     df, x_col="time", Z_cols=["age", "weight"], c_col="censored"
... )
>>> model.sf([10, 20], df[["age", "weight"]])

class surpyval.univariate.regression.accelerated_life.lifemodel.LifeModel(name: str, phi_param_map: dict[str, int], phi_bounds: tuple[tuple[int | None, int | None]]): Bases: ABC