Data Wrangling Examples

Lets just say we have a list of right censored data and a list of failures. How can we wrangle these into data for the fit() method to accept?

import surpyval as surv

# Failure data
f = [2, 3, 4, 5, 6, 7, 8, 8, 9]
# 'suspended' or right censored data
s = [1, 2, 10]

# convert to xcnt format!
x, c, n, t = surv.fs_to_xcnt(f, s)
print(x, c, n)

model = surv.Weibull.fit(x, c, n)
print(model)

[ 1.  2.  2.  3.  4.  5.  6.  7.  8.  9. 10.] [1 0 1 0 0 0 0 0 0 0 1] [1 1 1 1 1 1 1 1 2 1 1]
Parametric SurPyval Model
=========================
Distribution        : Weibull
Fitted by           : MLE
Parameters          :
     alpha: 7.200723091118343
      beta: 2.474773879032559

You can even bring in your left censored data as well:

# Failure data
f = [2, 3, 4, 5, 6, 7, 8, 8, 9]
# 'suspended' or right censored data
s = [1, 2, 10]
# left censored data
l = [7, 8, 9]

# convert to xcnt format!
x, c, n, t = surv.fsl_to_xcnt(f, s, l)
print(x, c, n)

model = surv.Weibull.fit(x, c, n)
print(model)

[ 1  2  2  3  4  5  6  7  7  8  8  9  9 10] [ 1  0  1  0  0  0  0 -1  0 -1  0 -1  0  1] [1 1 1 1 1 1 1 1 1 1 2 1 1 1]
Parametric SurPyval Model
=========================
Distribution        : Weibull
Fitted by           : MLE
Parameters          :
     alpha: 6.814750943875047
      beta: 2.4708983791966714

Another common type of data that is provided is in a simple text list with “+” indicating that the observation was censored at that point. Using some simple python list comprehensions can help.

# Example provided data
data = "1, 2, 3+, 5, 6, 8, 10, 3+, 5+"

f = [float(x) for x in data.split(',') if "+" not in x]
s = [float(x[0:-1]) for x in data.split(',') if "+" in x]

data = surv.fs_to_xcnt(f, s)

model = surv.Weibull.fit(*data)
model

Parametric SurPyval Model
=========================
Distribution        : Weibull
Fitted by           : MLE
Parameters          :
     alpha: 6.737537380917251
      beta: 1.9245506420564855

Again, this can be extended to left censored data as well:

data = "1, 2, 3+, 5, 6, 8, 10, 3+, 5+, 15-, 16-, 17-"
split_data = data.split(',')

f = [float(x) for x in split_data if ("+" not in x) & ("-" not in x)]
s = [float(x[0:-1]) for x in split_data if "+" in x]
l = [float(x[0:-1]) for x in split_data if "-" in x]

# Create the x, c, n data
data = surv.fsl_to_xcnt(f, s, l)

model = surv.Weibull.fit(*data)

Surpyval also offers the ability to use a pandas DataFrame as an input. All you need to do is tell it which columns to look at for x, c, n, and t. Columns for c, n, and t are optional. Further, if you have interval censored data you can use the ‘xl’ and ‘xr’ column names instead. If you have mixed interval and observed or censored data, just make sure the value in the ‘xl’ column is the value of the observation or left or right censoring.

import pandas as pd

xr = [2, 4, 6, 8, 10]
xl = [1, 2, 3, 4, 5]
df = pd.DataFrame({'xl' : xl, 'xr' : xr})

model = surv.Weibull.fit_from_df(df, xl='xl', xr='xr')
print(model)

Parametric SurPyval Model
=========================
Distribution        : Weibull
Fitted by           : MLE
Parameters          :
     alpha: 4.694329418712061
      beta: 2.410693002292504