Competing Risks Analysis

Competing risks analysis addresses situations where a subject is at risk of experiencing more than one type of event, but only one event can occur first and doing so removes the subject from further observation. A subject “competes” among several possible failure causes.

Classic examples:

A patient may die from cancer, heart disease, or another cause; the first to occur ends their observation period.
A mechanical component may fail by fatigue, corrosion, or overload; which failure mode occurs first determines both the failure time and its cause.
A customer may churn, upgrade, or downgrade; the event that happens first changes the analysis for the remaining outcomes.

Competing risks require special treatment because treating the competing events as ordinary independent censoring gives a quantity that cannot be interpreted as a real-world probability. A deeper subtlety is the identifiability problem: from competing-risks data alone the marginal (net, latent) distribution of each cause — the distribution that would be seen if the other causes were removed — cannot be identified without an untestable assumption about the dependence between causes. This is why the observable, well-defined target is the cumulative incidence function rather than a marginal cause-specific survival.

Relationship to Univariate Analysis

Standard survival methods (Kaplan-Meier, parametric MLE) applied to a single cause — ignoring others — estimate the cause-specific survival function. Naïvely applying KM while censoring the competing events yields a quantity that cannot be interpreted as the probability of experiencing the event in the real world because the competing events are not truly independent censoring mechanisms.

The correct marginal quantity of interest is the Cumulative Incidence Function (CIF), also called the sub-distribution survival function. For cause \(k\):

\[F_k(t) = P(T \leq t,\; K = k)\]

where \(T\) is the event time and \(K\) is the event type. The CIFs sum to the overall failure probability:

\[\sum_{k=1}^{K} F_k(t) = F(t) = 1 - S(t)\]

Non-Parametric CIF Estimation

The empirical CIF for cause \(k\) is estimated from the cause-specific hazard rates:

\[\hat{F}_k(t) = \sum_{x_i \leq t} \hat{h}_k(x_i)\, \hat{S}(x_i^-)\]

where \(\hat{h}_k(x_i) = d_{k,i} / r_i\) is the cause-specific hazard at event time \(x_i\), \(d_{k,i}\) is the number of cause-\(k\) events at \(x_i\), \(r_i\) is the total at-risk count, and \(\hat{S}(x_i^-)\) is the overall survival estimate just before \(x_i\).

SurPyval estimates the non-parametric CIF for each cause with the CompetingRisks class, which uses this cause-specific-hazard construction directly (the Nelson-Aalen or Kaplan-Meier estimate of the overall survival supplies \(\hat{S}(x_i^-)\)).

Parametric Competing Risks

A parametric competing risks model specifies a separate parametric distribution for each cause. The overall survival function is the product of the cause-specific survival functions (assuming independent latent failure times):

\[S(t) = \prod_{k=1}^{K} S_k(t)\]

The overall density is:

\[f(t) = \sum_{k=1}^{K} f_k(t) \prod_{j \neq k} S_j(t)\]

SurPyval provides the ParametricCompetingRisks class for this model. Under the independent-latent-times assumption the joint likelihood separates, so each cause’s distribution is fitted independently by MLE with the other causes’ events treated as right-censored.

Regression: Fine-Gray and Cause-Specific PH

Two main regression approaches are used in competing risks:

Cause-specific proportional hazards — fits a separate Cox or parametric PH model for each cause, with all other cause events treated as censored:

\[h_k(t \mid Z) = h_{k,0}(t)\, e^{Z \beta_k}\]

This estimates the effect of covariates on the hazard of each cause independently.

Fine-Gray sub-distribution hazards — models the effect of covariates directly on the CIF via a proportional hazards model on the sub-distribution hazard:

\[h_k^*(t \mid Z) = h_{k,0}^*(t)\, e^{Z \gamma_k}\]

This is the natural choice when the scientific question is about the probability of a cause occurring in the presence of competing risks (e.g. clinical risk scores).

SurPyval provides the FineGray and CompetingRisksProportionalHazards classes.

Comparing incidence across groups: Gray’s test

The log-rank test compares survival curves; its competing-risks analogue is Gray’s test (1988), which compares the cumulative incidence functions of a chosen cause across groups. The distinction matters. A cause-specific log-rank compares the cause-specific hazards — the instantaneous rate of the cause among those still at risk — whereas Gray’s test compares the CIFs themselves, i.e. the actual incidence of the cause in a population that is also being depleted by the competing causes.

Gray’s test achieves this by modifying the risk set. Instead of removing subjects who fail from a competing cause (as a cause-specific analysis would), it keeps them in the subdistribution risk set with an inverse-probability-of-censoring weight

\[w_j(t) = \frac{\hat{G}(t)}{\hat{G}(x_j)},\]

where \(\hat{G}\) is the Kaplan-Meier estimate of the censoring distribution. Subjects who have already failed from a competing cause therefore continue to count — with a decaying weight — which is precisely what makes the comparison one of incidence rather than of instantaneous rate. The resulting statistic is \(\chi^2\) distributed with \(k - 1\) degrees of freedom for \(k\) groups. Reach for it when the question is “how many fail of this cause”, and for the cause-specific log-rank when the question is “how fast”.

For worked examples of estimating cumulative incidence functions, fitting the Fine-Gray and cause-specific proportional hazards models, and comparing groups with Gray’s test, see the Competing Risks SurPyval Modelling page.