Non-parametric survival/risk estimation: Kaplan-Meier and Aalen-Johansen

Kaplan-Meier estimator of the cumulative risk/survival

cd("docs/src/fig/")
using Random, LSurvival, Distributions, LinearAlgebra, Plots, DataFrames

# generate some data under a discrete hazards model
id, int, out, data = LSurvival.dgm(MersenneTwister(1212), 100, 20)

data[:, 1] = round.(data[:, 1], digits = 3)
d, X = data[:, 4], data[:, 1:3]
wt = rand(MersenneTwister(1212), length(d)) # random weights just to demonstrate usage

tab = DataFrame("id"=>id, "in"=>int, "t"=>out, "d"=>d, "wts"=>wt)
# Kaplan-Meier curve
mfit = kaplan_meier(@formula(Surv(in, t, d)~1), tab, wts = tab.wts, id = ID.(id))
#mfit = kaplan_meier(int, out, d, wts = wt, id = ID.(id)) # equivalent specification without relying on Tables.jl interface

Output: (note that events do not occur in integer counts - this is due to the use of weights).

Kaplan-Meier Survival
────────────────────────────────────────
    time   survival   # events   at risk
────────────────────────────────────────
1    1.0  0.886358    5.73763   50.4885
2    2.0  0.684543   10.3561    45.4832
3    3.0  0.622066    3.30175   36.1763
4    4.0  0.534651    4.26077   30.3209
5    5.0  0.469165    2.652     21.6518
6    6.0  0.396423    3.57466   23.0554
7    7.0  0.298113    5.09531   20.5463
8    8.0  0.287853    0.388946  11.3009
9    9.0  0.216842    3.24547   13.156
10  10.0  0.195417    0.965415   9.77109
11  11.0  0.167813    1.43976   10.1925
12  12.0  0.140082    0.998202   6.04051
13  13.0  0.126733    0.58481    6.13692
14  16.0  0.10436     0.867046   4.91143
15  20.0  0.0986266   0.270987   4.93256
────────────────────────────────────────
Number of events:  43.7388
Number of unique event times:       15

Plotting the survival curve

plot(mfit)
savefig("km.svg")

Kaplan-Meier

Checking whether marginal distribution of the outcome comports with a parametric Weibull or Exponential model

  • a straight line implies Weibull
  • a flat line implies Exponential (a special case of Weibull)
lognlogplot(mfit)
savefig("lognlog.svg")

Is-it-Weibull

Competing event analysis: Aalen-Johansen estimator of cumulative risk

using Random, LSurvival, Distributions, LinearAlgebra, DataFrames

# simulate some data
z, x, t, d, event, wt = LSurvival.dgm_comprisk(MersenneTwister(122), 1000)
X = hcat(x,z)
enter = t .* rand(MersenneTwister(1232), length(d))*0.02 # create some fake entry times

df = DataFrame("enter"=> enter, "exit" => t, "e" => event, "wts" => wt)
res_aj = aalen_johansen(@formula(Surv(enter, exit, e)~1), df, wts = df.wts);
#res_aj = aalen_johansen(enter, t, event; wts = wt);# equivalent specification without relying on Tables.jl interface
res_aj

Output:

Kaplan-Meier Survival, Aalen-Johansen risk
──────────────────────────────────────────────────────────────────────────────────────
      time  survival  # events (j=1)  # events (j=2)  at risk   risk (j=1)  risk (j=2)
──────────────────────────────────────────────────────────────────────────────────────
1   0.0022  0.999034          0.159           0.0     164.588  0.000966047  0.0
2   0.0038  0.996697          0.5814          0.0     248.553  0.00330293   0.0
3   0.0054  0.991383          0.0             1.8427  345.628  0.00330293   0.00531385
4   0.0111  0.990613          0.0             0.4837  622.239  0.00330293   0.00608451
5   0.0174  0.989756          0.781           0.0     902.762  0.00415993   0.00608451
6   0.0254  0.987997          0.0             1.7696  996.153  0.00415993   0.00784274
7   0.0288  0.987633          0.3667          0.0     994.383  0.00452428   0.00784274
8   0.0298  0.987457          0.0             0.1767  994.017  0.00452428   0.00801831
9   0.0595  0.985665          1.8042          0.0     993.84   0.00631689   0.00801831
10  0.061   0.98494           0.7293          0.0     992.036  0.0070415    0.00801831
──────────────────────────────────────────────────────────────────────────────────────
...
──────────────────────────────────────────────────────────────────────────────────────
       time  survival  # events (j=1)  # events (j=2)  at risk  risk (j=1)  risk (j=2)
──────────────────────────────────────────────────────────────────────────────────────
97   0.7976  0.896879          0.0             1.9207  904.596   0.0443939   0.0587274
98   0.798   0.895434          1.4537          0.0     902.676   0.0458382   0.0587274
99   0.8072  0.894221          1.2212          0.0     901.222   0.0470516   0.0587274
100  0.815   0.894108          0.1138          0.0     900.001   0.0471647   0.0587274
101  0.8174  0.893825          0.2843          0.0     899.887   0.0474471   0.0587274
102  0.8309  0.892683          0.0             1.1494  899.603   0.0474471   0.0598695
103  0.8386  0.890721          0.0             1.9746  898.453   0.0474471   0.0618314
104  0.8572  0.890309          0.4151          0.0     896.479   0.0478596   0.0618314
105  0.9051  0.889313          1.0021          0.0     896.064   0.0488552   0.0618314
106  0.9189  0.888984          0.3315          0.0     895.062   0.0491846   0.0618314
──────────────────────────────────────────────────────────────────────────────────────
Number of events (j=1.0):  46.8371
Number of events (j=2.0):  58.4336
Number of unique event times:      106

Plotting marginal cause-specific risks

plot(res_aj)
savefig("aj.svg")

Aalen-Johansen