LSurvival

Survival analysis functions in Julia for time-to-event outcomes that can include:

  • Loss-to-follow-up/right censoring
  • Late entry/left truncation (not available in Survival.jl)
  • "Person-period" data structures (not available in Survival.jl)
  • Observation weights (not available in Survival.jl)
  • Competing risks (not available in Survival.jl)

Capabilities include estimators for

  • Kaplan-Meier non-parametric conditional risk functions
  • Aalen-Johansen non-parametric cause-specific unconditional risk functions
  • Cox proportional hazards model (Efron's or Breslow's methods for ties)
  • Parametric survival models

Convenience functions enable:

  • Non-parametric bootstrapping, cluster-bootstrapping, jackknife
  • Estimating baseline hazards from a Cox Model
  • Estimating cause-specific risk from an exhaustive set of Cox models for competing risk outcomes
  • Simple simulation of competing and non-competing survival events
  • Martingale, score, Schoenfeld, and dfbeta residuals
  • Cluster robust variance estimation (Cox models)

Plans to include:

  • Parametric survival models: more distributions
  • Stratification in Cox models
  • Parametric survival model diagnostics

The package has been tuned to follow the "survival" package from R in terms of specific estimators/results.

Report issues here

Installation

using Pkg; Pkg.add(url = "https://github.com/alexpkeil1/LSurvival.jl")

Quick examples

Single event type: Cox model and Kaplan-Meier curve


tab = ( in = int, out = out, d=d, x=X[:,1], z1=X[:,2], z2=X[:,3]) 

coxph(@formula(Surv(in, out, d)~x+z1+z2), tab, ties = "efron", wts = wt)

Output:

Maximum partial likelihood estimates (alpha=0.05):
─────────────────────────────────────────────────────────
      ln(HR)    StdErr        LCI      UCI     Z  P(>|Z|)
─────────────────────────────────────────────────────────
x   1.60222   0.530118   0.563208  2.64123  3.02   0.0025
z1  0.305929  0.43838   -0.55328   1.16514  0.70   0.4853
z2  1.98011   0.325314   1.34251   2.61771  6.09   <1e-08
─────────────────────────────────────────────────────────
Partial log-likelihood (null): -150.162
Partial log-likelihood (fitted): -131.512
LRT p-value (X^2=37.3, df=3): 3.9773e-08
Newton-Raphson iterations: 5
# can also be done if there is no late entry
coxph(@formula(Surv(out, d)~x+z1+z2), tab, ties = "efron", wts = wt)
# can also be done if there is no late entry and no right censoring (i.e. all times are failure times)
coxph(@formula(Surv(out)~x+z1+z2), tab, ties = "efron", wts = wt)

# Kaplan-Meier estimator of the cumulative risk/survival
kaplan_meier(int, outt, d)

Competing event analysis: Aalen-Johansen and Cox-model-based estimators of the cumulative risk/survival

# Aalen-Johansen estimator: marginal cause-specific risks
res_aj = aalen_johansen(enter, t, event; wts = wt);
res_aj

# Cox-model estimator: cause-specific risks at given levels of covariates
fit1 = fit(PHModel, X, enter, t, (event .== 1), ties = "efron",  wts = wt)
n2idx = findall(event .!= 1)
fit2 = fit(PHModel, X[n2idx,:], enter[n2idx], t[n2idx], (event[n2idx] .== 2), ties = "efron",  wts = wt[n2idx])

# risk at average levels of `x` and `z`
res_cph = risk_from_coxphmodels([fit1,fit2], coef_vectors=[coef(fit1), coef(fit2)], pred_profile=mean(X, dims=1))
# compare to Aalen-Johansen fit
res_aj


# this approach operates on left censored outcomes (which operate in the background in model fitting)
LSurvivalResp(enter, t, d, origintime=0)
LSurvivalCompResp(enter, t, event) # automatically infers origin


# can use the ID type to refer to units with multiple observations
id, int, outt, data = dgm(MersenneTwister(), 1000, 10; regimefun = int_0)
LSurvivalResp(int, outt, data[:,4], ID.(id))

Index of functions

Function help

LSurvival.AbstractNPSurvType

Abstract type for non-parametric survival models, including Kaplan-Meier, Aalen Johansen, and Cox-model based estimates of survival using an Aalen-Johansen-like estimator

source
LSurvival.IDType

Type for identifying individuals in survival outcomes.

Used for the id argument in

  • Outcome types: LSurvivalResp, LSurvivalCompResp
  • Model types: PHModel, KMRisk, AJRisk

Accepts any Number or String. There is no significance to having this particular struct, but it enables easier use of multiple dispatch.

[ID(i) for i in 1:10]
source
LSurvival.LSurvivalCompRespType

Outcome type for competing risk survival outcomes subject to left truncation and right censoring (not generally needed for users)

Parameters

  • enter Time at observation start

  • exit Time at observation end

  • y event occurrence in observation

  • wts observation weights

  • eventtimes unique event times

  • origin origin on the time scale

  • id person level identifier (must be wrapped in ID() function)

  • eventtypes vector of unique event types

  • eventmatrix matrix of indicators on the observation level

    Signatures:

 struct LSurvivalCompResp{
 E<:AbstractVector,
 X<:AbstractVector,
 Y<:AbstractVector,
 W<:AbstractVector,
 T<:Real,
 I<:AbstractLSurvivalID,
 V<:AbstractVector,
 M<:AbstractMatrix,
 } <: AbstractLSurvivalResp
 enter::E
 exit::X
 y::Y
 wts::W
 eventtimes::X
 origin::T
 id::Vector{I}
 eventtypes::V
 eventmatrix::M
 end
 LSurvivalCompResp(
 enter::E,
 exit::X,
 y::Y,
 wts::W,
 id::Vector{I}
 )
 LSurvivalCompResp(
 enter::E,
 exit::X,
 y::Y,
 id::Vector{I}
 )
 LSurvivalCompResp(
 enter::E,
 exit::X,
 y::Y,
 wts::W,
 )
 LSurvivalCompResp(
 enter::E,
 exit::X,
 y::Y,
 )
 LSurvivalCompResp(
  exit::X,
  y::Y,
  ) where {X<:Vector,Y<:Union{Vector{<:Real},BitVector}}
source
LSurvival.LSurvivalRespType

Outcome type for survival outcome subject to left truncation and right censoring.

Will not generally be needed by users

Parameters

  • enter: Time at observation start
  • exit: Time at observation end
  • y: event occurrence in observation
  • wts: observation weights
  • eventtimes: unique event times
  • origin: origin on the time scale
  • id: person level identifier (must be wrapped in ID() function)
 struct LSurvivalResp{
 E<:AbstractVector,
 X<:AbstractVector,
 Y<:AbstractVector,
 W<:AbstractVector,
 T<:Real,
 I<:AbstractLSurvivalID,
 } <: AbstractLSurvivalResp
 enter::E
 exit::X
 y::Y
 wts::W
 eventtimes::E
 origin::T
 id::Vector{I}
 end
 LSurvivalResp(
    enter::E,
    exit::X,
    y::Y,
    wts::W,
    id::Vector{I},
  ) where {
    E<:Vector,
    X<:Vector,
    Y<:Union{Vector{<:Real},BitVector},
    W<:Vector,
    I<:AbstractLSurvivalID,
}
 LSurvivalResp(
 enter::E,
 exit::X,
 y::Y,
 id::Vector{I},
 ) 
 LSurvivalResp(
  y::Vector{Y},
  wts::W,
  id::Vector{I},
  ) where {Y<:AbstractSurvTime,W<:Vector,I<:AbstractLSurvivalID}
 LSurvivalResp(
  enter::E,
  exit::X,
  y::Y,
  ) where {E<:Vector,X<:Vector,Y<:Union{Vector{<:Real},BitVector}}
 LSurvivalResp(exit::X, y::Y) where {X<:Vector,Y<:Vector}

Examples

  # no late entry
  LSurvivalResp([.5, .6], [1,0])
source
LSurvival.PHModelType

PHModel: Mutable object type for proportional hazards regression (not generally needed for users)

Parameters

  • R Survival response

  • P # parameters

  • ties String: "efron" or "breslow"

  • fit Bool: logical for whether the model has been fitted

  • bh AbstractMatrix: baseline hazard estimates

    Signatures

 mutable struct PHModel{G<:LSurvivalResp,L<:AbstractLSurvivalParms} <: AbstractPH
 R::G        # Survival response
 P::L        # parameters
 ties::String #"efron" or"breslow"
 fit::Bool
 bh::AbstractMatrix
 end

 PHModel(
 R::G,
 P::L,
 ties::String,
 fit::Bool,
 ) where {G<:LSurvivalResp,L<:AbstractLSurvivalParms}
 PHModel(R::G, P::L, ties::String) where {G<:LSurvivalResp,L<:AbstractLSurvivalParms}
 PHModel(R::G, P::L) where {G<:LSurvivalResp,L<:AbstractLSurvivalParms}

Methods: fit, coef, confint, std_err, show

Example

 using LSurvival
 using Random
 import LSurvival: _stepcox!, dgm_comprisk

 z,x,t,d, event,wt = dgm_comprisk(MersenneTwister(1212), 100);
 enter = zeros(length(t));
 X = hcat(x,z);
 R = LSurvivalResp(enter, t, Int.(d), wt)
 P = PHParms(X)
 mf = PHModel(R,P)
  LSurvival._fit!(mf)
source
LSurvival.PHSurvType

Mutable type for proportional hazards models (not generally needed by users)

PHSsurv: Object type for proportional hazards regression

surv::Vector{Float64} risk::Matrix{Float64} basehaz::Vector{Float64} event::Vector{Float64}

  • fitlist: vector of PHSurv objects (Cox model fits)
  • eventtypes: vector of unique event types
  • times: unique event times
  • surv: Overall survival at each time
  • risk: Cause-specific risk at each time (1 for each outcome type)
  • basehaz: baseline hazard for a specific event type
  • event: value of event type that occurred at each time

Methods: fit, show

mutable struct PHSurv{G<:Array{T} where {T<:PHModel}} <: AbstractNPSurv
fitlist::G        
eventtypes::AbstractVector
times::AbstractVector
surv::Vector{Float64}
risk::Matrix{Float64}
basehaz::Vector{Float64}
event::Vector{Float64}
end

PHSurv(fitlist::Array{T}, eventtypes) where {T<:PHModel}
PHSurv(fitlist::Array{T}) where {T<:PHModel}
source
LSurvival.StrataType

Type for identifying individuals in survival outcomes. Used for the strata argument in PHModel (not yet implemented)

Accepts any Number or String. There is no significance to having this particular struct, but it enables easier use of multiple dispatch.

[Strata(i) for i in 1:10]
source
LSurvival._update_PHParms!Method

Update the partial likelihood, gradient and Hessian values from a Cox model fit (used during fitting, not generally useful for users).

Uses Breslow's or Efron's partial likelihood.

Updates over all observations

Signature

_update_PHParms!(
 m::M,
 # big indexes
 ne::I,
 caseidxs::Vector{Vector{T}},
 risksetidxs::Vector{Vector{T}},
 ) where {M<:AbstractPH,I<:Int,T<:Int}

updatePHParms!(m, risksetidxs, caseidxs, ne, den)

source
LSurvival.aalen_johansenMethod

Aalen-Johansen estimator for cumulative cause-specific risk (in the presence of competing events)

Signatures

 StatsBase.fit!(m::T; kwargs...) where {T<:AbstractNPSurv}

 aalen_johansen(enter::AbstractVector, exit::AbstractVector, y::AbstractVector,
   ; kwargs...)

Keyword arguments

  • wts::Vector{<:Real} = similar(enter, 0); vector of case weights (or zero length vector) for each observation
  • id::Vector{<:AbstractLSurvivalID} = [ID(i) for i in eachindex(y)]; Vector of AbstractSurvID objects denoting observations that form a single unit (used in bootstrap and jackknife methods)
  • atol = 0.00000001; absolute tolerance for defining tied event times
  • keepy = true; keep the outcome vector after fitting (may save memory with large datasets)
  • eps = 0.00000001; deprecated (replaced by atol)
using LSurvival
using Random
z,x,t,d, event,wt = LSurvival.dgm_comprisk(MersenneTwister(1212), 1000);
enter = zeros(length(t));
   # event variable is coded 0[referent],1,2
m = fit(AJSurv, enter, t, event)
mw = fit(AJSurv, enter, t, event, wts=wt)

or, equivalently:

aalen_johansen(enter, t, event, wts=wt)
source
LSurvival.bootstrapMethod

Bootstrapping coefficients of a proportional hazards model

Signatures

# single bootstrap draw, keeping the entire object
bootstrap(rng::MersenneTwister, m::PHModel)
bootstrap(m::PHModel)
# muliple bootstrap draws, keeping only coefficient estimates
bootstrap(rng::MersenneTwister, m::PHModel, iter::Int; kwargs...)
bootstrap(m::PHModel, iter::Int; kwargs...)

Returns:

  • If using bootstrap(m): a single bootstrap draw
  • If using bootstrap(m, 10) (e.g.): 10 bootstrap draws of the cumulative cause-specific risks at the end of follow up
using LSurvival, Random

id, int, outt, data =
LSurvival.dgm(MersenneTwister(1212), 500, 5; afun = LSurvival.int_0)

d, X = data[:, 4], data[:, 1:3]
weights = rand(length(d))

# survival outcome:
R = LSurvivalResp(int, outt, d, ID.(id))    # specification with ID only
P = PHParms(X)

Mod = PHModel(R, P)
LSurvival._fit!(Mod, start=Mod.P._B, keepx=true, keepy=true)


# careful propogation of bootstrap sampling
idx, R2 = bootstrap(R)
P2 = bootstrap(idx, P)
Modb = PHModel(R2, P2)
LSurvival._fit!(Mod, start=Mod.P._B, keepx=true, keepy=true)

# convenience function for bootstrapping a model
Modc = bootstrap(Mod)
LSurvival._fit!(Modc, start=Modc.P._B);
Modc
Modc.P.X == nothing
Modc.R == nothing

Bootstrap Cox model coefficients

LSurvival._fit!(mb, keepx=true, keepy=true, start=[0.0, 0.0])
using LSurvival, Random
res = z, x, outt, d, event, wts = LSurvival.dgm_comprisk(MersenneTwister(123123), 200)
int = zeros(length(d)) # no late entry
X = hcat(z, x)

mainfit = fit(PHModel, X, int, outt, d .* (event .== 1), keepx=true, keepy=true)

function stddev_finite(x)
 n = length(x)
 mnx = sum(x)/n
 ret = sum((x .- mnx) .^ 2)
 ret /= n-1
 sqrt(ret)
end

# bootstrap standard error versus asymptotic
mb = bootstrap(MersenneTwister(123123), mainfit, 200)
## bootstrap standard error
[stddev_finite(mb[:,i]) for i in 1:2]
## asymptotic standard error
stderror(mainfit)
source
LSurvival.bootstrapMethod

Bootstrap methods for Aalen-Johansen cumulative risk estimator

Signatures

 # single bootstrap draw, keeping the entire object
 bootstrap(rng::MersenneTwister, m::AJSurv)
 bootstrap(m::AJSurv)

 # muliple bootstrap draws, keeping only coefficient estimates
 bootstrap(rng::MersenneTwister, m::AJSurv, iter::Int; kwargs...)
 bootstrap(m::AJSurv, iter::Int; kwargs...)

Returns:

  • If using bootstrap(m): a single bootstrap draw
  • If using bootstrap(m, 10) (e.g.): 10 bootstrap draws of the cumulative cause-specific risks at the end of follow up
using LSurvival
using Random

z, x, t, d, event, wt = LSurvival.dgm_comprisk(MersenneTwister(1212), 100)
id = 1:length(x)
enter = zeros(length(t))

aj1 = aalen_johansen(enter, t, event, id=ID.(id), wts=wt)
aj2 = bootstrap(aj1, keepy=false);
ajboot = bootstrap(aj1, 10, keepy=false);
aj1


aj1.R
aj2.R
source
LSurvival.bootstrapMethod

Bootstrap methods for Kaplan-Meier survival curve estimator

Signatures

 # single bootstrap draw, keeping the entire object
 bootstrap(rng::MersenneTwister, m::KMSurv)
 bootstrap(m::KMSurv)

 # muliple bootstrap draws, keeping only coefficient estimates
 bootstrap(rng::MersenneTwister, m::KMSurv, iter::Int; kwargs...)
 bootstrap(m::KMSurv, iter::Int; kwargs...)

Returns:

  • If using bootstrap(m): a single bootstrap draw
  • If using bootstrap(m, 10) (e.g.): 10 bootstrap draws of the survival probability at the end of follow up
using LSurvival
using Random

id, int, outt, data =
LSurvival.dgm(MersenneTwister(1212), 20, 5; afun = LSurvival.int_0)

d, X = data[:, 4], data[:, 1:3]
wts = rand(length(d))

km1 = kaplan_meier(int, outt, d, id=ID.(id), wts=wts)
km2 = bootstrap(km1, keepy=false)
km3 = bootstrap(km1, 10, keepy=false)
km1

km1.R
km2.R
source
LSurvival.bootstrapMethod

Bootstrapping coefficients of a proportional hazards model

Signatures

# single bootstrap draw, keeping the entire object
bootstrap(rng::MersenneTwister, m::PHModel)
bootstrap(m::PHModel)
# muliple bootstrap draws, keeping only coefficient estimates
bootstrap(rng::MersenneTwister, m::PHModel, iter::Int; kwargs...)
bootstrap(m::PHModel, iter::Int; kwargs...)

Returns:

  • If using bootstrap(m): a single bootstrap draw
  • If using bootstrap(m, 10) (e.g.): 10 bootstrap draws of the cumulative cause-specific risks at the end of follow up
using LSurvival, Random

id, int, outt, data =
LSurvival.dgm(MersenneTwister(1212), 500, 5; afun = LSurvival.int_0)

d, X = data[:, 4], data[:, 1:3]
weights = rand(length(d))

# survival outcome:
R = LSurvivalResp(int, outt, d, ID.(id))    # specification with ID only
P = PHParms(X)

Mod = PHModel(R, P)
LSurvival._fit!(Mod, start=Mod.P._B, keepx=true, keepy=true)


# careful propogation of bootstrap sampling
idx, R2 = bootstrap(R)
P2 = bootstrap(idx, P)
Modb = PHModel(R2, P2)
LSurvival._fit!(Mod, start=Mod.P._B, keepx=true, keepy=true)

# convenience function for bootstrapping a model
Modc = bootstrap(Mod)
LSurvival._fit!(Modc, start=Modc.P._B);
Modc
Modc.P.X == nothing
Modc.R == nothing

Bootstrap Cox model coefficients

LSurvival._fit!(mb, keepx=true, keepy=true, start=[0.0, 0.0])
using LSurvival, Random
res = z, x, outt, d, event, wts = LSurvival.dgm_comprisk(MersenneTwister(123123), 200)
int = zeros(length(d)) # no late entry
X = hcat(z, x)

mainfit = fit(PHModel, X, int, outt, d .* (event .== 1), keepx=true, keepy=true)

function stddev_finite(x)
 n = length(x)
 mnx = sum(x)/n
 ret = sum((x .- mnx) .^ 2)
 ret /= n-1
 sqrt(ret)
end

# bootstrap standard error versus asymptotic
mb = bootstrap(MersenneTwister(123123), mainfit, 200)
## bootstrap standard error
[stddev_finite(mb[:,i]) for i in 1:2]
## asymptotic standard error
stderror(mainfit)
source
LSurvival.bootstrapMethod

Bootstrap sampling of a proportional hazards predictor object

using LSurvival, Random

id, int, outt, data =
LSurvival.dgm(MersenneTwister(1212), 20, 5; afun = LSurvival.int_0)

d, X = data[:, 4], data[:, 1:3]
weights = rand(length(d))

# survival outcome:
R = LSurvivalResp(int, outt, d, ID.(id))    # specification with ID only
P = PHParms(X)
idx, R2 = bootstrap(R)
P2 = bootstrap(idx, P)

Mod = PHModel(R2, P2)
LSurvival._fit!(Mod, start=Mod.P._B)
source
LSurvival.bootstrapMethod

Bootstrapping sampling of a competing risk survival response

Signatures

bootstrap(rng::MersenneTwister, R::T) where {T<:LSurvivalCompResp}
bootstrap(R::T) where {T<:LSurvivalCompResp}
z,x,t,d,event,weights =
LSurvival.dgm_comprisk(MersenneTwister(1212), 300)
enter = zeros(length(event))

# survival outcome:
R = LSurvivalCompResp(enter, t, event, weights, ID.(collect(1:length(t))))    # specification with ID only
bootstrap(R) # note that entire observations/clusters identified by id are kept
source
LSurvival.bootstrapMethod

Bootstrapping sampling of a survival response

id, int, outt, data =
LSurvival.dgm(MersenneTwister(1212), 20, 5; afun = LSurvival.int_0)

d, X = data[:, 4], data[:, 1:3]
weights = rand(length(d))

# survival outcome:
R = LSurvivalResp(int, outt, d, ID.(id))    # specification with ID only
source
LSurvival.calcpMethod

Two-tailed p-value for a (null) standard normal distribution depends on SpecialFunctions https://en.wikipedia.org/wiki/Normal_distribution

    calcp(z) = 1.0 - SpecialFunctions.erf(abs(z) / sqrt(2))

    calcp(1.96)

source
LSurvival.cdfchisqMethod

quantile function for a chi-squared distribution depends on SpecialFunctions https://en.wikipedia.org/wiki/Chi-squared_distribution

Source code, example:

    cdfchisq(df, x) = SpecialFunctions.gamma_inc(df / 2, x / 2, 0)[1]

    cdfchisq(3, 3.45)
source
LSurvival.cdfnormMethod

quantile function for a standard normal distribution depends on SpecialFunctions https://en.wikipedia.org/wiki/Normal_distribution

Source code, example:

    cdfnorm(z) = 0.5 * (1 + SpecialFunctions.erf(z / sqrt(2)))
    
    cdfnorm(1.96)
source
LSurvival.coxphMethod

Fit method for AbstractPH objects (Cox models)

Keyword arguments (used here, and passed on to internal structs)

  • ties "breslow" or "efron" (default)

  • wts observation weights

  • ties "breslow" or "efron" (default)

  • offset not currently used at all

  • fitargs arguments passed to other structs, which include

    • id cluster or individual level ID (defaults to a unique value for each row of data) see note below on ID
    • contrasts StatsModel style contrasts (dicts) that can be used for variable transformations/indicator variable creation (e.g. https://juliastats.org/StatsModels.jl/stable/contrasts/)
  • Arguments passed onto fitting routine:

    • eps (default: Float64 = 1e-9) tolerance for declaring convergence. Model is determined to be converged when relative change in log-partial likelihood is < eps .
    • getbasehaz (default: true): estimate baseline hazard
    • start (default: nothing) nothing, or vector of floats corresponding to initial values for parameters. Note that this defaults to a vector of zeros when set to nothing, and setting to other values invalidates some of the test statistics reported by default with coxph.
    • keepx (default: true) logical. Keep design matrix in AbstractPH object output (set to false for slight computational gain).
    • keepy (default: true)logical. Keep outcome in AbstractPH object output (set to false for slight computational gain).
    • bootstrap_sample (default: false) Fit the model to a bootstrap sample of the data (not generally used by end-users, but provides some convenience toward bootstrap variance estimation).
    • bootstrap_rng (default: Random.MersenneTwister()) Random number seed used when drawing a bootstrap sample of the data (not generally used by end-users, but provides some convenience toward bootstrap variance estimation).

    Signatures

  fit(::Type{M},
  X::AbstractMatrix,#{<:FP},
  enter::AbstractVector{<:Real},
  exit::AbstractVector{<:Real},
  y::Union{AbstractVector{<:Real},BitVector}
  ;
  ties ="breslow",
  wts::AbstractVector{<:Real}      = similar(y, 0),
  offset::AbstractVector{<:Real}   = similar(y, 0),
  fitargs...) where {M<:AbstractPH}
 coxph(f::FormulaTerm, data; kwargs...)
  coxph(X, enter, exit, y, args...; kwargs...)
   using LSurvival, Random
   z,x,t,d, event,wt = LSurvival.dgm_comprisk(MersenneTwister(1212), 1000);
   enter = zeros(length(t));
   X = hcat(x,rand(length(x)));
    m = fit(PHModel, X, enter, t, d, ties="efron")
   m2 = fit(PHModel, X, enter, t, d, ties="breslow")
   coeftable(m)

Note on use of id keyword

id is not needed in person-period structure data for standard estimates or confidence intervals

  using Random, LSurvival
     id, int, outt, dat =
         LSurvival.dgm(MersenneTwister(123123), 100, 100; afun = LSurvival.int_0)
     data = (
             int = int,
             outt = outt,
             d = dat[:,4] .== 1,
             x = dat[:,1],
             z = dat[:,2]
     )

     f = @formula(Surv(int, outt,d)~x+z)
     coxph(f, data)

BUT, you must specify id to get appropriate robust variance and some other statistics.

Here is an example where the same data are presented in two different ways, which should yield identical statistics when used in Cox model.

 dat1 = (
    time = [1,1,6,6,8,9],
    status = [1,0,1,1,0,1],
    x = [1,1,1,0,0,0]
  )
  ft = coxph(@formula(Surv(time,status)~x),dat1)
  bic(ft)
  nobs(ft)
  dof_residual(ft)
  # lrtest is another one

  stderror(ft)                     # model based
  stderror(ft, type="robust")   # robust standard error, based on dfbeta residuals
  ft

  # now using "clustered" data with multiple observations per individual
 dat1clust= (
     id = [1,2,3,3,4,4,5,5,6,6],
     enter = [0,0,0,1,0,1,0,1,0,1],
     exit = [1,1,1,6,1,6,1,8,1,9],
     status = [1,0,0,1,0,1,0,0,0,1],
     x = [1,1,1,1,0,0,0,0,0,0]
 )
 
 # use the `id` parameter with the ID struct
 ft2 = coxph(@formula(Surv(enter, exit, status) ~ x),dat1clust, id=ID.(dat1clust.id))
 bic(ft2)                       # CORRECT        
 nobs(ft2)                      # CORRECT
 dof_residual(ft2)              # CORRECT
  
 stderror(ft2)                  # model based (CORRECT)
 stderror(ft2, type="robust")   # robust standard error, based on `id` level dfbeta residuals (CORRECT)
 # once robust SE is calculated, coefficient table uses the robust SE for confidence intervals and test statistics
 ft2   # CORRECT (compare to `ft` object)

NOTE THE FOLLOWING IS INCORRECT because the id keyword is omitted

 ft2w = coxph(@formula(Surv(enter, exit, status) ~ x),dat1clust)
 bic(ft2w)                          # INCORRECT 
 nobs(ft2w)                         # INCORRECT
 dof_residual(ft2w)                 # INCORRECT

 stderror(ft2w)                     # model based (CORRECT)
 stderror(ft2w, type="robust")      # robust variance (INCORRECT)
 
 ft2w # the coefficient table now shows incorrect confidence intervals and test statistics
  
source
LSurvival.ddloglik!Method

Hessian contribution for an observation in a parametric survival model

    d = m.d
    i = 1
    enter = m.R.enter[i]
    exit = m.R.exit[i]
    y = m.R.y[i]
    wts = m.R.wts[i]
    x = m.P.X[i,:]
    θ=[1,0,.4]
source
LSurvival.dgmMethod

Generating discrete survival data without competing risks

Signatures

dgm(rng::MersenneTwister, n::Int, maxT:Int; afun = int_0, yfun = yprob, lfun = lprob)

dgm(n::Int, maxT::Int; kwargs...)

Usage: dgm(rng, n, maxT;afun=int0, yfun=yprob, lfun=lprob) dgm(n, maxT;afun=int0, yfun=yprob, lfun=lprob)

Where afun, yfun, and lfun are all functions that take arguments v,l,a and output time-specific values of a, y, and l respectively Example:


expit(mu) =  inv(1.0+exp(-mu))

function aprob(v,l,a)
expit(-1.0 + 3*v + 2*l)
end
  
function lprob(v,l,a)
expit(-3 + 2*v + 0*l + 0*a)
end
  
function yprob(v,l,a)
expit(-3 + 2*v + 0*l + 2*a)
end
  # 10 individuals followed for up to 5 times
LSurvival.dgm(10, 5;afun=aprob, yfun=yprob, lfun=lprob)
source
LSurvival.dgm_compriskMethod

Generating continuous survival data with competing risks

Signatures

dgm_comprisk(rng::MersenneTwister, n::Int)

dgm_comprisk(n::Int)
    - rng = random number generator    
    - n = sample size

Example:

using LSurvival
# 100 individuals with two competing events
z,x,t,d,event,weights = LSurvival.dgm_comprisk(100)
    
source
LSurvival.dgm_phmodelMethod

Proportional hazards model from a Weibull distribution with scale parameter λ

dgm_phmodel(rng::MersenneTwister, n::Int; 
    λ=1.25,
    β=[0.0, 0.0]
    )

keyword parameters:

  • λ: Weibull scale parameter
  • β: vector of regression coefficients
rng = MersenneTwister()
X, t, d, _ = dgm_phmodel(2000; λ=1.25,β=[1.0, -0.5])
coxph(@formula(Surv(t0,t,d)~x+z), (t=t,t0=t.*0,d=d,x=X[:,1],z=X[:,2]))
source
LSurvival.dloglik!Method

Gradient contribution for an observation in a parametric survival model

    d = m.d
    i = 1
    enter = m.R.enter[i]
    exit = m.R.exit[i]
    y = m.R.y[i]
    wts = m.R.wts[i]
    x = m.P.X[i,:]
    θ=[1,0,.4]
source
LSurvival.dlpdf_gengammaMethod

α=0.1 ρ =-1.2 κ=1.9 t = 2.0

exp((log(t) - α)exp(-ρ) - ρ) - exp(κ - ρ) (α - log(t))exp(κ - ρ) + (log(t) - α)exp((log(t) - α)exp(-ρ) - ρ) - 1 (log(t) - α)exp(κ - ρ) - exp(κ)SpecialFunctions.digamma(exp(κ))

source
LSurvival.jackknifeMethod

Obtain jackknife (leave-one-out) estimates from a Aalen-Johansen risk curve (risk at end of follow-up) by refitting the model n times

Signatures

jackknife(m::M;kwargs...) where {M<:AJSurv}
source
LSurvival.jackknifeMethod

Obtain jackknife (leave-one-out) estimates from a Kaplan-Meier survival curve (survival at end of follow-up) by refitting the model n times

Signatures

jackknife(m::M;kwargs...) where {M<:KMSurv}
using LSurvival, Random, StatsBase

dat1 = (time = [1, 1, 6, 6, 8, 9], status = [1, 0, 1, 1, 0, 1], x = [1, 1, 1, 0, 0, 0])

dat1clust = (
  id = [1, 2, 3, 3, 4, 4, 5, 5, 6, 6],
  enter = [0, 0, 0, 1, 0, 1, 0, 1, 0, 1],
  exit = [1, 1, 1, 6, 1, 6, 1, 8, 1, 9],
  status = [1, 0, 0, 1, 0, 1, 0, 0, 0, 1],
  x = [1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
)

m = kaplan_meier(dat1.time, dat1.status)
a = aalen_johansen(dat1.time, dat1.status)
mc = kaplan_meier(dat1clust.enter, dat1clust.exit, dat1clust.status, id=ID.(dat1clust.id))
ac = aalen_johansen(dat1clust.enter, dat1clust.exit, dat1clust.status, id=ID.(dat1clust.id))
jk = jackknife(m);
jkc = jackknife(mc);
jka = jackknife(a);
bs = bootstrap(mc, 100);
std(bs[:,1])
stderror(m, type="jackknife")
stderror(mc, type="jackknife")
@assert jk == jkc
source
LSurvival.jackknifeMethod

Obtain jackknife (leave-one-out) estimates from a Cox model by refitting the model n times

using LSurvival, Random, StatsBase
id, int, outt, data =
LSurvival.dgm(MersenneTwister(112), 100, 10; afun = LSurvival.int_0)
data[:, 1] = round.(data[:, 1], digits = 3)
d, X = data[:, 4], data[:, 1:3]
wt = rand(length(d))
wt ./= (sum(wt) / length(wt))
m = coxph(X,int, outt,d, wts=wt, id=ID.(id))

jk = jackknife(m);
bs = bootstrap(MersenneTwister(12321), m, 1000);
N = nobs(m)
#comparing estimate with jackknife estimate with bootstrap mean
hcat(coef(m), mean(jk, dims=1)[1,:], mean(bs, dims=1)[1,:])
semb = stderror(m)
sebs = std(bs, dims=1)
sero = stderror(m, type="robust")
sejk = stderror(m, type="jackknife")
sejk_manual = std(jk, dims=1, corrected=false) .* sqrt(N-1)

sqrt.(diag(LSurvival.jackknife_vcov(m)))

hcat(semb, sebs[1,:], sejk, sejk_manual[1,:], sero)

dat1 = (time = [1, 1, 6, 6, 8, 9], status = [1, 0, 1, 1, 0, 1], x = [1, 1, 1, 0, 0, 0])
dat1clust = (
  id = [1, 2, 3, 3, 4, 4, 5, 5, 6, 6],
  enter = [0, 0, 0, 1, 0, 1, 0, 1, 0, 1],
  exit = [1, 1, 1, 6, 1, 6, 1, 8, 1, 9],
  status = [1, 0, 0, 1, 0, 1, 0, 0, 0, 1],
  x = [1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
)

m = coxph(@formula(Surv(time, status)~x),dat1)
mc = coxph(@formula(Surv(enter, exit, status)~x),dat1clust, id=ID.(dat1clust.id))
jk = jackknife(m);
jkc = jackknife(mc);
bs = bootstrap(mc, 100);
std(bs[:,1])
stderror(m, type="jackknife")
stderror(mc, type="jackknife")
@assert jk == jkc
source
LSurvival.kaplan_meierMethod

Kaplan-Meier estimator for cumulative conditional risk

Signatures

StatsBase.fit!(m::T; kwargs...) where {T<:AbstractNPSurv}

kaplan_meier(enter::AbstractVector, exit::AbstractVector, y::AbstractVector,
   ; kwargs...)

Keyword arguments

  • wts::Vector{<:Real} = similar(enter, 0); vector of case weights (or zero length vector) for each observation
  • id::Vector{<:AbstractLSurvivalID} = [ID(i) for i in eachindex(y)]; Vector of AbstractSurvID objects denoting observations that form a single unit (used in bootstrap and jackknife methods)
  • atol = 0.00000001; absolute tolerance for defining tied event times
  • censval = 0; value of the outcome to be considered a censored event
  • keepy = true; keep the outcome vector after fitting (may save memory with large datasets)
  • eps = 0.00000001; deprecated (replaced by atol)
using LSurvival
using Random
z,x,t,d, event,wt = LSurvival.dgm_comprisk(MersenneTwister(1212), 1000);
enter = zeros(length(t));
m = fit(KMSurv, enter, t, d)
mw = fit(KMSurv, enter, t, d, wts=wt)

or, equivalently:

kaplan_meier(enter, t, d, wts=wt)
source
LSurvival.lgh!Method

dat1 = (time = [1, 1, 6, 6, 8, 9], status = [1, 0, 1, 1, 0, 1], x = [1, 1, 1, 0, 0, 0]) enter = zeros(length(dat1.time)) t = dat1.time d = dat1.status X = hcat(ones(length(dat1.x)), dat1.x) wt = ones(length(t))

dist = Exponential() P = PSParms(X[:,1:1], extraparms=length(dist)-1) P = PSParms(X, extraparms=length(dist)-1) P.B P.grad R = LSurvivalResp(dat1.time, dat1.status) # specification with ID only m = PSModel(R,P,dist)

λ=1 θ = rand(2) lgh!(m, θ) θ .+= inv(-m.P.hess) * m.P.grad * λ

lgh!(m, θ .+ [-.00, 0, 0]) lgh!(m, θ .+ [-.01, 0.05, 0.05])

m.P._LL

source
LSurvival.lgh_breslow!Method

Update the partial likelihood, gradient and Hessian values from a Cox model fit (used during fitting, not generally useful for users).

Uses Breslow's partial likelihood.

Updates over all observations

Signature

lgh_breslow!(m::M, j, caseidx, risksetidx) where {M<:AbstractPH}
source
LSurvival.lgh_efron!Method

Update the partial likelihood, gradient and Hessian values from a Cox model fit (used during fitting, not generally useful for users).

Uses Efron's partial likelihood.

Updates over all observations

Signature

lgh_efron!(m::M, j, caseidx, risksetidx) where {M<:AbstractPH}
source
LSurvival.loglikMethod

Log likelihood contribution for an observation in a parametric survival model

    d = m.d
    i = 1
    enter = m.R.enter[i]
    exit = m.R.exit[i]
    y = m.R.y[i]
    wts = m.R.wts[i]
    x = m.P.X[i,:]
    θ=[1,0,.4]
    
    m.P._B
source
LSurvival.lpdfMethod

Log-likelihood calculation for Exponential regression: PDF

β = [-2, 1.2]
x = [2,.1]
ρ = -0.5
t = 3.0
α = dot(β,x)
d = Exponential()
lpdf_gradient(d, θ, t, x)
source
LSurvival.lpdfMethod

Log probability distribution function: Exponential distribution

    β = [-2, 1.2]
    x = [2,.1]
    ρ = -0.5
    t = 3.0
    α = dot(β,x)
    d = Exponential()
    lpdf(d, t)
source
LSurvival.lpdfMethod

log probability distribution function, generalized gamma distribution

Location scale representation (Klein Moeschberger ch 12)

source
LSurvival.lpdfMethod

log probability distribution function, Gamma distribution

Location scale representation (Klein Moeschberger ch 12)

source
LSurvival.lpdfMethod

Log likelihood calculation for Lognormal regression: PDF

β = [-2, 1.2]
x = [2,.1]
ρ = -0.5
t = 3.0
α = dot(β,x)
d = Lognormal()
lpdf(d, vcat(θ,ρ), t, x)
source
LSurvival.lpdfMethod

log probability distribution function: Weibull distribution

Location scale representation (Klein Moeschberger ch 12)

source
LSurvival.lpdfMethod

Log-likelihood calculation for weibull regression: PDF

β = [-2, 1.2]
x = [2,.1]
ρ = -0.5
t = 3.0
α = dot(β,x)
d = Weibull()
lpdf(d, vcat(θ,ρ), t, x)
source
LSurvival.lpdfMethod

Log probability distribution function: Weibull distribution

location scale representation (Klein Moeschberger ch 12)

α=0.1   # location
ρ=-1.2  # log(scale)
time=2
lpdf(Weibull(α, ρ), time)
source
LSurvival.lpdf_gradientMethod

Gradient calculation for Exponential regression: PDF

β = [-2, 1.2]
x = [2,.1]
ρ = -0.5
t = 3.0
α = dot(β,x)
d = Exponential()
lpdf_gradient(d, θ, t, x)
source
LSurvival.lpdf_gradientMethod

Gradient calculation for Lognormal regression: PDF

β = [-2, 1.2]
x = [2,.1]
ρ = -0.5
t = 3.0
α = dot(β,x)
d = Lognormal()
lpdf_gradient(d, vcat(θ,ρ), t, x)
source
LSurvival.lpdf_gradientMethod

Gradient calculation for weibull regression: PDF

β = [-2, 1.2]
x = [2,.1]
ρ = -0.5
t = 3.0
#α = dot(β,x)
d = Weibull()
lpdf_gradient(d, vcat(θ,ρ), t, x)
source
LSurvival.lpdf_hessianMethod

Hessian calculation for Exponential regression: PDF

β = [-2, 1.2]
x = [2,.1]
ρ = -0.5
t = 3.0
α = dot(β,x)
d = Exponential()
lpdf_hessian(d, θ, t, x)
source
LSurvival.lpdf_hessianMethod

Hessian calculation for Weibull distribution: PDF

β = [-2, 1.2]
x = [2,.1]
ρ = -0.5
t = 3.0
α = dot(β,x)
d = Exponential(α)
lpdf_hessian(d, t)
source
LSurvival.lpdf_hessianMethod

Hessian calculation for Log-normal regression: PDF

β = [-2, 1.2]
x = [2,.1]
ρ = -0.5
t = 3.0
α = dot(β,x)
d = Lognormal()
lpdf_hessian(d, vcat(θ,ρ), t, x)
source
LSurvival.lpdf_hessianMethod

Hessian calculation for Log-normal distribution: PDF

β = [-2, 1.2]
x = [2,.1]
ρ = -0.5
t = 3.0
α = dot(β,x)
d = Lognormal(α, ρ)
lpdf_hessian(d, t)
source
LSurvival.lpdf_hessianMethod

Hessian calculation for weibull regression: PDF

β = [-2, 1.2]
x = [2,.1]
ρ = -0.5
t = 3.0
α = dot(β,x)
d = Weibull()
lpdf_hessian(d, vcat(θ,ρ), t, x)
source
LSurvival.lsurvMethod

Log-likelihood calculation for Exponential regression: Survival

β = [-2, 1.2]
x = [2,.1]
ρ = -0.5
t = 3.0
α = dot(β,x)
d = Exponential()
lsurv(d, θ, t, x)
source
LSurvival.lsurvMethod

Log survival function: Exponential distribution

    β = [-2, 1.2]
    x = [2,.1]
    ρ = -0.5
    t = 3.0
    α = dot(β,x)
    d = Exponential()
    lsurv(d, t)
source
LSurvival.lsurvMethod

log probability distribution function, generalized gamma distribution

Location scale representation (Klein Moeschberger ch 12)

source
LSurvival.lsurvMethod

log probability distribution function, Gamma distribution

Location scale representation (Klein Moeschberger ch 12)

source
LSurvival.lsurvMethod

Log likelihood calculation for Log-normal regression: Survival

β = [-2, 1.2]
x = [2,.1]
ρ = -0.5
t = 3.0
α = dot(β,x)
d = Lognormal()
lsurv(d, vcat(θ,ρ), t, x)
source
LSurvival.lsurvMethod

log probability distribution function: Weibull distribution

Location scale representation (Klein Moeschberger ch 12)

source
LSurvival.lsurvMethod

Log-likelihood calculation for weibull regression: Survival

β = [-2, 1.2]
x = [2,.1]
ρ = -0.5
t = 3.0
α = dot(β,x)
d = Weibull()
lsurv(d, vcat(θ,ρ), t, x)
source
LSurvival.lsurvMethod

Log survival distribution function: Weibull distribution

location, log(scale) representation (Klein Moeschberger ch 12)

α=0.1   # location
ρ=-1.2  # log(scale)
time=2
lsurv(Weibull(α, ρ), time)
source
LSurvival.lsurv_gradientMethod

Gradient calculation for Exponential regression: Survival

β = [-2, 1.2]
x = [2,.1]
ρ = -0.5
t = 3.0
α = dot(β,x)
d = Exponential()
lsurv_gradient(d, θ, t, x)
source
LSurvival.lsurv_gradientMethod

Gradient calculation for Log-normal regression: Survival

β = [-2, 1.2]
x = [2,.1]
ρ = -0.5
t = 3.0
α = dot(β,x)
d = Lognormal()
lsurv_gradient(d, vcat(θ,ρ), t, x)
source
LSurvival.lsurv_gradientMethod

Gradient calculation for weibull regression: Survival

β = [-2, 1.2]
x = [2,.1]
ρ = -0.5
t = 3.0
α = dot(β,x)
d = Weibull()
lsurv_gradient(d, vcat(θ,ρ), t, x)
source
LSurvival.lsurv_hessianMethod

Hessian calculation for Exponential regression: Survival

β = [-2, 1.2]
x = [2,.1]
ρ = -0.5
t = 3.0
α = dot(β,x)
d = Exponential()
lsurv_hessian(d, θ, t, x)
source
LSurvival.lsurv_hessianMethod

Hessian calculation for Exponential distribution: Survival

β = [-2, 1.2]
x = [2,.1]
ρ = -0.5
t = 3.0
α = dot(β,x)
d = Exponential(α)
lsurv_hessian(d, t)
source
LSurvival.lsurv_hessianMethod

Hessian calculation for Log-normal regression: Survival

β = [-2, 1.2]
x = [2,.1]
ρ = -0.5
t = 3.0
α = dot(β,x)
d = Lognormal()
lsurv_hessian(d, vcat(θ,ρ), t, x)
source
LSurvival.lsurv_hessianMethod

Hessian calculation for Log-normal distribution: Survival

β = [-2, 1.2]
x = [2,.1]
ρ = -0.5
t = 3.0
α = dot(β,x)
d = Lognormal(α, ρ)
lsurv_hessian(d, t)
source
LSurvival.lsurv_hessianMethod

Hessian calculation for weibull regression: Survival

β = [-2, 1.2]
x = [2,.1]
ρ = -0.5
t = 3.0
α = dot(β,x)
d = Weibull()
lsurv_hessian(d, vcat(θ,ρ), t, x)
source
LSurvival.lsurv_hessianMethod

Hessian calculation for Weibull distribution: Survival

β = [-2, 1.2]
x = [2,.1]
ρ = -0.5
t = 3.0
α = dot(β,x)
d = Weibull(α, ρ)
lsurv_hessian(d, t)
source
LSurvival.qstdnormMethod

quantile function for a standard normal distribution depends on SpecialFunctions https://en.wikipedia.org/wiki/Normal_distribution

Source code, example:
    qstdnorm(p) = sqrt(2) * SpecialFunctions.erfinv(2.0 * p - 1.0)
    
    qstdnorm(.975)
source
LSurvival.qweibullMethod

Quantile function for the Weibull distribution

\[F(t) = egin{cases} 1 - e^{{-t/ρ}^{α}}&, t ≥ 0\ 0&, t < 0 nd{cases} Q(p) = ρ * (log(1/(1-p))^{1/α})\]

lightweight function used for simulation

Note that there is no checking that parameters α,ρ are positively bound, and p ∈ (0,1), and errors will be given if this is not the case

Signature:

qweibull(p::Real,α::Real,ρ::Real)

Source code, example:

qweibull(p, α, ρ) = ρ * ((-log1p(-p))^(1 / α))

# cross reference the approach in the Distributions package
quantile(Distributions.Weibull(.75, 1.1), .3)
LSurvival.qweibull(0.3, .75, 1.1)
source
LSurvival.randweibullMethod

Random draw from Weibull distribution

lightweight function used for simulation

Note that there is no checking that parameters α,ρ are positively bound, and errors will be given if this is not the case

Signatures:

randweibull(rng::MersenneTwister,α::Real,ρ::Real)
randweibull(α::Real,ρ::Real)

Source code, example:

randweibull(rng, α, ρ) = qweibull(rand(rng), α, ρ)
randweibull(α, ρ) = randweibull(MersenneTwister(), α, ρ)

# cross reference the approach in the Distributions package
rand(Distributions.Weibull(.75, 1.1))
randweibull(.75, 1.1)
source
LSurvival.risk_from_coxphmodelsMethod

Survival curve estimation using multiple cox models

Signatures

  risk_from_coxphmodels(fitlist::Vector{T}, args...; kwargs...) where {T<:PHModel}

  fit(::Type{M}, fitlist::Vector{T}, ; fitargs...) where {M<:PHSurv,T<:PHModel}

Optional keywords

  • coef_vectors = nothing(default) or vector of coefficient vectors from the cox models [will default to the coefficients from fitlist models]
  • pred_profile = nothing(default) or vector of specific predictor values of the same length as the coef_vectors[1]
 using LSurvival
 using Random
 # event variable is coded 0[referent],1,2
 z,x,t,d, event,wt = LSurvival.dgm_comprisk(MersenneTwister(1212), 1000);
 enter = zeros(length(t));

 ft1 = coxph(hcat(x,z), enter, t, (event .== 1))
 nidx = findall(event .!= 1)
 ft2 = coxph(hcat(x,z)[nidx,:], enter[nidx], t[nidx], (event[nidx] .== 2))

 # risk at referent levels of `x` and `z`
 risk_from_coxphmodels([ft1,ft2])

 # risk at average levels of `x` and `z`
 mnx = sum(x)/length(x)
 mnz = sum(z)/length(z)
 # equivalent
 fit(PHSurv, [ft1,ft2], pred_profile=[mnx,mnz])
 risk_from_coxphmodels([ft1,ft2], pred_profile=[mnx,mnz])
source
LSurvival.survivaldataMethod

Loading example survival analysis datasets

using LSurvival, Plots # note Plots does not install by default
heartdata, heartmeta = survivaldata("heart")
ft = coxph(@formula(Surv(start, stop, event)~surgery), heartdata);
# plot baseline cumulative hazard
basehazplot(ft)
# plot Schoenfeld residuals
coxdx(ft)  
source
RecipesBase.apply_recipeMethod

Recipe for aalen-johansen risk curve

    using Plots, LSurvival
    res = z, x, outt, d, event, weights = LSurvival.dgm_comprisk(MersenneTwister(123123), 100)
    int = zeros(length(d)) # no late entry
    
        c = fit(AJSurv, int, outt, event)
        #risk2 = aalen_johansen(int, outt, event)
        plot(c)
source
RecipesBase.apply_recipeMethod

Plotting a kaplan meier curve

    using Plots, LSurvival
dat4 = (
    id = [1, 1, 2, 2, 2, 3, 4, 5, 5, 6],
    enter = [1, 2, 5, 4, 6, 7, 3, 6, 8, 0],
    exit = [2, 5, 6, 7, 8, 9, 6, 8, 14, 9],
    status = [0, 1, 0, 0, 1, 0, 1, 0, 0, 1],
    x = [0.1, 0.1, 1.5, 1.5, 1.5, 0, 0, 0, 0, 3],
    z = [1, 1, 0, 0, 0, 0, 0, 1, 1, 0],
    w = [0, 0, 0, 0, 0, 1, 1, 1, 1, 0],
)
R = LSurvivalResp(dat4.enter, dat4.exit, dat4.status)
    k = kaplan_meier(dat4.enter, dat4.exit, dat4.status)
    plot(k)
source
RecipesBase.apply_recipeMethod
function name(::Type{T}) where {T}
    #https://stackoverflow.com/questions/70043313/get-simple-name-of-type-in-julia
    isempty(T.parameters) ? T : T.name.wrapper
end

using Plots, LSurvival
dat2 = (
    enter = [1, 2, 5, 2, 1, 7, 3, 4, 8, 8],
    exit = [2, 3, 6, 7, 8, 9, 9, 9, 14, 17],
    status = [1, 1, 1, 1, 1, 1, 1, 0, 0, 0],
    x = [1, 0, 0, 1, 0, 1, 1, 1, 0, 0],
)
fte = survreg(@formula(Surv(enter, exit, status)~x), dat2)

# density function
aftdist(fte, label="X=0")
aftdist!(fte, covlevels=[1.0, 2.0], color="red", label="X=1")

# Survival function
aftdist(fte, type="surv", label="X=0")
aftdist!(fte, type="surv", covlevels=[1.0, 2.0], color="red", label="X=1")

# hazard function
aftdist(fte, type="haz", label="X=0")
aftdist!(fte, type="haz", covlevels=[1.0, 2.0], color="red", label="X=1")

# Cumulative incidence/risk function
aftdist(fte, type="risk", label="X=0")
aftdist!(fte, type="risk", covlevels=[1.0, 2.0], color="red", label="X=1")
source
RecipesBase.apply_recipeMethod

Plotting baseline hazard for a Cox model

using Plots, LSurvival
dat2 = (
    enter = [1, 2, 5, 2, 1, 7, 3, 4, 8, 8],
    exit = [2, 3, 6, 7, 8, 9, 9, 9, 14, 17],
    status = [1, 1, 1, 1, 1, 1, 1, 0, 0, 0],
    x = [1, 0, 0, 1, 0, 1, 1, 1, 0, 0],
)
fte = coxph(@formula(Surv(enter, exit, status)~x), dat2, maxiter=0)
ftb = coxph(@formula(Surv(enter, exit, status)~x), dat2, ties="breslow", maxiter=0)

plot(fte, label="Efron")
plot!(ftb, label="Breslow")
source
RecipesBase.apply_recipeMethod

using Plots, LSurvival
dat2 = (
    enter = [1, 2, 5, 2, 1, 7, 3, 4, 8, 8],
    exit = [2, 3, 6, 7, 8, 9, 9, 9, 14, 17],
    status = [1, 1, 1, 1, 1, 1, 1, 0, 0, 0],
    x = [1, 0, 0, 1, 0, 1, 1, 1, 0, 0],
)
fte = coxph(@formula(Surv(enter, exit, status)~x), dat2)

coxdx(fte)
source
RecipesBase.apply_recipeMethod
using Plots, LSurvival
dat2 = (
    enter = [1, 2, 5, 2, 1, 7, 3, 4, 8, 8],
    exit = [2, 3, 6, 7, 8, 9, 9, 9, 14, 17],
    status = [1, 1, 1, 1, 1, 1, 1, 0, 0, 0],
    x = [1, 0, 0, 1, 0, 1, 1, 1, 0, 0],
)
fte = coxph(@formula(Surv(enter, exit, status)~x), dat2)

coxinfluence(fte, type="jackknife", par=1)
coxinfluence!(fte, type="dfbeta", color=:red, par=1)
source
RecipesBase.apply_recipeMethod

Plotting baseline hazard for a Cox model

using Plots, LSurvival
dat4 = (
    id = [1, 1, 2, 2, 2, 3, 4, 5, 5, 6],
    enter = [1, 2, 5, 4, 6, 7, 3, 6, 8, 0],
    exit = [2, 5, 6, 7, 8, 9, 6, 8, 14, 9],
    status = [0, 1, 0, 0, 1, 0, 1, 0, 0, 1],
    x = [0.1, 0.1, 1.5, 1.5, 1.5, 0, 0, 0, 0, 3],
    z = [1, 1, 0, 0, 0, 0, 0, 1, 1, 0],
    w = [0, 0, 0, 0, 0, 1, 1, 1, 1, 0],
)

k = kaplan_meier(dat4.enter, dat4.exit, dat4.status)

lognlogplot(k)
source
RecipesBase.apply_recipeMethod

Plotting LSurvivalResp objects (outcomes in cox models, kaplan meier curves, parametric survival models)

Recipe for plotting time-to-event outcomes

using Plots, LSurvival

dat4 = (
    id = [1, 1, 2, 2, 2, 3, 4, 5, 5, 6],
    enter = [1, 2, 5, 4, 6, 7, 3, 6, 8, 0],
    exit = [2, 5, 6, 7, 8, 9, 6, 8, 14, 9],
    status = [0, 1, 0, 0, 1, 0, 1, 0, 0, 1],
    x = [0.1, 0.1, 1.5, 1.5, 1.5, 0, 0, 0, 0, 3],
    z = [1, 1, 0, 0, 0, 0, 0, 1, 1, 0],
    w = [0, 0, 0, 0, 0, 1, 1, 1, 1, 0],
)
R = LSurvivalResp(dat4.enter, dat4.exit, dat4.status)
plot([[R.enter[i], R.exit[i]] for i in eachindex(R.enter)], [[i, i] for i in values(R.id)])
source
RecipesBase.apply_recipeMethod

Recipe for cox-model based risk curves

    using Plots, LSurvival, Random, StatsBase
    res = z, x, outt, d, event, wts = LSurvival.dgm_comprisk(MersenneTwister(123123), 100)
    X = hcat(z, x)
    int = zeros(length(d)) # no late entry
    ft1 = fit(PHModel, X, int, outt, d .* (event .== 1), wts=wts)
    ft2 = fit(PHModel, X, int, outt, d .* (event .== 2), wts=wts)
    c = risk_from_coxphmodels([ft1, ft2], pred_profile = mean(X, dims=1))
    
    plot(c)
source
StatsAPI.confintMethod

Greenwood's formula for variance and confidence intervals of a Aalen-Johansen risk function

Signatures:

StatsBase.stderror(m::AJSurv)

StatsBase.confint(m:AJSurv; level=0.95, method="normal")

Keyword arguments

  • method
    • "normal" normality-based confidence intervals
    • "lognlog" log(-log(S(t))) based confidence intervals
res = z, x, outt, d, event, wts = LSurvival.dgm_comprisk(MersenneTwister(123123), 100)
int = zeros(length(d)) # no late entry
m = fit(AJSurv, int, outt, event)
stderror(m)
confint(m, level=0.95)
source
StatsAPI.confintMethod

Greenwood's formula for variance and confidence intervals of a Kaplan-Meier survival curve

Signatures:

StatsBase.stderror(m::KMSurv)

StatsBase.confint(m:KMSurv; level=0.95, method="normal")

Keyword arguments

method:

  • "normal" normality-based confidence intervals
  • "lognlog" log(-log(S(t))) based confidence intervals
using LSurvival
using Random
z,x,t,d, event,wt = LSurvival.dgm_comprisk(MersenneTwister(1212), 1000);
enter = zeros(length(t));
m = fit(KMSurv, enter, t, d)
mw = fit(KMSurv, enter, t, d, wts=wt)
stderror(m)
confint(m, method="normal")
confint(m, method="lognlog") # log-log transformation
source
StatsAPI.confintMethod
using LSurvival
 dat1= (
   time = [1,1,6,6,8,9],
   status = [1,0,1,1,0,1],
   x = [1,1,1,0,0,0]
 )

 ft = coxph(@formula(Surv(time, status) ~ x),dat1, keepx=true)
 # model-based variance
 confint(ft)

 # robust variance
 confint(ft, type="robust")

for cluster confidence intervals

 dat1clust= (
   id = [1,2,3,3,4,4,5,5,6,6],
   enter = [0,0,0,1,0,1,0,1,0,1],
   exit = [1,1,1,6,1,6,1,8,1,9],
   status = [1,0,0,1,0,1,0,0,0,1],
   x = [1,1,1,1,0,0,0,0,0,0]
 )

 ft2 = coxph(@formula(Surv(enter, exit, status) ~ x),dat1clust, id=ID.(dat1clust.id), keepx=true)
 # model-based variance
 confint(ft2)

 # robust variance
 confint(ft2, type="robust")
source
StatsAPI.fit!Method

Kaplan-Meier estimator for cumulative conditional risk

Signatures

StatsBase.fit!(m::T; kwargs...) where {T<:AbstractNPSurv}

kaplan_meier(enter::AbstractVector, exit::AbstractVector, y::AbstractVector,
   ; kwargs...)

Keyword arguments

  • wts::Vector{<:Real} = similar(enter, 0); vector of case weights (or zero length vector) for each observation
  • id::Vector{<:AbstractLSurvivalID} = [ID(i) for i in eachindex(y)]; Vector of AbstractSurvID objects denoting observations that form a single unit (used in bootstrap and jackknife methods)
  • atol = 0.00000001; absolute tolerance for defining tied event times
  • censval = 0; value of the outcome to be considered a censored event
  • keepy = true; keep the outcome vector after fitting (may save memory with large datasets)
  • eps = 0.00000001; deprecated (replaced by atol)
using LSurvival
using Random
z,x,t,d, event,wt = LSurvival.dgm_comprisk(MersenneTwister(1212), 1000);
enter = zeros(length(t));
m = fit(KMSurv, enter, t, d)
mw = fit(KMSurv, enter, t, d, wts=wt)

or, equivalently:

kaplan_meier(enter, t, d, wts=wt)

Aalen-Johansen estimator for cumulative cause-specific risk (in the presence of competing events)

Signatures

 StatsBase.fit!(m::T; kwargs...) where {T<:AbstractNPSurv}

 aalen_johansen(enter::AbstractVector, exit::AbstractVector, y::AbstractVector,
   ; kwargs...)

Keyword arguments

  • wts::Vector{<:Real} = similar(enter, 0); vector of case weights (or zero length vector) for each observation
  • id::Vector{<:AbstractLSurvivalID} = [ID(i) for i in eachindex(y)]; Vector of AbstractSurvID objects denoting observations that form a single unit (used in bootstrap and jackknife methods)
  • atol = 0.00000001; absolute tolerance for defining tied event times
  • keepy = true; keep the outcome vector after fitting (may save memory with large datasets)
  • eps = 0.00000001; deprecated (replaced by atol)
using LSurvival
using Random
z,x,t,d, event,wt = LSurvival.dgm_comprisk(MersenneTwister(1212), 1000);
enter = zeros(length(t));
   # event variable is coded 0[referent],1,2
m = fit(AJSurv, enter, t, event)
mw = fit(AJSurv, enter, t, event, wts=wt)

or, equivalently:

aalen_johansen(enter, t, event, wts=wt)

Survival curve estimation using multiple cox models

Signatures

  risk_from_coxphmodels(fitlist::Vector{T}, args...; kwargs...) where {T<:PHModel}

  fit(::Type{M}, fitlist::Vector{T}, ; fitargs...) where {M<:PHSurv,T<:PHModel}

Optional keywords

  • coef_vectors = nothing(default) or vector of coefficient vectors from the cox models [will default to the coefficients from fitlist models]
  • pred_profile = nothing(default) or vector of specific predictor values of the same length as the coef_vectors[1]
 using LSurvival
 using Random
 # event variable is coded 0[referent],1,2
 z,x,t,d, event,wt = LSurvival.dgm_comprisk(MersenneTwister(1212), 1000);
 enter = zeros(length(t));

 ft1 = coxph(hcat(x,z), enter, t, (event .== 1))
 nidx = findall(event .!= 1)
 ft2 = coxph(hcat(x,z)[nidx,:], enter[nidx], t[nidx], (event[nidx] .== 2))

 # risk at referent levels of `x` and `z`
 risk_from_coxphmodels([ft1,ft2])

 # risk at average levels of `x` and `z`
 mnx = sum(x)/length(x)
 mnz = sum(z)/length(z)
 # equivalent
 fit(PHSurv, [ft1,ft2], pred_profile=[mnx,mnz])
 risk_from_coxphmodels([ft1,ft2], pred_profile=[mnx,mnz])
source
StatsAPI.fitMethod

Survival curve estimation using multiple cox models

Signatures

  risk_from_coxphmodels(fitlist::Vector{T}, args...; kwargs...) where {T<:PHModel}

  fit(::Type{M}, fitlist::Vector{T}, ; fitargs...) where {M<:PHSurv,T<:PHModel}

Optional keywords

  • coef_vectors = nothing(default) or vector of coefficient vectors from the cox models [will default to the coefficients from fitlist models]
  • pred_profile = nothing(default) or vector of specific predictor values of the same length as the coef_vectors[1]
 using LSurvival
 using Random
 # event variable is coded 0[referent],1,2
 z,x,t,d, event,wt = LSurvival.dgm_comprisk(MersenneTwister(1212), 1000);
 enter = zeros(length(t));

 ft1 = coxph(hcat(x,z), enter, t, (event .== 1))
 nidx = findall(event .!= 1)
 ft2 = coxph(hcat(x,z)[nidx,:], enter[nidx], t[nidx], (event[nidx] .== 2))

 # risk at referent levels of `x` and `z`
 risk_from_coxphmodels([ft1,ft2])

 # risk at average levels of `x` and `z`
 mnx = sum(x)/length(x)
 mnz = sum(z)/length(z)
 # equivalent
 fit(PHSurv, [ft1,ft2], pred_profile=[mnx,mnz])
 risk_from_coxphmodels([ft1,ft2], pred_profile=[mnx,mnz])
source
StatsAPI.fitMethod

Fit method for AbstractPH objects (Cox models)

Keyword arguments (used here, and passed on to internal structs)

  • ties "breslow" or "efron" (default)

  • wts observation weights

  • ties "breslow" or "efron" (default)

  • offset not currently used at all

  • fitargs arguments passed to other structs, which include

    • id cluster or individual level ID (defaults to a unique value for each row of data) see note below on ID
    • contrasts StatsModel style contrasts (dicts) that can be used for variable transformations/indicator variable creation (e.g. https://juliastats.org/StatsModels.jl/stable/contrasts/)
  • Arguments passed onto fitting routine:

    • eps (default: Float64 = 1e-9) tolerance for declaring convergence. Model is determined to be converged when relative change in log-partial likelihood is < eps .
    • getbasehaz (default: true): estimate baseline hazard
    • start (default: nothing) nothing, or vector of floats corresponding to initial values for parameters. Note that this defaults to a vector of zeros when set to nothing, and setting to other values invalidates some of the test statistics reported by default with coxph.
    • keepx (default: true) logical. Keep design matrix in AbstractPH object output (set to false for slight computational gain).
    • keepy (default: true)logical. Keep outcome in AbstractPH object output (set to false for slight computational gain).
    • bootstrap_sample (default: false) Fit the model to a bootstrap sample of the data (not generally used by end-users, but provides some convenience toward bootstrap variance estimation).
    • bootstrap_rng (default: Random.MersenneTwister()) Random number seed used when drawing a bootstrap sample of the data (not generally used by end-users, but provides some convenience toward bootstrap variance estimation).

    Signatures

  fit(::Type{M},
  X::AbstractMatrix,#{<:FP},
  enter::AbstractVector{<:Real},
  exit::AbstractVector{<:Real},
  y::Union{AbstractVector{<:Real},BitVector}
  ;
  ties ="breslow",
  wts::AbstractVector{<:Real}      = similar(y, 0),
  offset::AbstractVector{<:Real}   = similar(y, 0),
  fitargs...) where {M<:AbstractPH}
 coxph(f::FormulaTerm, data; kwargs...)
  coxph(X, enter, exit, y, args...; kwargs...)
   using LSurvival, Random
   z,x,t,d, event,wt = LSurvival.dgm_comprisk(MersenneTwister(1212), 1000);
   enter = zeros(length(t));
   X = hcat(x,rand(length(x)));
    m = fit(PHModel, X, enter, t, d, ties="efron")
   m2 = fit(PHModel, X, enter, t, d, ties="breslow")
   coeftable(m)

Note on use of id keyword

id is not needed in person-period structure data for standard estimates or confidence intervals

  using Random, LSurvival
     id, int, outt, dat =
         LSurvival.dgm(MersenneTwister(123123), 100, 100; afun = LSurvival.int_0)
     data = (
             int = int,
             outt = outt,
             d = dat[:,4] .== 1,
             x = dat[:,1],
             z = dat[:,2]
     )

     f = @formula(Surv(int, outt,d)~x+z)
     coxph(f, data)

BUT, you must specify id to get appropriate robust variance and some other statistics.

Here is an example where the same data are presented in two different ways, which should yield identical statistics when used in Cox model.

 dat1 = (
    time = [1,1,6,6,8,9],
    status = [1,0,1,1,0,1],
    x = [1,1,1,0,0,0]
  )
  ft = coxph(@formula(Surv(time,status)~x),dat1)
  bic(ft)
  nobs(ft)
  dof_residual(ft)
  # lrtest is another one

  stderror(ft)                     # model based
  stderror(ft, type="robust")   # robust standard error, based on dfbeta residuals
  ft

  # now using "clustered" data with multiple observations per individual
 dat1clust= (
     id = [1,2,3,3,4,4,5,5,6,6],
     enter = [0,0,0,1,0,1,0,1,0,1],
     exit = [1,1,1,6,1,6,1,8,1,9],
     status = [1,0,0,1,0,1,0,0,0,1],
     x = [1,1,1,1,0,0,0,0,0,0]
 )
 
 # use the `id` parameter with the ID struct
 ft2 = coxph(@formula(Surv(enter, exit, status) ~ x),dat1clust, id=ID.(dat1clust.id))
 bic(ft2)                       # CORRECT        
 nobs(ft2)                      # CORRECT
 dof_residual(ft2)              # CORRECT
  
 stderror(ft2)                  # model based (CORRECT)
 stderror(ft2, type="robust")   # robust standard error, based on `id` level dfbeta residuals (CORRECT)
 # once robust SE is calculated, coefficient table uses the robust SE for confidence intervals and test statistics
 ft2   # CORRECT (compare to `ft` object)

NOTE THE FOLLOWING IS INCORRECT because the id keyword is omitted

 ft2w = coxph(@formula(Surv(enter, exit, status) ~ x),dat1clust)
 bic(ft2w)                          # INCORRECT 
 nobs(ft2w)                         # INCORRECT
 dof_residual(ft2w)                 # INCORRECT

 stderror(ft2w)                     # model based (CORRECT)
 stderror(ft2w, type="robust")      # robust variance (INCORRECT)
 
 ft2w # the coefficient table now shows incorrect confidence intervals and test statistics
  
source
StatsAPI.fitMethod

Aalen-Johansen estimator for cumulative cause-specific risk (in the presence of competing events)

Signatures

 StatsBase.fit!(m::T; kwargs...) where {T<:AbstractNPSurv}

 aalen_johansen(enter::AbstractVector, exit::AbstractVector, y::AbstractVector,
   ; kwargs...)

Keyword arguments

  • wts::Vector{<:Real} = similar(enter, 0); vector of case weights (or zero length vector) for each observation
  • id::Vector{<:AbstractLSurvivalID} = [ID(i) for i in eachindex(y)]; Vector of AbstractSurvID objects denoting observations that form a single unit (used in bootstrap and jackknife methods)
  • atol = 0.00000001; absolute tolerance for defining tied event times
  • keepy = true; keep the outcome vector after fitting (may save memory with large datasets)
  • eps = 0.00000001; deprecated (replaced by atol)
using LSurvival
using Random
z,x,t,d, event,wt = LSurvival.dgm_comprisk(MersenneTwister(1212), 1000);
enter = zeros(length(t));
   # event variable is coded 0[referent],1,2
m = fit(AJSurv, enter, t, event)
mw = fit(AJSurv, enter, t, event, wts=wt)

or, equivalently:

aalen_johansen(enter, t, event, wts=wt)
source
StatsAPI.fitMethod

Kaplan-Meier estimator for cumulative conditional risk

Signatures

StatsBase.fit!(m::T; kwargs...) where {T<:AbstractNPSurv}

kaplan_meier(enter::AbstractVector, exit::AbstractVector, y::AbstractVector,
   ; kwargs...)

Keyword arguments

  • wts::Vector{<:Real} = similar(enter, 0); vector of case weights (or zero length vector) for each observation
  • id::Vector{<:AbstractLSurvivalID} = [ID(i) for i in eachindex(y)]; Vector of AbstractSurvID objects denoting observations that form a single unit (used in bootstrap and jackknife methods)
  • atol = 0.00000001; absolute tolerance for defining tied event times
  • censval = 0; value of the outcome to be considered a censored event
  • keepy = true; keep the outcome vector after fitting (may save memory with large datasets)
  • eps = 0.00000001; deprecated (replaced by atol)
using LSurvival
using Random
z,x,t,d, event,wt = LSurvival.dgm_comprisk(MersenneTwister(1212), 1000);
enter = zeros(length(t));
m = fit(KMSurv, enter, t, d)
mw = fit(KMSurv, enter, t, d, wts=wt)

or, equivalently:

kaplan_meier(enter, t, d, wts=wt)
source
StatsAPI.loglikelihoodMethod

Maximum log partial likelihood for a fitted AbstractPH model Efron or Breslow (depending on the ties` parameter)

source
StatsAPI.nullloglikelihoodMethod

Null log-partial likelihood for a fitted AbstractPH model Efron or Breslow (depending on the ties` parameter)

Note: this is just the log partial likelihood at the initial values of the model, which default to 0. If initial values are non-null, then this function no longer validly returns the null log-partial likelihood.

source
StatsAPI.nullloglikelihoodMethod

Null log-partial likelihood for a fitted PSModel model

Note: this is just the log partial likelihood at the initial values of the model, which default to 0. If initial values are non-null, then this function no longer validly returns the null log-partial likelihood.

source
StatsAPI.residualsMethod

####################################################################

Cox proportional hazards model residuals:

Signature

  residuals(m::M; type = "martingale") where {M<:PHModel}

where type is one of

  • martingale

  • schoenfeld

  • score

  • dfbeta

  • jackknife

  • dfbetas (scaled dfbeta)

  • scaled_schoenfeld or schoenfelds (scaled Schoenfeld)

    Residuals from the residuals function are designed to exactly emulate those from the survival package in R. Currently, they are validated for single observation data (e.g. one data row per individual).

    ####################################################################

    Martingale residuals: Observed versus expected

  # example from https://cran.r-project.org/web/packages/survival/vignettes/validate.pdf
  # by Terry Therneau

  dat1 = (
    time = [1,1,6,6,8,9],
    status = [1,0,1,1,0,1],
    x = [1,1,1,0,0,0]
  )

  # Nelson-Aalen type estimator for Breslow partial likelihood
  ft = coxph(@formula(Surv(time,status)~x),dat1, keepx=true, keepy=true, ties="breslow")
  residuals(ft, type="martingale")
  dat1 = (
    time = [1,1,6,6,8,9],
    status = [1,0,1,1,0,1],
    x = [1,1,1,0,0,0]
  )

  # Fleming-Harrington type estimator for Efron partial likelihood
  ft = coxph(@formula(Surv(time,status)~x),dat1, keepx=true, keepy=true, ties="efron")
  residuals(ft, type="martingale")

####################################################################

Score residuals: Per observation contribution to score function

  using LSurvival
  dat1 = (
    time = [1,1,6,6,8,9],
    status = [1,0,1,1,0,1],
    x = [1,1,1,0,0,0]
  )
  ft = coxph(@formula(Surv(time,status)~x),dat1, keepx=true, keepy=true, ties="breslow")
  S = residuals(ft, type="score")[:]
  ft = coxph(@formula(Surv(time,status)~x),dat1, keepx=true, keepy=true, ties="efron", maxiter=0)
  S = residuals(ft, type="score")[:]

####################################################################

Schoenfeld residuals: Per time contribution to score function

  using LSurvival
  dat1 = (
    time = [1,1,6,6,8,9],
    status = [1,0,1,1,0,1],
    x = [1,1,1,0,0,0]
  )
  ft = coxph(@formula(Surv(time,status)~x),dat1, keepx=true, keepy=true, ties="breslow", maxiter=0)


  X = ft.P.X
  M = residuals(ft, type="martingale")
  S = residuals(ft, type="schoenfeld")[:]
  Ss = residuals(ft, type="scaled_schoenfeld")[:]

####################################################################

dfbeta residuals: influence of individual observations on each parameter

  using LSurvival
  dat1 = (
    time = [1,1,6,6,8,9],
    status = [1,0,1,1,0,1],
    x = [1,1,1,0,0,0]
  )

  ft = coxph(@formula(Surv(time,status)~x),dat1, ties="breslow")
  residuals(ft, type="dfbeta")

  # can also calculate from score residuals and Hessian matrix
  L = residuals(ft, type="score") # n X p
  H = ft.P._hess   # p X p
  dfbeta = L*inv(H)
  robVar = dfbeta'dfbeta
  sqrt(robVar)

using the id keyword argument

see help for LSurvival.vcov for what happens when id keyword is not used

  dat1clust= (
    id = [1,2,3,3,4,4,5,5,6,6],
    enter = [0,0,0,1,0,1,0,1,0,1],
    exit = [1,1,1,6,1,6,1,8,1,9],
    status = [1,0,0,1,0,1,0,0,0,1],
    x = [1,1,1,1,0,0,0,0,0,0]
  )

  ft2 = coxph(@formula(Surv(enter, exit, status) ~ x),dat1clust, id=ID.(dat1clust.id), ties="breslow")

  # note these are still on the observation level (not the id level)! 
  residuals(ft2, type="dfbeta")

  # getting id level dfbeta residuals
  dfbeta = residuals(ft2, type="dfbeta")
  id = values(ft2.R.id)
  D = reduce(vcat, [sum(dfbeta[findall(id .== i),:], dims=1) for i in unique(id)])
  D'D
  vcov(ft, type="robust")
  vcov(ft2, type="robust")

####################################################################

jackknife residuals: influence of individual observations on each parameter using leave-one-out estimates

note there are other definitions of jackknife residuals See Chapter 7.1 of "Extending the Cox Model" by Therneau and Grambsch for an example of the type of jackknife residuals used here

Jackknife residuals $r_i$ for $i \in 1:n$ are given as the difference between the maximum partial likelihood estimate and the jackknife estimates for each observation

\[r_i = \hat\beta - \hat\beta_{(-i)}\]

where $\beta_{(-i)}$ is the maximum partial likelihood estimate of the log-hazard ratio vector obtained from a dataset in which observations belonging to individual i are removed

  using LSurvival
  dat1 = (
    time = [1,1,6,6,8,9],
    status = [1,0,1,1,0,1],
    x = [1,1,1,0,0,0]
  )

  ft = coxph(@formula(Surv(time,status)~x),dat1, ties="breslow")
  #jackknife(ft)
  residuals(ft, type="jackknife")

source
StatsAPI.stderrorMethod

Greenwood's formula for variance and confidence intervals of a Aalen-Johansen risk function

Signatures:

StatsBase.stderror(m::AJSurv)

StatsBase.confint(m:AJSurv; level=0.95, method="normal")

Keyword arguments

  • method
    • "normal" normality-based confidence intervals
    • "lognlog" log(-log(S(t))) based confidence intervals
res = z, x, outt, d, event, wts = LSurvival.dgm_comprisk(MersenneTwister(123123), 100)
int = zeros(length(d)) # no late entry
m = fit(AJSurv, int, outt, event)
stderror(m)
confint(m, level=0.95)
source
StatsAPI.stderrorMethod

Greenwood's formula for variance and confidence intervals of a Kaplan-Meier survival curve

Signatures:

StatsBase.stderror(m::KMSurv)

StatsBase.confint(m:KMSurv; level=0.95, method="normal")

Keyword arguments

method:

  • "normal" normality-based confidence intervals
  • "lognlog" log(-log(S(t))) based confidence intervals
using LSurvival
using Random
z,x,t,d, event,wt = LSurvival.dgm_comprisk(MersenneTwister(1212), 1000);
enter = zeros(length(t));
m = fit(KMSurv, enter, t, d)
mw = fit(KMSurv, enter, t, d, wts=wt)
stderror(m)
confint(m, method="normal")
confint(m, method="lognlog") # log-log transformation
source
StatsAPI.vcovMethod

Covariance matrix for Cox proportional hazards models

Keyword arguments

  • type nothing or "robust": determines whether model based or robust (dfbeta based) variance is returned.

    See ?residuals for info on dfbeta residuals

using LSurvival
dat1 = (
  time = [1,1,6,6,8,9],
  status = [1,0,1,1,0,1],
  x = [1,1,1,0,0,0]
)
ft = coxph(@formula(Surv(time,status)~x),dat1, id=ID.(collect(1:6)))

vcov(ft)                   # model based
vcov(ft, type="robust")    # robust variance, based on dfbeta residuals
# once robust SE is calculated, coefficient table uses the robust SE for confidence intervals and test statistics
ft

cluster robust standard errors using the id keyword argument

dat1clust= (
  id = [1,2,3,3,4,4,5,5,6,6],
  enter = [0,0,0,1,0,1,0,1,0,1],
  exit = [1,1,1,6,1,6,1,8,1,9],
  status = [1,0,0,1,0,1,0,0,0,1],
  x = [1,1,1,1,0,0,0,0,0,0]
)

ft2 = coxph(@formula(Surv(enter, exit, status) ~ x),dat1clust, id=ID.(dat1clust.id))

vcov(ft2)                     # model based
vcov(ft2, type="robust")       # robust variance, based on dfbeta residuals
stderror(ft2, type="robust")   # robust variance, based on dfbeta residuals
confint(ft2, type="robust")    # robust variance, based on dfbeta residuals
nobs(ft2)                     # id argument yields correct value of number of independent observations
# once robust SE is calculated, coefficient table uses the robust SE for confidence intervals and test statistics
ft2 

NOTE THE FOLLOWING IS INCORRECT because the id keyword is omitted

ft2w = coxph(@formula(Surv(enter, exit, status) ~ x),dat1clust)

vcov(ft2w)                   # model based (CORRECT)
vcov(ft2w, type="robust")    # robust variance (INCORRECT)
nobs(ft2w)

ft2w
source

Implementation details and further help