2. Educational Background
⢠Hasselt Universiteit, Belgium, MSc in Applied Statistics
2005-2006.
⢠Hasselt Universiteit, Belgium, MSc in Biostatistics 2006-
2007.
⢠Hasselt Universiteit, Belgium, PhD Statistical
Bioinformatics, 2007-2011.
⢠Medical Epidemiology And Biostatistics Dept.
Karolinska Institutet, Sweden, Postdoctoral, 2011-2014
3. Course Outline
⢠Introduction
o Overview of Survival data analysis
o Type of censoring
⢠Kaplan-Meier Survival Model
o Kaplan-Meier curve
o Comparison of survival curves
o Logrank test & Wilcoxon (Gehan) test
o Application in R.
Setia PramanaSurvival Data Analysis 3
4. Course Outline
⢠Cox Proportional Hazard:
o Parameter Estimation
o Partial likelihood
o Model diagnostics
o Hazard Ratio
o Application in R.
Setia PramanaSurvival Data Analysis 4
5. Course Outline
⢠Parametric Survival Functions:
o Weibull dist
o Exponential
⢠Competing risk
⢠Frailty Model
Setia PramanaSurvival Data Analysis 5
6. Course Workload
⢠40% Theory, 60% practice
⢠Group Project (5 students)
⢠Presentation every week
⢠Software used mainly R, others are allowed
⢠R code would be provided
⢠Slides can be seen at :
http://www.slideshare.net/hafidztio/
Setia PramanaSurvival Data Analysis 6
8. Survival Analysis
⢠Statistical procedures focuses on time to event
data. Outcome: âtime until an event occursâ
⢠Events:
o time to death
o time to onset (or relapse) of a disease
o length of stay in a hospital
o duration of a strike
o money paid by health insurance
o viral load measurements
o time to finish our study
Setia PramanaSurvival Data Analysis 8
9. Survival Studies
⢠Clinical trials
⢠Prospective cohort studies
⢠Retrospective cohort studies
⢠Typically, survival data are not fully
observed, but rather are censored.
Setia PramanaSurvival Data Analysis 9
10. Goals
⢠To Estimate and interpret Survivor and
Hazard functions
⢠To compare Survivor and Hazard functions
⢠To assess the relationship of explanatory
variables to Survival time
Setia PramanaSurvival Data Analysis 10
11. Survival Studies
⢠Clinical trials
⢠Prospective cohort studies
⢠Retrospective cohort studies
⢠Typically, survival data are not fully
observed, but rather are censored.
Setia PramanaSurvival Data Analysis 11
12. Example
⢠Survival times of cancer patients
⢠Patients with advanced cancer of the
stomach, bronchus, colon, ovary, or breast
were treated (in addition to standard
treatment) with ascorbate.
⢠Research questions:
o What is the prognosis for a patient with specific
type of cancer ?
o Do survival times differ with organ affected ?
Setia PramanaSurvival Data Analysis 12
16. The survival time
response
⢠Usually continuous
⢠May be incompletely determined for some subjects
o i.e.- For some subjects we may know that their survival
⢠Time was at least equal to some time t. Whereas, for
other subjects, we will know their exact time of
event.
⢠Incompletely observed responses are censored
⢠Is always ⼠0
Setia PramanaSurvival Data Analysis 16
19. Censoring
⢠We have some information about a
subjectâs event time, but we donât know the
exact event time.
⢠Censoring mechanism must be
independent of the survival mechanism.
⢠Three reasons:
o Study ends (no event)
o Lost to follow-up
o Withdraws
Setia PramanaSurvival Data Analysis 19
20. Censoring
⢠Right Censoring: The
survival time is
incomplete at the right
side.
Setia PramanaSurvival Data Analysis 20
21. Censoring
⢠Right Censoring: The
survival time is
incomplete at the right
side.
⢠Left Censoring: True
survival time <=
observed survival time
⢠Most studies are right
censoring
Setia PramanaSurvival Data Analysis 21
29. Terminology & Notation
⢠Survival functions:
⢠Downwards as t increases
⢠At time t=0 S(t=0)=1
⢠S(~)= 0
Setia PramanaSurvival Data Analysis 29
68. Review
⢠Hazard Function
o The risk of failure in a time interval after
time t, given that the customer has
survived to time t
o denoted as: h(t)
⢠Survival Function
o The probability that a person/patients will
have a survival time >= t
o denoted as: S(t)
Setia PramanaSurvival Data Analysis 68
71. Survival Application
⢠Telco â customer lifetime
⢠Insurance â time to lapsing on policy
⢠Mortgages â time to mortgage redemption
⢠Mail Order Catalogue â time to next purchase
⢠Retail â time till food customer starts purchasing
non-food
⢠Manufacturing - lifetime of a machine component
⢠Public Sector â time intervals to critical events
Setia PramanaSurvival Data Analysis 71
73. ⢠The hazard rate is defined for non repairable
populations as the (instantaneous) rate of failure for
the survivors to time t during the next instant of time.
Setia PramanaSurvival Data Analysis 73
74. Regression for Survival
Data
⢠The relation with factors can be studied using
group-specific Kaplan-Meier estimates, together
with Logrank and/or Wilcoxon tests
⢠Investigating the relation with covariates, requires a
regression-type model
⢠Relating the outcome to several factors and/or
covariates simultaneously requires multiple
regression, ANOVA, or ANCOVA models
⢠The most frequently used model is the Cox
(proportional hazards) model
Setia PramanaSurvival Data Analysis 74
78. 78
Characteristics of Cox
Regression, continued
⢠Cox models the effect of covariates on the hazard rate
but leaves the baseline hazard rate unspecified.
⢠Does NOT assume knowledge of absolute risk.
⢠Estimates relative rather than absolute risk.
80. PH Assumption
⢠The PH assumption requires that the HR is constant
over time
⢠If the hazards of each group is different (not
proportioned), then a CoxPH model is not
appropriate.
⢠Use extended Cox model
Setia PramanaSurvival Data Analysis 80
90. Likelihood ratio tests
⢠Likelihood ratio tests (LRTs) have been used to compare
two nested models.
⢠The form :
⢠the ratio of two likelihood functions; the simpler model (s)
has fewer parameters than the general (g) model.
⢠LRT ~ chi-squared random variable, DF = the difference
in the number of parameters between the two models.
Setia PramanaSurvival Data Analysis 90
93. 93
⢠Does not require that you choose some particular
probability model to represent survival times, and is
therefore more robust than parametric methods
discussed last week.
⢠Semi-parametric
(recall: Kaplan-Meier is non-parametric; exponential and
Weibull are parametric)
⢠Can accommodate both discrete and continuous
measures of event times
⢠Easy to incorporate time-dependent covariatesâ
covariates that may change in value over the course of
the observation period
95. 95
Assumptions of Cox Regression
⢠Proportional hazards assumption: the hazard for any
individual is a fixed proportion of the hazard for any
other individual
⢠Multiplicative risk
104. 104
Cox regression vs.logistic
regression
Distinction between rate and proportion:
⢠Incidence (hazard) rate: number of new cases of
disease per population at-risk per unit time (or
mortality rate, if outcome is death)
⢠Cumulative incidence: proportion of new cases
that develop in a given time period
105. 105
Cox regression vs.logistic
regression
Distinction between hazard/rate ratio and odds
ratio/risk ratio:
⢠Hazard/rate ratio: ratio of incidence rates
⢠Odds/risk ratio: ratio of proportions
By taking into account time, you are taking into account
more information than just binary yes/no.
Gain power/precision.
Logistic regression aims to estimate the odds ratio; Cox
regression aims to estimate the hazard ratio
106. HR Ex. Data Model 1
Setia PramanaSurvival Data Analysis 106
115. Adjusted Survival Curves
⢠No Model: Kaplan-Meier method (Prev.
chapter)
⢠Cox model: adjusted survival curves
o Adjust for explanatory variables used as
predictors
o Like KM curves plotted as step functions
Setia PramanaSurvival Data Analysis 115
116. Adjusted Survival Curves
⢠Converting Hazard Functions to Survival Functions
Setia PramanaSurvival Data Analysis 116
117. Adjusted Survival Curves
⢠Converting Hazard Functions to Survival Functions
Setia PramanaSurvival Data Analysis 117
Xi must be specified before
119. Case: Telco
Survival Analysis:
⢠To understand length of time before an event
occurs
⢠To predict time till next event
⢠To analyze duration of time in a particular state
⢠âEventâ can be:
o Customer churn (the tendency of the subscribers to
switch providers)
o Take-up new product
o Default on credit
o Make next purchase
Setia PramanaSurvival Data Analysis 119
120. Case: Telco
⢠Compute the survival curve for your customer base
â Understand ânatural patternsâ in customer survival
â Identify key points where survival rates fall
⢠Compare survival curves between
â Demographic groups
â Customer segments
â Sales channels
â Product plans, etc
⢠Identifies key factors influencing âtime till churnâ
⢠Enables you to predict monthly numbers of churners
â but does not identify which customers will churn
Setia PramanaSurvival Data Analysis 120
131. Observed Versus
Expected Plots
⢠One-at-a-time: uses KM curves to
obtain observed plots
⢠Adjusting for other variables: uses
stratified Cox PH model to obtain
observed plot.
⢠One-at-a-time:
⢠stratify data by categories of
predictor
⢠obtain KM curves for each
category
Setia PramanaSurvival Data Analysis 131
161. Survival Analysis so far
⢠The methods that are most often employed to analyze time-
to-event data are
o Kaplan-Meier + Log-Rank/Wilcoxon Test.
⢠Produces empirical estimate of the time-to-event
distribution and compare between groups
o Cox (proportional hazard) regression Cox (proportional
hazard) regression.
⢠Measure the effect of multiple predictors without
modeling underlying distribution
⢠Assuming proportional hazards between levels of
predictors
⢠Neither of these methods produce an estimate of the
functional form of the underlying distribution
Setia PramanaSurvival Data Analysis 161
162. Parametric Survival
Analysis
⢠The survival time follows a distribution.
⢠Explicitly models the functional form of the event times using
various statistical distributions
⢠Exact distribution is unknown if parameters are unknown
⢠Data is used to estimate parameters
⢠Examples of parametric models:
o Linear regression
o Logistic regression
o Poisson regression
Setia PramanaSurvival Data Analysis 162
163. Parametric Survival
Analysis
⢠Most commonly used
o Exponential
o Weibull
o Gompertz
o Log-Logistic
o Log-Normal
o Gamma
⢠Generally involve two parameters
Scale (ďŹ) and Shape (p) parameters
⢠Shape generally assumed constant across individuals
⢠Scale related to determinants via regression
o Can quantify the effect of predictors, particularly treatment
Setia PramanaSurvival Data Analysis 163
165. Parametric vs Cox PH
⢠Parametric Survival Model
+ Completely specified h(t) and S(t)
+ More consistent with theoretical S(t)
+ time-quantile prediction possible
â Assumption on underlying distribution
⢠Cox PH Model
â distribution of survival time unknown
â Less consistent with theoretical S(t) (typically step
function)
+ Does not rely on distributional assumptions
+ Baseline hazard not necessary for estimation of
hazard ratio
Setia PramanaSurvival Data Analysis 165
167. Parametric Survival
Analysis
⢠Conceptually same as linear case, but Normal is
replaced by appropriate distribution
⢠It is implemented in a regression framework,
estimated by maximizing the likelihood of the data:
o For patients observed to have event at time t:
⢠Likelihood contribution: P(T=t) = f(t) (density
function)
o For patients censored at time t
⢠Likelihood contribution: Prob = P(T> t) = S(t)
(survival function)
Setia PramanaSurvival Data Analysis 167
168. Functions Characterizing
Parametric Distributions
⢠The survival time T is assumed to follow a distribution
with density function f (t)
⢠Cumulative Incidence: F(t) = P[T⤠t]
⢠Survival Distribution: S(t) = P[T > t ]
Setia PramanaSurvival Data Analysis 168
169. Commonly Used
Distributions and Parameters
â˘ ďŹ is reparameterized in terms of predictor variables
and regression parameters.
⢠p Typically for parametric models, the shape
parameters p is held fixed
Setia PramanaSurvival Data Analysis 169
172. Weibull Distribution
⢠p is Shape Parameter
o p > 1: Hazards increase over time
o p = 1: Hazard is constant (Exponential Distribution)
o p < 1: Hazards decreases over time
Setia PramanaSurvival Data Analysis 172