2. Objectives
● Understand the length of time between breast cancer diagnoses and
specific events
● Understand what factors play a role in determining these lengths
3. About Survival Analysis
● Predicting the time until an event of interest occurs
● Applications in Medicine, Manufacturing, Sociology, Sports,
and many more
● Right Censored Data – an observation where the event has
not yet occurred
● Survival Function - probability that at a given time, t, an
event of interest has not yet occurred
4. Kaplan-Meier Estimator
● Non-parametric estimator of the
survival function
● Time on the X-axis
● Percentage surviving on Y axis
● Tick marks represent right-censored
observations
5. Cox Proportional Hazard Regression
● Used to look at the relationship between the survival of a patient and various
explanatory variables
● Each explanatory variable is given a coefficient
○ HR = 1 : No effect
○ HR < 1 : Reduction in hazard ( Death)
○ HR > 1 : Increase in Hazard ( Death)
6. German Breast Cancer Data
● Retrieved from UMass Amherst’s Statistics website
● Data collected from clinical trials performed by the German Breast Cancer Study
Group
● Total of 686 observations conducted between July 1984 and December 1989
● 16 variables, including censoring and time-length fields for death and cancer
recurrence
15. Probability density function f(t)
Survival function S(t) = P(T>=t)
Hazard function h(t) = f(t) / S(t)
A way to compare two hazard functions:
Hazard ratio : HR(t) = h0 (t) / h1(t)
Proportional hazard assumption : The hazard ratio does not vary with time, i.e.
HR(t) = HR