Kaplan Meier - RhoInc/sas-sgplot GitHub Wiki
- The Goal
- Dummy Data
- Curve Only
- Make Nice Axes
- Add Censor Symbols
- Make Nice Legend
- Add Censor Legend
- Add Number at Risk
- Style Modifications
The Kaplan Meier plot will be developed in several steps. A succinct version of the code is available here.
On this page we walk through the process of creating a Kaplan Meier graph, complete with censoring symbols and the number of subjects at risk at each time point. The end goal looks like this:
The dummy data used to produce this plot was created with the following data step. This data step is only included for the sake of completeness. Do not read, study, or obsess over this data step!
data derive.adtte;
length paramcd $8 param $100;
paramcd = "OS";
param = "Overall Survival";
do trtpn = 1 to 4;
trtp = 'Treatment ' || put(trtpn,1.);
do subjid = 1 to 40;
aval = 2 + ranuni(1)*24*(trtpn/4);
cnsr = (ranuni(1)>0.2);
output;
end;
end;
run;
In this step we learn how to produce a "curve only" plot.
In order to plot Kaplan Meier, we must first calculate Kaplan Meier statistics. Fortunately, there's a PROC for that.
proc sort data=derive.adtte out=km00;
by trtpn trtp;
where paramcd = "OS";
run;
ods graphics on;
ods listing close;
proc lifetest data=km00 plots=(survival);
ods output survivalplot=km10;
time aval*cnsr(1);
strata trtpn;
run;
Note: we turned on ods graphics
and choose the ods output
data object survivalplot
over the more traditional data object productlimitestimates
. This is motivated by the desire to add number at risk later in the process.
If we are going to produce a figure for a commercial study, it is probably going to need to be saved as an RTF file. The typical options
, ods graphics
, and ods rtf
statements that surround the sgplot
code are as follows:
options
nonumber
nodate
orientation=portrait
;
ods graphics /
noborder
height=6in
width=6in
outputfmt=png
;
ods results off;
ods listing close;
ods rtf file="&PgmDir\curve_only.rtf";
<<sgplot portion>>
ods rtf close;
ods listing;
ods results on;
The <<sgplot portion>>
contains a step
statement along with the group=
option.
proc sgplot data=km10;
*--- draw the survival curves ---;
step y=survival x=time /
group=stratumnum
;
run;
In this step, we make the axes look nice.
First we create a label for the y-axis based on the variable param
.
proc sql noprint;
select distinct strip(param) || " Probability"
into :ylabel
from km00
;
quit;
%let ylabel = &ylabel;
%put &=ylabel;
The result in the log shows us:
YLABEL=Overall Survival Probability
Next we create a custom set of tick marks for the x-axis.
%let values = 0 2 4 6 8 12 16 20 24;
%put &=values;
The result in the log shows us:
VALUES=0 2 4 6 8 12 16 20 24
We also want to format the value 0
as B/L
.
proc format;
value timefmt
0 = "B/L"
other = [best.]
;
run;
Finally, we incorporate the above format and macro variables into the plot.
proc sgplot data=km10;
*--- draw the survival curves ---;
step y=survival x=time /
group=stratumnum
;
*--- cosmetics ---;
yaxis
label="&ylabel"
;
xaxis
values=(&values)
valueshint
valuesformat=timefmt.
label="Weeks"
;
run;
The eagle-eyed observer will have noted the valueshint
option in the xaxis
statement. This option tells SGPLOT that the values
list should not be used to determine the axis range, that it is only meant to communicate where tick marks should appear.
Next we add censor symbols to our plot.
This is accomplished through the addition of a scatter
statement.
proc sgplot data=km10;
*--- draw the survival curves ---;
step y=survival x=time /
group=stratumnum
;
*--- draw censor indicators ---;
scatter y=censored x=time /
group=stratumnum
markerattrs=(symbol=circle)
;
*--- cosmetics ---;
yaxis …
xaxis …
run;
Next we clean up the legend.
First we identify all trtpn
and trtp
combinations, applying the resulting format to the stratumnum
variable.
proc sql;
create table strat as
select distinct "strat" as fmtname, trtpn as start, trtp as label
from km00
;
quit;
proc format cntlin=strat;
run;
proc sql noprint;
alter table km10
modify stratumnum format=strat.
;
quit;
Next we add a keylegend
statement.
proc sgplot data=km10;
*--- draw the survival curves ---;
step y=survival x=time /
group=stratumnum
name="step"
;
*--- draw censor indicators ---;
scatter …
*--- cosmetics ---;
yaxis …
xaxis …
keylegend "step" /
title="Treatment Group"
noborder
linelength=15pct
outerpad=(bottom=10pt)
;
run;
Next we add a secondary legend for the censor symbols.
This is accomplished by adding a second scatter
statement (prior to the first) as well as a second keylegend
statement.
proc sgplot data=km10;
*--- draw the survival curves ---;
step …
*--- draw censor indicators for censoring symbol legend purposes ---;
scatter y=censored x=time /
markerattrs=(symbol=circle)
name="censor"
;
*--- draw censor indicators ---;
scatter …
*--- cosmetics ---;
yaxis …
xaxis …
keylegend "step" / …
keylegend "censor" /
location=inside
position=bottomleft
;
run;
The next-to-last step is to add the number at risk at each time point.
We begin by modifying our original proc lifetest
code. Adding the atrisk=
option has the effect of adding the number at risk results to the survivalplot
data object. That's pretty convenient if you ask me.
ods graphics on;
ods listing close;
proc lifetest data=km00 plots=(survival(atrisk=&values));
ods output survivalplot=km10;
time aval*cnsr(1);
strata trtpn;
run;
We use the variable atrisk
to create a new variable xatn
, complete with a format to prevent missing values from showing up as a dot.
proc format;
value xatn
. = " "
other = [best.]
;
run;
data km20;
set km10;
if n(tatrisk) then
xatn = atrisk;
format xatn xatn.;
run;
Next we add the xaxistable
statement. For some reason SAS has us use the class=
option to create groups in the x-axis table (we use the group=
option in the step
and scatter
statements).
proc sgplot data=km20;
*--- draw the survival curves ---;
step …
*--- draw censor indicators for censoring symbol legend purposes ---;
scatter …
*--- draw censor indicators for grouping purposes ---;
scatter …
*--- add the sample sizes ---;
xaxistable xatn /
class=stratumnum
colorgroup=stratumnum
;
*--- cosmetics ---;
yaxis …
xaxis …
keylegend "step" / …
keylegend "censor" / …
run;
As a final step, we perform some style modifications to change colors and fonts.
The simplest way to change colors for grouped data is with the %modstyle
macro. Here we use the style rtf
to create a new style named kmstyle0
.
%modstyle
(name=kmstyle0
,parent=rtf
,type=CLM
,colors=black black black black
);
Changing fonts looks a little more awkward. Here we use the kmstyle0
from above to create a new style named kmstyle
.
proc template;
define style styles.kmstyle;
parent=styles.kmstyle0;
class GraphFonts /
"GraphDataFont" = ("Courier New, <MTserif>, <serif>", 7pt)
"GraphValueFont" = ("Courier New, <MTserif>, <serif>", 9pt)
"GraphLabelFont" = ("Courier New, <MTserif>, <serif>",10pt)
;
end;
run;
Finally, we apply this new style to the ods rtf
statement.
ods rtf
style=styles.kmstyle
file="&PgmDir\style_mods.rtf"
;
proc sgplot data=km20;
step …
scatter …
scatter …
xaxistable …
yaxis …
xaxis …
keylegend …
keylegend …
run;
ods rtf close;
And that's all there is to it.
Next page: Paneled Box Plot