#' ---
#' title: "Simulation-based design of group-sequential trials with a survival endpoint with rpact"
#' author: "Marcel Wolbers, Gernot Wassmer, and Friedrich Pahlke"
#' date: "Last change: `r format(Sys.time(), '%d %B, %Y')`"
#' number: 5
#' header-includes:
#' - \usepackage{fancyhdr}
#' - \pagestyle{fancy}
#' - \setlength{\headheight}{23pt}
#' - \fancyfoot[C]{www.rpact.com}
#' - \fancyfoot[L]{\thepage}
#' output:
#' rmarkdown::html_document:
#' highlight: pygments
#' number_sections: yes
#' self_contained: yes
#' toc: yes
#' toc_depth: 3
#' toc_float: yes
#' css: style.css
#' includes:
#' before_body: header.html
#' after_body: footer2.html
#' ---
#'
#'
#'
#'
#' rpact provides the function `getSimulationSurvival` for simulation of group-sequential trials with a time-to-event endpoint. For a given scenario, `getSimulationSurvival` simulates many hypothetical group-sequential trials and calculates the test results. Based on this Monte Carlo simulation, estimates of key quantities such as overall study power, stopping probabilities at each interim analysis, timing of analyses etc. can be obtained.
#'
#' `getSimulationSurvival` complements the analytical calculations from function `getSampleSizeSurvival` in multiple ways:
#'
#' - Simulations can be used to assess the accuracy of the analytical formulas.
#' - Simulations allow to answer questions such as the following:
#' + How variable is the timing of interim analysis (even if all assumptions are correct)?
#' + How could a dataset of a trial that is stopped early for efficacy at an interim analysis look like?
#' - Simulation is also possible for scenarios that are analytically intractable such as scenarios with delayed treatment effects.
#'
#' The syntax of function `getSimulationSurvival` is very similar to function `getSampleSizeSurvival`. Hence, this document only provides some examples and expects that the reader is familiar with the R markdown document which describes standard designs of a trial with a survival endpoint.
#'
#' `getSimulationSurvival` also supports the usage of adaptive sample size recalculation but this is not covered here. For more details, please also consult the help `?getSimulationSurvival`.
#'
#'
#' # Load rpact package
## ---- include=TRUE, echo=TRUE--------------------------------------------
# Load rpact
library(rpact)
packageVersion("rpact") # version should be version 2.0.1 or later
#' For this R markdown file, some **additional R packages** are loaded which will be used to examine simulated datasets.
## ---- include=TRUE, echo=TRUE--------------------------------------------
# Load ggplot2 for plotting
library(ggplot2) # general plotting
library(survminer) # plotting of KM curves with ggplot2
library(survival) # survival analysis routines
#'
#'
#' # Standard analytical calculation
#'
#' For comparison with the simulation-based analysis, a standard example is first calculated under the following assumptions:
#'
#' - Group-sequential design with one interim analysis after 66% of information using an O'Brien&Fleming type $\alpha$-spending function, one-sided type I error 2.5%, power 80%:
#'
## ---- include=TRUE, echo=TRUE--------------------------------------------
design <- getDesignGroupSequential(informationRates = c(0.66,1),
typeOfDesign = "asOF",
sided=1, alpha=0.025, beta=0.2)
#'
#' - Exponential PFS with a median PFS of 60 months in control (`lambda2=log(2)/60`) and a target hazard ratio of 0.74 (`hazardRatio=0.74`).
#' - Annual drop-out of 2.5% in both arms (`dropoutRate1 = 0.025, dropoutRate2 = 0.025, dropoutTime = 12`).
#' - Recruitment is 42 patients/month from month 6 onwards after linear ramp up. (`accrualTime = c(0,1,2,3,4,5,6), accrualIntensity=c(6,12,18,24,30,36,42)`)
#' - Randomization ratio 1:1 (`allocation1=1` and `allocation2=1`; this is how subjects are randomized in treatment groups 1 and 2 in a subsequent way). 1:1 allocation is the default and is thus not explicitly set in the function call below.
#' - A fixed total sample size of 1200 (`maxNumberOfSubjects = 1200`).
#'
#' As described in the R markdown doument which describes standard designs of a trial with a survival endpoint, sample size calculations for this design can be performed as per the code below:
#'
## ---- include=TRUE, echo=TRUE--------------------------------------------
sampleSizeResult <- getSampleSizeSurvival(design,
lambda2=log(2)/60,hazardRatio=0.74,
dropoutRate1 = 0.025, dropoutRate2 = 0.025, dropoutTime = 12,
accrualTime = c(0,1,2,3,4,5,6),
accrualIntensity=c(6,12,18,24,30,36,42),
maxNumberOfSubjects = 1200)
# Standard rpact display of results
sampleSizeResult
#'
#' By design, the power of the trial is 80%. The interim analysis is after
#' `r ceiling(sampleSizeResult$eventsPerStage[1,1])` events which is expected to occur after
#' `r formatC(sampleSizeResult$analysisTime[1,1],digits=2,format="f")` months,
#' and the final analysis is after
#' `r ceiling(sampleSizeResult$eventsPerStage[2,1])` events which is expected to occur after
#' `r formatC(sampleSizeResult$analysisTime[2,1],digits=2,format="f")` months.
#' These numbers will now be compared to simulations.
#'
#' # Simulation under proportional hazards
#'
#' The call `getSimulationSurvival` uses the same arguments as `getSampleSizeSurvival` with the following changes:
#'
#' - The maximum number of patients (`maxNumberOfSubjects = 1200`) is always provided to allow the simulation.
#' - The number of events at each analysis is specified as per the analytical calculation above (`plannedEvents=c(232,351)`).
#' - For one-sided tests, the direction of the alternative is specified. Here, the alternative is towards hazard ratios <1 which is specified as `directionUpper = FALSE`.
#' - The number of simulated trials is specified (`maxNumberOfIterations = 10000` in the example below).
#' - By default, raw datasets from simulation runs are not extracted. However, in this example, it is specifies that one raw dataset that led to stopping after each stage, respectively, will be stored: `maxNumberOfRawDatasetsPerStage=1`.
#' - For reproducibility, it is useful to set the random seed which is set to `seed=2` in the example.
#'
## ---- include=TRUE, echo=TRUE--------------------------------------------
simulationResult <- getSimulationSurvival(design,
lambda2=log(2)/60,hazardRatio=0.74,
dropoutRate1 = 0.025, dropoutRate2 = 0.025,
dropoutTime = 12,
accrualTime = c(0,1,2,3,4,5,6),
accrualIntensity=c(6,12,18,24,30,36,42),
plannedEvents=c(232,351),
directionUpper = FALSE,
maxNumberOfSubjects = 1200,
maxNumberOfIterations = 10000,
maxNumberOfRawDatasetsPerStage = 1,
seed=2)
# print the simulation result
simulationResult
#'
#' According to the output, the simulated overall power is
#' `r formatC(100*simulationResult$overallReject,digits=1,format="f")`%
#' and the probability to cross the efficacy boundary at the interim analysis is
#' `r formatC(100*simulationResult$rejectPerStage[1,1],digits=1,format="f")`%.
#' These are both within 1% of the analytical power.
#'
#' The mean simulated analysis times are after `r formatC(simulationResult$analysisTime[1,1],digits=2,format="f")` months for the interim analysis and after `r formatC(simulationResult$analysisTime[2,1],digits=2,format="f")` for the final analysis. Both timings differ by <0.1 months from the analytical calculation
#' (Difference analysis times = `r formatC(simulationResult$analysisTime[1,1] - sampleSizeResult$analysisTime[1,1], digits = 2, format = "f")`,
#' `r formatC(simulationResult$analysisTime[2,1] - sampleSizeResult$analysisTime[2,1], digits = 2, format = "f")`).
#'
#' The summary of the simulations above also provide median [range] and mean+/-sd of the trial results across the simulated datasets. Summary results for each trial can be obtained from the simulation object using function `getData`. Similarly, raw data from individual trials that were stopped at each stage can be obtained using function `getRawData` (if `maxNumberOfRawDatasetsPerStage` was set > 0). The format of these datasets is described in the help `?getSimulationSurvival` and illustrated below.
#'
#' ## Accessing trial summaries per stage for each simulation
#'
## ---- include=TRUE, echo=TRUE--------------------------------------------
# get aggregate datasets from all simulation runs
aggregateSimulationData <- getData(simulationResult)
# show the first 6 records of the dataset
head(aggregateSimulationData)
# The aggregated dataset contains one record for each of the 10'000 simulated
# interim datasets (stageNumber=1), and one additional record for the final
# analysis (stageNumber=2) for all simulated trials which were not stopped at
# the interim analysis
table(aggregateSimulationData$stageNumber)
# One possible analysis which uses the aggregate dataset:
# display the distribution of the timing of the analyses (by stage)
# across simulation runs
p <- ggplot(aggregateSimulationData, aes(factor(stageNumber),analysisTime))
p + geom_boxplot() + labs(x = "Analysis stage",
y = "Timing of analysis (months \nfrom first patient randomized)")
#'
#' ## Accessing trial summaries per stage for each simulation
#'
## ---- include=TRUE, echo=TRUE--------------------------------------------
# get all stored raw datasets
rawSimulationDataData <- getRawData(simulationResult)
# chose one dataset which corresponds to a simulated (hypothetical) trial
# which was stopped at the efficacy interim analysis
# (only 1 such dataset exists here, because maxNumberOfRawDatasetsPerStage
# was set to 1 above)
stage1rawData <- subset(rawSimulationDataData,stopStage==1)
# Perform a Cox regression analysis of the simulated trial
# (treatmentGroup==1 is the intervention arm)
# Note that the p-value from the Cox regression is two-sided whereas
# the group-sequential is one-sided.
summary(coxph(Surv(timeUnderObservation,event)~I(treatmentGroup==1),data=stage1rawData))
# Show Kaplan-Meier curves of the simulated trial
ggsurvplot(survfit(Surv(timeUnderObservation, event) ~ treatmentGroup,
data = stage1rawData), risk.table = TRUE, break.time.by = 12,
censor.size = 1, risk.table.fontsize = 4, risk.table.pos = "in",
title = paste0("Kaplan-Meier curve of one simulated trial\n which ",
"was stopped at the interim analysis"),
xlab = "Time (months)", data = stage1rawData)
#'
#' # Simulation under non-proportional hazards
#'
#' For the sake of illustration, assume that the treatment effect in the example above is delayed by 6 months, i.e., that the active treatment does not affect the hazard in the first 6 months but still reduces it by 0.74-fold from month 6 onwards. The code below determines the power of the design in this situation via simulation.
#'
#' The code to specify a delayed treatment effect is similar to the simulation under proportional hazards except that now the survival function in each arm is specified via a piecewise constant exponential distribution:
#' `piecewiseSurvivalTime=c(0,6), lambda2=c(log(2)/60,log(2)/60), lambda1=c(log(2)/60,0.74*log(2)/60)`
#' (as always for rpact, "2" refers to the control arm which still has a constant hazard rate, i.e. an exponential distribution).
#'
## ---- include=TRUE, echo=TRUE--------------------------------------------
# Simulation assuming a delayed treatment effect
simulationResultNPH <- getSimulationSurvival(design,
piecewiseSurvivalTime=c(0,6),
lambda2=c(log(2)/60,log(2)/60),
lambda1=c(log(2)/60,0.74*log(2)/60),
dropoutRate1 = 0.025, dropoutRate2 = 0.025, dropoutTime = 12,
accrualTime = c(0,1,2,3,4,5,6),
accrualIntensity=c(6,12,18,24,30,36,42),
plannedEvents=c(232,351),
directionUpper = FALSE,
maxNumberOfSubjects = 1200,
maxNumberOfIterations = 10000,
maxNumberOfRawDatasetsPerStage = 1,
seed=3)
simulationResultNPH
#'
#' As can be seen from the output above, power drops substantially to
#' `r formatC(100*simulationResultNPH$overallReject,digits=1,format="f")`%.
#'
#' If one wants the trial to maintain a power of 80% despite the delayed treatment effect, the maximal number of events would need to be increased and some experimentation shows that one would need 498 events in this case as demonstrated in the simulation below. Note further that in this non-proportional hazards scenario, power does not only depend on the number of events but also on other assumptions, in particular on assumptions regarding the speed of recruitment.
#'
#'
## ---- include=TRUE, echo=TRUE--------------------------------------------
# Simulation assuming a delayed treatment effect
plannedEventsMax <- 498
simulationResultNPH2 <- getSimulationSurvival(design,
piecewiseSurvivalTime=c(0,6),
lambda2=c(log(2)/60,log(2)/60),
lambda1=c(log(2)/60,0.74*log(2)/60),
dropoutRate1 = 0.025, dropoutRate2 = 0.025, dropoutTime = 12,
accrualTime = c(0,1,2,3,4,5,6),
accrualIntensity=c(6,12,18,24,30,36,42),
plannedEvents=c(ceiling(0.66*plannedEventsMax),plannedEventsMax),
directionUpper = FALSE,
maxNumberOfSubjects = 1200,
maxNumberOfIterations = 10000,
maxNumberOfRawDatasetsPerStage = 1,
seed=4)
simulationResultNPH2
#'
#'
#' ***
#'
#' System: rpact `r packageVersion("rpact")`, `r R.version.string`, platform: `r R.version$platform`
#'
## ---- include=TRUE, echo=FALSE, results='asis'---------------------------
print(citation("rpact"), bibtex = FALSE)