#' ---
#' title: "Simulation-based design of group-sequential trials with a survival endpoint with rpact"
#' author: "Marcel Wolbers, Gernot Wassmer, and Friedrich Pahlke"
#' date: "Last change: `r format(Sys.time(), '%d %B, %Y')`"
#' number: 5
#' header-includes:
#' - \usepackage{fancyhdr}
#' - \pagestyle{fancy}
#' - \setlength{\headheight}{23pt}
#' - \fancyfoot[C]{www.rpact.com}
#' - \fancyfoot[L]{\thepage}
#' output:
#' rmarkdown::html_document:
#' highlight: pygments
#' number_sections: yes
#' self_contained: yes
#' toc: yes
#' toc_depth: 3
#' toc_float: yes
#' css: style.css
#' includes:
#' before_body: header.html
#' after_body: footer2.html
#' ---
#'
#'
#'
#' # Summary {-}
#'
#' This [R Markdown](https://rmarkdown.rstudio.com) document describes how to
#' simulate design characterics for survival design under complex settings (incl.
#' non-proportional hazards) in [rpact](https://cran.r-project.org/package=rpact).
#'
#' # Introduction
#'
#' rpact provides the function `getSimulationSurvival()` for simulation of
#' group-sequential trials with a time-to-event endpoint. For a given scenario,
#' `getSimulationSurvival()` simulates many hypothetical group-sequential trials
#' and calculates the test results. Based on this Monte Carlo simulation, estimates
#' of key quantities such as overall study power, stopping probabilities at each
#' interim analysis, timing of analyses etc. can be obtained.
#'
#' `getSimulationSurvival()` complements the analytical calculations from the function
#' `getSampleSizeSurvival()` in multiple ways:
#'
#' - Simulations can be used to assess the accuracy of the analytical formulas.
#' - Simulations allow to answer questions such as the following:
#' + How variable is the timing of interim analysis (even if all assumptions
#' are correct)?
#' + How could a dataset of a trial that is stopped early for efficacy at an
#' interim analysis look like?
#' - Simulation is also possible for scenarios that are analytically intractable
#' such as scenarios with delayed treatment effects.
#'
#' The syntax of function `getSimulationSurvival()` is very similar to the function
#' `getSampleSizeSurvival()`. Hence, this document only provides some examples and
#' expects that the reader is familiar with the R Markdown document [Designing
#' group-sequential trials with two groups and a survival endpoint with
#' rpact](https://vignettes.rpact.org/html/rpact_survival_examples.html) which
#' describes standard designs of a trial with a survival endpoint.
#'
#' `getSimulationSurvival()` also supports the usage of adaptive sample size
#' recalculation but this is not covered here. For more details, please also
#' consult the help `?getSimulationSurvival`.
#'
#'
#' **First, load the rpact package**
#'
## ---- include = TRUE, echo = TRUE---------------------------------------------
library(rpact)
packageVersion("rpact") # version should be version 2.0.5 or later
#'
#'
#' For this R Markdown document, some **additional R packages** are loaded which
#' will be used to examine simulated datasets.
## ---- include = TRUE, echo = TRUE---------------------------------------------
# Load ggplot2 for plotting
library(ggplot2) # general plotting
library(survminer) # plotting of KM curves with ggplot2
library(survival) # survival analysis routines
#'
#'
#' # Standard analytical calculation
#'
#' For comparison with the simulation-based analysis, a standard example is first
#' calculated under the following assumptions:
#'
#' - Group-sequential design with one interim analysis after 66% of information
#' using an O'Brien & Fleming type $\alpha$-spending function, one-sided Type I
#' error 2.5%, power 80%:
#'
## ---- include = TRUE, echo = TRUE---------------------------------------------
design <- getDesignGroupSequential(informationRates = c(0.66,1),
typeOfDesign = "asOF", sided = 1, alpha = 0.025, beta = 0.2)
#'
#' - Exponential PFS with a median PFS of 60 months in control
#' (`lambda2 = log(2)/60`) and a target hazard ratio of 0.74 (`hazardRatio = 0.74`).
#' - Annual drop-out of 2.5% in both arms
#' (`dropoutRate1 = 0.025, dropoutRate2 = 0.025, dropoutTime = 12`).
#' - Recruitment is 42 patients/month from month 6 onwards after linear ramp up.
#' (`accrualTime = c(0,1,2,3,4,5,6), accrualIntensity = c(6,12,18,24,30,36,42)`)
#' - Randomization ratio 1:1 (`allocation1 = 1` and `allocation2 = 1`; this is how
#' subjects are randomized in treatment groups 1 and 2 in a subsequent way). 1:1
#' allocation is the default and is thus not explicitly set in the function call
#' below.
#' - A fixed total sample size of 1200 (`maxNumberOfSubjects = 1200`).
#'
#' As described in the R Markdown document which describes standard designs of a
#' trial with a survival endpoint, sample size calculations for this design can be
#' performed as per the code below:
#'
## ---- include = TRUE, echo = TRUE---------------------------------------------
sampleSizeResult <- getSampleSizeSurvival(design,
lambda2 = log(2)/60,hazardRatio = 0.74,
dropoutRate1 = 0.025, dropoutRate2 = 0.025, dropoutTime = 12,
accrualTime = c(0,1,2,3,4,5,6),
accrualIntensity = c(6,12,18,24,30,36,42),
maxNumberOfSubjects = 1200)
# Standard rpact display of results
sampleSizeResult
# Summary of results
summary(sampleSizeResult)
#'
#' By design, the power of the trial is 80%. The interim analysis is after
#' `r ceiling(sampleSizeResult$eventsPerStage[1,1])` events which is expected to occur after
#' `r formatC(sampleSizeResult$analysisTime[1,1],digits = 2,format = "f")` months,
#' and the final analysis is after
#' `r ceiling(sampleSizeResult$eventsPerStage[2,1])` events which is expected to occur after
#' `r formatC(sampleSizeResult$analysisTime[2,1],digits = 2,format = "f")` months.
#' These numbers will now be compared to simulations.
#'
#' # Simulation under proportional hazards
#'
#' The call `getSimulationSurvival()` uses the same arguments as
#' `getSampleSizeSurvival()` with the following changes:
#'
#' - The maximum number of patients (`maxNumberOfSubjects = 1200`) is always provided to allow the simulation.
#' - The number of events at each analysis is specified as per the analytical
#' calculation above (`plannedEvents = c(232,351)`).
#' - For one-sided tests, the direction of the alternative is specified. Here, the
#' alternative is towards hazard ratios <1 which is specified as
#' `directionUpper = FALSE`.
#' - The number of simulated trials is specified (`maxNumberOfIterations = 10000`
#' in the example below).
#' - By default, raw datasets from simulation runs are not extracted. However, in
#' this example, it is specifies that one raw dataset that led to stopping after
#' each stage, respectively, will be stored: `maxNumberOfRawDatasetsPerStage = 1`.
#' - For reproducibility, it is useful to set the random seed which is set to
#' `seed = 2` in the example.
#' - For simulation of trials with unequal randomization, integer arguments
#' `allocation1` and `allocation2` must be provided to the function
#' `getSimulationSurvival` (instead of argument `allocationRatioPlanned` in the
#' function `getSampleSizeSurvival`). `allocation1` and `allocation2` specify the
#' number of consecutively enrolled subjects in the intervention and control
#' groups, respectively, before another subject from the opposite group is
#' recruited. For example, `allocation1 = 2, allocation1 = 1` refers to 2
#' (intervention):1 (control) randomization and
#' `allocation1 = 1, allocation1 = 2` to 1 (intervention):2 (control)
#' randomization.
#'
## ---- include = TRUE, echo = TRUE---------------------------------------------
simulationResult <- getSimulationSurvival(design,
lambda2 = log(2)/60,hazardRatio = 0.74,
dropoutRate1 = 0.025, dropoutRate2 = 0.025,
dropoutTime = 12,
accrualTime = c(0,1,2,3,4,5,6),
accrualIntensity = c(6,12,18,24,30,36,42),
plannedEvents = c(232,351),
directionUpper = FALSE,
maxNumberOfSubjects = 1200,
maxNumberOfIterations = 10000,
maxNumberOfRawDatasetsPerStage = 1,
seed = 12345)
# print the simulation result
simulationResult
# Summary of simulation results
summary(simulationResult)
#'
#' According to the output, the simulated overall power is
#' `r formatC(100*simulationResult$overallReject,digits = 1,format = "f")`%
#' and the probability to cross the efficacy boundary at the interim analysis is
#' `r formatC(100*simulationResult$rejectPerStage[1,1],digits = 1,format = "f")`%.
#' These are both within 1% of the analytical power.
#'
#' The mean simulated analysis times are after
#' `r formatC(simulationResult$analysisTime[1,1],digits = 2,format = "f")` months for
#' the interim analysis and after `r formatC(simulationResult$analysisTime[2,1],digits = 2,format = "f")`
#' for the final analysis. Both timings differ by <0.1 months from the analytical calculation
#' (Difference analysis times = `r formatC(simulationResult$analysisTime[1,1] - sampleSizeResult$analysisTime[1,1], digits = 2, format = "f")`,
#' `r formatC(simulationResult$analysisTime[2,1] - sampleSizeResult$analysisTime[2,1], digits = 2, format = "f")`).
#'
#' The summary of the simulations above also provide median [range] and mean+/-sd
#' of the trial results across the simulated datasets. Summary results for each
#' trial can be obtained from the simulation object using the function `getData()`.
#' Similarly, raw data from individual trials that were stopped at each stage can
#' be obtained using function `getRawData()` (if `maxNumberOfRawDatasetsPerStage` was
#' set > 0). The format of these datasets is described in the help
#' `?getSimulationSurvival` and illustrated below.
#'
#' ## Accessing trial summaries per stage for each simulation
#'
## ---- include = TRUE, echo = TRUE---------------------------------------------
# get aggregate datasets from all simulation runs
aggregateSimulationData <- getData(simulationResult)
# show the first 6 records of the dataset
head(aggregateSimulationData)
# The aggregated dataset contains one record for each of the 10'000 simulated
# interim datasets (stageNumber = 1), and one additional record for the final
# analysis (stageNumber = 2) for all simulated trials which were not stopped at
# the interim analysis
table(aggregateSimulationData$stageNumber)
# One possible analysis which uses the aggregate dataset:
# display the distribution of the timing of the analyses (by stage)
# across simulation runs
p <- ggplot(aggregateSimulationData, aes(factor(stageNumber),analysisTime))
p + geom_boxplot() + labs(x = "Analysis stage",
y = "Timing of analysis (months \nfrom first patient randomized)")
#'
#' ## Accessing trial summaries per stage for each simulation
#'
## ---- include = TRUE, echo = TRUE---------------------------------------------
# get all stored raw datasets
rawSimulationDataData <- getRawData(simulationResult)
# chose one dataset which corresponds to a simulated (hypothetical) trial
# which was stopped at the efficacy interim analysis
# (only 1 such dataset exists here, because maxNumberOfRawDatasetsPerStage
# was set to 1 above)
stage1rawData <- subset(rawSimulationDataData,stopStage = 1)
# Perform a Cox regression analysis of the simulated trial
# (treatmentGroup == 1 is the intervention arm)
# Note that the $p$-value from the Cox regression is two-sided whereas
# the group-sequential is one-sided.
summary(coxph(Surv(timeUnderObservation,event)~I(treatmentGroup == 1),data = stage1rawData))
# Show Kaplan-Meier curves of the simulated trial
ggsurvplot(survfit(Surv(timeUnderObservation, event) ~ treatmentGroup,
data = stage1rawData), risk.table = TRUE, break.time.by = 12,
censor.size = 1, risk.table.fontsize = 4, risk.table.pos = "in",
title = paste0("Kaplan-Meier curve of one simulated trial\n which ",
"was stopped at the interim analysis"),
xlab = "Time (months)", data = stage1rawData)
#'
#' # Simulation under non-proportional hazards
#'
#' For the sake of illustration, assume that the treatment effect in the example
#' above is delayed by 6 months, i.e., that the active treatment does not affect
#' the hazard in the first 6 months but still reduces it by 0.74-fold from month 6
#' onwards. The code below determines the power of the design in this situation via
#' simulation.
#'
#' The code to specify a delayed treatment effect is similar to the simulation under
#' proportional hazards except that now the survival function in each arm is
#' specified via a piecewise constant exponential distribution:
#' `piecewiseSurvivalTime = c(0,6), lambda2 = c(log(2)/60,log(2)/60), lambda1 = c(log(2)/60,0.74*log(2)/60)`
#' (as always for rpact, "2" refers to the control arm which still has a constant
#' hazard rate, i.e., an exponential distribution).
#'
## ---- include = TRUE, echo = TRUE---------------------------------------------
# Simulation assuming a delayed treatment effect
simulationResultNPH <- getSimulationSurvival(design,
piecewiseSurvivalTime = c(0,6),
lambda2 = c(log(2)/60,log(2)/60),
lambda1 = c(log(2)/60,0.74*log(2)/60),
dropoutRate1 = 0.025, dropoutRate2 = 0.025, dropoutTime = 12,
accrualTime = c(0,1,2,3,4,5,6),
accrualIntensity = c(6,12,18,24,30,36,42),
plannedEvents = c(232,351),
directionUpper = FALSE,
maxNumberOfSubjects = 1200,
maxNumberOfIterations = 10000,
maxNumberOfRawDatasetsPerStage = 1,
seed = 23456)
simulationResultNPH
#'
#' As can be seen from the output above, power drops substantially to
#' `r formatC(100*simulationResultNPH$overallReject,digits = 1,format = "f")`%.
#'
#' If one wants the trial to maintain a power of 80% despite the delayed treatment
#' effect, the maximal number of events would need to be increased and some
#' experimentation shows that one would need 498 events in this case as
#' demonstrated in the simulation below. Note further that in this non-proportional
#' hazards scenario, power does not only depend on the number of events but also on
#' other assumptions, in particular on assumptions regarding the speed of
#' recruitment.
#'
#'
## ---- include = TRUE, echo = TRUE---------------------------------------------
# Simulation assuming a delayed treatment effect
plannedEventsMax <- 498
simulationResultNPH2 <- getSimulationSurvival(design,
piecewiseSurvivalTime = c(0,6),
lambda2 = c(log(2)/60,log(2)/60),
lambda1 = c(log(2)/60,0.74*log(2)/60),
dropoutRate1 = 0.025, dropoutRate2 = 0.025, dropoutTime = 12,
accrualTime = c(0,1,2,3,4,5,6),
accrualIntensity = c(6,12,18,24,30,36,42),
plannedEvents = c(ceiling(0.66*plannedEventsMax),plannedEventsMax),
directionUpper = FALSE,
maxNumberOfSubjects = 1200,
maxNumberOfIterations = 10000,
maxNumberOfRawDatasetsPerStage = 1,
seed = 34567)
simulationResultNPH2
#'
#' ***
#'
#' System: rpact `r packageVersion("rpact")`, `r R.version.string`, platform: `r R.version$platform`
#'
## ---- include = TRUE, echo = FALSE, results = 'asis'--------------------------
printCitation()