This function generates nonlinear bins for probability of survival data based on specified thresholds and divisors as specified in Napoli et al. (2017), Schroeder et al. (2019), and Kassar et al. (2016). This function calculates bin statistics, including mean, standard deviation, total alive, total dead, count, and percentage for each bin.
Usage
nonlinear_bins(
data,
Ps_col,
outcome_col,
divisor1 = 5,
divisor2 = 5,
threshold_1 = 0.9,
threshold_2 = 0.99
)
Arguments
- data
A
data.frame
ortibble
containing the probability of survival data for a set of patients.- Ps_col
The column in
data
containing the probability of survival values for a set of patients.- outcome_col
The name of the column containing the outcome data. It should be binary, with values indicating patient survival. A value of
1
should represent "alive" (survived), while0
should represent "dead" (did not survive). Ensure the column contains only these two possible values.- divisor1
A parameter to control the width of the probability of survival range bins. Affects the creation of step sizes for the beginning of each bin range. Defaults to
5
.- divisor2
A parameter to control the width of the probability of survival range bins. Affects the creation of step sizes for the beginning of each bin range. Defaults to
5
.- threshold_1
A parameter to decide where data indices will begin to create step sizes. Defaults to
0.9
.- threshold_2
A parameter to decide where data indices will end to create step sizes. Defaults to
0.99
.
Value
A list with intervals
and bin_stats
objects:
intervals
: A vector of start and end-points for the probability of survival bin ranges.bin_stats
: Atibble
with columnsbin_number
,bin_start
,bin_end
,mean
,sd
,alive
,dead
,count
, andpercent
.
Examples
# Generate example data with high negative skewness
set.seed(123)
# Parameters
n_patients <- 10000 # Total number of patients
# Skewed towards higher values
Ps <- plogis(rnorm(n_patients, mean = 2, sd = 1.5))
# Simulate survival outcomes based on Ps
survival_outcomes <- rbinom(n_patients,
size = 1,
prob = Ps
)
# Create data frame
data <- data.frame(Ps = Ps, survival = survival_outcomes) |>
dplyr::mutate(death = dplyr::if_else(survival == 1, 0, 1))
# Apply the nonlinear_bins function
results <- nonlinear_bins(data = data,
Ps_col = Ps,
outcome_col = survival,
divisor1 = 5,
divisor2 = 5,
threshold_1 = 0.9,
threshold_2 = 0.99)
# View results
results$intervals
#> [1] 0.02257717 0.54234698 0.70154257 0.79581165 0.85714527 0.90005763
#> [7] 0.92518915 0.94603830 0.96266743 0.97623957 0.99957866
results$bin_stats
#> # A tibble: 10 × 13
#> bin_number bin_start bin_end mean sd Pred_Survivors_b Pred_Deaths_b
#> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.0226 0.542 0.378 0.122 419. 692.
#> 2 2 0.542 0.702 0.628 0.0458 698. 413.
#> 3 3 0.702 0.796 0.752 0.0271 836. 275.
#> 4 4 0.796 0.857 0.829 0.0173 921. 190.
#> 5 5 0.857 0.900 0.879 0.0126 976. 134.
#> 6 6 0.900 0.925 0.913 0.00723 735. 70.0
#> 7 7 0.925 0.946 0.936 0.00596 753. 51.8
#> 8 8 0.946 0.963 0.954 0.00473 768. 36.7
#> 9 9 0.963 0.976 0.970 0.00406 781. 24.4
#> 10 10 0.976 1.00 0.987 0.00621 1210. 16.2
#> # ℹ 6 more variables: AntiS_b <dbl>, AntiM_b <dbl>, alive <dbl>, dead <dbl>,
#> # count <dbl>, percent <dbl>