Skip to contents

This function generates nonlinear bins for probability of survival data based on specified thresholds and divisors as specified in Napoli et al. (2017), Schroeder et al. (2019), and Kassar et al. (2016). This function calculates bin statistics, including mean, standard deviation, total alive, total dead, count, and percentage for each bin.

Usage

nonlinear_bins(
  data,
  Ps_col,
  outcome_col,
  group_vars = NULL,
  divisor1 = 5,
  divisor2 = 5,
  threshold_1 = 0.9,
  threshold_2 = 0.99
)

Arguments

data

A data.frame or tibble containing the probability of survival data for a set of patients.

Ps_col

The name of the column containing the survival probabilities (Ps). Should be numeric on a scale from 0 to 1.

outcome_col

The name of the column containing the outcome data. It should be binary, with values indicating patient survival. A value of 1 should represent "alive" (survived), while 0 should represent "dead" (did not survive). Ensure the column contains only these two possible values.

group_vars

Optional grouping variables for bin statistics calculations. These should be specified as quoted column names.

divisor1

A parameter to control the width of the probability of survival range bins. Affects the creation of step sizes for the beginning of each bin range. Defaults to 5.

divisor2

A parameter to control the width of the probability of survival range bins. Affects the creation of step sizes for the beginning of each bin range. Defaults to 5.

threshold_1

A parameter to decide where data indices will begin to create step sizes. Defaults to 0.9.

threshold_2

A parameter to decide where data indices will end to create step sizes. Defaults to 0.99.

Value

A list with two elements:

  • intervals: A vector defining bin boundaries for probability of survival.

  • bin_stats: A tibble containing:

    • bin_number: Bin index.

    • bin_start, bin_end: Bin range.

    • mean, sd: Mean and standard deviation of Ps_col within the bin.

    • Pred_Survivors_b, Pred_Deaths_b: Predicted counts of survivors and decedents, respectively.

    • AntiS_b, AntiM_b: Anticipated proportion survived, and deceased, respectively.

    • alive, dead: Count of observed survivors and non-survivors.

    • count: Total records in the bin.

    • percent: Percentage of records within each bin.

Details

Like other statistical computing functions, nonlinear_bins() is happiest without missing data. It is best to pass complete probability of survival and outcome data to the function for optimal performance. With smaller datasets, this is especially helpful. However, nonlinear_bins() will handle NA values and throw a warning about missing probability of survival values, if any exist in Ps_col.

References

Kassar, O.M., Eklund, E.A., Barnhardt, W.F., Napoli, N.J., Barnes, L.E., Young, J.S. (2016). Trauma survival margin analysis: A dissection of trauma center performance through initial lactate. The American Surgeon, 82(7), 649-653. doi:10.1177/000313481608200733

Napoli, N. J., Barnhardt, W., Kotoriy, M. E., Young, J. S., & Barnes, L. E. (2017). Relative mortality analysis: A new tool to evaluate clinical performance in trauma centers. IISE Transactions on Healthcare Systems Engineering, 7(3), 181–191. doi:10.1080/24725579.2017.1325948

Schroeder, P. H., Napoli, N. J., Barnhardt, W. F., Barnes, L. E., & Young, J. S. (2018). Relative mortality analysis of the “golden hour”: A comprehensive acuity stratification approach to address disagreement in current literature. Prehospital Emergency Care, 23(2), 254–262. doi:10.1080/10903127.2018.1489021

See also

Author

Nicolas Foss, Ed.D, MS, original implementation in MATLAB by Nicholas J. Napoli, Ph.D., MS

Examples

# Generate example data with high negative skewness
set.seed(123)

# Parameters
n_patients <- 10000  # Total number of patients

# Skewed towards higher values
Ps <- plogis(rnorm(n_patients, mean = 2, sd = 1.5))

# Simulate survival outcomes based on Ps
survival_outcomes <- rbinom(n_patients,
                            size = 1,
                            prob = Ps
                            )

# Create data frame
data <- data.frame(Ps = Ps, survival = survival_outcomes) |>
dplyr::mutate(death = dplyr::if_else(survival == 1, 0, 1))

# Apply the nonlinear_bins function
results <- nonlinear_bins(data = data,
                          Ps_col = Ps,
                          outcome_col = survival,
                          divisor1 = 5,
                          divisor2 = 5,
                          threshold_1 = 0.9,
                          threshold_2 = 0.99)

# View results
results$intervals
#>  [1] 0.02257717 0.54234698 0.70154257 0.79581165 0.85714527 0.90005763
#>  [7] 0.92518915 0.94603830 0.96266743 0.97623957 0.99957866
results$bin_stats
#> # A tibble: 10 × 13
#>    bin_number bin_start bin_end  mean      sd Pred_Survivors_b Pred_Deaths_b
#>         <int>     <dbl>   <dbl> <dbl>   <dbl>            <dbl>         <dbl>
#>  1          1    0.0226   0.542 0.378 0.122               420.         692. 
#>  2          2    0.542    0.702 0.628 0.0458              698.         413. 
#>  3          3    0.702    0.796 0.753 0.0270              836.         275. 
#>  4          4    0.796    0.857 0.829 0.0173              921.         190. 
#>  5          5    0.857    0.900 0.879 0.0126              976.         134. 
#>  6          6    0.900    0.925 0.913 0.00723             735.          70.0
#>  7          7    0.925    0.946 0.936 0.00596             753.          51.8
#>  8          8    0.946    0.963 0.954 0.00473             768.          36.7
#>  9          9    0.963    0.976 0.970 0.00405             781.          24.4
#> 10         10    0.976    1.00  0.987 0.00621            1209.          16.2
#> # ℹ 6 more variables: AntiS_b <dbl>, AntiM_b <dbl>, alive <int>, dead <int>,
#> #   count <int>, percent <dbl>

# Example with grouping by a categorical variable

# Add random group variable
data$group <- sample(c("A", "B"), size = n_patients, replace = TRUE)

# Run the function using a single grouping variable
results_grouped <- nonlinear_bins(data,
                                  Ps_col = Ps,
                                  outcome_col = survival,
                                  group_vars = "group"
                                  )

# View grouped results
results_grouped$bin_stats
#> # A tibble: 20 × 14
#>    group bin_number bin_start bin_end  mean      sd Pred_Survivors_b
#>    <chr>      <int>     <dbl>   <dbl> <dbl>   <dbl>            <dbl>
#>  1 A              1    0.0226   0.542 0.385 0.120               213.
#>  2 A              2    0.542    0.702 0.629 0.0459              361.
#>  3 A              3    0.702    0.796 0.753 0.0266              419.
#>  4 A              4    0.796    0.857 0.829 0.0178              460.
#>  5 A              5    0.857    0.900 0.880 0.0124              499.
#>  6 A              6    0.900    0.925 0.913 0.00724             367.
#>  7 A              7    0.925    0.946 0.936 0.00600             366.
#>  8 A              8    0.946    0.963 0.954 0.00473             392.
#>  9 A              9    0.963    0.976 0.970 0.00413             390.
#> 10 A             10    0.976    1.00  0.987 0.00619             569.
#> 11 B              1    0.0226   0.542 0.370 0.125               207.
#> 12 B              2    0.542    0.702 0.628 0.0458              337.
#> 13 B              3    0.702    0.796 0.752 0.0275              417.
#> 14 B              4    0.796    0.857 0.828 0.0168              461.
#> 15 B              5    0.857    0.900 0.879 0.0129              477.
#> 16 B              6    0.900    0.925 0.913 0.00723             368.
#> 17 B              7    0.925    0.946 0.935 0.00592             387.
#> 18 B              8    0.946    0.963 0.955 0.00472             376.
#> 19 B              9    0.963    0.976 0.970 0.00398             391.
#> 20 B             10    0.976    1.00  0.987 0.00622             640.
#> # ℹ 7 more variables: Pred_Deaths_b <dbl>, AntiS_b <dbl>, AntiM_b <dbl>,
#> #   alive <int>, dead <int>, count <int>, percent <dbl>