Skip to contents

[Experimental]

is_it_normal() calculates descriptive statistics and conducts univariate normality testing on one or more numeric variables in a dataset using a selected statistical test. Optional plots are included for one variable at a time, only. Results are returned as a named list containing summaries and, optionally, normality tests and/or diagnostic plots.

Usage

is_it_normal(
  df,
  ...,
  group_vars = NULL,
  seed = 10232015,
  normality_test = NULL,
  include_plots = FALSE,
  plot_theme = traumar::theme_cleaner
)

Arguments

df

A data.frame or tibble containing the variables to assess.

...

One or more unquoted column names from df to be analyzed.

group_vars

Optional. A character vector of column names in df to group results by (e.g., c("year", "hospital_level")). If NULL, no grouping is applied. Grouped summaries and normality tests are computed within each unique combination of values across these variables.

seed

A numeric value passed to set.seed() to ensure reproducibility. Default is 10232015.

normality_test

A character string specifying the statistical test to use. Must be one of: "shapiro-wilk" or "shapiro" or "sw", "kolmogorov-smirnov" or "ks", "anderson-darling" or "ad", "lilliefors" or "lilli", "cramer-von-mises" or "cvm", "pearson" or "p", or "shapiro-francia" or "sf". If NULL, no normality test is performed, which is the default.

include_plots

Logical. If TRUE, plots are generated for a single variable. Plotting is disabled if multiple variables are passed.

plot_theme

A ggplot2::theme function to apply to all plots. Default is traumar::theme_cleaner.

Value

A named list with the following elements:

descriptive_statistics

A tibble of summary statistics for each variable.

normality_test

A tibble of test statistics and p-values (if normality_test == TRUE).

plots

A patchwork object containing four plots (if include_plots = TRUE and one variable supplied).

Details

  • If the data do not meet the test requirements for a chosen test of normality, is_it_normal() will not run the tests.

  • Normality tests may yield differing results. Each test has distinct assumptions and sensitivity. Users should verify assumptions and consult test-specific guidance to ensure appropriate use.

  • The function will abort with helpful CLI messages if input types or structures are incorrect.

  • If plotting is enabled, and nrow(df) > 10000, a warning is issued as plotting may become computationally expensive.

Note

Supported normality tests are below. Please check the specifications of these tests in the corresponding documentation.

Please note that if grouped plotting is enabled, each group will generate its own set of plots. This may flood your IDE or console. Plan your use of this functionality with care to avoid lags or unwanted outputs.

Author

Nicolas Foss, Ed.D., MS