
Exploratory Data Analysis, Normality Testing, and Visualization
Source:R/is_it_normal.R
is_it_normal.Rdis_it_normal() calculates descriptive statistics and conducts univariate
normality testing on one or more numeric variables in a dataset using a
selected statistical test. Optional plots are included for one variable at a
time, only. Results are returned as a named list containing summaries and,
optionally, normality tests and/or diagnostic plots.
Usage
is_it_normal(
df,
...,
group_vars = NULL,
seed = 10232015,
normality_test = NULL,
include_plots = FALSE,
plot_theme = traumar::theme_cleaner
)Arguments
- df
A
data.frameortibblecontaining the variables to assess.- ...
One or more unquoted column names from
dfto be analyzed.- group_vars
Optional. A character vector of column names in
dfto group results by (e.g.,c("year", "hospital_level")). IfNULL, no grouping is applied. Grouped summaries and normality tests are computed within each unique combination of values across these variables.- seed
A numeric value passed to
set.seed()to ensure reproducibility. Default is10232015.- normality_test
A character string specifying the statistical test to use. Must be one of:
"shapiro-wilk" or "shapiro" or "sw","kolmogorov-smirnov" or "ks","anderson-darling" or "ad","lilliefors" or "lilli","cramer-von-mises" or "cvm","pearson" or "p", or"shapiro-francia" or "sf". IfNULL, no normality test is performed, which is the default.- include_plots
Logical. If
TRUE, plots are generated for a single variable. Plotting is disabled if multiple variables are passed.- plot_theme
A
ggplot2::themefunction to apply to all plots. Default istraumar::theme_cleaner.
Value
A named list with the following elements:
- descriptive_statistics
A
tibbleof summary statistics for each variable.- normality_test
A
tibbleof test statistics and p-values (ifnormality_test == TRUE).- plots
A patchwork object containing four plots (if
include_plots = TRUEand one variable supplied).
Details
If the data do not meet the test requirements for a chosen test of normality,
is_it_normal()will not run the tests.Normality tests may yield differing results. Each test has distinct assumptions and sensitivity. Users should verify assumptions and consult test-specific guidance to ensure appropriate use.
The function will abort with helpful CLI messages if input types or structures are incorrect.
If plotting is enabled, and
nrow(df) > 10000, a warning is issued as plotting may become computationally expensive.
Note
Supported normality tests are below. Please check the specifications of these tests in the corresponding documentation.
Shapiro-Wilk (
stats::shapiro.test())Kolmogorov-Smirnov (
stats::ks.test())Anderson-Darling (
nortest::ad.test())Lilliefors (
nortest::lillie.test())Cramer-von Mises (
nortest::cvm.test())Pearson (
norest::pearson.test())Shapiro-Francia (
nortest::sf.test())
Please note that if grouped plotting is enabled, each group will generate its own set of plots. This may flood your IDE or console. Plan your use of this functionality with care to avoid lags or unwanted outputs.