mverse is an extension to multiverse package (Sarma et al. 2021) which allows users create explorable multiverse analysis (Steegen et al. 2016) in R. This extension provides user friendly abstraction and a set of examples for researchers, educators, and students in statistics.
You can install the released version of mverse from CRAN with:
install.packages("mverse")
You can install the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("mverseanalysis/mverse", build_vignettes = TRUE)
The following demonstration performs a multiverse analysis using hurricane
dataset (Jung et al. 2014) included in the library. We first create 6 universes as described in Figure 1. A filter branch with 2 options and a mutate branch with 3 options results in 6 universes in total. We then fit a Poisson regression model across the multiverse and inspect a coefficient estimate. See vignette("hurricane")
for a detailed analysis as well as the terminologies used.
#> Warning: Using alpha for a discrete variable is not advised.
#> Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.
First, we start by loading the library and defining a mverse
object with the dataset of interest.
We use the *_branch()
methods to define branches. filter_branch()
defines filtering operations using dplyr::filter()
with different options for the filter.
outliers <- filter_branch(
! Name %in% c("Katrina"),
! Name %in% c("Katrina", "Audrey")
)
mutate_branch()
multiplexes dplyr::mutate()
to add a new column in the dataset.
strength <- mutate_branch(
NDAM, HighestWindSpeed, Minpressure_Updated_2014)
In order to fit a Poisson regression, we need to specify the model using R’s formula syntax and the underlying distribution using family
. In mverse
, we provide the specifications using formula_branch()
and family_branch()
. In this demonstration, we only define a single option for both formula and family but it is possible to provide multiple options for them as well.
model <- formula_branch(alldeaths ~ strength * MasFem)
distribution <- family_branch(poisson)
After defining the branches, we can add the branch objects to the mverse
object using add_*_branch()
methods.
mv <- mv %>%
add_filter_branch(outliers) %>%
add_mutate_branch(strength) %>%
add_formula_branch(model) %>%
add_family_branch(distribution)
glm_mverse()
multiplexes stats::glm()
function and fits a GLM in each universe according to the specifications provided by add_fomula_branch()
and add_family_branch()
.
mv <- mv %>% glm_mverse()
After completing the analysis, we can extract the results using summary()
. The method returns a table with branching options, estimates, 95% confidence intervals for all regression terms across the multiverse.
res <- summary(mv)
res
#> # A tibble: 24 × 16
#> universe outliers_br…¹ stren…² model…³ distr…⁴ term estimate std.e…⁵ stati…⁶
#> <fct> <fct> <fct> <fct> <fct> <chr> <dbl> <dbl> <dbl>
#> 1 1 outliers_1 streng… model_1 distri… (Int… 2.13e+0 8.04e-2 26.5
#> 2 1 outliers_1 streng… model_1 distri… stre… 3.02e-5 2.63e-6 11.5
#> 3 1 outliers_1 streng… model_1 distri… MasF… 6.23e-2 1.01e-2 6.19
#> 4 1 outliers_1 streng… model_1 distri… stre… 7.96e-7 3.20e-7 2.49
#> 5 2 outliers_1 streng… model_1 distri… (Int… -8.59e-2 2.65e-1 -0.324
#> 6 2 outliers_1 streng… model_1 distri… stre… 2.35e-2 2.17e-3 10.8
#> 7 2 outliers_1 streng… model_1 distri… MasF… 5.31e-2 3.20e-2 1.66
#> 8 2 outliers_1 streng… model_1 distri… stre… 3.32e-4 2.60e-4 1.28
#> 9 3 outliers_1 streng… model_1 distri… (Int… 4.74e+1 3.17e+0 15.0
#> 10 3 outliers_1 streng… model_1 distri… stre… -4.69e-2 3.34e-3 -14.0
#> # … with 14 more rows, 7 more variables: p.value <dbl>, conf.low <dbl>,
#> # conf.high <dbl>, outliers_branch_code <fct>, strength_branch_code <fct>,
#> # model_branch_code <fct>, distribution_branch_code <fct>, and abbreviated
#> # variable names ¹outliers_branch, ²strength_branch, ³model_branch,
#> # ⁴distribution_branch, ⁵std.error, ⁶statistic
The resulting data is a tibble
object and we can use regular tidyverse
grammar to manipulate the data. In the code below, we specifically focus on the estimated coefficient for MasFem
and its confidence intervals.
library(dplyr)
res %>%
filter(term == "MasFem") %>%
select(outliers_branch, strength_branch, term, estimate, conf.low, conf.high)
#> # A tibble: 6 × 6
#> outliers_branch strength_branch term estimate conf.low conf.high
#> <fct> <fct> <chr> <dbl> <dbl> <dbl>
#> 1 outliers_1 strength_1 MasFem 0.0623 0.0427 0.0822
#> 2 outliers_1 strength_2 MasFem 0.0531 -0.00942 0.116
#> 3 outliers_1 strength_3 MasFem -0.845 -1.59 -0.103
#> 4 outliers_2 strength_1 MasFem 0.0623 0.0427 0.0822
#> 5 outliers_2 strength_2 MasFem 0.0956 0.0301 0.161
#> 6 outliers_2 strength_3 MasFem -1.02 -1.81 -0.247
We can also inspect the result graphically using spec_curve()
. The method builds a specification curve (Simonsohn, Simmons, and Nelson 2020) for a term in the regression model specified by var
. The method also allows multiple ways of sorting the estimates. See ?spec_curve
for details.
spec_summary(mv, var = "MasFem") %>%
spec_curve(spec_matrix_spacing = 4) +
labs(colour = "Significant at 0.05")