BEAR

Benchmarks of Empirical Accuracy in Research

Problem

Quantitative metascience drives important debates about research standards. Most of its crucial contributions have been based on analysing individual datasets, often painstakingly constructed by researchers. Thanks to their work we now have better understanding of replicability, publication bias, p-hacking, reporting practices, pre-registration, significance rates, and more. But many canonical findings of metascience are based on individual datasets, each constructed in a specific way. How generalisable are these findings?

Solution

BEAR is an open-source database for metascience exploration with over 20 datasets across many scientific fields and millions of data points. BEAR does not generate any new source data; it repackages publicly available datasets into a common structure to make research on replication, exchangeability, meta-analysis, and other questions about empirical research easier and more generalisable.

To download the latest version of the data, go to GitHub Releases, or follow the download instructions for the command line.

12.4 mln

data points

7,567

meta-analyses

datasets

Datasets

BEAR combines curated registries of studies, random collections of articles in different disciplines, sets of meta-analyses, and direct replication projects. These are purposefully heterogeneous datasets of varying quality. The dataset covers many disciplines: economics, political science, biomedicine, clinical trials, ecology, evolution, education, neuroscience, and more. Head to datasets section to read about individual datasets. See Documentation for shared derivation rules, including the treatment of p-values and confidence intervals.

What to do with these data

At a minimum BEAR contains z-values with clearly labelled identifiers for studies and meta-analyses. Where available, it also includes effect sizes, standard errors, sample sizes, methods (e.g. parallel RCT), measures (e.g. standardised mean difference, odds ratio), and additional covariates (e.g. year, pre-registration status, phase of clinical trial, journal, discipline).

An optional model of signal-to-noise ratios (based on van Zwet et al) is also included as part of the repository, to quantify basic metascientific parameters like selection, assurance, and replicability of findings. Users are encouraged to fit their own models.

Credits

All credit is due to researchers who put together the metascientific datasets and people maintaining large scale repositories from which data are compiled here. Please see individual dataset pages for references and licensing information.

Problem

Solution

Datasets

What to do with these data

Links

Credits