BEAR
Benchmarks of Empirical Accuracy in Research
Problem
Quantitative metascience drives important debates about research standards. Most of its crucial contributions have been based on analysing individual datasets, often painstakingly constructed by researchers. Thanks to their work we now have better understanding of replicability, publication bias, p-hacking, reporting practices, pre-registration, significance rates, and more. But many canonical findings of metascience are based on individual datasets, each constructed in a specific way. How generalisable are these findings?
Solution
BEAR is an open-source database for metascience exploration with over 20 datasets across many scientific fields and millions of data points. BEAR does not generate any new source data; it repackages publicly available datasets into a common structure to make research on replication, exchangeability, meta-analysis, and other questions about empirical research easier and more generalisable.
To download the latest version of the data, go to GitHub Releases, or follow the download instructions for the command line.
Datasets
BEAR combines curated registries of studies, random collections of articles in different disciplines, sets of meta-analyses, and direct replication projects. These are purposefully heterogeneous datasets of varying quality. The dataset covers many disciplines: economics, political science, biomedicine, clinical trials, ecology, evolution, education, neuroscience, and more. Head to datasets section to read about individual datasets. See Documentation for shared derivation rules, including the treatment of p-values and confidence intervals.
What to do with these data
At a minimum BEAR contains z-values with clearly labelled identifiers for studies and meta-analyses. Where available, it also includes effect sizes, standard errors, sample sizes, methods (e.g. parallel RCT), measures (e.g. standardised mean difference, odds ratio), and additional covariates (e.g. year, pre-registration status, phase of clinical trial, journal, discipline).
An optional model of signal-to-noise ratios (based on van Zwet et al) is also included as part of the repository, to quantify basic metascientific parameters like selection, assurance, and replicability of findings. Users are encouraged to fit their own models.

Links
- Get data for a minimal R example.
- Datasets for the current dataset index, basic results, and individual dataset descriptions.
- Documentation for shared derivation rules and processing notes.
- Reproduce for more complete workflow of rebuilding the database from individual data sources and modelling.
- GitHub repository to access code and dataset directly.
Credits
All credit is due to researchers who put together the metascientific datasets and people maintaining large scale repositories from which data are compiled here. Please see individual dataset pages for references and licensing information.