Jager and Leek 2014 (PubMed/Medline)

Note

To download only this data file: JagerLeek.rds (331 KB)

To download all BEAR datasets, click here.

Jager and Leek 2014 (PubMed/Medline)

Reference: Jager and Leek (2014).

Research question: estimating science-wise false discovery rate.

Data collection: authors used a custom program to extract p-values from scraped PubMed abstracts for papers published in 5 main medical journals 2000-2010. 15,653 p-values are available in 5,322 articles.

Data availability: the data file pvalueData.rda is available https://github.com/jtleek/swfdr License for the programs in that repository is GNU GPL, although a license for the dataset is not stated, as far as we are aware.

Data processing: there was only minimal processing. We derived unsigned z values from p-values assuming they were two-sided.

Notably, large proportion of p-values is truncated, almost always at 0.0001, 0.001, 0.01, or 0.05. As in all datasets, we retained information on truncation. We also treated p-values recorded as exactly 0 as truncated (z-operator >). We created a crude flag for RCTs by searching the paper titles for “randomized”, “randomised” and “controlled”.

Model of z-values

The fitted mixture model is shown over the empirical distribution of absolute z-values. The solid line is a mixture of half-normals, with selection. The dashed line shows the distribution without selection. If there are inequalities (e.g. studies reporting p < 0.05) the histogram resamples values from the appropriate set.

Jager and Leek mixture model plot

References

Jager, Leah R, and Jeffrey T Leek. 2014. “An Estimate of the Science-Wise False Discovery Rate and Application to the Top Medical Literature.” Biostatistics 15 (1): 1–12.