The Invisible Helpers

How Algorithms Are Guarding the Quality of COVID-19 Science

Automated Screening Research Transparency Scientific Reproducibility

The COVID-19 Preprint Explosion

In the frantic early months of the COVID-19 pandemic, as the novel coronavirus swept across continents, another phenomenon was spreading through the scientific community—an unprecedented avalanche of research. Scientists worldwide raced to understand the virus, publishing their findings at breathtaking speed on preprint servers like medRxiv and bioRxiv. These platforms allowed researchers to share findings immediately, bypassing the typically slow peer-review process that can take months or even years.

Preprint Volume Comparison
The Challenge

With one quarter of all COVID-19 papers appearing as preprints, the traditional system of peer review simply couldn't keep pace. This torrent of research came with a significant challenge: how could the scientific community ensure the quality and reliability of these urgently published studies?

Enter an innovative solution: automated screening tools that could rapidly evaluate thousands of preprints and help authors improve their work before formal publication.

Meet the Robot Referees

As thousands of COVID-19 preprints flooded scientific servers, the Automated Screening Working Group assembled a digital toolkit to help manage the deluge. These automated tools aren't designed to replace human peer review but to complement it by flagging common reporting problems that can affect research quality and reproducibility.

Think of these tools as specialized proofreaders for scientific research, each with a specific expertise:

SciScore

Evaluates whether studies include critical methodological details like blinding, randomization, sample-size calculations, and whether researchers report the sex of animal or human subjects 1 .

ODDPub

Scans papers for mentions of open data and open code—essential elements that allow other scientists to verify and build upon published findings 1 .

Barzooka

Detects a surprisingly common problem in scientific visualization: the use of bar graphs to display continuous data, which can hide important patterns in the underlying data 1 .

JetFighter

Identifies another visualization issue—the use of rainbow color maps that are difficult for colorblind readers to interpret and can create visual artifacts 1 .

These tools screen every new COVID-19 preprint posted to medRxiv and bioRxiv, with results shared daily through web annotations and social media via @SciScoreReports 1 . Authors receive specific, actionable feedback on how to improve their manuscripts, often before their work reaches traditional peer reviewers.

What the Algorithms Found: A Reality Check for COVID-19 Science

When researchers analyzed the results from screening 6,570 COVID-19 preprints, they uncovered both encouraging signs and concerning gaps in research transparency 1 .

Transparency Indicators in COVID-19 Preprints
Data Interpretation

The numbers reveal significant room for improvement. Only about one in seven preprints made their underlying data or analysis code available to other researchers 1 . This creates a substantial reproducibility problem, as other scientists cannot verify or build upon these findings without the original materials.

Similarly concerning, only about one-third of preprints acknowledged any study limitations 1 . Recognizing limitations is crucial for proper interpretation of results, especially in fast-moving fields like COVID-19 research where initial findings often require refinement.

Methodological Gaps

The analysis also uncovered specific methodological reporting gaps. Even among studies that included ethical approval statements—suggesting human or animal research—only:

Randomization mentioned: 12.6%
Blinding discussed: 5.4%
Sample size calculations: 2.4%

These methodological details are essential for evaluating the reliability of experimental findings.

A Deeper Look: The Preprint Screening Experiment

To understand how automated screening works in practice, let's examine the large-scale analysis conducted on COVID-19 preprints. This wasn't a traditional laboratory experiment but rather a massive evaluation of scientific reporting practices across thousands of studies.

Methodology: How the Screening Worked
Data Collection

The process began with collecting metadata from major preprint servers—arXiv, bioRxiv, and medRxiv—via their application programming interfaces (APIs) 4 .

Keyword Identification

Researchers identified COVID-19-related preprints by searching for keywords in titles and abstracts, including terms like "COVID-19," "SARS-CoV-2," and "coronavirus" 4 .

Algorithm Application

The core analysis used the ODDPub (Open Data Detection in Publications) algorithm 4 , which scans paper texts for markers indicating whether authors have shared their data or code.

Results and Analysis

The findings revealed striking gaps in research transparency. A later study that expanded the analysis through June 2021 found that an overwhelming majority of COVID-19 preprints lacked open science markers 4 .

Absence of Open Science Markers in COVID-19 Preprints

When compared to pre-pandemic baselines, these numbers become even more interesting. Analysis of 2019 preprints found that 93% of arXiv, 63% of bioRxiv, and 75% of medRxiv preprints showed no indicators of open data or code 4 .

The implications are significant. When research lacks open data or code, other scientists cannot easily verify results or build upon the findings—a critical limitation during a public health emergency when time is of the essence.

The Scientist's Toolkit: Essential COVID-19 Research Resources

While automated screening tools represent one technological advancement in science, the tangible tools of COVID-19 research remain essential. The pandemic has spurred the development of an extensive array of research reagents that enable scientists to study the virus and develop countermeasures.

Reagent Type Examples Primary Research Applications
Virus Strains SARS-CoV-2 BetaCoV/Australia/VIC01/2020 6 Establishing infection models, vaccine testing
International Standards 1st WHO International Standard for SARS-CoV-2 RNA 6 Calibrating assays, ensuring accurate measurements
Antibodies Anti-SARS-CoV-2 immunoglobulin 6 Neutralization assays, therapeutic development
PCR Components SARS-CoV-2 RT-PCR primers and probes 3 Diagnostic test development, viral detection
Cell Lines VeroE6/TMPRSS2 6 Viral propagation, infection mechanism studies
Viral Proteins Spike protein, nucleocapsid protein 8 Vaccine development, serological test creation
Organoid Models Intestinal organoids, kidney organoids 7 Modeling infection in human-like tissues
International Standards

For instance, international standards for SARS-CoV-2 RNA allow laboratories worldwide to calibrate their diagnostic tests for accurate, comparable results 6 .

Organoid Models

Meanwhile, organoid models—miniature, simplified versions of organs grown from stem cells—enable scientists to study how the virus infects human tissues without relying solely on animal models 7 .

The widespread availability of these carefully validated research materials has been crucial for the accelerated pace of COVID-19 science, allowing researchers across the globe to work with standardized, reliable tools.

The Future of Automated Science Screening

The automated screening of COVID-19 preprints represents more than just a pandemic stopgap—it offers a glimpse into the future of scientific publishing. As the volume of scientific literature continues to grow exponentially across all fields, tools that can provide rapid, automated feedback may become increasingly integrated into the research publication process.

1
Early-Career Support

These technologies show particular promise for helping early-career scientists improve their reporting practices.

2
Global Standards

They can support researchers from institutions with less established scientific traditions.

3
Immediate Feedback

By providing immediate, specific feedback on common reporting issues, automated tools have the potential to raise standards across the global scientific community.

Human-Machine Collaboration

The experience with COVID-19 preprints has demonstrated that while automated screening isn't perfect—these tools can make mistakes and struggle with complex problems—they can successfully flag common issues and direct authors toward solutions 1 . This approach complements human peer review by handling straightforward checks algorithmically, freeing up human experts to focus on more nuanced aspects of scientific quality.

As we emerge from the COVID-19 pandemic, the scientific community continues to grapple with how to maintain both speed and quality in research. The experiment with automated screening of preprints has provided valuable insights into one possible path forward—where technology and human expertise work in tandem to guard the integrity of science while accelerating its progress.

References