- February 19, 2026
- By Tom Ventsias
The past decade has seen explosive growth in the collection and use of metagenomic data—genetic material gathered directly from environments such as soil, water and the human gut. This boom is helping scientists track diseases, study antibiotic resistance and discover new enzymes. But the flood of data is also creating major challenges in how the information is organized, analyzed and assessed for trustworthiness.
Researchers at the University of Maryland are working to tackle those challenges with support from two National Institutes of Health grants totaling $5.1 million. The team is building open-source software to better piece together complex genetic data while also conducting the first comprehensive review of how the quality of data in public biological databases impacts the accuracy of computational analyses.
“The overall goal is to create new analytic pipelines that can more effectively mine data to improve human health and combat disease,” said Mihai Pop, professor of computer science and principal investigator of both awards.
Modern microbiome research increasingly uses artificial intelligence to identify which microbes are present in a sample and what they are doing. While these tools have improved rapidly, Pop cautioned that accuracy remains critical.
“AI is not foolproof, and correctly profiling genetic material is imperative so clinicians can make better diagnoses and offer more personalized treatments,” he said.
Computer science Professor Mihai Pop (left) is principal investigator for $5.1 million in federal awards that focus on metagenomic data. (Photo by John T. Consoli)
One of the grants—a $2.7 million award from the National Institute of Allergy and Infectious Diseases—will support development of new software that helps scientists reconstruct microbial genomes from mixed genetic samples.
Rebuilding whole genomes gives researchers much deeper biological insight. However, today’s tools often struggle when data comes from multiple samples or different sequencing technologies. Current approaches can also require heavy computing power and can’t always be scaled up efficiently.
Pop’s team—including a new postdoctoral researcher and a recently hired bioinformatics engineer—will develop methods that combine results from multiple analyses in a faster and more flexible way. The work will be supported by high-performance computing resources at the University of Maryland Institute for Advanced Computer Studies (UMIACS).
The second grant, $2.4 million from the National Library of Medicine, focuses on the growing reliance on large public biological databases used by scientists and clinicians worldwide.
These databases contain vast amounts of information about DNA, proteins and molecular structures. But as submissions have increased—partly due to new measurement technologies and by the emergence of AI—the risk of errors or corrupted entries has also grown.
“If bad data get into widely used databases, it can ripple through many studies and even affect clinical decisions,” Pop said.
Working with UMIACS colleagues Tudor Dumitras, associate professor of electrical and computer engineering, and Brantley Hall, assistant professor of cell biology and molecular genetics, the team will examine which types of analysis are most vulnerable to bad data and how errors spread through complex bioinformatics pipelines. They will also explore ways to protect databases from malicious or adversarial data manipulation.
As part of the project, the researchers will create a suite of user-friendly tools that scientists can apply to their own workflows. These tools will help researchers:
- Test how accurate their bioinformatics software is;
- Better understand the structure of biological databases;
- Measure how sensitive their results are to questionable data;
- Track how errors move through complex analysis pipelines.
Together, the tools will function as a forensic and debugging toolkit for the bioinformatics community.
“We hope that our research program will lead to biomedical analytics systems able to draw correct conclusions from imperfect data, helping to advance human and environmental health,” Pop said.