A report that brought together researchers all over the United States highlights the need to address gaps in data recording to improve biological diversity monitoring across the globe.
Justin Berg, a University of Guam EPSCoR graduate research assistant, collaborated with other researchers to produce the paper, “Poor data stewardship will hinder global genetic diversity surveillance.” PNAS published the brief report in July this year.
For the study, the researchers looked at publicly available data in the International Nucleotide Sequence Database Collaboration (INSDC). The study notes that most scientific journals require authors to archive their genetic data in a permanent database, and the INSDC is the leading repository of raw genomic data.
With the available data in the INSDC and other open-access repositories, the study notes that researchers can now “genotype thousands of loci or sequence whole genomes from virtually any species.”
During the research process, Berg said they found gaps or missing metadata in these data sets, or it indicated different geographical locations. According to Berg, as of October 2020, the Sequence Read Archive of INSDC contained 16,700 unique wild and domesticated eukaryotic species and 327,577 individual organisms. He said only 14 percent of the genomic data had spatiotemporal metadata for genetic diversity monitoring.
Berg said, “That essentially means when people place their genetic sequences in a database, from an international level all the way to the United States NCBI (National Center for Biotechnology Information), they may be missing data sets and missing metadata that are concurrently in past or current studies. Right now, through this project, we show that 86 percent of these projects were missing some form of metadata, including the year that it was collected or the location where it was collected.”
According to the report, the researchers looked at aquatic and terrestrial domesticated species recorded in the INSDC through the NCBI because biodiversity studies mostly focus on these targets.
The report notes that, in principle, these data can “provide time-stamped records for genetic diversity monitoring, to support the goals of the United Nations Convention on Biological Diversity (CBD).” In addition, the data can be used to shed light on “the evolutionary and ecological processes that shape biodiversity across the globe.”
As an instrument for sustainable development, the CBD focuses on the conservation of biological diversity, the sustainable use of its components and the fair and equitable sharing of the benefits arising out of the utilization of genetic resources.
“This study can help with genetic diversity monitoring through the United Nations Convention on Biological biodiversity. It can do this by including increased metadata in the future. So, if someone from another part of the world wants to go in, anyone can access this genetic data,” Berg said.
Berg and the other researchers said they join others in calling for ambitious goals to safeguard genetic diversity and the knowledge structures that will support this goal. “Common to proposed genetic diversity monitoring agendas is a shared vision whereby agile pipelines would intake raw genomic data and produce outputs that directly inform conservation policies and decisions,” the researchers said.
The researchers emphasized that without appropriate archival genomic data that include the spatiotemporal metadata, crucial information will be unavailable to such pipelines, and researchers will be unable to monitor genetic biodiversity or reconstruct past baselines.
Berg said they are planning to release a more comprehensive report on their findings.
The paper can be accessed through PNAS, a peer-reviewed scientific journal.