Scientists working to sequence all manner of bacteria, Archaea, plants, and animals and to make these genomes publicly available hope to use the data to inform health, industrial, and environmental issues. Large-scale sequencing consortia have been churning out data at an impressive rate, yet significant gaps remain in the genomic tree of life. And while these groups have largely been working independent of one another, together they might address more far-reaching questions, such as how life has evolved, how it currently functions, and how it might look down the line.
“We are still in the developmental stage, where every consortium focuses on a specific domain and is building up their own data and making sure it’s in good enough shape,” said Igor Grigoriev, head of the fungal genomics program at the US Department of Energy (DOE) Joint Genome Insitute (JGI) in Walnut Creek, California, and part of the 1,000 Fungal Genomes project. “Some dialog between the consortia is happening but grand-scale data integration remains to happen.”
Although there is still relatively little crosstalk among consortia, some of their data are being collected in central repositories. Aside from the National Center for Biotechnology Information’s genome database, there is the JGI-funded Genomes Online Database (GOLD), which functions as a hub for completed and ongoing genome sequencing initiatives and metagenome projects. GOLD is mainly focused on microbial genomes, but includes some eukaryotic genomes. Data from many of these projects are integrated in to JGI’s databases and can also be uploaded into newly developed KnowledgeBase tools funded by the DOE.
Bioinformatics tools will need to evolve to keep pace as genomic analyses become more complicated—covering complex inter-domain relationships, such as the symbiotic interplay between certain plants, fungi, and endobacteria. But even within a single consortium’s database, as the number of genomic sequences increases from tens to many hundreds, scaling the storage and analytical tools has been a challenge.
“Many computational scientists and bioinformaticians are working alongside biologists to analyze and organize the sequencing data. This is a major challenge but I have a lot of optimism because there is plenty of innovation and energy in this field,” said Klaus-Peter Koepfli, one of the principle investigators of the Genome 10K project and visiting scientist at the Smithsonian Conservation Biology Institute in Washington, D.C. “There are many obstacles to reconstructing the phylogeny of all living things, but it’s a great goal.”