Almost 15 years after scientists first sequenced the human genome, making sense of the enormous amount of data that encodes human life remains a formidable challenge. But it is also precisely the sort of problem that machine learning excels at. On Monday, Google released a tool called DeepVariant that uses the latest AI techniques to build a more accurate picture of a person’s genome from sequencing data. DeepVariant helps turn high-throughput sequencing readouts into a picture of a full genome. It automatically identifies small insertion and deletion mutations and single-base-pair mutations in sequencing data.
A number of tools exist for interpreting these readouts, including GATK, VarDict, and FreeBayes. However, these software programs typically use simpler statistical and machine-learning approaches to identifying mutations by attempting to rule out read errors. “One of the challenges is in difficult parts of the genome, where each of the [tools] has strengths and weaknesses,” says Brad Chapman, a research scientist at Harvard’s School of Public Health who helped develop DeepVariant. “These difficult regions are increasingly important for clinical sequencing, and it’s important to have multiple methods.”