A group of international researchers have gained further insight into genetic variants responsible for human diseases by analyzing primate DNA data with a novel AI algorithm.
First, the scientists sequenced over 800 individual samples from 233 species of non-human primates from all 16 families, from lemurs to gorillas. To interpret the data, they developed a new algorithm: PrimateAI-3D.
PrimateAI-3D is based on deep learning language architectures similar to ChatGPT, but designed for modeling genomic rather than language sequences. The team used natural selection to train its parameters by presenting mutations that are ruled out for disease in our primate relatives. In this way, the algorithm learned to recognize benign genetic variants and, by elimination, mutations likely to cause disease.
The scientists then used PrimateAI-3D to identify potentially harmful mutations in humans, using health records and gene variant data from over 400 people who had donated samples to the UK Biobank project. They found that the algorithm showed “impressive improvements” in predicting humans’ increased genetic risk for common diseases.
The method’s claimed ability to identify pathogenic mutations more accurately than existing techniques is also related to the fact that it can overcome genetic bias due to white European ancestry.
“Even though we are 8 billion people, our genetic diversity still resembles the original population of 10,000 common ancestors from which we all descend,” said Kyle Farh, study co-author and vice president of artificial intelligence at collaborative company Illumina.
“There just isn’t enough information about the human species. “A few years ago it became clear that the data contained in the sequencing of the human genome is insufficient to truly understand the human genome,” he added.
Combining human and non-human primate data is key to this, especially since living primates share more than 90% of our DNA. Illumina research has shown that a genetic variant is tolerated by natural selection in another primate and is therefore 99% unlikely to cause disease in humans.
The results of the study can be used to support health research, for example to help scientists prioritize variants most likely to pose a risk to humans. They can also help maintain the populations of other primates.
“I think we’re just getting started,” noted Farh. “You can learn an incredible amount here. And the idea that you can learn more about our own species from other species, I think is deeply romantic.”
The full study is published in the journal Science.