AIDO.RAGPLM and AIDO.RAGFold: Accelerating Protein Structure Prediction with Retrieval-Augmented Models
Type | research |
---|---|
Area | AIDigital Twin |
Published(YearMonth) | 2412 |
Source | https://www.biorxiv.org/content/10.1101/2024.12.02.626519v1 |
Tag | newsletter |
Checkbox | |
Date(of entry) |
Building on the success of AlphaFold2, researchers introduce AIDO.RAGPLM and AIDO.RAGFold, two retrieval-augmented models that advance protein structure prediction in the AI-driven Digital Organism framework. AIDO.RAGPLM combines pre-trained protein language models with retrieved multiple sequence alignments (MSA), effectively leveraging co-evolutionary data while addressing MSA scarcity. This model outperforms single-sequence language models in perplexity, contact prediction, and fitness prediction tasks. AIDO.RAGFold, built on AIDO.RAGPLM, achieves AlphaFold2-level accuracy with sufficient MSA and is up to eight times faster. When MSA data is limited, it significantly surpasses AlphaFold2, with substantial TM-score improvements. Additionally, an MSA retriever was developed, enhancing MSA search speed by 45–90 times and expanding the training dataset by 32%. Together, these models deliver efficient, scalable, and accurate solutions for protein structure prediction, enabling advancements in structural biology. Open-source models and code are available for the research community.