An interdisciplinary research team from Leipzig University and the Saxon AI center ScaDS.AI have developed a new approach that combines methods of artificial intelligence (AI) and biophysical modeling. This new approach can be used to develop new active substances such as antibodies and vaccines, for example for pandemic prevention. The research project in collaboration with Vanderbilt University, Nashville/USA, is the result of intensive preliminary work on computer-aided drug development. It aims to combine the strengths of both disciplines, AI and biophysics, in order to overcome the challenges of protein design more effectively.
In the current research landscape in the field of computer-aided protein design, there is a veritable gold-rush atmosphere in which many new methods are published without experimental validation. This often leads to incorrect assessments of the performance of AI models. "We urgently need standards for the description and availability of such models," says Clara Schoeder, research group leader at the Institute of Drug Discovery. "Our research work makes an important contribution to this goal." The current study results show that AI methods are particularly good at suggesting sequences that do not interfere with the folding of proteins. However, they have difficulties in precisely assessing the effects of individual amino acid changes on folding. "Our findings make it clear that no AI model or biophysical method is ideally suited for all design problems," explains Humboldt Professor Dr. Jens Meiler, one of the project’s lead scientists. In future, we will have to carefully consider which model is used for which purpose. Our work is a first step towards greater comparability between the different methods," explains Meiler, Director of the Institute for Drug Development.
The biophysical software suite Rosetta, which has been used in protein research for many years, serves as a framework for the integration of various AI methods. Rosetta is supported by over 100 laboratories worldwide and enables researchers to efficiently combine different approaches - such as large language models (e.g. ESM-2) and the ProteinMPNN model together with biophysical methods. This combination allows researchers to compare and analyze the different behaviors of the design approaches. "With this development, we can quickly and easily combine AI models with classical methods and use them side by side," explains Jens Meiler. "This simplifies our work considerably and allows us to make optimal use of the entire infrastructure that has been developed in Rosetta over the last 20 years."
However, the research project is not yet complete. The working groups of Meiler and Schoeder will continue to refine the algorithms developed and evaluate them experimentally, particularly with regard to vaccine design for pandemic prevention. "We are investigating which methods reliably suggest amino acid changes that can result in vaccine candidates," says research group leader Clara Schoeder. Despite the progress made through the use of AI, the so-called "scoring" problem remains a challenge. This refers to the difficulty of predicting the effect of a single amino acid exchange. In collaboration with the Center for Scalable Data Analysis and Artificial Intelligence ScaDS.AI, the research team is optimistic that the combination of AI and biophysical methods will not only increase efficiency in protein design.
Original publication in Science Advances:
Self-supervised machine learning methods for protein design improve sampling, but not the identification of high-fitness variants, DOI 10.1126/sciadv.adr7338
ScaDS.AI is a research center for data science, artificial intelligence and big data with locations in Leipzig and Dresden. As one of five new AI centers in Germany, ScaDS.AI has been funded since 2019 as part of the federal government’s AI strategy and is supported by the Federal Ministry of Education and Research (BMBF) and the Free State of Saxony.


