A new tool for protein sequence generation and design
Researchers have developed a new technique that uses a protein language model for generating protein sequences with comparable properties to natural sequences. The method outperforms traditional models and offers promising potential for protein design. Designing new proteins with specific structure and function is a highly important goal of bioengineering, but the vast size of protein sequence space makes the search for new proteins difficult. However, a new study by the group of Anne-Florence Bitbol at EPFL's School of Life Sciences has found that a deep-learning neural network, MSA Transformer, could be a promising solution. Developed in 2021 , MSA Transformer works in a similar way to natural language processing, used by the now famous ChatGPT. The team, composed of Damiano Sgarbossa, Umberto Lupo, and Anne-Florence Bitbol, proposed and tested an "iterative method", which relies on the ability of the model to predict missing or masked parts of a sequence based on the surrounding context. The team found that through this approach, MSA Transformer can be used for generating new protein sequences from given protein "families" (groups of proteins with similar sequences), with similar properties to natural sequences.



