Deep learning enables high success rates of protein sequences designed from scratch

Professor Liu Haiyan and Associate Professor Chen Quan of the Faculty of Life Sciences and Medicine of the University of Science and Technology of China collaborated with the team of Professor Li Houqiang of the School of Information Science and Technology to develop an algorithm based on deep learning to design amino acid sequences from scratch for a given backbone structure, ABAKUS-R. After experimental verification, the design success rate and design accuracy of ABACUS-R exceed those of the original statistical energy model ABACUS. The research results were published in Nature-Computational Science on July 21, Beijing time.


A target structure from a natural protein (sky blue) and the crystal structure (green) of the corresponding de novo design protein are superimposed figures provided by the research group

The team of Liu Haiyan and Chen Quan has long been committed to the development of data-driven protein design methods, and has established and experimentally verified the ABACUS model of the statistical energy function design amino acid sequence for a given backbone structure, and the SCUBA model that uses the neural network energy function to design the main chain structure from scratch. However, the ABACUS model based on traditional statistical energy technology still lacks in terms of success rate and computational efficiency.

Recent studies have shown that the use of deep learning for amino acid sequence design can surpass the energy function method in terms of calculation indicators such as the recovery rate of natural amino acid residue types. However, in the work that has been officially published so far, the experimental verification results of the relevant methods are far from reaching the success rate of the energy function method.

Liu Haiyan introduced that the method of sequence design using ABACUS-R consists of two parts.

The first part is a multi-task pre-trained code-decoder network for cryptic spatial encoding of the structure and chemical environment of individual amino acids, which are decoded into multiple real-world features including central residue amino acid types; The second part is to iterate over the codec network to each amino acid residue of the target backbone until the full sequence of maximum self-consistency is obtained.

Based on theoretical verification, the team tried to experimentally characterize 57 sequences redesigned by ABACUS-R for 3 natural backbone structures, of which 86% of the sequences (49) were solublely expressed and folded into stable monomers. The five high-resolution crystal structures analyzed by the experiment are highly consistent with the target structure. In addition, similar to the previously reported de novo design protein, ABACUS-R de novo design protein exhibits ultra-high thermal stability, and the defolding temperature can mostly reach more than 100 °C.

Overall, the higher success rate and structural accuracy of ABACUS-R sequence design further enhances the practicality of the data-driven protein de edio design approach compared to the ABACUS model. ABACUS-R also provides a pre-trained representation of protein local structure information that can be used for tasks other than sequence design.

The reviewers argue that the study’s “most novel contribution lies in the adequate experimental characterization of the design, including crystal structure, and the high success rate of soluble expression.” (Source: China Science Daily Wang Min)

Related paper information:

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button