Omics big data “understands” the circle of friends of functional genes in seconds

Corn experimental field. Photo courtesy of interviewee

Classical genetics cloned and resolved a number of important functional genes. However, more than two decades after the introduction of functional genomes, the cloned functional genes in rice and maize still account for less than 10% of all their genes, and new functions of cloned genes continue to be discovered.

How to quickly clone functional genes, analyze the molecular mechanism of important trait variation, and globally decode the mystery of genetic variation of important crops still faces great challenges.

On December 30, 2022, Beijing time, Nature Genetics published online the research papers of Li Lin’s research group, the National Key Laboratory of Crop Genetic Improvement of Huazhong Agricultural University and the research group of Professor Li Fang and Yan Jianbing of Hubei Hongshan Laboratory. In this study, the first generation of multiomics integration network maps of maize was constructed, involving 2 million network relationships at multiple genetic levels of genome, transcriptome, translation ome and protein interaction group, and a number of important functional genes were successfully predicted by machine learning methods to identify molecular regulatory pathways that regulate important traits such as flowering period of maize.

Professor Tian Feng of China Agricultural University and others published a hot review in the Journal of Botany on the same day that the construction of maize multidimensional omics integration network is a major progress in maize functional genomics research, which not only provides new tools for maize important traits new gene cloning, molecular regulation pathway analysis and maize genome evolutionary analysis, but also provides important genetic resources and molecular modules for maize genome design and breeding, laying an important foundation for maize intelligent breeding.

Clone a gene in 5 years: Functional gene resolution progresses slowly

In 2008, Zhang Qiqi, academician of the Chinese Academy of Sciences and professor of Huazhong Agricultural University, published the Rice Functional Genome Initiative (Rice 2020) in the international journal Molecular Plant, which plans to analyze the function of all genes in rice by 2020.

At that time, the functional genome analysis of a variety of crops was in full bloom, and more and more researchers were invested in this huge and arduous research.

Li Lin in the field experiment. Photo courtesy of interviewee

Li Lin’s main work during his doctoral studies was to clone and analyze a corn grain oil main functional gene. He has done very well-established molecular and genetic experiments, as well as evaluations of molecular breeding applications.

However, in 2010, when he submitted the gene that had taken him five years to finely map and clone to an academic journal, he was full of confidence and received the opinion from the reviewers: the gene had been cloned and studied by others.

“I was very frustrated and thought it would be too inefficient to spend 5 years cloning a gene.” As a result, Li Lin began to think about whether he could quickly and globally analyze gene function.

At the same time, the comprehensive elucidating of the functional genome of crops has not advanced as quickly as expected.

In 2013, when Li Lin was a postdoctoral fellow in the United States, he had the idea of systematically analyzing the function of each gene through biological big data methods.

“At that time, I was doing eQTL localization and co-expression networks to analyze the global gene regulatory network of maize.” At the end of 2013, Li Lin published the earliest paper on eQTL research on the regulatory relationship of maize through population RNA-seq in PLoS Genetics.

When he wanted to further study and construct genes and gene regulatory networks at various scales and mesoscale levels from the perspective of multidimensional omics, so as to comprehensively solve the mystery of biological inheritance, he was rejected when discussing with his co-supervisor. “You may feel that the workload is too great and think that it is an impossible task.” Li Lin recalled.

However, this idea took root in Li Lin’s heart. In 2016, Li Lin returned to China and became a teacher at Huazhong Agricultural University. Yan Jianbing asked him to organize everyone to discuss major topics in the future. At this time, Yang Fang’s team has developed a high-throughput yeast two-hybrid system and begun to analyze the network structure of crop proteomics.

“I put forward my idea to discuss with everyone, and finally hit it off with Teacher Yang Fang. Thanks to the rapid development of scientific research strength in China and the platform support of Huazhong Agricultural University, the conditions are ripe, and the three teams jointly promote the development of this grand project. Li Lin said that they officially kicked off the prelude to building a maize multidimensional network map at the levels of maize genome, transcriptome, translation ome, and proteomics.

At that time, less than 10% of the function of rice genes could be resolved, and Rice 2020 still has a long way to go.

Forge the sword of big data in biological networks

Biological seed industry is the foundation and core of agriculture, and biological breeding is the key technology of biological seed industry. Yan Jianbing told China Science News that biological breeding has experienced the 1.0, 2.0, and 3.0 eras, and is making great strides towards BT+IT-driven intelligent breeding in the 4.0 era.

“No matter what stage of biological breeding is inseparable, it is inseparable from the cloning of functional genes that control biological genetic variation and the analysis of molecular mechanisms.” Yan Jianbing said that classical genetics and molecular biology methods carry out locating, cloning and molecular interaction experiments on a single important site of an important trait, so as to clarify the upstream regulatory genes, molecular chaperones, and downstream target sites of important target genes, and then build a regulatory network of functional genes, and finally analyze a molecular mechanism for gene control of important trait variation.

However, the analysis of functional genes of crops, represented by rice and maize, has progressed slowly. The molecular mechanism of rapid cloning of functional genes and resolving important trait variation is an important constraint to moving towards the era of intelligent breeding 4.0.

Biological research has entered the era of big data. “Based on biological big data, it has become possible to build upstream and downstream and chaperone networks of all genes from a global level, providing us with an unprecedented opportunity to solve as many gene functions as possible on a global scale, and then comprehensively solve the mystery of biological genetic variation.” Yan Jianbing said.

“No matter what kind of gene is studied, it is ultimately necessary to establish a molecular network model of this gene. So, why not analyze the upstream and downstream and molecular chaperones of all genes at once, so that you can understand the function of as many genes as possible globally? Li Lin explained that living organisms have tens of thousands of genes, and to determine the function of these genes, it is actually necessary to determine the regulatory relationship between these genes and genes.

The genes inside living organisms are very similar to those in human society. To determine a person’s function or role in human society, it can be through his family kinship, circle of friends, and work circle relationship. Similarly, to understand the function of a gene is to understand its relationship to other genes at different genetic levels. Even, based on the logic of “things are clustered by analogy, people are grouped”, the function of any gene can be inferred.

Based on this, they identified gene-gene relationships within different levels of gene function and transmission of genetic information (genome, transcriptome, translation ome, protein interaction ome, etc.).

Team members pollinate experimental corn. Photo courtesy of interviewee

In this study, multidimensional omics big data assays were performed on samples from different tissues/periods of the reference inbred line B73 in the whole growth stage, and mRNA-Seq data of 31 different tissues or developmental periods, circRNA-Seq, sRNA-Seq data and Ribo-Seq data of 21 tissues or developmental periods were obtained.

Yang Fang said that they used a high-throughput yeast system to build a corn protein interaction network, obtaining more than 360,000 protein-protein interaction pairs, and 56,243 high-confidence interactions. The existing genome-level ChIA-PET network was integrated with the transcriptome-level co-expression network, translationome-level co-translation network and protein interaction network generated by this study to construct the first-generation multiomics integration network map of maize, involving 2 million interactions.

“This is the first time that a network big data map of genome, transcriptome, translation ome and proteome has been constructed so comprehensively in a species, which is like forging the sword of biological network big data, providing a basis for a comprehensive and systematic analysis of the genetic variation mechanism of maize.” Li Lin said.

Revolutionizing the paradigm of classical genetics research

Based on the successfully constructed maize multidimensional network big data map, this study explores the functional differentiation of duplicate genes in the network at the genome-wide level, revealing that the two ancient subgenomes of maize exhibit progressive functional differentiation from transcriptome to protein interaction group.

They also reconstructed the molecular network of functional genes for plant type that had been cloned in maize and functional genes related to grain development. Tian Feng et al. pointed out that up to now, a total of 63 genes regulating grain development have been cloned in maize, of which 62 genes are located in the integration map released by the team. They successfully predicted and confirmed that a PPR protein with an unknown function could affect corn grain shrinkage. The above results fully prove that the integrated network map has a strong ability to predict gene function.

In addition, they focused on the important agronomic trait of maize, the flowering period. In order to ensure the accuracy of big data prediction, they worked closely with the team of Professor Chen Hong of the College of Science of Huazhong Agricultural University to continuously develop cutting-edge artificial intelligence algorithms for rapid trait analysis, providing new means for systematic analysis of gene function and the genetic mechanism of trait variation, and to a certain extent, revolutionizing the paradigm of classical genetic research.

According to the reviewers, the study describes a vast experimental effort to identify complex molecular relationships between macromolecules encoded by the maize genome. In particular, the mining application of machine learning to network big data in this paper is an innovative way to interpret the molecular network map of functional genes.

By mining the big data of the first-generation maize integrated multiomics network, they predicted 2651 candidate flowering stage genes and divided them into 8 sub-network pathways according to whether they controlled the same trait.

In order to verify the accuracy of the prediction results, since 2020, they have carried out molecular experiments and field experiments on the prediction results in Hainan, Hubei, Shandong and Hebei, identified 20 predicted genes related to flowering stage traits, and preliminarily explained their molecular mechanisms.

In addition to the pathways that have been reported before, they also identified a new molecular network pathway in maize, deepened their understanding of maize flowering period, and provided a theoretical basis and genetic resources for intelligent design breeding in maize flowering period.

Tian Feng et al. pointed out that multidimensional omics big data such as genome, phenotype, transcriptome, proteome and metabolome is the basis for using artificial intelligence technologies such as machine learning to accurately mine key genes and molecular modules for genome intelligent design and breeding.

The reviewers believe that this study supports artificial intelligence prediction of gene function based on network big data by means of functional verification. The comprehensive molecular network map of different tissues of maize at different growth stages was constructed, which is an important resource for the entire maize study.

Li Lin told China Science News that this paper has successfully analyzed the network relationship within maize multidimensional omics, and the next step is to understand the regulatory relationship between multidimensional omics. thisIn addition, this research paradigm can also be applied to the study of the functional genomes of other crops such as rice and wheat. (Source: Li Chen, China Science News)

Related paper information:

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button