Progress has been made in the storage space and computational efficiency issues of large-scale functional data analysis

In the era of big data, with the rapid development of the Internet, cloud storage and other technologies, the scale of data encountered in actual analysis and processing is getting bigger and bigger. Although large-scale functional data can bring us massive information, it has higher demand for computing resources and requires longer computing time, which also greatly increases the computing cost and affects the timeliness and operability of data analysis. Therefore, how to solve the problems of storage space and computing efficiency encountered in large-scale functional data analysis is an important problem in functional data analysis in the era of big data.

Recently, in response to the above problems, Dr. Liu Hua, a young teacher from the School of Economics and Finance of Xi’an Jiaotong University, Dr. You Jinhong, a professor at the School of Statistics and Management of Shanghai University of Finance and Economics, and Dr. Jiguo Cao, a professor at Simon Fraser University in Canada, conducted in-depth research. They applied the idea of subsampling to functional data analysis for the first time, and developed the Functional L-Optimality Subsampling (FLoS), an optimal sampling method adapted to functional generalized regression models, to achieve the goal of reducing calculation time and overcoming problems such as insufficient memory. In addition, the authors illustrate the accuracy and effectiveness of this sampling method through theory and a series of numerical simulations.

The researchers used the proposed optimal sampling method FLoS to analyze the case of organ transplant data, which collected the information of hundreds of thousands of kidney transplant recipients at the time of organ transplantation and recorded the information of these transplant recipients at each follow-up visit after surgery, so it is a very large data set containing functional data. They wanted to use the recipient’s postoperative glomerular filtration rate curve to determine whether the transplant was successful and to estimate their approximate life expectancy after surgery. Through analysis and comparison, they found that the sampling estimate obtained based on the optimal subsample extracted by the FLoS method was almost exactly the same as the estimate under the whole sample, which further verified the accuracy and effectiveness of the optimal sampling method.

The research results were published in JMLR.

Recently, the above research results were published in the Journal of Machine Learning Research (JMLR), a top international journal in the field of machine learning and artificial intelligence, entitled “Optimal Sampling Method FLoS under Large-scale Functional Generalized Regression Model”. Liu Hua is the first author, and the School of Economics and Finance of Xi’an Jiaotong University is the first signatory. Published by MIT Press and backed by MIT’s Computer Science and Artificial Intelligence Laboratory, JMLR is internationally recognized as one of the top journals in the field of computing. (Source: Yan Tao, China Science News)

Related paper information:

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button