Fudan MOSS team: research results will be open source!

The ChatGPT model developed by OpenAI has ignited a new round of AI revolution around the world, and attention at home and abroad has increased sharply. Not long ago, Qiu Xipeng’s team from the School of Computer Science and Technology of Fudan University released a ChatGPT-like model MOSS, and the news quickly appeared on Weibo.

MOSS “fired”. On February 20, on the day of its release, it received a large number of closed beta applications, interviews, investments, and cooperation invitations – and the public was enthusiastic about this new technology, which was originally limited to the NLP (natural language processing) academic circle. Qiu Xipeng’s team was surprised by this, but soon returned to calm.

The most exciting day for everyone is the 28th day of the lunar month before the Spring Festival. Sun Tianxiang, the main developer of the project and a doctoral student in the School of Computer Science and Technology, entered a Chinese question during the test, but MOSS answered correctly in English, “like a person who can’t speak but understands Chinese.” At that time, the version of MOSS was still very rudimentary, and the Chinese corpus accounted for less than 0.1% of all training data.


“It’s amazing, we didn’t teach it machine translation.” The potential shown by MOSS made Qiu Xipeng so excited that night that he couldn’t sleep. He compares MOSS to a “smart kid” who, even if not yet good at writing poetry, solving problems, or many specific things, has shown the potential to become a framework for general artificial intelligence (AGI). In fact, Qiu Xipeng also let his 6-year-old daughter chat with MOSS, and found that the child can happily talk to MOSS for a long time.

Behind the attention of MOSS is the accumulation of researchers for ten years. As an expert in artificial intelligence research, Qiu Xipeng has been involved in machine learning since his doctoral studies, and entered the field of natural language processing research after working in school. He and his team have formed many innovative research results on the basic models and algorithms of natural language processing. Qiu Xipeng’s book “Neural Network and Deep Learning” is affectionately known as the “Dandelion Book” by the majority of readers, and is on the list of many “must-read books for artificial intelligence”. Last year, he also led the team to win the first prize of the “Qian Weichang Chinese Information Processing Science and Technology Award” of the China Society of Chinese Information.

These days, Qiu Xipeng and his MOSS team—eight young Fudan students—continue to work intensively on closed beta and iterative work. The optimization of the new model is expected to be completed by the end of March, and then gradually opened to the public in the later stage.

Group photo of MOSS team (sixth from left is Qiu Xipeng)

Some users who have participated in the internal test said that although MOSS is an order of magnitude smaller than ChatGPT in terms of parameter scale, the coverage of factual problems is not comprehensive enough, and it often “seriously talks nonsense”, but it does have the “smell of ChatGPT” and “basic functions have been realized”.

Qiu Xipeng is optimistic that in the near future, large-scale language models such as MOSS will become as routine as search engines, providing benefits to all aspects of people’s lives.

Open source (i.e. open source and model parameters) is the academic insistence of Qiu Xipeng and colleagues in Fudan Natural Language Processing Lab. “This time, we will also open up the research results to the public and society.” He said.

【In-depth dialogue with MOSS team】

“We want to show that ChatGPT-like models can be made with limited resources”

Q1: Can you briefly introduce MOSS? What is the difference between this “large-scale conversational language model” and the chatbots we use every day, such as Siri, Xiaodu, and Xiaoai?

Qiu Xipeng: I can use an analogy, the relationship between the two is like a smartphone and a feature phone. The previous chat system is still weak artificial intelligence, and they are designed to chat, just like traditional feature phones can only be used to make calls; Today’s large language models, like ChatGPT, MOSS, can do a lot of things, chat is just one of the features, just like a smartphone can be used to make calls, but it can do much more than that.

In the case of ChatGPT and MOSS, they have a universal ability that can help humans accomplish a variety of things, but in the form of conversations. It can accomplish most tasks in the field of natural language processing, including machine translation, information extraction, error correction, and more. They can also interact with the outside world and create after learning to use external tools. These are all things that existing chatbots don’t have. It should be said that this conversational large-scale language model shows us a new path to “general artificial intelligence”.

Q2: The team released the MOSS model on February 20, did it just be built? How long did it take before and after?

Qiu Xipeng: Actually, we developed the first generation model before the Spring Festival. It shows a lot of potential, very different from previous chat systems, with good human intent understanding ability, and many emerging capabilities, such as learning machine translation without training. After that, we spent more than a month polishing its engineering deployment, such as improving efficiency and optimizing the interface.

For the public, the emergence of models such as ChatGPT and MOSS may be very sudden; But for those who have been following this field, everything is traceable. For example, Google’s technology research and development in this field is no less than OpenAI, but OpenAI does this thing to the extreme, and proposes a very innovative form of interaction such as “dialogue”, which gives large language models the ability to interact directly with humans, making large language models look very intelligent.

The development of MOSS is also not achieved overnight, it is inseparable from our team’s past foreshadowing work and long-term accumulated research experience. Since 2021, we have started to make Chinese generative pre-trained models, which are also open source for others to download, with an average of tens of thousands of downloads per month. Later, we put forward the concept of “language model as a service”, believing that the basic language model will become the foundation of language services. In 22 years, because I realized that large language models would become the foundation of the future, I began to train large language models. Later, he spent another half a year researching how to make large language models understand human instructions and have the ability to talk.


MOSS dialogue presentation


MOSS dialogue presentation

Q3: How can MOSS achieve “end-to-end” access to large language models, and what difficulties have been overcome?

Qiu Xipeng: “End-to-end” is an academic concept, which refers to starting from scratch, information collection, data processing, modeling and finally forming a large model with the ability to talk to humans, and all technical paths in the middle can be walked, which is called “end-to-end” from the starting point to the end. Because OpenAI has not yet announced the technical route and technical details of the development of ChatGPT, we need to rely on limited public information to explore ourselves.

This process is very difficult, including a lot of empirical and intuitive design, the key is to open up two steps: the first is the base, the base of the large language model is not simply the parameters are large enough, but also need to give the large language model a variety of knowledge ability, learning ability, and logical reasoning ability. The second is to trigger its dialogue ability through some instructions, so that it can understand human intentions and interact with humans.

So far, we have been able to control the technical route, but we may face greater difficulties in the future, because we have collected a lot of instructions for human interaction, and to give it values and various capabilities, we need to hire some professionals to help us design and further enhance the capabilities of all aspects of MOSS.

Q4: What is the composition of your team?

Qiu Xipeng: Our Natural Language Processing Laboratory is part of the Shanghai Key Laboratory of Intelligent Information Processing, with nearly 100 teachers and students in the team, and has been actively implementing the organized scientific research model promoted by schools and colleges, focusing on international cutting-edge technology research in natural language processing and artificial intelligence. There are more than 30 students related to the research of large-scale language model pedestals.

“The hot search on the unexpected, the name MOSS is a tribute to “The Wandering Earth 2″”

Q1: The MOSS model has attracted much attention since its release, did you expect this?

Qiu Xipeng: The hot search on the Internet was completely beyond our expectations. With such a high degree of attention, I think it may be that everyone is more excited about the domestic team to make a ChatGPT-like model, and the reason for the excitement is that there are many voices before, saying that the gap between our technical level and foreign countries is very large, and it will take a long time to catch up. But our efforts proved that it didn’t take that long.

Q2: Many companies at home and abroad are developing ChatGPT-like models and investing a lot. As a university academic research team, what is your original intention for developing the MOSS model?

Qiu Xipeng: We developed the MOSS model to explore and verify the technical route of ChatGPT on tens of billions of parameters, and prove that we are not lagging behind foreign countries in technical implementation. In addition, I also want to prove that this technology is not necessarily monopolized by large companies, and that an academic research laboratory like ours can also make ChatGPT-like models with relatively limited resources.

Of course, we develop MOSS models beyond models that are similar to ChatGPT’s capabilities. MOSS is positioned as the most cutting-edge exploration in the field of natural language processing and even in the field of general artificial intelligence. Perhaps the industry pays more attention to its landing performance, and we pay more attention to its next-generation development, that is, how to achieve general artificial intelligence. From an academic point of view, only if we see further and more ahead of OpenAI can we finally achieve transcendence. We firmly believe that academia has a lot to do in this area.

Q3: How did the name MOSS come about?

Qiu Xipeng: In academic circles, people like to name their own AI models with film and television images, which is also a relatively common practice. For example, the Transformer model and Megatron model are named after “Transformers”, and the BERT model and ERNIE model use the character image from “Sesame Street”. Then, after we developed this large-scale language model with dialogue capabilities, we also wanted to find a domestic film and television image that could represent Chinese characteristics to name.

During the development process, when the movie “The Wandering Earth 2” was released, our team members all liked “The Wandering Earth 2” very much, and they were all fans of “The Wandering Earth 2”. The intelligent quantum computer MOSS in the film impressed us deeply, so we named the model MOSS, which is also a tribute to the movie “The Wandering Earth 2”. In the past few days, fans of “The Wandering Earth 2” have also emailed us, hoping that we must cheer up and really be able to make it.

Q4: How is the internal test progressing and what problems have been found? How many users can the server support to be online at the same time? What is the reason why MOSS cannot be used on the night of February 20?

Sun Tianxiang: The registration for the internal test was very enthusiastic, and we received a large number of applications on the first day。 We also found a lot of problems, and the feedback was also polarized. For the current MOSS, there are definitely many good cases to pick out, and bad cases are also grabbed a lot (especially Chinese), that is, “high ceiling, low limit”. We hope to control the lower limit in the next version.

On the evening of February 20, after the hot search on Weibo on MOSS, the instantaneous access to our server reached tens of millions. As an academic research lab, our server resources are relatively limited, which causes network congestion. For example, it is like we make a dish and wait for everyone to come to eat in a house, but because there are too many people coming, most people block in the yard without entering the house. But later, we randomly selected nearly 1,000 registered users to send the beta invitation code, and the server ran without any pressure. At present, the maximum capacity of MOSS servers is around tens of thousands.

“Compared with the knowledge of facts, the logical thinking ability of the model deserves more attention”

Q1: What are the main differences between MOSS and ChatGPT?

Qiu Xipeng: The biggest difference is the scale of the parameters. ChatGPT has as many as 175 billion parameters, while MOSS has an order of magnitude smaller, about 1/10 of the former. We chose the parameter of the scale of tens of billions because academia mainly does some exploratory technologies, and this scale is also within the scope of financial and material resources. We think that at this parameter level, these models can also emerge with a certain intelligence ability, and we can also give them the ability to talk. The experimental results confirm our hypothesis that the MOSS model can chat and interact with humans very smoothly.

Another big difference is the ability to iterate. The larger the number of users and the more interactive data, the more iterative the model. As the leader of a new round of artificial intelligence race, ChatGPT is far ahead of collecting a large number of user interaction data, and OpenAI can turn this model and data flywheel on the basis of this data. That’s why ChatGPT’s writing skills are much better now than when it first launched.

The difference between the above amount of parameters and the amount of interactive data further leads to the gap between MOSS and ChatGPT in the knowledge reserve of facts, which is manifested in the fact that MOSS is more prone to errors when answering fact-based questions. Generally speaking, the capabilities to be mastered by the model can be divided into two types: one is the fact class, such as “where is Shanghai” and “how much population is Shanghai”; The other is the logical class, such as the ability to understand human intentions and the ability to understand human instructions. Fact knowledge follows the rule of two eight, 80% are long-tail knowledge, do not know is not know, but does not mean that the model will not learn. It is easy for us to make the model more knowledgeable by expanding the knowledge base, expanding the learning corpus and parameter scale of the model. Therefore, I think that when paying attention to models such as ChatGPT and MOSS, we should pay more attention to its ability to understand, learn and think, rather than its factual knowledge reserve. If you look at the logic class ability, I think the performance of MOSS is actually quite good.

Q2: Where is the technological innovation point of MOSS?

Qiu Xipeng: MOSS is characterized by its small scale and is relatively easy to adapt to personalized models. From a domestic point of view, most enterprises have a very large demand to access AI services, but if you directly migrate a model as large as ChatGPT, enterprises cannot use it. MOSS, which is a scale of tens of billions, is very suitable, can be privately deployed within the enterprise, and can be transformed into productivity after some data fine-tuning.

So we think that on the MOSS level model, we can give more specialized capabilities, such as access to external knowledge bases, and the ability to add search or domain-specific tools. Just like humans, many abilities are not enough, but we can use tools to amplify our abilities, such as insufficient memory, by looking up dictionaries and using search engines. The same is true for MOSS. If it’s not that knowledgeable, we’ll have to think of other ways to make it better at using a variety of tools to empower industries. This is probably the main difference between us and ChatGPT in the future.

Q3: The MOSS model is currently a pure English interface, and the Chinese level is obviously not as good as the English level, what is the reason?

Sun Tianxiang: Our primary goal in developing the MOSS model is to verify the technical roadmap. Currently, there are many publicly available high-quality English datasets in the open source community, while there are fewer high-quality Chinese datasets. This is because Chinese there are more disturbing information on the web page, such as advertisements, and it is more difficult to clean the corpus. In order to verify the technical roadmap first, we have launched this version of the English interface.

Qiu Xipeng: There are indeed shortcomings in the Chinese level of MOSS, and we have begun to construct some high-quality Chinese data ourselves. Although it will take some time, we believe that MOSS will have good Chinese understanding and generation ability in the future. Our goal is also to create a Chinese large-scale language model with Chinese characteristics.

Q4: Artificial intelligence is a “double-edged sword”. How do you train MOSS on ethical, values-based directives?

Qiu Xipeng: This is also something we want to further strengthen. Once you think of your model as an agent, it’s not just about the correctness or accuracy of its answers, but also about other things, such as making sure it’s at least not harmful to humans. However, human ethics and values are very diverse, and next, not only we do technology, but also relevant people engaged in legal and ethical research need to participate in building large-scale language models. In this regard, we can give full play to the advantages of Fudan interdisciplinary and comprehensive universities.

“I am optimistic, general artificial intelligence from science fiction to reality, should not be far away”

Q1: What do you think is the necessity and value of the construction of the Chinese version of ChatGPT model?

Qiu Xipeng: First of all, from the general direction, like the ChatGPT model, it is not open to Chinese mainland. If China wants to be at the forefront of technologies such as large-scale language models or, in the future, general artificial intelligence, it must build its own language model base. Second, foreign developers are unlikely to develop their models based on Chinese, and their development focus is still in English. Then, if we want to develop a large language model base for domestic information processing, especially Chinese information processing, we must build a large language model with very strong Chinese ability.

Q2: What aspects will be included in MOSS optimization? What are the team’s near-term goals and ultimate expectations for the MOSS model?

Qiu Xipeng: In the future, MOSS optimization will focus on three aspects. First, we will prepare higher quality Chinese data; Second, we will open the interface for MOSS to have conversations with humans and collect more dialogue data; Third, we will further increase investment and expand its parameter scale, and if the parameter scale of MOSS can rise to 50 billion or 100 billion, its capabilities will be greatly improved.

The short-term goal is to hope that MOSS will become a relatively advanced conversational language model in China. We hope to insist on being a research institution that is not driven by profit, share research results with academia free of charge, and open research results to the industry under the premise of legal compliance, so that they can do customized or application in special fields. If the next step goes well, we will open source around the end of March.

In the long run, we look forward to using MOSS as a pedestal to general artificial intelligence, turning it into a real existence like a sci-fi character. I am optimistic, I think that general artificial intelligence from science fiction to reality, should not be very far, maybe 5 to 10 years. At that time, we will accept general artificial intelligence as we do now.

(Original topic: research results will be open source!) Fudan MOSS team in-depth interview is coming)
Special statement: This article is reproduced only for the need to disseminate information, and does not mean to represent the views of this website or confirm the authenticity of its content; If other media, websites or individuals reprint and use from this website, they must retain the “source” indicated on this website and bear their own legal responsibilities such as copyright; If the author does not wish to be reprinted or contact the reprint fee, please contact us.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button