OpenAI Reveals Reinforcement Learning Secrets, Intense Competition Among Chinese AI Companies

OpenAI Releases Research on Inference Models
On February 12, OpenAI revealed the secrets behind its O-series reinforcement learning, influenced by Chinese AI companies. OpenAI released a research paper titled "Competitive Programming with Large Reasoning Models," showcasing three inference models: o1, o1-ioi, and o3, and their performance in the IOI (International Olympiad in Informatics) and CodeForces (a globally recognized online programming competition). According to the paper, in IOI 2024, o3 achieved a gold medal with 395.64 points under strict rules, and its performance on CodeForces was comparable to that of top human competitors.
Breakthroughs by Chinese AI Companies
The paper also mentioned that Chinese companies DeepSeek-R1 and Kimi k1.5, through independent research, showed that using the Chain-of-Thought (COT) method significantly enhances the performance of models in mathematical problem-solving and programming challenges. R1 and k1.5 are new inference models released simultaneously by DeepSeek and Kimi on January 20. These model releases mark a significant breakthrough for Chinese AI companies in the global competition.
Performance Improvement through Reinforcement Learning
The paper compared the performance of large language models trained with reinforcement learning (RL) on complex coding and reasoning tasks. It found that adding reinforcement learning training significantly improved the performance of models, bringing them closer to world-class human competitors. These models will unlock new experiences in AI applications in science, coding, and mathematics.
Future Outlook
The competition between OpenAI and Chinese AI companies in the fields of inference models and reinforcement learning is accelerating the development of AI technology. As these technologies continue to advance, we can expect more applications of AI in scientific research, programming competitions, and mathematical problem-solving, leading to more efficient and intelligent solutions. This AI competition has not only brought innovation to the industry but also provided global users with more choices and possibilities.
Conclusion
OpenAI's public release of the O-series reinforcement learning secrets demonstrates its excellence in competitive programming and highlights the rapid rise of Chinese AI companies in this field. By employing reinforcement learning and Chain-of-Thought methods, model performance has significantly improved, paving the way for new applications of AI in science, coding, and mathematics. As technology continues to progress, AI will demonstrate its powerful potential and application value in more fields.