DeepSeek 登顶苹果应用商店下载榜,技术与成本优势成关键。

DeepSeek Successfully Reaches the Top
On January 27, DeepSeek’s app topped the free app download chart in the U.S. Apple App Store, surpassing ChatGPT. Meanwhile, DeepSeek also ranked first in the free app rankings in the Chinese Apple App Store. This achievement has attracted widespread attention.

Dual Advantages in Technology and Cost
Academician of the Chinese Academy of Engineering and Professor of the Department of Computer Science at Tsinghua University, Zheng Weimin, along with many AI experts, communicated with Sina Tech and pointed out the key to DeepSeek's success. Currently, the industry’s love and praise for DeepSeek mainly focus on three aspects: Technological Breakthroughs
First, in terms of technology, DeepSeek’s underlying DeepSeek-V3 and the newly released DeepSeek-R1 models have achieved capabilities comparable to OpenAI’s GPT-4 and GPT-3 models. These two models have performed excellently in terms of performance and have received high recognition in the industry.

Cost Advantage
Secondly, the two models developed by DeepSeek are much cheaper, costing only about one-tenth of OpenAI’s GPT-4 and GPT-3 models. This cost advantage puts DeepSeek in a favorable position in market competition.

Open Source Strategy
Thirdly, DeepSeek has open-sourced the technology of these two models, allowing more AI teams to develop more AI-native applications based on the most advanced yet cost-effective models. This open-source strategy not only promotes the dissemination and application of technology but also enhances DeepSeek’s brand influence.

The Secret Behind Reduced Model Costs
Zheng Weimin pointed out that DeepSeek’s self-developed MLA architecture and DeepSeek MOE architecture have played a key role in reducing the training costs of its models. MLA primarily compresses the KV Cache size by modifying attention operators, allowing more KV Cache to be stored within the same capacity. This architecture, combined with modifications to the FFN layer in the DeepSeek-V3 model, facilitates a very large sparse MoE (Mixture of Experts) layer, which is the key reason for DeepSeek’s low training costs.

KV Cache Optimization Technology
KV Cache is an optimization technique commonly used to store the key-value pairs of tokens generated during the operation of AI models to improve computational efficiency. Through "storage-based calculations," it avoids repetitive calculations from the first token in large models, thereby improving the efficiency of computational resources.

Solving MoE Model Performance Issues
Moreover, DeepSeek has solved the performance problems of “very large but very sparse MoE models.” Using the MoE mixed expert model to enhance the cognitive ability of AI large models has become an industry-recognized effective method. However, the increase in the number of expert models may lead to less accurate results. DeepSeek’s achievement lies in its ability to train MoE models, making it the first company to successfully train such large MoE models.

Efficient Expert Model Activation Technology
To ensure the balanced operation of large-scale MoE expert models, DeepSeek uses an advanced, auxiliary loss-function-free expert load balancing technique. This ensures that, under each token, a small number of expert network parameters are activated, and different expert networks are activated more evenly, preventing network congestion. Furthermore, DeepSeek makes full use of the sparse activation design of expert networks, limiting the number of tokens sent to GPU cluster nodes, ensuring stable and low communication overhead between GPUs.

Conclusion
DeepSeek's app, thanks to its technological and cost advantages, has successfully reached the top of the Apple App Store download rankings. The breakthroughs in performance and cost of the underlying DeepSeek-V3 and DeepSeek-R1 models, along with the implementation of the open-source strategy, have made DeepSeek a significant player in the AI field. In the future, DeepSeek is expected to continue leading the development of AI applications, bringing more innovation and convenience to users.