Stay informed with our newsletter.

Icon
Startups
January 28, 2025

How DeepSeek's AI Model Competes with OpenAI's Innovations

Chinese AI startup DeepSeek has created an innovative AI model that competes with OpenAI's advanced technologies. By focusing on cutting-edge developments, DeepSeek has designed a model that challenges industry leaders, highlighting its capabilities in natural language processing and machine learning. This competitive AI model showcases the startup’s strength in artificial intelligence, positioning it as a strong player in the rapidly evolving AI landscape, aiming to rival the innovations of established giants like OpenAI.

Liang Wenfeng, the founder of a Chinese quant hedge fund, ventured into AI research by acquiring 10,000 Nvidia chips and assembling a team of highly ambitious, young professionals. Two years later, DeepSeek made a significant entrance into the field.

On January 20, DeepSeek, a relatively obscure AI lab from China, unveiled an open-source model that quickly gained attention in Silicon Valley. According to a paper from the company, DeepSeek-R1 surpasses leading models like OpenAI's in key math and reasoning benchmarks. DeepSeek is proving to be a formidable competitor to Western AI giants in terms of capability, cost, and openness.

The success of DeepSeek highlights an unexpected consequence of the tech rivalry between the US and China. US export controls have drastically limited Chinese tech companies’ ability to follow the Western approach of scaling AI through purchasing more chips and training for extended periods. As a result, many Chinese firms have shifted focus to downstream applications rather than developing their own models. However, DeepSeek has shown that there is another route to success by reimagining the foundational structures of AI models and using resources more efficiently.

“Unlike many other Chinese AI companies that rely on advanced hardware, DeepSeek has concentrated on optimizing resources through software,” says Marina Zhang, an associate professor at the University of Technology Sydney who specializes in Chinese innovations. “DeepSeek’s embrace of open-source methods and collaborative innovation allows it to overcome resource constraints and accelerate cutting-edge technological development, differentiating it from more isolated competitors.”

So, who is behind DeepSeek, and why has the company suddenly released an industry-leading model for free? WIRED spoke with experts on China’s AI scene and interviewed founder Liang Wenfeng to uncover the story behind DeepSeek's rapid rise. However, the company did not respond to multiple inquiries from WIRED.

DeepSeek is an unconventional player within China's AI industry. Originally known as Fire-Flyer, it was a deep-learning research division of High-Flyer, one of China’s most successful quantitative hedge funds. Founded in 2015, the hedge fund quickly became a leader in the country, raising over 100 billion RMB (about $15 billion) in its early years, although the number has since decreased to around $8 billion. Despite this dip, High-Flyer remains a key player in China’s hedge fund industry.

For years, High-Flyer amassed GPUs and built supercomputers to analyze financial data. In 2023, Liang, who holds a master’s degree in computer science, decided to allocate the fund’s resources to create DeepSeek, a company focused on developing cutting-edge AI models, with the goal of achieving artificial general intelligence. This move was reminiscent of a financial firm like Jane Street transitioning into an AI startup focused on research.

Liang’s vision was bold, but it worked. “DeepSeek represents a new generation of Chinese tech companies prioritizing long-term technological growth over immediate commercialization,” says Zhang.

Liang explained in an interview with 36Kr that his decision was driven by scientific curiosity, not profit. “If you ask me to find a commercial reason for founding DeepSeek, I wouldn’t be able to,” he said. “Basic science research doesn’t offer a high return on investment. Early investors in OpenAI weren’t concerned with the financial returns; they simply wanted to contribute to the advancement of the field.”

Today, DeepSeek is one of the few top AI companies in China that does not rely on funding from tech giants like Baidu, Alibaba, or ByteDance.

Liang’s approach to building DeepSeek’s research team was unconventional as well. Rather than hiring experienced engineers, he sought out PhD students from top Chinese universities such as Peking and Tsinghua, who were eager to prove their worth. Many had published in prestigious journals and won accolades at international conferences, but lacked industry experience, according to QBitAI.

“Our core technical roles are mostly filled by recent graduates,” Liang shared with 36Kr in 2023. This hiring strategy fostered a collaborative culture where the team had the freedom to use abundant computing resources to explore unconventional research projects. This approach is in stark contrast to other major Chinese internet firms, where competition for resources is common. (A recent incident saw ByteDance accusing a former intern of sabotaging colleagues to gain access to more computing resources.)

Liang believes that younger researchers are better suited for high-investment, low-return research because they can fully dedicate themselves to a mission without selfish motives. His pitch to prospective hires emphasized that DeepSeek’s mission was to tackle the most challenging questions in the world.

Experts believe that the fact these young researchers are predominantly educated in China contributes to their drive. Zhang notes, “This younger generation feels a sense of patriotism, particularly as they navigate US restrictions on critical technologies. Their determination reflects both personal ambition and a desire to strengthen China’s position as a global leader in innovation.”

The company's innovation emerged in part from the challenges posed by the US government's export controls on cutting-edge chips. In October 2022, the US began imposing restrictions on the export of advanced chips, such as Nvidia’s H100, which created a challenge for DeepSeek. Although the firm had stockpiled 10,000 A100 chips, more were needed to compete with firms like OpenAI and Meta. “Our issue has never been funding, but the export restrictions on advanced chips,” Liang told 36Kr in another interview in 2024.

To overcome this, DeepSeek focused on more efficient methods for training its models. “They optimized their model architecture using various engineering techniques, such as custom chip communication schemes, reducing field sizes to conserve memory, and creatively applying a mix-of-models approach,” says Wendy Chang, a software engineer turned policy analyst at the Mercator Institute for China Studies. “While these ideas aren’t new, combining them successfully to create an advanced model is a remarkable achievement.”

DeepSeek has also advanced technologies like Multi-head Latent Attention (MLA) and Mixture-of-Experts, which help make their models more cost-effective by requiring fewer computing resources. In fact, their latest model was so efficient that it used just one-tenth of the computing power needed to train Meta’s Llama 3.1 model, according to Epoch AI.

The company’s decision to share its innovations with the public has earned it significant goodwill in the global AI research community. Many Chinese AI companies have turned to open-source models as a way to compete with Western firms by attracting more users and contributors, which ultimately helps improve the models. “DeepSeek has demonstrated that cutting-edge models can be developed with relatively fewer resources, and the current model-building norms leave room for optimization,” Chang says. “We are likely to see more efforts in this direction.”

This development may pose a challenge to the current US export controls designed to create bottlenecks in computing resources. “Existing estimates of China’s AI capabilities could be upended,” Chang concludes.

For questions or comments write to writers@bostonbrandmedia.com

Source: wired

Stay informed with our newsletter.