How Chinese AI startup DeepSeek built a model to rival OpenAI

Photo of author

By [email protected]


Today, DeepSeek is one of the only leading AI companies in China that does not rely on funding from tech giants like Baidu, Alibaba or ByteDance.

A young group of geniuses eager to prove themselves

According to Yang, when he formed the DeepSeek research team, he wasn’t looking for experienced engineers to build a consumer-facing product. Instead, he focused on doctoral students from China’s top universities, including Peking University and Tsinghua University, who were keen to prove themselves. Many were published in prestigious journals and won awards at international academic conferences, but lacked industry experience, according to Chinese technology publication QBitAI.

“Our core technical positions are mostly filled by people who graduated this year or within the last year or two,” In 2023, Liang told 36Kr. The recruitment strategy helped create a collaborative company culture where employees were free to use ample computing resources to pursue non-traditional research projects. It’s a very different way of working from China-based internet companies, where teams often compete for resources. (Modern example: ByteDance has accused a former intern(Winner of a prestigious academic award, no less) sabotaging his colleagues’ work in order to hoard more computing resources for his team.)

Liang said students could be more suitable for research with high investment and low profit. “Most people, when they are young, can devote themselves entirely to a task without utilitarian considerations,” he explained. His pitch to potential employees was that DeepSeek was created to “solve the world’s toughest questions.”

The fact that these young researchers were educated almost entirely in China adds to their motivation, experts say. “This young generation also embodies a sense of patriotism, especially as they navigate American restrictions and chokepoints in critical hardware and software technologies,” Zhang explains. “Their determination to overcome these barriers reflects not only their personal ambition, but also a broader commitment to strengthening China’s position as a global leader in innovation.”

Innovation was born out of crisis

In October 2022, the US government began putting in place export controls that severely blocked Chinese AI companies from accessing cutting-edge chips like Nvidia’s H100. This move presented a problem for DeepSeek. The company started with an inventory of 10,000 H100 devices, but needed more to compete with companies like OpenAI and Meta. “The problem we faced was not financing at all, but controlling exports of advanced chips,” Liang told 36Kr. In a second interview in 2024.

DeepSeek had to come up with more efficient ways to train its models. “They improved their model architecture using a combination of engineering tricks — custom communication schemes between chips, reducing the size of fields to save memory, and innovative use of a model mix approach,” says Wendy Zhang, a software engineer-turned-politician. Analyst at the Mercator Institute for Chinese Studies. “Many of these approaches are not new ideas, but successfully combining them to produce a cutting-edge model is a remarkable achievement.”

DeepSeek has also made significant progress in multi-head latent attention (MLA) and expert blending, two technical architectures that make DeepSeek’s models more cost-effective by requiring fewer computational resources for training. In fact, DeepSeek’s latest model is so efficient that it requires one-tenth the computational power of Meta’s similar Llama 3.1 model to train it. According to the research foundation Epoch AI.

DeepSeek’s willingness to share these innovations with the public has earned it a great deal of goodwill within the global AI research community. For many Chinese AI companies, developing open source models is the only way to catch up with their Western counterparts, because it attracts more users and contributors, which in turn helps the models grow. “They have now demonstrated that advanced models can be built using less, albeit still a lot, money, and that current standards for model building leave a lot of room for improvement,” Zhang says. “We are sure that we will see more attempts in this direction in the future.”

This news could create problems for existing US export controls that focus on creating bottlenecks in computing resources. “Current estimates of how powerful China’s AI computing power is, and what it can achieve with that power, could be upended,” Zhang says.



https://media.wired.com/photos/67916f9e8f0de7e17273dc69/191:100/w_1280,c_limit/DeepSeek-AI-Business-shutterstock_2553453597.jpg

Source link

Leave a Comment