Dibsic Artificial Intelligence Model R1 has been updated You may get the largest part of the attention of the artificial intelligence community this week. But the Chinese Ai Laboratory also released a smaller version of the new R1, Deepseek-R1-0528-SWEN3-8B, which claims Deepseek outperforming relative models on certain standards.
The smaller R1, which was built using QWEN3-8B Form Alibaba launched in May as a better performance than Google’s Gemini 2.5 flash In Aime 2025, a set of difficult mathematics questions.
Deepsek-R1-0528-SWEN3-8B also matchs almost almost the recently released Microsoft Phi 4 Reasoning Plus Model in other mathematics skills test, HMMT.
The so-called distilled models such as Deepseek-R1-0528-SQWEN3-8B are generally less capable of their full-sized counterparts. On the positive side, it is less calculated. According to To nodeshift the cloud platform, QWEN3-8B GPU requires 40GB-80GB RAM to run (for example, NVIDIA H100). New R1 needs of full size About ten units of graphics processing 80 GB.
Deepseek trained Deepseek-R1-0528-SWEN3-8B by taking a text created by the updated R1 and used to adjust QWEN3-8B. On a web page for the model on the face of the AI Dev platform, Deepseke Deepseek-R1-0528-SQWEN3-8B describes that “both academic research on thinking and industrial development models that focus on small models.”
Deepsek-R1-0528-SWEN3-8B is available under the Massachusetts Institute of Technology, which means that it can be used commercially without restriction. Many hosts, including LM studioAlready offered the model through the application programming interface.
https://techcrunch.com/wp-content/uploads/2025/01/GettyImages-2196223480.jpg?resize=1200,825
Source link