Perhaps Deepseek Gemini from Google to train its latest model

Photo of author

By [email protected]


Last week, she released Lail Lab Deepseek A An updated version of the AI ​​model R1 It works well on a number of mathematics and coding standards. The company did not reveal the source of the data it used to train the model, but some artificial intelligence researchers expect that at least part of the AI’s Gewing Gemini family.

Sam Bish, a developer of Melbourne, who creates “emotional intelligence” assessments of Amnesty International, has published what is evidence that the Dembek model has been trained in outputs from Gemini. Bish said in a X post.

This is not a smoking rifle. But another developer, the pseudonym for “Freedom of Expression” for Amnesty International is called letterI noticed the effects of the Deepseek model – the “ideas” that the model generates because it works for the conclusion – “Read like the effects of Gemini.”

Deepseek has been accused of data training from artificial intelligence models before. In December, developers Note The Deepseek V3 often defines itself as ChatGPT, the AI’s Chatbot Chatbot platform, indicating that it may be trained in Chatgpt chat records.

Earlier this year, Tell Openai The Financial Times Evidence connected to Deepseek is found using distillation, a technique for training artificial intelligence models by extracting data from the largest and more capable. According to BloombergMicrosoft, a collaborator and investor close to Openai, discovered that large quantities of data were activated through Openai Deleper accounts in late 2024 – Openai accounts believe it belongs to Deepseek.

Distillation is not unfamiliar, but the conditions of the Openai service prevents customers from using the company’s model outputs to build competitive artificial intelligence.

To clarify, many models Make themselves It meets the same words and phrases. This is because the open web, as artificial intelligence companies have become the source of their training data, Full With artificial intelligence Deviate. Artificial intelligence content farms use to create ClickbaitAnd robots overwhelm Redit and x.

This “pollution”, if you can, make it Very difficult To filter AI’s outputs from training data sets.

However, artificial intelligence experts such as Nathan Lambert, a researcher at the Institute of Non -Perstantial Intelligence Research, AI2, does not believe that it is precedent that Deepseek is trained in data from Google’s Gemini.

“If you are Deepseek, I will definitely create a lot of artificial data from the best API model there,” Lambert books In a post on X. “Deepseek) short on graphics processing and flow units. It is literally more accountable.”

Partially in an attempt to prevent distillation, artificial intelligence companies intensified security measures.

In April, Openai began Require Organizations to complete the identity verification process in order to reach some advanced models. The operation requires an identity issued by the government from one of the API -supported countries; China is not in the list.

Elsewhere, Google has recently started to “summarize” the effects created by the models available through the Developer Ai Studio platform, a step that makes it difficult to train competing performance models on Gemini effects. Anthropor said in May he would do Start in summarizing The effects of its model, indicating the need to protect its “competitive advantages”.

We have contacted Google to comment and we will update this piece if we hear.





https://techcrunch.com/wp-content/uploads/2025/01/GettyImages-2196333417_75e106.jpg?resize=1200,901

Source link

Leave a Comment