Researchers found that just 250 malicious documents could make MScs vulnerable to backdoors

Photo of author

By [email protected]


AI companies are working at breakneck speeds to develop the best and most powerful tools, but this rapid development has not always been accompanied by a clear understanding of AI’s limitations or weaknesses. Today, Anthropic A a report On how attackers impact the development of a large language model.

The study focused on a type of attack called poisoning, where an LLM is pre-trained on malicious content intended to make it learn dangerous or unwanted behaviors. The main finding from this study is that the bad actor does not need to control a percentage of the pre-training material in order to poison the LLM. Instead, the researchers found that a small, fairly constant number of malicious documents can poison an LLM, regardless of the size of the form or its training materials. The study was able to successfully access LLMs based on the use of only 250 malicious documents in the pre-training dataset, a much lower number than expected for models ranging from 600 million to 13 billion parameters.

“We are sharing these findings to show that data poisoning attacks may be more practical than thought, and to encourage further research into data poisoning and potential defenses against it,” the company said. Anthropic collaborated with the UK’s AI Security Institute and the Alan Turing Institute on the research.



https://s.yimg.com/ny/api/res/1.2/xMV8HxTYXI7MSoVuws1hvw–/YXBwaWQ9aGlnaGxhbmRlcjt3PTEyMDA7aD04MDA-/https://d29szjachogqwa.cloudfront.net/images/2025-10/4a767e68-8961-4c8a-bd19-4607d9e47a5a

Source link

Leave a Comment