The content crawling on the content is provided from web sites that explicitly indicated that they do not want to be scraped, according to the infrastructure provider on the Internet.
On Monday, Cloudflare Published search She said that she noticed that starting artificial intelligence ignores the blocs and hides their creeping activities and cursing. Cloudflare researchers wrote that the network giant of the network accused the confusion of blocking its identity when trying to scrape the web pages “in an attempt to circumvent the preferences of the web site.”
Artificial intelligence products, such as those offered by confusion, depend on increasing large quantities of data from the Internet, and the startups of artificial intelligence have texts, pictures and long videos from the Internet several times without permission to make their products work. Recently, the websites have tried to fight using the Web Standard Robots.txt file, which tells search engines and artificial intelligence companies that can be indexed and not, should not, efforts Which has witnessed mixed results so far.
The confusion appears to circumvent these blocs by changing the “user agent” robots, which means a sign that determines the visitor’s visitor visitor by their devices and the type of version; In addition to changing independent system networks, or ASN, basically a number specifies large networks on the Internet, according to Cloudflare.
“This activity has been observed across tens of thousands of fields and millions of requests per day. We were able to fingerprints this crawl using a set of machine learning signals and network signals.”
Perplexity Jesse DWYER rejected the Cloudflare Blog post as a “sales stadium”, adding in an email to Techcrunch that screenshots in the post “show that no content has been reached.” In an email to follow up, DWYER claimed that the robot called the Cloudflare Blog “is not even ours.”
Cloudflare said that she first noticed the behavior after her customers complained that confusion was crawling and crowded their sites, even after they added rules to their robots file and the prohibition of robot programs known specifically. Cloudflare said that he then conducted tests for verification and confirmed that confusion was defrauding these blocs.
TECHRUNCH event
San Francisco
|
27-29 October, 2025
“We have noticed that confusion not only uses the advertiser user agent, but also a general browser aimed at impersonating Google Chrome on MacOS when the announced creeping was banned,” according to Cloudflare.
The company also said that it has canceled the inclusion of confused robots from its verified list and added new techniques to prevent it.
Cloudflare recently took a general position against artificial intelligence. Last month, Cloudflare She announced the launch of the market Allowing website owners and publishers to impose fees on artificial intelligence sections who visit their sites. CEO of Cloudflare Matthew Prince The alarm appeared At that time, the sayings of artificial intelligence breaks the business model for the Internet, especially publishers. Last year, Cloudflare as well I launched a free tool To prevent robots from bulldozing sites to train artificial intelligence.
This is not the first time that confusion has been accused of bulldozing without permission.
Last year, news means, Like wirelessThe alleged The confusion was acquired by its content. After weeks, Aravind Srinivas, CEO of Perplexity He was unable to answer immediately When they are asked to submit the company’s definition of plagiarism during an interview with the Devin Coldewey from Techcrunch at the Disrupt 2024 conference.
https://techcrunch.com/wp-content/uploads/2024/11/GettyImages-2181996346.jpg?w=1024
Source link