He claims that confusion embodies the websites, so it is not supposed to be, again

Photo of author

By [email protected]


It is claimed that the web crawls that are published by confusion to scrape the web sites A new report from Cloudflare. Specifically, the report claims that the company’s robots seem to be “ghost crawl” by hiding its identity to circumvent Robots.txt files and protection walls.

Robots.txt is a simple file site that allows the web crawling to know if they can scrape the content of the web sites or not. Confusion Web crawling robots They are “puzzled” and “a confusing user”. In Cloudflare tests, the confusion was still able to display the content of a new web site, even when those specified robots were banned by Robots.txt. The behavior has extended to web sites that contain specific web protection wall rules (WAF) that also restrict the web crawling.

A streamlined plan is tried by Cloudflare to clarify the different ways of Perplexity web crawls to access the content of the web site.A streamlined plan is tried by Cloudflare to clarify the different ways of Perplexity web crawls to access the content of the web site.

Cloudflare

Cloudflare believes that confusion revolves around these obstacles by using a “general browser aimed at impersonating Google Chrome on MacOS” when Robots.txt is prohibited by his regular robots. In CloudLfare tests, the company’s undeclared m celebration can also rotate through the unlikely IP IP addresses in Perplexity to obtain protection walls. Cloudflare says that confusion looks like it is doing the same with the independent system numbers (ASNS) – ID for IP addresses that are managed by the same work – he writes that she has monitored an ASNS changing “across tens of thousands of areas and millions of requests per day.”

Engadget has communicated to the confusion to comment on the Cloudflare report. We will update this article if we hear.

Modern information from web sites is vital to training companies on artificial intelligence models, especially with the use of confusion in service as alternatives to search engines. Also, confusion has been arrested in the past, defrauding the rules to stay aware. Multiple web sites were reported in 2024 This confusion was still reaching its content despite its ban in Robots.txt- which is something that the company blamed the web crawling from the third party that it was using at that time. Later confusion I participated with several publishers To share revenues gained from the ads offered along with their content, it appears to be a good work for her previous behavior.

It is possible that preventing companies from bulldozing the web content will remain a game of Whack-A-Mole. Meanwhile, Cloudflare removed confused robots from The list of verified robots A way to determine and prevent the ghost creeping is implemented in confusion from reaching the content of its customers.



https://s.yimg.com/ny/api/res/1.2/.LIhK._kc3bCFIS7uWdbzw–/YXBwaWQ9aGlnaGxhbmRlcjt3PTEyMDA7aD04MzU-/https://s.yimg.com/os/creatr-uploaded-images/2025-07/98e4b780-6368-11f0-b8da-99929cec11c5

Source link

Leave a Comment