Reddit blocks the Internet Archive machine from the index of most of its site, after discovering that artificial intelligence companies were collecting their data from the digital time capsule.
This step comes at a time when Reddit tightens his grip on user data. The company does not mind training AI companies on its models on Reddit posts, but it must pay first. Redait previously said that he will not restrict “goodwill actors” like the Internet archive, but now he believes that some are helping artificial intelligence companies to get rid of licensing fees. The sudden change of Reddit highlights how the data license has become a major source of revenue in the age of artificial intelligence.
Internet archive is a non -profit organization devoted to building a vast digital library of web sites and other online content. To date, she has made billions of web pages, along with millions of books, videos and software programs. Its signature, Wayback, allows users to save and reconsider webpage clips later to see how they look at exactly a specific date.
Reddit says it has evidence that some artificial intelligence companies use a Wayback device to overcome their policies and to scrape the user content without permission.
A Reddit spokesman told Gizmodo in an e -mail statement. “So that they can defend their location and comply with the basic system policies (for example, with regard to the user’s privacy, re -delete the removed content), we limit some of their access to Reddit data to protect Redditors.”
Redit said freedom The Wayback machine will not be able to crawl publishing pages, comments or profiles. Instead, Reddit’s home page. The restrictions “nomination” begin today, and Reddit says it gave the Internet archive ahead.
The Internet archive did not immediately respond to a request for comment from Gizmodo.
Reddit has tightened control of its data in recent years. Although the company is open to licensing its data, it is divided into companies that have not been paid. The company has already succeeded in millions of dollars deals Google And Openai. In the Google deal, Reddit participated with Google to get both the research index and artificial intelligence training data, then I started Other search engines prevent Who followed the start of the last Reddit posts in their search results.
In June, I replied a lawsuit against Amnesty International AnthropHe accused him of unauthorized scraping.
https://gizmodo.com/app/uploads/2024/07/Reddit-Search.jpg
Source link