It turns out that you can train artificial intelligence models without copyright materials

Photo of author

By [email protected]


Artificial intelligence companies demanding their tools It cannot exist without training in copyrights. It turns out, they can – it’s really difficult. To prove this, artificial intelligence researchers trained a new, less powerful but more ethical model. This is because the LLM data collection only uses the public domain and the licensed materials publicly.

the paper ((via Washington PostCooperation was between 14 different institutions. Authors are universities such as the Massachusetts Institute of Technology, Carnegie Mellon and the University of Toronto. Non -profit organizations such as the Vector Institute and Allen International also contributed to the Amnesty International.

The collection built a 8TB data collection. Among the data is a collection of 130,000 books in the Library of Congress. After entering the material, they trained a large language model seven billion (LLM) on that data. The result? It did it as well as the similar size Lama 2-7B From 2023. The team did not publish criteria comparing its results with the best models today.

The performance similar to a two -year -old model was not the only passive side. The process of assembling together was also grinding. It is not possible to read a lot of data by machines, so humans had to go into them. “We use automated tools, but all our purposes were manually explained at the end of the day and separated from people,” said Stella Pedmman, co -author. And Abu. “This is really difficult.” Discovering legal details make the process difficult. The team had to determine the license applied to each web site that wipes it.

So, what do you do with LLM less powerful to train? If there is nothing else, it can serve as a counter point.

In 2024, Openai A British parliamentary committee told this Such a model cannot be found in the first place. The company has claimed that it would be “impossible to train the leading AI models today without using copyrights.” Last year, an anthropier expert witnessed, “It is possible that there will be no LLMS if the artificial intelligence companies are asked to license businesses in their training data groups.”

Of course, this study will not change the path of artificial intelligence companies. After all, more work to create less powerful tools that do not interest them. But at least one of the common arguments for the industry. Do not be surprised if you hear about this study again in Legal issues and Organization arguments.



https://s.yimg.com/ny/api/res/1.2/b9aiOohgHip5oOFgzn4VoA–/YXBwaWQ9aGlnaGxhbmRlcjt3PTEyMDA7aD01ODc-/https://s.yimg.com/os/creatr-uploaded-images/2025-06/ab6c2f30-4231-11f0-afe9-182932bffbc7

Source link

Leave a Comment