Microsoft explores a method of credit for contributing to artificial intelligence training data

Photo of author

By [email protected]


Microsoft launches a research project to estimate the impact of specific training examples on the text and pictures and other types of media created by artificial intelligence models.

this Each list of work It dates back to the recently recycled December on LinkedIn.

According to the inclusion, which is looking for a trainee in the research, the project will try to prove that models can be trained in a way that the impact of specific data can be estimated – for example, on their outputs “efficiently and agreeing to that.”

“The current structure of the nerve network is not transparent in terms of providing sources for its generations, and there are (…) good reasons to change this,” says the list. ((One of them,) incentives, confession, and may push people who contribute to some valuable data for unexpected types of models that we will want in the future, assuming that the future will maintain us mainly. “

The text, which works with AI, symbol, photos, videos and song generators is located in a center A number of IP lawsuits Against artificial intelligence companies. Often, these companies train their models on huge quantities of data from public websites, some of which protect copyright. Many companies argue with that The doctrine of just use It protects their draining and training practices. But designs – from artists to programmers to authors – do not agree largely.

Microsoft itself faces at least two legal challenges of copyright.

New York Times A lawsuit against the technology giant Its collaborators, Openai, in December, accused the two companies of violating Times’s copyright by publishing trained models on millions of articles. Many program developers She also filed a lawsuit against Microsoft, claiming that the assistant Github Copilot Ai Coding was illegally trained using her protected business.

The new research effort of Microsoft, which the list describes as “the origin of training time”, ” It is said He has the participation of Jaron Lanier, A prominent technician and the multidisciplinary world In Microsoft Research. In April 2023 The opening in New YorkerLanier wrote about the concept of “data dignity”, which means that it means communicating “digital things” to “people who want to be known as”.

“The data design approach will follow the most unique contributors and influence when it provides a large valuable model,” Lanner wrote. “For example, if I asked a model about” a moving movie for my children in a world of cats in talking about oil on an adventure, some of the main painters are in oil, cat photographs, vocal actors, and the book-or their areas-may be calculated until they are paid. “

There, not for nothing, already many companies trying to do so. The developer of the artificial intelligence model, which recently raised $ 40 million in investment capital, claims to compensate for “compensation” of data owners according to the “total effect”. Adobe and Shutterstock also give regular payments to data groups, although the exact payments are inclined to be transparent.

A few large laboratories have created individual payment programs for shareholders outside the licensing agreements with publishers, platforms and data mediators. Instead, they have provided means of copyright “cancellation of subscription” in training. But some of these subscription cancels are exhausting, and only applies to future models-not pre-training.

Of course, the Microsoft project may reach slightly more than evidence of the concept. There is a precedent for that. Again in maybeOpenai said that the development of a similar technology that would allow creators to determine how to include their work in training data – or excluded from – training data. But about a year later, you did not see the tool after the light today, and it is often It was not seen as an internal priority.

Microsoft may also try. “Washing ethics“Here – or organizational decisions and/or trial are subjected to turmoil in artificial intelligence.

However, the company is investigating the noticeable training data in light of other situations that have been recently expressed by AI Labs. Many of the best laboratories, including Google and Openai, have published, Recommendation policy documents The Trump administration weakens the protection of copyright in terms of its connection with the development of artificial intelligence. Openai has Frankly called on the United States government To write down the fair use of typical training, which argues that he will free developers from exhausting restrictions.

Microsoft did not immediately respond to a request for comment.



https://techcrunch.com/wp-content/uploads/2025/01/GettyImages-2153485379.jpg?resize=1200,800

Source link

Leave a Comment