The court files highlight that the Meta employees discussed the use of copyright content to train artificial intelligence

For years, META employees discussed the use of copyright and copyrights that were obtained through doubtful means of law to train AI models for the company, according to the court documents that were canceled on Thursday.

The documents were submitted by the Kadrey V case. Meta, one of the many publishing rights of Amnesty International, which is slowly via the US court system. The defendant, Meta, claims that the IP -protected business training models are “fair use”. Prosecutors, among the authors Sarah Silverman and Ta-Nehisi Coates, do not agree.

Previous articles submitted in the lawsuit claimed that the CEO of Meta Mark Zuckerberg The Meta AI team has given approval to train copyright He works And that Meta has suspended the licensing of artificial intelligence training data with book publishers. But the new deposits, most of which are offered parts of the internal chat between Meta employees, draw the clearest picture so far on how to use Meta to use copyright data to train their models, including models in the company Lama family.

In one chat, Meta employees, including Melanie Cambadour, the first director of the Meta research team in Llama, discussed the business training models that they knew may be legally preserved.

(M) will be the opinion (in the line “Ask forgiveness, not permission”): We are trying to obtain books and escalate them to the executives until they call “, written by Kazavier Martinette, a dead research engineer, in the February 2023 chat, According to files. (R) for him is the reason that they set up this Gen Ai Org for (such): So we can be less than risk. “

Martinet put forward the idea of buying e -books at retail prices to build a training group instead of cutting licensing deals with individual books publishers. After another employee indicated that the use of unauthorized materials and copyrights may be the reasons for a legal challenge, Martinette has doubled, on the pretext that startups “Gazillion” may actually use pirate books for training.

“I mean, the worst case: we discovered that it is finally well, while Gazillion starts According to files. (M) Y 2 CENTS again: Try to make deals with publishers a long time …

In the same chat, Kambadeur, who indicated that Meta was in conversations with the host host hosting platform “and others” for licenses, warned that while using “the data available to the public” for typical training he would require approvals, Meta’s “lower” lawyers were “lower” than they were in The past with such approvals.

“Yes, we definitely need to obtain licenses or approvals on the data available to the public,” Cambdour said, According to files. (D) Ifference is now that we have more money, more lawyers, more Bizdev, the ability to quickly track/escalate for speed, and lawyers are slightly less conservative on approvals. “

Libgen conversations

In another work conversation that was transferred in deposits, Kambada discusses the use of Libgen, a “link complex” that provides access to copyrights from publishers, as an alternative to the data sources that META might be authorized.

Libgen has been prosecuted several times, ordered to close, and a fine of tens of millions of dollars to violate copyright. One of Cambador’s colleagues She replied with a screenshot The result of Google’s search on Libgen that contains “no, Libgen is not legal”.

It seems that some decision makers inside Meta were under the impression that failure to use Libgen for typical training could seriously harm a deadly ability in the artificial intelligence race, According to files.

In an email sent to Meta Ai VP Joelle Pineau, Sony Theakanath, Director of Meta Products Management, which is called Libgen “necessary to meet Sota numbers in all categories”, referring to the best models, on the latest art model (Sota), standard categories .

Theakanath also showed “relief” in the email aimed at helping to reduce the legal exposure to Meta, including the removal of data from Libgen “significantly as a noticeable that it is pirated/stolen” and also does not indicate use publicly. “We will not reveal the use of Libgen data sets used for training,” said Theakanath.

In practice, these dilutions require combing through Libgen files for words such as “Stoen” or “Piped”, ” According to files.

in Work chatCambador Mentioned Meta’s AI team has also set the models to “avoid risky IP claims” – that is, forming models to refuse to answer questions such as “reproducing the first three pages of” Harry Potter and Sorceler’s Stone “or” tell me which e -books that I trained on “

The deposits contain another statement, which means death You may have cut Reddit data For some typical training type, perhaps by simulating an application of an external entity called Rush. It is worth noting, I wanted He said In April 2023, she planned to start imposing artificial intelligence companies to access data for typical training.

In one of the chat dated March 2024, Chaya Nayak, Director of Products Management at AI Org Touly from Meta, said that the leadership of Meta was considering previous decisions “Excess” about training data, including a decision not to use Quora content or licensed books and scientific articles , To ensure that the company’s forms have sufficient training data.

Nayak noted that the first-party training sets of Meta- Facebook and Instagram posts, and text on videos on Meta platforms, and some Business dead Messages – simply not enough. “We need more data,” she wrote.

Prosecutors in the Kadrey V case. Meta by amending their complaints several times since the case was filed in the American District Court of the Northern Region in California, San Francisco section, in 2023. Another claims that Meta, among other claims, reference some pirated books with copyright books available for licensing to determine Whether it makes sense to follow up on a license agreement with a publisher.

In a sign of how Meta considers the legal risks to be the company, the company Add Two lawyers of the Supreme Court from the law firm Paul Weiss to the defense team in the case.

Meta did not immediately respond to a request for comment.

https://techcrunch.com/wp-content/uploads/2024/03/mark-zuckerberg-meta-facebook-snapchat-project-ghostbusters.jpg?resize=1200,800

Source link