Researchers have found that retraining just small portions of AI models can reduce costs and prevent forgetting

Photo of author

By [email protected]



Companies often find that when… They adjust the modelsOne effective approach to making a large language model (LLM) fit-for-purpose and data-driven is to make the model lose some of its capabilities. After fine-tuning, some models “forget” how to perform certain tasks or other tasks they have already learned.

Research from the University of Illinois Urbana-Champaign suggests a new way to retrain models that avoids “catastrophic forgetting,” where the model loses some of its prior knowledge. The paper focuses on two specific LLM programs that generate responses from images: LLaVA and Qwen 2.5-VL.

This approach encourages organizations to retrain only narrow portions of the LLM to avoid retraining the entire model and incurring a significant increase in computing costs. The team claims that catastrophic forgetting is not true memory loss, but rather a side effect of bias bias.

“Training a new LMM can cost millions of dollars, weeks of time, and emit hundreds of tons of carbon dioxide, so finding ways to update existing models more efficiently and effectively is a pressing concern,” the team wrote in their report. paper. “Guided by this finding, we explore control recipes that maintain learning while reducing shift in outcomes.”

The researchers focused on multilayer perceptron (MLP), the internal decision-making component of the model.

Catastrophic forgetfulness

The researchers first wanted to investigate the presence and cause of catastrophic forgetting in the models.

To do this, they created a set of target tasks for the models to complete. The models were then fine-tuned and evaluated to determine whether they were significantly conducive to forgetting. But as the process continued, the researchers found that the models were regaining some of their abilities.

“We also observed a surprising result, which is that model performance would decline significantly in the benchmarks after training on the counting task, and would mostly recover in PathVQA, another specialized task that is not well represented in the benchmarks,” they said. “Meanwhile, while conducting the forgetting experiments, we also tried to tune the self-attention projection layers (SA Proj) or MLP layers separately, motivated by the finding that tuning only the LLM was generally better than tuning the full model. This led to another very surprising result – that tuning only the self-attention projection layers led to very good learning of the target tasks with no drop in performance on the held tasks. Even after training all five mission targets sequentially.”

The researchers said they believe that “what looks like forgetting or interference after fine-tuning to a narrow target task is actually a bias in the distribution of output due to a shift in task distribution.”

Narrow retraining

This discovery turned out to be the key to the experiment. The researchers note that adjusting the MLP increases the likelihood of “digital token output and a significantly associated decrease in the accuracy of hold tasks.” What it showed is that a model forgetting some of his knowledge is only temporary and not a long-term issue.

“To avoid biasing the output distribution, we tuned the highest/delightful MLP projections while keeping the downward projection frozen, and find that it achieves similar learning to full MLP tuning with little forgetting,” the researchers said.

This allows for a more straightforward and repeatable way to fine-tune the model.

By focusing on a narrow slice of the model, rather than wholesale retraining, organizations can reduce compute costs. It also allows better control of output skew.

However, the research only focuses on two models, specifically those dealing with vision and language. The researchers noted that due to limited resources, they are unable to try the experiment with other models.

However, their findings can be extended to other MBAs, especially for different modalities.



[og_img]

Source link

Leave a Comment