Researchers improve AI agent’s performance on unfamiliar tasks using ‘Dungeons and Dragons’

Join our daily and weekly newsletters for the latest updates and exclusive content on our industry-leading AI coverage. He learns more

Organizations interested in deploying AI agents must first fine-tune them, especially in workflows that often seem routine. While some organizations want agents that only do one type of task in one workflow, sometimes agents need to be brought into new environments in the hopes that they can adapt.

Researchers from Beijing University of Posts and Telecommunications I’ve unveiled a new method, AgentRefine. It teaches agents how to self-correct, leading to more general and adaptive AI agents.

Existing tuning methods limit agents to the same tasks as their training dataset, or “holding” tasks, and do not perform well in “holding” or novel environments, the researchers said. By only following the rules prescribed by training data, agents trained using these frameworks will have difficulty “learning” from their mistakes and cannot be transformed into general agents and introduced into new courses of action.

To overcome this limitation, AgentRefine aims to create more general datasets for agent training that enable the model to learn from errors and adapt to new workflows. On a new sheetThe goal of AgentRefine is to “develop generalized agent tuning data and establish the relationship between agent generalization and self-improvement,” the researchers said. If agents correct themselves, they will not repeat any errors they have learned and will bring the same errors to other environments in which they are deployed.

“We found that tuning the agent to self-improvement data enhances the agent to explore more viable actions while facing bad situations, leading to better generalization to the agent’s new environments,” the researchers wrote.

AI agent training inspired by D&D

They take their cues from tabletop role-playing games dungeons And Dragons, The researchers created characters and scripts for the agent to follow and challenges. And yes, there is a Dungeon Master (DM).

They divided the data construction for AgentRefine into three areas: script generation, path generation, and validation.

In script generation, the model creates a script or guide that contains information about the environment, tasks, and actions that people can take. (Researchers tested AgentRefine using Llama-3-8B-Instruct, Llama-3-70B-Instruct, Mistral-7B-Instruct-v0.3, GPT-4o-mini, and GPT-4o)

The model then creates agent data that contains errors and acts as both DM and player during the path phase. It evaluates the actions it can take and then sees if they contain errors. The final stage, verification, examines the text and path, allowing the clients he coaches to self-correct.

Better and more diverse mission capabilities

The researchers found that agents trained using the AgentRefine method and dataset performed better on various tasks and adapted to new scenarios. These agents correct themselves further to redirect their actions and decision-making to avoid errors, and become more powerful in the process.

In particular, AgentRefine has improved the performance of all models to work on pending tasks.

Companies must make agents more adaptable to tasks so that they are not repeating what they have learned just so they can become better decision makers. Coordinating agents not only “live traffic” for multiple agents, but also determines whether agents have completed tasks based on user requests.

OpenAIQ3 Offers “Program Synthesis” Which can improve task adaptability. other coordination and training frameworks, Like Magnetic One from Microsoftdefines procedures for supervising agents to know when tasks are transferred to different agents.

Daily insights into business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from organizational transformations to hands-on deployments, so you can share insights to maximize ROI.

Read our privacy policy

Thanks for subscribing. Check more VB newsletters here.

An error occurred.