Want more intelligent visions of your inbox? Subscribe to our weekly newsletters to get what is concerned only for institutions AI, data and security leaders. Subscribe now
New technology from Zhejiang University and Ali Baba Group LLM factors give a dynamic memory, making it more efficient and effective in complex tasks. This technology is called MEMPIt provides agents “procedural memory” that is constantly updated when they gain experience, such as how people learn from practice.
MEMP creates a lifelong educational framework where agents do not have to start from zero point in each new task. Instead, it becomes gradually better and more efficient because it faces new positions in the real world environments, and it is a major requirement for automating reliable institutions.
The state of procedural memory in artificial intelligence agents
LLM agents carry a promise to automate multiple multi -step businesses. In practice, though, long horizon tasks can be fragile. The researchers note that unpredictable events such as network defects, user interface changes, or database conversion can hinder the entire process. For current agents, this means often starting every time, which can take a long and expensive time.
Meanwhile, many complex tasks, despite the superficial differences, share deep structural denominations. Instead of re -learning these patterns every time, the agent must be able to extract and reuse his experience from previous successes and failures, as researchers note. This requires specific “procedural memory”, which in humans is a long -term memory responsible for skills such as writing or biking, which becomes automatic with practice.
Artificial intelligence limits its limits
Power caps, high costs of the symbol, and inference delay are reshaped. Join our exclusive salon to discover how the big difference:
- Transforming energy into a strategic advantage
- Teaching effective reasoning for real productivity gains
- Opening the return on competitive investment with sustainable artificial intelligence systems
Securing your place to stay in the foreground: https://bit.ly/4mwngngo

The current agent systems often lack this ability. Their procedural knowledge is usually manually manually manufactured by developers, or stored in solid or guided molds in model parameters, which are expensive and slow in updating. Even the current frameworks that are competing with memory provides only rough abstracts and do not deal sufficiently how to build, indexing, correct and distribute skills in the end of the agent’s life cycle.
Thus, the researchers notice in Determine them“There is no initial way to determine the efficiency of the worker in which its procedural ammunition develops or to ensure that new experiences improve instead of erosion of performance.”
How MEMP works
MEMP is an unprocessable framework that treats procedural memory as a basic ingredient to be improved. It consists of three main stages working in an ongoing episode: building, recovering and modernizing memory.
Memories are designed from previous agent’s experiences, or “tracks”. The researchers explore the storage of these memories in two formats: step -by -step literal actions; Or distillation of these measures in high -level text abstracts. To retrieve, the agent in his memory is looking for the most relevant previous experience when giving a new task. Try the team different ways, such a vector search, to match the description of the new task with previous queries or extract keywords to find the best suitability.
The most important component is the update mechanism. MEMP offers many strategies to ensure the development of the agent’s memory. Since the agent completes more tasks, his memory can be simply updated by adding the new experience, liquidating only successful results, or more effectively, thinking about failures in correcting and reviewing the original memory.

This focuses on advanced dynamic memory that places MEMP within an increasing field of research aimed at making artificial intelligence agents more reliable for long -term tasks. Other efforts are equivalent, such as Mem0Which works to integrate the main information from long talks to organized facts and knowledge fees to ensure consistency. Similarly, A-MEM The agents enable the creation and connection of “memory notes” independently of their interactions, and to form a complex knowledge structure over time.
However, the co -author Runnan Fang highlights the decisive distinction between MEP and other frameworks.
“Mem0 and A-Mem excellent business … but they focus on remembering the prominent content inside Fang commented on one path or conversation, in essence, helping the agent to remember “what” happened. “On the contrary, it targets crossed procedural memory.” It focuses on “guiding” knowledge that can be circulated through similar tasks, preventing the worker from re -exploring from zero point each time.
“By distillation of successful work in the past into procedural usable devices, MEMP raises success rates and shortening steps,” Fang added. “Decally, we also offer an update mechanism so this procedural memory continues to improve – after all, the practice makes it perfect for factors as well.”
Overcoming the problem of “cold start”
Although the concept of learning from the previous paths is strong, it raises a practical question: How does the worker build his initial memory when there are no ideal examples of learning from it? The researchers deals with the problem of “cold start” with a practical approach.
Fang explained that Devs can first select a strong evaluation scale instead of ordering a perfect “golden” path. This scale, which can depend on the rules or even another LLM, records the quality of the agent’s performance. “Once this scale is in its place, we let modern models explore within the functioning of the agent and keep the paths that achieve the highest levels,” Fang said. This process quickly paved a preliminary set of useful memories, allowing a new agent to reach the speed without large -scale manual programming.
MEMP at work
To test the frame, the team carried out MEMP at the head of strong LLMS like GPT-4Oand Claude 3.5 Sonata and QWEN2.5Her evaluation of complex tasks such as homework in the Alfworld Index and search for information in Travelplanner. The results showed that building and retrieving procedural memory allowed the agent to distille and reuse his previous experience effectively.
During the test, not only the MEMP agents achieved higher success rates, but also more efficient. They canceled the uneducated exploration, experience and error, which leads to a significant decrease in both the number of steps and the symbolic consumption required to complete the task.

One of the most important results for institutions applications is that procedural memory is convertable. In one experiment, the procedural memory created by the strong GPT-4O was given a much smaller model, QWEN2.5-14B. The smaller model has witnessed a large payment of performance, improving its success rate and reducing the steps needed to complete the tasks.
According to Fang, this works because smaller models often deal with simple, single -step procedures but stumble when it comes to long -horizontal planning and thinking. The procedural memory is filled with the most effective model with this power gap. This indicates that knowledge can be obtained using a newer model, then it is published on smaller and more effective models without losing the benefits of that experience.
Towards really independent factors
By preparing factors with memory updating mechanisms, the MEMP framework allows them to build and refine their procedural knowledge constantly while working in a living environment. The researchers found that this gives the agent “a continuous and almost pace with the task.”
However, the way to complete self -rule requires overcoming another obstacle: many tasks in the real world, such as producing a research report, lacking a simple success signal. To improve continuously, the agent needs to know if he has done a good job. Fang says that the future lies in using LLMS themselves as judges.
“Today we often gather between strong models with hand -made bases to calculate the end levels,” he said. “This works, but the rules written manually are fragile and difficult to generalize.”
and LLM-AAS-LGH The accurate supervisory feedback needed can provide a self -correcting agent in complex and self -tasks. This would make the entire learning ring more developmental and powerful, which represents a critical step towards building flexible, adaptive and independent intelligence workers who are really necessary to automate advanced institutions.
https://venturebeat.com/wp-content/uploads/2025/08/llm-agent-memory.jpg?w=1024?w=1200&strip=all
Source link