Join our daily and weekly newsletters for the latest updates and exclusive content on our industry-leading AI coverage. He learns more
Measuring inference time is one of the Big topics for AI in 2025And artificial intelligence laboratories attack it from different angles. In its latest research paper, Google DeepMind introduces the concept of “Evolution of the minda technique that improves the responses of large language models (LLMs) to planning and reasoning tasks.
Reasoning time techniques attempt to improve the performance of Master of Management students by allowing them to “think” more when generating their answers. In practice, this means that instead of generating its answer all at once, the model is allowed to create several answers, review and correct its answers, and explore different ways to solve the problem.
Evolution of LLM responses
The development of the mind depends on two main components: research and genetic algorithms. Search algorithms are a A common element in many inference time measurement techniques. It allows LLM students to find the best logical path to the optimal solution. Genetic algorithms are inspired by natural selection. They create and develop a set of candidate solutions to optimize the goal, often referred to as the “fitness function.”

The evolution of the mind begins with the creation of a set of candidate solutions expressed in natural language. Solutions are generated by an LLM who is given a description of the problem as well as useful information and instructions. The LLM then evaluates each candidate and improves it if it does not meet the solution criteria.
The algorithm then selects the original solutions for the next generation of solutions by sampling the existing population, with a higher chance of selecting the highest quality solutions. It then generates new solutions through crossover (selecting original pairs and combining their elements to create a new solution) and mutation (making random changes to the newly generated solutions). It reuses the evaluation method to improve new solutions.
The cycle of evaluation, selection, and recombination continues until the algorithm reaches the optimal solution or exhausts a predetermined number of iterations.

One of the important parts of the development of the mind is the evaluation function. Evaluators of inference time measurement techniques often require formalizing the problem from natural language into a structured symbolic representation that can be processed by a solution program. Formalizing a problem can require significant domain expertise and a deep understanding of the problem to identify all the essential elements that need to be represented symbolically and how they relate to each other, which limits its applicability.
In Mind Evolution, the fitness function is designed to work with natural language planning tasks where solutions are expressed in natural language. This allows the system to avoid formalizing problems, as long as a software solution evaluator is available. It also provides textual feedback in addition to the numerical score, allowing the LLM to understand specific problems and make targeted improvements.
“We focus on cutting-edge solutions in natural language spaces rather than formal spaces. This eliminates the requirement to formalize tasks, which requires significant effort and specialized knowledge for each task instance.
Mind Evolution also uses an “island” approach to ensure that it explores a variety of solutions. At each stage, the algorithm creates separate sets of solutions that evolve within itself. It then “migrates” the optimal solutions from one group to another to combine and create new solutions.
The development of the mind in planning tasks
The researchers tested brain development against baselines such as one pass, where the model generates only one answer; best-of-N, where the model generates multiple answers and selects the best one; and Sequential Reviews+, a review technique where 10 candidate solutions are proposed independently and then reviewed separately for 80 cycles. Sequential Revisions+ is the closest to Mind Evolution, although it does not have the genetic algorithm component to combine the best parts of the discovered solution. For reference, it also includes an additional one-pass baseline that is used OpenAI o1-Preview.

The researchers conducted most of the tests in a fast and affordable manner Gemini 1.5 flash. They also explored a two-stage approach, where Gemini 1.5 Pro The form is used when the Flash form cannot handle the problem. This two-stage approach provides better cost-effectiveness than using a professional model in every problem case.
The researchers tested brain development on several natural language planning benchmarks for tasks such as planning trips and meetings. Previous research shows that LLMs cannot achieve good performance on these tasks without the help of formal solvers.
For example, Gemini 1.5 Flash and o1-preview achieve a success rate of only 5.6% and 11.7% on TravelPlanner, a benchmark that simulates trip plan organization based on user preferences and constraints expressed in natural language. Even through the Best-of-N’s exploitation of over 800 independently generated responses, Gemini 1.5 Flash achieved only 55.6% success on TravelPlanner.

In all of their tests, Mind Evolution outperformed the baselines by a wide margin, especially as mission difficulty increased.
For example, Mind Evolution has a 95% success rate on TravelPlanner. For the trip planning benchmark, which involves creating an itinerary of cities to visit with the number of days in each, Mind Evolution achieved 94.1% in test cases while other methods reached a maximum success rate of 77%. Interestingly, the gap between Mind Evolution and other technologies increases as the number of cities grows, indicating their ability to handle more complex planning tasks. Through the two-stage process, Mind Evolution has achieved near-perfect success rates across all criteria.
Mind Evolution has also proven to be a cost-effective approach to solving natural language mapping problems, using a fraction of the number of symbols used by Sequential-Revision+, the only other technique that comes close to its performance.
“Overall, these results show a clear advantage of an evolutionary strategy that combines broad search, through random exploration, with deep search that leverages LLM to improve solutions,” the researchers wrote.
https://venturebeat.com/wp-content/uploads/2023/07/cfr0z3n_vector_art_of_cybernetic_DNA_strands_with_a_padlock_ico_aaa71acb-75ba-41fc-9107-1c876666042a.png?w=1024?w=1200&strip=all
Source link