Minimax-M1 is a new open source model with 1m code context

Photo of author

By [email protected]


Join the event that the leaders of the institutions have been trusted for nearly two decades. VB Transform combines people who build AI’s strategy for real institutions. Learn more


The Chinese AI Minimax start, and perhaps known in the West because of its realistic AI Hailuo Video FormIt released the latest large linguistic model, Minimax-m1 – In great news for institutions and developers, it is completely Open source under APache 2.0 licenseIn the sense that companies can take them and use them for commercial applications and modify them according to their desire without restriction or payment.

M1 is an open -weight display that determines new criteria in long thinking in context, use the agent tool, and performing an effective account. It is available today on the social intelligence code sharing community Embroidery and Microsoft Conduari Community GithubThe first version of the company called “Minimaxweek” from its social account on X – with more product ads expected.

Minimax-M1 distinguishes itself with a window of context of one million input symbols and up to 80,000 icons in the output, which puts it as one of the most expansion models available for long thinking tasks in the context.

The “context window” indicates in the large language models (LLMS) to the maximum number of symbols that the model can treat simultaneously – including both inputs and outputs. Symbols are the basic units of the text, which may include complete words, parts of words, numbering marks, or code symbols. These symbols are converted into digital carriers used by the model to represent the meaning through parameters (weights and biases). They are, in essence, the mother tongue llm.

For comparison, Openai’s GPT-4O It contains a context window of only 128,000 symbols – which is enough to exchange About the novel’s information Between the user and the model in one reaction back and forth. At a million icons, the minimax-m1 can exchange a small exchange group Or book series information. Google Gemini 2.5 Pro offers the upper limit for the distinctive symbol of 1 millionAlso, with a window reported 2 million in business.

But the M1 has another trick of her possession: it has been trained using reinforcement learning in innovative, resourceful, highly efficient technology. The model is trained using a hybrid sewn (MEE) combination with lightning attention mechanism designed to reduce the costs of inference.

According to the technical report, Minimax-M1 consumes only 25 % of the required floating point (FLOPS) Deepsek R1 A generation of 100,000 symbols.

Architecture and variables

The model comes in two types-minimax-m1-40K and minimax-m1-80K-indicating “thinking budgets” or output lengths.

Architecture is designed on the former Minimax-Text-01 company and includes 456 billion teachers, with 45.9 billion activations for each symbol.

A prominent feature of the version is the cost of training the form. Minimax reports that the M1 model was trained using a large -scale reinforcement learning (RL) efficiently seen in this field, at a total cost of $ 534,700.

This efficiency is attributed to a dedicated RL algorithm called CISPO, which cuts the weights of importance for samples instead of symbolic updates, and a hybrid attention design that helps to simplify scaling.

This is an amazing “cheap” amount for Frontier LLM, where Deepseek trained the successful R1 thinking model in A. The cost cost from 5 to 6 million dollarsWhile the cost of training for Openais’ GPT-4- It is a model for more than two years now It is said to exceed 100 million dollars. This cost comes from both the price of graphics processing units (GPU), which are largely parallel computing devices that were mainly manufactured by companies such as NVIDIA, which can cost 20,000-30,000 dollars or more per unit, and from the energy required to operate these chips continuously in large data centers.

Standard performance

Minimax-M1 was evaluated through a series of in place that test advanced thinking, software engineering and tool use capabilities.

In AIME 2024, Mathematics Competitions, the M1-80K 86.0 % is taught. It also provides strong performance in the tasks of coding and long context, and achieving:

  • 65.0 % on LiveCodebench
  • 56.0 % on SWE seat
  • 62.8 % on the seat
  • 73.4 % on Openai MRCR (version 4-era)

These results put minimal minax-m1 before other open weights such as Deepseek-R1 and QWEN3-23B-A22B On several complex tasks.

While closed weight models such as Openai’s O3 and Gemini 2.5 Pro still topped some criteria, Minimax-M1 greatly narrows the performance gap while continuing to reach it freely under the Apache-2.0 license.

For publication, Minimax VLLM recommends as the background interface, citing the improvement of large model work burdens, memory efficiency, and payment of payment request. The company also provides publishing options using the Transformers Library.

The Minimax-M1 includes the potential for setting up structured jobs and is filled with API Chatbot, which includes online search, video and pictures generation, speech synthesis, and audio cloning tools. These features aim to support the wider behavior in agents in realistic applications.

The effects of technical decision makers and institution buyers

The open access to Minimax-M1, the capabilities of long context, and the efficiency calculation of many frequent challenges for technical professionals responsible for managing artificial intelligence systems on a large scale.

For engineering, the LEGINERING is responsible for the LLMS full life cycle-such as improving the performance of the model and publishing in the framework of narrow time schedules-MINMAX-M1 provides a lower operating cost file with support for advanced thinking tasks. The long context window can significantly reduce the pre -processing efforts of the institution’s documents or registry data that extend tens or hundreds of thousands of symbols.

For those who manage artificial intelligence synchronous pipelines, the ability to adjust and publish Minimax-M1 using fixed tools such as VLM or Transformers supports the easiest integration in the current infrastructure. The hybrid enrollment structure may help simplify the scaling strategies, and the competitive performance of the model provides multi -steps and engineering standards for software high as a high level of settlement or the agent -based systems.

From the perspective of the data platform, the teams responsible for maintaining effective and developed infrastructure can benefit from the support of the M1 to connect the organized functions and its compatibility with the automatic pipelines. Its open source provides the difference to customize performance with its staple without locking the seller.

Safety threads may also find value in assessing M1 capabilities for local safe publishing for a highly unable to transfer sensitive data to third -party end points.

Combined, Minimax-M1 offers a flexible option for institutions that look forward to an experience or expand the scope of advanced artificial intelligence possibilities while managing costs, survival within operational limits, and avoiding special restrictions.

The version indicates the constant focus of Minimax on practical and developmental artificial intelligence models. By combining open access, advanced architecture and account efficiency, the minimum M1 may be a key model for developers who build the following generation applications of the depth of thinking and understanding long -term inputs.

We will track other Minimax versions throughout the week. Stay followers!




Source link

Leave a Comment