Google offers a feature in the Gemini Application interface that the company claims will make the latest artificial intelligence models cheaper for third -party developers.
Google calls the “implicit temporary storage” feature and says it can offer 75 % of savings on the “frequent context” that has been passed to the models via the Gemini Application interface. It supports the Google Gemini 2.5 Pro and 2.5 flash models.
It is likely to be a welcome news for developers, such as the cost of using border models Continuing to Grow.
Temporary storage, a widely adopted practice in the artificial intelligence industry, which is repeatedly reused or data has been already calculated from models to reduce computing and cost requirements. For example, cache can store answers to the questions that users often ask on a model, eliminating the need for the model to re -create answers to the same request.
Google previously offered the model storage, but only sincere Rapid storage, this means that Devs was to determine the highest frequency claims. Although the cost savings were supposed to be guaranteed, the fragile rapid storage usually includes a lot of manual work.
Some of the developers were not happy with how to implement the frank storage of Google with Gemini 2.5 Pro, who said it might cause amazing API bills amazingly. The complaints reached a temperature last week, The Gemini team pushed to apologize He pledged to make changes.
Unlike explicit temporary storage, implicit temporary storage is automatic. It is enabled by the Gemini 2.5 models, it passes the costs if API Gemini requests the cache.
TECHRUNCH event
Berkeley, California
|
June 5
(W) The chicken sends a request to a Gemini 2.5 model, if the request is involved in a common prefix as one of the previous requests, it is qualified to obtain the cache, “I explained Google in A. Blog post. “We will pass the cost saving you dynamically.”
The minimum number of the distinctive symbol of storage storage is 1,024 for 2.5 Flash and 2,048 for 2.5 Pro,, According to Google developer documentsIt is not a terrible amount, which means that it should not take much splendor. Symbols are raw parts of the data models that you work with, with a thousand symbols equivalent to about 750 words.
Given that the recent Google claims to save the costs of Ran Foul storage, there are some areas of Jupiter in this new feature. For one of them, Google recommends that the developers retain frequent context at the beginning of requests to increase the chances of storage storage storage. The company says that the context that may change from the request to the request must be finally attached.
For another, Google has not provided any third -party verification that the new storage system will provide the promised automatic savings. Therefore, we will have to see what the first followers say.
https://techcrunch.com/wp-content/uploads/2025/03/GettyImages-2169339854.jpg?resize=1200,857
Source link