Join our daily and weekly newsletters for the latest updates and exclusive content on our industry-leading AI coverage. He learns more
IBM claims its place at the top of the open source AI leaderboard with the new Granite 3.1 series released today.
Granite 3.1 Large linguistic models (LLMs) offer enterprise users extended context length of up to 128K tokens, new embedding models, integrated hallucination detection, and improved performance. According to IBM, the new Granite 8B Instruct model outperforms open source competitors of similar size including Meta Llama 3.1, Qwen 2.5, and Google Gemma 2. IBM has ranked its models across a series of academic benchmarks included in the OpenLLM Leaderboard.
The new models are part of IBM’s accelerated release cadence Granite Open source models. Granite 3.0 It was just released in October. At the time, IBM claimed to have a $2 billion book of business related to generative AI. With the Granite 3.1 update, IBM is focusing on packing more capabilities into smaller models. The basic idea is that smaller models are easier for businesses to operate and more cost effective to operate.
“We’ve also boosted all the numbers — the performance of almost everything has improved across the board,” David Cox, vice president of AI models at IBM Research, told VentureBeat. “We use Granite in a lot of different use cases, we use it internally at IBM for our products, we use it for consulting, we make it available to our customers and we release it as open source, so we have to be kind of good at everything.”
Why performance and smaller models matter for enterprise AI
There are any number of ways an organization can evaluate LLM performance using criteria.
The direction IBM is taking is to run models through a series of academic and real-world tests. Cox stressed that IBM has tested and trained its models to optimize them for enterprise use cases. Performance is not just an abstract measure of speed; It is a somewhat more accurate measure of efficiency.
One aspect of efficiency that IBM aims to move forward with is helping users spend less time getting the desired results.
“You should spend less time fiddling with claims,” Cox said. “So, the stronger the model is in an area, the less time you spend on engineering claims.”
Efficiency also relates to model size. The larger the model, the more computing and GPU resources it typically requires, which also means more cost.
“When people do a minimally viable prototype, they often jump to very large models, so you might go to a 70-billion-parameter model or a 405-billion-parameter model to build your prototype,” Cox said. “But the reality is that many of these projects are not economical, so the other thing we’re trying to do is increase the carrying capacity as much as possible in the smallest package possible.”
Context matters to enterprise agent AI
Aside from promising improved performance and efficiency, IBM has significantly extended Granite’s context length.
With the initial release of Granite 3.0, context length was limited to 4K. In Granite 3.1, IBM expanded this to 128 KB, allowing much longer documents to be processed. Extended context is an important upgrade for enterprise AI users, whether for retrieval augmented generation (RAG) or agent AI.
Agent AI systems and AI agents often need to process and reason about longer sequences of information, such as larger documents, traces of history, or extended conversations. The increased context length of 128 KB allows these agent AI systems to access more contextual information, enabling them to better understand and respond to complex queries or tasks.
IBM is also releasing a series of embedding models to help speed up the process of converting data to vectors. The Granite-Embedding-30M-English model can achieve a performance of 0.16 seconds per query, which IBM claims is faster than competing options including North Pole Snowflake.
How IBM improved Granite 3.1 to serve enterprise AI needs
So how did IBM improve its performance for Granite 3.1? Cox explained that it was not something specific, but rather a series of practical and technical innovations.
He said that IBM has developed increasingly advanced and multi-stage training lines. This has allowed the company to extract more performance from the models. Also, data is an important part of any LLM training. Instead of focusing solely on increasing the amount of training data, IBM focused heavily on improving the quality of the data used to train Granite models.
“It’s not a quantitative game,” Cox said. “It’s not like we’re going to go out and get 10 times more data, and this will magically make the models better.”
Reduce hallucinations directly in the model
One common approach to reduce the risk of hallucinations and false outputs in LLMs is the use of handrails. They are usually published as external features along with LLM.
With Granite 3.1, IBM is integrating hallucination protection directly into the model. Granite Guardian 3.1 8B and 2B models now include the ability to detect hallucinations summoning function.
“The model can natively create its own guardrails, which can provide different opportunities for developers to capture things,” Cox said.
He explained that performing hallucination detection in the model itself improves the entire process. Endogenous detection means fewer inference calls, making the model more efficient and accurate.
How organizations can use Granite 3.1 today, and what’s next
All new Granite models are now freely available as open source for enterprise users. The models are also available through IBM’s Watsonx Enterprise AI service and will be integrated into IBM’s commercial products.
The company plans to maintain a strong pace of updating granite models. Looking to the future, the plan for Granite 3.2 is to add multimedia functionality which will debut in early 2025.
“You’ll see us over the next few releases, adding more of these kinds of different, distinct features, leading up to the things we’ll be announcing at the IBM Think conference next year,” Cox said.
https://venturebeat.com/wp-content/uploads/2024/12/IBM-crown-smk.jpg?w=1024?w=1200&strip=all
Source link