Join our daily and weekly newsletters for the latest updates and exclusive content on our industry-leading AI coverage. He learns more
Generative AI has become an essential part of the infrastructure in many industries, and healthcare is no exception. However, as organizations like GlaxoSmithKline Push your limits Generative artificial intelligence However, they face significant challenges – especially when it comes to reliability. HallucinationsOr when AI models generate incorrect or fabricated information, it is a persistent problem in high-stakes applications such as drug discovery and healthcare. For GSK, addressing these challenges requires leveraging scale-out compute at test-time to improve general AI systems. Here’s how they do it.
The problem of hallucinations in obstetric health care
Healthcare applications require a very high level of accuracy and reliability. Mistakes aren’t just annoying; It can have life-changing consequences. This makes hallucinations in large language models (LLMs) a critical problem for companies like GSK, where general AI is being applied to tasks such as scientific literature review, genetic analysis, and drug discovery.
To mitigate hallucinations, GSK uses advanced inference time strategies, including self-reflection mechanisms, multi-model sampling, and iterative output evaluation. According to Kim Branson, senior vice president of AI and Machine Learning (ML) at GSK, these technologies help ensure customers are “robust and reliable,” while enabling scientists to generate actionable insights more quickly.
Benefit from computing benchmarking at test time
The test time calculation scale indicates the ability to Increased computational resources During the inference phase of artificial intelligence systems. This allows more complex operations, such as iterative output optimization or multi-model aggregation, which are essential to reduce hallucinations and improve model performance.
Branson emphasized the transformative role of scaling GSK’s AI efforts, noting, “We’re all about increasing iteration cycles at GSK – how we think faster.” By using strategies such as self-reflection and ensemble modeling, GSK can leverage these additional computing cycles to produce accurate and reliable results.
Branson also touched on the broader trend of the industry, saying: “You’re seeing this war happening with how much can I offer, the cost per token, the time per token. This allows people to offer these different algorithmic strategies that weren’t technically possible before. This will also drive this type of dissemination and adoption of agents.
Strategies to reduce hallucinations
GlaxoSmithKline has identified hallucinations as a critical challenge in… Artificial general intelligence for healthcare. The company uses two main strategies that require additional computational resources during inference. Applying more comprehensive processing steps ensures that each answer is checked for accuracy and consistency before being delivered in clinical or research settings, where reliability is critical.
Self-reflection and iterative review of outputs
One primary method is self-reflection, where LLM students critique or modify their own responses to improve quality. The model “thinks step by step,” analyzing its initial outputs, identifying weak points and revising answers as needed. GSK’s literature search tool exemplifies this: it collects data from internal repositories and MBA memory, then re-evaluates its findings through self-criticism to uncover inconsistencies.
This iterative process leads to clearer and more detailed final answers. Branson stressed the value of self-criticism, saying: “If you can only do one thing, do it.” Optimizing its logic before delivering results allows the system to produce insights that meet stringent healthcare standards.
Multi-model sampling
GSK’s second strategy relies on multiple LLMs or different configurations of a single model to verify the output. In practice, the system may run the same query at different temperature settings to generate diverse answers, use fine-grained versions of the same model that specialize in particular domains, or call entirely separate models trained on distinct data sets.
Comparing and contrasting these outputs helps confirm which conclusions are most consistent or convergent. “You can get this effect by having different orthogonal approaches to arriving at the same result,” Branson said. Although this approach requires more computational power, it reduces hallucinations and enhances confidence in the final answer – an essential benefit in high-stakes healthcare environments.
Wars of inference
GSK’s strategies rely on infrastructure that can handle significantly heavier computational loads. In what Branson calls “inference wars,” AI infrastructure companies – e.g The brainand Groq and SambaNova – competing to deliver hardware breakthroughs that improve token throughput, reduce latency, and reduce costs per token.
Specialized chips and architectures enable complex inference procedures, including multi-model sampling and iterative self-reflection, at scale. For example, Cerebras technology processes thousands of tokens per second, allowing advanced technologies to work in real-world scenarios. “The results of these innovations directly impact how obstetric models can be effectively deployed in healthcare,” Branson noted.
When hardware keeps up with software requirements, solutions emerge to maintain accuracy and efficiency.
There are still challenges
Even with these advances, scaling up computing resources presents obstacles. Longer inference times can slow down workflow, especially if clinicians or researchers need quick results. High computing usage also leads to high costs, requiring careful management of resources. However, GSK considers these trade-offs necessary to obtain stronger reliability and richer functionality.
“As we enable more tools in the agent ecosystem, the system becomes more useful to people, and you end up with increased computing usage,” Branson noted. Balancing performance, costs and system capabilities allows GSK to maintain a pragmatic and forward-looking strategy.
What’s next?
GSK plans to continue improving its AI healthcare solutions with scaling up test time calculation as a top priority. The combination of self-reflection, multi-model sampling, and robust infrastructure helps ensure that generative models meet the stringent requirements of clinical environments.
This approach also serves as a roadmap for other organizations, showing how to reconcile accuracy, efficiency, and scalability. Maintaining leadership in computational innovations and cutting-edge inference techniques not only addresses current challenges, but also lays the foundation for breakthroughs in drug discovery, patient care, and beyond.
https://venturebeat.com/wp-content/uploads/2025/01/a-stunning-sci-fi-illustration-of-a-futu_Ti_g4n2BRUmWlxEeqduNnw_cPuhW5SGRmq6qccxYtRY0Q-transformed.jpeg?w=1024?w=1200&strip=all
Source link