How Harness is “harnessing” agentic AI to help improve enterprise incident response through automated data collection and playbooks

Photo of author

By [email protected]


Join our daily and weekly newsletters for the latest updates and exclusive content on our industry-leading AI coverage. He learns more


Incident response, the process of responding to system disruptions and slowdowns, is an important aspect of IT operations. It is also an activity that traditionally involves a lot of time-consuming manual processes.

This is the challenge Harness A new incident response service is being targeted. The technology enters early access today as a module on the company’s eponymous platform. Harness began its business in 2017 with an initial focus on continuous integration/continuous delivery (CI/CD) automation for DevOps. In the years that followed, the company expanded into… Software delivery platform With multiple units. In the fall of 2024 Harness broke into the company Amnesty International agent,Initially to help support software development.

The company is now expanding the same basic foundation of artificial intelligence Incident response. The new solution also leverages licensed capabilities originally developed by the development workflow vendor transfer. Tina Huang, co-founder of Transposit, along with several members of her team, joined Harness in September 2024.

The goal of Harness Incident Response is to accelerate the Mean Time to Resolution (MTTR) of an incident.

“When you think about what DevOps platforms have been so far, it’s been largely about helping you streamline these deployments,” Huang told VentureBeat. “I think a very natural place to go next is: ‘How can I control your deployments after they reach production?’”

How Harness enables autonomous incident response using agentic AI

At the heart of Harness’ Incident Response module is the company’s AI agent architecture, which was first introduced in September 2024.

Jyoti Bansal, CEO and co-founder of Harness, explained to VentureBeat that its AI agents are designed to provide autonomous assistance, going beyond simply alerting engineers to incidents. Traditional incident response technology uses an approach known as playbook. IT teams, often working with Site Reliability Engineers (SRE), define operating rules that define step-by-step processes for recovering from different types of service outages.

Instead of relying solely on pre-defined rules of the game, AI agents can suggest actions, identify potential root causes, and even create new rules of play on the fly.

“The agent workflow suggests actions that should be taken,” Bansal said.

Huang explained that AI agents perform multiple steps that are critical to helping organizations respond faster to incidents. Even before the rules of the game come into play, there is a certain amount of sorting that needs to happen, Bansal explained. General triage can, for example, identify which services are affected or identify upstream and downstream dependencies that will also be affected by the incident.

The Harness system has agents aware of and connected to multiple systems, and can automatically collect information, including information and discussion from Slack channels. This information can then help other agents alert humans and provide self-help.

While the system is highly automated, Huang emphasized that humans are still in the loop. But instead of alerting a human to a problem and then having to figure out whether there is a playbook — and if so, how to run it — the system recommends the treatment and the human only needs to approve it.

Incident response requires more than just technology

The Harness Incident Response module can be run on its own, meaning organizations don’t actually need to run any other Harness modules.

However, Bansal expects the combined offering — which can enable integration with many other workflows including DevOps or chaos engineering — to be beneficial. Chaos engineering is the process of introducing unexpected variables and events into an application to see how it responds. Harness has had a chaos engineering module as part of its platform since 2022.

As part of the incident response platform, the organization can conduct “fire drills” alongside the chaos engineering unit to test different scenarios, Huang explained.

“Accidents happen infrequently, and are often the unfortunate result of something you never discovered,” Huang said. “We want to enable a very proactive approach to incident response.”

How organizations will benefit from AI-based incident response

One Harness customer that uses the Incident Response Unit is… Tyler techniqueswhich develops software for the public sector.

The company uses the Harness platform for continuous deployment, cloud cost management, and feature flag development. Jeff Green, CTO at Tyler Technologies, explained that adding incident response can help solve the main challenge faced by Faces.

“Our primary challenge is to integrate all the data, metrics, and operational processes, and then connect them into one unified approach to managing incidents and automating our response,” he told VentureBeat. “Our portfolio includes more than 100 products built on different technologies using a wide range of development tools and platforms.”

The incident response capability will complement the existing operations Tyler Technologies already has with Harness. For example, the ability to associate deployments with incidents, or tag tags with incidents.

“We believe the AI ​​capabilities being integrated into the product will save a lot of time by helping us analyze the root cause, identify ways to mitigate or resolve incidents, and prevent incidents,” Green said. “Much of this work today is done by humans pulling data from multiple sources, scanning logs and application performance monitoring (APM) data and looking for patterns, all tasks for which AI is best suited.”

ROI for Agent AI for Incident Response

Another Harness client evaluating the Incident Response Module is Omar Al Watar, Senior DevOps Engineer at InStride.

Al Watar told VentureBeat that his company uses the Harness Continious Delivery module. He noted that when it comes to incident response, his organization faces two main challenges: preventive monitoring and identifying the root cause. He said Harness’s new incident response tool is interesting for his company, because it will help identify problems faster and automate repair suggestions.

“In terms of ROI, the most significant impact will be reduced downtime, as it directly impacts service level agreement compliance and customer satisfaction,” Al Watar said. “In addition, by automating aspects of incident response, the 11-person DevOps team can focus more on strategic projects and innovation rather than continuous troubleshooting.”



https://venturebeat.com/wp-content/uploads/2025/01/agentic-ai-for-incident-response-smk.jpg?w=1024?w=1200&strip=all
Source link

Leave a Comment