Meet the OpenAI Operator, an AI agent that navigates the web for you

Photo of author

By [email protected]


Join our daily and weekly newsletters for the latest updates and exclusive content on our industry-leading AI coverage. He learns more


OpenAI revealed Operatorthe first semi-autonomous AI agent, designed to “run” a web browser as someone would do it for them. The agent uses the cursor to point and click, type on its own, browse the web and perform actions on various websites, such as booking restaurant reservations through OpenTable and assembling orders on Instacart and DoorDash. This is rather than being limited to the ChatGPT interface or OpenAI’s API.

“This product is the beginning of our move into dealerships,” CEO and co-founder Sam Altman said in a live-streamed demo on the company’s YouTube channel today at 1 p.m. ET.

President and fellow founder of OpenAI Greg Brockman wrote on X: “2025 is the year of customers.”

The preview, now available to US subscribers of OpenAI’s ChatGPT Pro plan ($200 per month), aims to demonstrate the capabilities of agentic AI while gathering important feedback to improve its capabilities.

However, your web browser is not managed by the operator. Instead, you visit a new, separate website — Operator.com. chatgpt.com -And they face a quick input box similar to ChatGPT.

Typing a request into this box — “Find me tickets to tonight’s LA Lakers game” — will cause the operator to open a separate virtual browser running in the cloud on OpenAI’s servers. Then, the agent can perform tasks like filling out forms, managing online reservations, even booking tickets for sporting events and concerts, and navigating other common workflows. The user watches the cursor move on its own on the cloud browser in real time. If the agent encounters a problem, it will stop and send a message to the user via text output, similar to ChatGPT responses.

Also, below the default browser, the user will see suggestions for actions the operator can take on their behalf.

However, the user can control the vehicle at any time, similar to semi-autonomous driving systems in modern cars. The operator also asks the user to enter their payment credentials when they reach the purchase screen on another website. Finally, users can save specific workflows they want to use going forward and start them again.

The player is powered by what OpenAI calls Computer Usage Agent (CUA) technology, a new version of GPT-4o that is specifically trained to use computers.

Bridging the gap between artificial intelligence and graphical user interfaces

The operator stands out from other automation tools by simulating human interaction with graphical user interfaces (GUIs).

Instead of relying on specialized APIs, the system leverages screenshots for visual input and uses virtual mouse and keyboard actions to complete tasks.

The basic CUA model combines GPT-4o vision and reinforcement learning capabilities, enabling the agent to perceive, think, and act on the screen.

This approach allows the operator to handle various tasks, including e-commerce browsing, travel planning, and even repetitive tasks such as creating playlists or managing shopping lists. Notable criteria demonstrate its effectiveness:

87% success rate on WebVoyagerTesting of live site navigation

Success rate of 58.1% on WebArenawhich simulates real-world e-commerce and content management scenarios

But there is already tough competition: just yesterday, Chinese tech company ByteDance (TikTok’s parent company) has launched its own AI agent To control web browsers and perform actions on the user. On behalf of. Named UI-TARS, It’s completely open source and has impressive benchmark performance (although it doesn’t appear to have been directly compared to the same benchmarks). This means that the OpenAI driver would need to be much better or more reliable to justify the relatively high cost ($200 per month) of accessing it through ChatGPT Pro subscriptions.

It is already being tested in enterprise web navigation use cases

OpenAI partners with several companies to ensure the player meets real-world needs. Companies including Instacart, DoorDash and Etsy are already testing the technology for use cases ranging from grocery delivery to in-person shopping.

Priceline CEO Brett Keller has commented on its usefulness in travel planning, calling it “an important step in making travel more seamless and personalized.”

For public sector applications, the City of Stockton is exploring ways to use the operator to simplify civic engagement. Jamil Niazi, the city’s IT director, highlighted the ability of artificial intelligence to facilitate registration for services for residents.

However, there are limitations. Technical publication all I got an early preview, tested it over the past week, and found the following:

“One design feature of the player is that it doesn’t use your browser. Instead, it uses a browser in one of OpenAI’s data centers that you can view and interact with remotely. The upside of this design decision is that you can use the player anywhere, anytime — for example For example, on any mobile device.

“The downside is that many sites like Reddit already block AI agents from browsing, so the player can’t access them. In this search preview mode, the player is also blocked by OpenAI from accessing certain resource-intensive sites like Figma or proprietary sites To competitors like YouTube for performance or legal reasons.

Safety measures

Due to its ability to act on behalf of users, the launcher has been developed with robust security features:

User control: The operator requires confirmation for sensitive actions, such as making purchases or sending emails.

Watch mode: Ensures user oversight of important tasks, especially on sensitive sites such as email or financial platforms.

Prevent misuse: The system is trained to reject malicious requests and includes safeguards against adversarial attacks, such as malicious claims embedded in websites.

OpenAI has also integrated features to protect user privacy, including options to clear browsing data and opt out of data sharing to improve the model.

Upcoming enterprise edition

OpenAI envisions a broader role for the operator in both individual and enterprise settings. Over time, the company plans to expand access to Plus, Team, and Enterprise users, and eventually integrate the player into ChatGPT.

There are also plans to make the underlying CUA technology available via an application programming interface (API), enabling developers to create custom agents for computer use.

Despite its potential, the launcher remains a work in progress. OpenAI has been transparent about its limitations, such as difficulties with complex interfaces or unfamiliar workflows. Early user feedback will play a pivotal role in improving the system’s accuracy, reliability and safety.

As OpenAI improves the operator through real-world use, it seeks to transform AI from a passive tool into an active participant in the digital ecosystem. Whether it’s simplifying everyday tasks or innovating business workflows, OpenAI positions the operator as the next step in making AI accessible, practical, and secure.



https://venturebeat.com/wp-content/uploads/2025/01/robot-call.png?w=1024?w=1200&strip=all
Source link

Leave a Comment