OpenAI’s Operator agent helped me move, but I had to help it, too
OpenAI gave me one week to test its new AI agent, Operator, a system that can independently do tasks for you on the internet.
Operator is the closest thing I’ve seen to the tech industry’s vision of AI agents — systems that can automate the boring parts of life, freeing us up to do the things we really love. However, judging from my experience with OpenAI’s agent, truly “autonomous” AI systems are still just out of reach.
OpenAI trained a new model to power Operator, which combines the visual understanding of GPT-4o with the reasoning capabilities of o1.
That model seems to work well for basic tasks; I watched Operator click buttons, navigate menus on websites, and fill out forms. The AI was occasionally successful at independently taking actions, and it works much faster than web-based agents I’ve seen from Anthropic and Google.
But during my trial, I found myself assisting OpenAI’s agent more than I’d like. It felt like I was coaching Operator through each problem, whereas I wanted to push certain tasks off my plate altogether.
Too often during my test, I had to answer several questions, grant permissions, fill out personal information, and help the agent when it got stuck.
In car terms, Operator is like driving a car with cruise control – occasionally taking your foot off the pedals and letting the car drive itself – but it’s far from full-blown autopilot.
In fact, OpenAI says Operator’s frequent pauses are by design.
The AI powering Operator, much like the AI powering chatbots like OpenAI’s ChatGPT, can’t reliably work independently for long periods of time, and it’s prone to the same sort of hallucinating. Because of that, OpenAI doesn’t want to give the system too much decision-making power or sensitive user information. Maybe that’s a safe choice by OpenAI, but it reduces Operator’s practicality.
That said, OpenAI’s first agent is an impressive proof of concept — and interface — for an AI that can use the front end of any website. But to create truly independent AI systems, tech companies will need to build more reliable AI models that don’t require this much steering.
Table of Contents
A little too ‘hands on’
My Operator trial coincided with the week I was moving apartments, so I had OpenAI’s agent help with moving logistics.
I asked Operator to help me buy a new parking permit. OpenAI’s agent told me, “Sure,” then opened a window into its browser on my PC’s screen.
Operator then conducted a search for a San Francisco parking permit in the browser, took me to the correct city website, and even the right page.
Operator still lets you use the rest of your computer while it’s working, something that can’t be said for Google’s Project Mariner. This is because OpenAI’s agent isn’t really working on the computer, but rather, off in the cloud somewhere.
For my parking permit, I had to grant Operator permission to start different processes a few too many times. It also stopped to ask me to fill out forms with personal information – such as my name, phone number, and email address. At times, Operator also got lost, forcing me to take control of the browser and get the agent back on track.
In another test, I asked Operator to make me a reservation at a Greek restaurant. To its credit, Operator found me a nice place in my area with reasonable prices. But I had to answer more than half a dozen questions throughout the flow.
If you have to intervene six or more times just to book a reservation through an AI agent, at what point is it easier to just do it yourself? That’s a question I asked myself a lot while testing Operator.
Agent-as-a-platform
In a few of my tests, I ran into websites that blocked Operator for whatever reason. For example, I tried booking an electrician using TaskRabbit, but OpenAI’s agent told me that it ran into an error, and asked if it could use an alternative service instead. Expedia, Reddit, and YouTube also blocked the AI agent from accessing their platforms.
However, other services are embracing Operator with open arms. Instacart, Uber, and eBay collaborated with OpenAI for the launch of Operator, allowing the agent to navigate their websites on behalf of humans.
These businesses are preparing for a future where a subset of user interactions are facilitated by an AI agent.
“Customers are using Instacart through a variety of different entry points,” said Daniel Danker, chief product officer at Instacart, in an interview with TechCrunch. “We see Operator as, potentially, another one of those entry points.”
Letting OpenAI’s agent use Instacart’s website on behalf of a person seems like it would separate Instacart from its customers. However, Danker says Instacart wants to meet customers wherever they are.
“We really are bullish about our belief, similar to OpenAI, that agentic systems will have a major impact on how consumers interact with digital properties,” said eBay’s chief AI officer, Nitzan Mekel-Bobrov, in an interview with TechCrunch.
Even if AI agents rise in popularity, Mekel-Bobrov says he expects users will always come to eBay’s website, noting that “online destinations are not going anywhere.”
Trust issues
I had some issues trusting Operator after it hallucinated a few times, and nearly cost me several hundreds dollars.
For instance, I asked the agent to find me a parking garage near my new apartment. It ended up suggesting two garages that it said would take just a few minutes to walk to.
Besides being way out of my price range, the garages were actually really far from my apartment. One was a 20-minute walk away, and the other was a 30-minute walk. Turns out, Operator had put in the wrong address.
This is exactly why OpenAI doesn’t give its agent your credit card number, passwords, or access to email. If OpenAI didn’t let me intervene here, Operator would’ve have wasted hundreds of dollars on a parking spot I didn’t need.
Hallucinations like this are a key roadblock to actually useful autonomous agents – ones that can take bothersome tasks off your plate. No one will trust agents if they’re prone to making basic mistakes, especially mistakes with real-world consequences.
With Operator, OpenAI seems to have built some impressive tools to let AI systems browse the web. But these tools won’t amount to much until the underpinning AI can reliably do what users ask it to do. Until then, humans will be stuck assisting agents — not the other way around. And that kind of defeats the point.