OpenAI’s “Operator” is an AI agent that can take control of a web browser to automate tasks such as booking travel accommodations, making restaurant reservations and shopping online. Operator is powered by a Computer-Using Agent (CUA) model that combines the vision capabilities of the OpenAI’s GPT-4o model with reasoning abilities from their more advanced models. What this means is that an agent doesn’t need an API to access services, it can use buttons, navigate menus and presumably pass the “prove you are not a robot” puzzles just as people do.
(OpenAI reports that it is collaborating with companies such DoorDash, eBay, Instacart, Priceline, StubHub, Uber and others to ensure that the agents respect the terms of service agreements.)
As it stands, Operator requires human supervision for certain categories of task including, as I am sure you would expect, financial transactions, so that consumer currently need to take control to enter payment information, for example. With the advent of CUA, we can now see the practical evolution of full-blown agentic commerce in strategi timeframes. Agents will go online to obtain services, look for an agentic API and then, if no such APIs is found, simply access the web pages as a human customer does.