AI bot evolution: Agents will be the next big thing in artificial intelligence

Companies are working on more complex agents that could run through multiple applications: create an itinerary and book tickets, accommodation, restaurants and taxis, for instance.
Companies are working on more complex agents that could run through multiple applications: create an itinerary and book tickets, accommodation, restaurants and taxis, for instance.

Summary

  • As we devolve agency to AI agents for various tasks, we mustn’t blow the lid off a Pandora’s Box. Tech companies want to keep humans in control of AI actions, but such tech can also give agents autonomy.

Bill Gates wrote a prescient blog recently on how agents will be the next big thing in software (bit.ly/3tSMNkB). In his inimitable style, he explained: “To do any task on a computer, you must tell your device which app to use. You can use Microsoft Word and Google Docs to draft a business proposal, but they can’t help you send an email, share a selfie, analyze data, schedule a party, or buy movie tickets. In the next five years, this will change completely. You won’t have to use different apps for different tasks. You’ll simply tell your device, in everyday language, what you want to do. This type of software—something that responds to natural language and can accomplish many different tasks based on its knowledge of the user—is called an agent." 

He went on to predict how they will upend the software industry and replace apps to become new platforms we use every day. Big Tech companies and startups have heeded his advice. The first glimpse of an agent-led world came with OpenAI’s GPT Store. It has more than three million GPTs; these proto-agents are a peek into how Agent Stores may replace App Stores. Microsoft, OpenAI and Google are scrambling to develop software that can do complex tasks by itself, with minimal guidance from you. Thus, the name agents—they have ‘agency.’ Aaron Holmes writes in The Information (bit.ly/3U0dpJu) about how Microsoft is building software that can create, send and track an invoice based on order history. Another one can “detect a large product order a business customer hasn’t filled, draft an invoice, and ask the business whether it wants to send that invoice to the client who placed the order. From there, the agent could automatically track the customer’s response and payment and log it in the company’s system." These agents are powered by OpenAI’s GPT4 and are the next iteration of the Copilots that Microsoft has launched. OpenAI is also busy building agents that could work on different applications at the same time, moving data from a spreadsheet to a PowerPoint slide, for example. Companies are working on more complex agents that could run through multiple applications: create an itinerary and book tickets, accommodation, restaurants and taxis, for instance. 

Planning a holiday is an onerous task; you need to work through a gigantic set of choices and apps, taking hours and days of your time. An empowered agent would know your preferences from your history and data and could do this within minutes. Another startup that Holmes writes about is Adept, co-founded by ex-Googler Anmol Gulati. Adept’s AI was built using videos of people actually working on their PCs to create an Excel spreadsheet or a PowerPoint deck. Trained on these human activities, Adept is building an ‘AI Teammate’ which can do these tasks for you. Interestingly, the first deployment of agents would probably be by their creators—software developers themselves. Millions of them are already using Microsoft’s GitHub Copilot, which helps them write code better and faster. Agents built into them could listen to a problem that a developer faces, suggest ways to address it, and then write, run and test the code.

Agents would also create the next class of devices for the post-smartphone era, like Rabbit R1 and AI Pin, both of which were unveiled recently. They use GenAI models as their operating system (OS), natural spoken language as their user interface (UI), and, importantly, have rudimentary agents instead of apps. So, for example, you can call an Uber, order food on Doordash or play Spotify by just telling your Rabbit R1 to do so. A Large Action Model (LAM), which is built on LLMs, functions as Rabbit’s OS to make it your personal voice assistant. The LAM OS uses its long-term memory of you to translate your requests into actionable steps and responses; it comprehends what apps and services you use daily. The LAM can learn to see and act in the world like humans do. It is still early days, but the app-led devices of today will likely give way to new agent-led devices.

While all this is super-exciting and novel, there are very thorny ethical concerns. So far, in this evolving dance between humans and AI, humans have held on to agency, the power to do stuff. That is why Microsoft calls its software the Copilot—it is not on autopilot, doing stuff on its own, or the sole pilot, since the human must prevail. With agents, we devolve agency to AI—it potentially becomes an autopilot tool and could perhaps act as the pilot itself. Thus far, we have managed to keep the lid on this particular Pandora’s Box shut; with agents, we just might crack it open.

Catch all the Business News, Market News, Breaking News Events and Latest News Updates on Live Mint. Download The Mint News App to get Daily Market Updates.
more

MINT SPECIALS