A decisive AI breakthrough is about to transform the world
Summary
The technology driving ChatGPT is capable of so much more. What’s coming next will make talking bots look like mere distractions.The AI revolution is about to spread way beyond chatbots.
From new plastic-eating bacteria and new cancer cures to autonomous helper robots and self-driving cars, the generative-AI technology that gained prominence as the engine of ChatGPT is poised to change our lives in ways that make talking bots look like mere distractions.
While we tend to equate the current artificial-intelligence boom with computers that can write, talk, code and make pictures, most of those forms of expression are built on an underlying technology called a “transformer" that has far broader applications.
First announced in a 2017 paper from Google researchers, transformers are a kind of AI algorithm that lets computers understand the underlying structure of any heap of data—be it words, driving data, or the amino acids in a protein—so that it can generate its own similar output.
The transformer paved the way for OpenAI to launch ChatGPT two years ago, and a range of companies are now working on how to use the innovation in new ways, from Waymo and its robot taxis to a biology startup called EvolutionaryScale, whose AI systems are designing new protein molecules.
The applications of this breakthrough are so broad that in the seven years since the Google research was published, it has been cited in other scientific papers more than 140,000 times.
It’s hardly an exaggeration to say that this one collection of algorithms is the reason that Nvidia is now the most valuable company on earth, that data centers are popping up all over the U.S. and the world, driving up electricity consumption and rates, and that chief executives of AI companies are often—and perhaps mistakenly—asserting that human-level AI is just around the corner.
From text translation to universal learner
Humans have always acted on the conviction that the universe is full of underlying order—even if they debated whether the source of that order was divine. Modern AI is in a sense yet another validation of the idea that every scientist since Copernicus really was onto something.
Modern AI has long been good at recognizing patterns in information. But previous approaches put serious limits on what more it could do. With language, for example, most AI systems could only process words one at a time, and evaluate them only in the sequence they were read, which limited their ability to understand what those words meant.
The Google researchers who wrote that seminal 2017 paper were focused on the process of translating languages. They realized that an AI system that could digest all the words in a piece of writing, and put more weight on the meanings of some words than others—in other words, read in context—could make much better translations.
For example, in the sentence “I arrived at the bank after crossing the river," a transformer-based AI that knows the sentence ends in “river" instead of “road" can translate “bank" as a stretch of land, not a place to put your money.
In other words, transformers work by figuring out how every single piece of information the system takes in relates to every other piece of information it’s been fed, says Tim Dettmers, an AI research scientist at the nonprofit Allen Institute for Artificial Intelligence.
That level of contextual understanding enables transformer-based AI systems to not only recognize patterns, but predict what could plausibly come next—and thus generate their own new information. And that ability can extend to data other than words.
“In a sense, the models are discovering the latent structure of the data," says Alexander Rives, chief scientist of EvolutionaryScale, which he co-founded last year after working on AI for Meta Platforms, the parent company of Facebook.
EvolutionaryScale is training its AI on the published sequences of every protein the company’s researchers can get their hands on, and all that we know about them. Using that data, and with no assistance from human engineers, his AI is able to determine the relationship between a given sequence of molecular building blocks, and how the protein that it creates functions in the world.
Earlier research related to this topic, which was more focused on the structure of proteins rather than their function, is the reason that Google AI chief Demis Hassabis shared the 2024 Nobel Prize in chemistry. The system he and his team developed, called AlphaFold, is also based on transformers.
Already, EvolutionaryScale has created one proof-of-concept molecule. It’s a protein that functions like the one that makes jellyfish light up, but its AI-invented sequence is radically different than anything nature has yet to invent.
The company’s eventual goal is to enable all sorts of companies—from pharmaceutical makers producing new drugs to synthetic chemistry companies working on new enzymes—to come up with substances that would be impossible without their technology. That could include bacteria equipped with novel enzymes that could digest plastic, or new drugs tailored to individuals’ particular cancers.
From chatbots to actual Transformers
Karol Hausman’s goal is to create a universal AI that can power any robot. “We want to build a model that can control any robot to do any task, including all the robots that exist today, and robots that haven’t even been developed yet," he says.
Hausman’s San Francisco-based startup, Physical Intelligence, is less than a year old, and Hausman himself used to work at Google’s AI wing, DeepMind. His company starts with a variant of the same large language model you use when you access ChatGPT. The newest of these language models also incorporate and can work with images. They are key to how Hausman’s robots operate.
In a recent demonstration, a Physical Intelligence-powered pair of robot arms does what is, believe it or not, one of the hardest tasks in all of robotics: folding laundry. Clothes can take on any shape, and require surprising flexibility and dexterity to handle, so roboticists can’t script the sequence of actions that will tell a robot exactly how to move its limbs to retrieve and fold laundry.
Physical Intelligence’s system can remove clothes from a dryer and neatly fold them using a system that learned how to do this task on its own, with no input from humans other than a mountain of data for it to digest. That demonstration, and others like it, was impressive enough that earlier this month the company raised $400 million from investors including Jeff Bezos and OpenAI.
In October, researchers at the Massachusetts Institute of Technology announced they’re pursuing a similar transformer-based strategy to create robot brains that can take in vast amounts of data from a variety of sources, and then operate flexibly in a wide range of environments. In one instance, they made several films of a regular robotic arm putting dog food into a bowl, then used the videos to train a separate AI-powered robot to do the same.
Robot, you can drive my car
As in robotics, researchers and companies working on self-driving cars are figuring out how to use transformer-based “visual language models" that can take in and connect not just language but images too.
California-based Nuro and London-based Wayve, as well as Waymo, owned by Google’s parent company, are among the companies working with these models.
This is a departure from pre-transformer approaches to self-driving, which used a mix of human-written instructions and older types of AI to process sensor data to identify objects on the road. The new transformer-based models are essentially a shortcut to giving self-driving systems the kind of general knowledge about the world that was previously very difficult to grant them.
Waymo researchers in one recent paper, for example, showed how using Google’s own commercial AI, called Gemini, could give their self-driving system the ability to identify and yield to objects it hadn’t been trained on, such as a dog crossing the street.
A helper rather than a replacement
Powerful as they can be, these systems still have limits and unpredictability that mean they won’t be able to completely automate people’s jobs, says Dettmer.
The AI at the heart of EvolutionaryScale, for example, can suggest new molecules for humans to try in the lab, but humans still have to synthesize and test them. And transformer-based models are far from reliable enough to take over driving completely.
Another limitation is that they are only as smart as the data they are trained on. Large language models like those from OpenAI are starting to run into the constraints of the available volume of useful written words in the world–and that’s with an internet full of text. For robots or self-driving cars to learn this way, they need enormous amounts of data about what happens when they try to operate in the real world–one reason there’s currently a race between companies to acquire such data.
These limitations are apparent in Physical Intelligence’s robots. Their system has taught itself to fold laundry, but before it can come to your house and take over that task for you, it would have to relearn that process in a way that’s specific to your own home. That would require a huge amount of engineers’ time, as well as money to train the model.
“I want to make sure I set expectations," says Hausman, the CEO. “As proud as we are of our accomplishment, we are still at the beginning."
Write to Christopher Mims at christopher.mims@wsj.com