Google’s SIMA 2 agent uses Gemini to reason and act in virtual worlds
Google DeepMind shared on Thursday a research preview of SIMA 2, the next generation of its generalist AI agent that integrates the language and reasoning powers of Gemini, Google’s large language model, to move beyond simply following instructions to understanding and interacting with its environment.
Like many of DeepMind’s projects, including AlphaFold, the first version of SIMA was trained on hundreds of hours of video game data to learn how to play multiple 3D games like a human, even some games it wasn’t trained on. SIMA 1, unveiled in March 2024, could follow basic instructions across a wide range of virtual environments, but it only had a 31% success rate for completing complex tasks, compared to 71% for humans.
“SIMA 2 is a step change and improvement in capabilities over SIMA 1,” Joe Marino, senior research scientist at DeepMind, said in a press briefing. “It’s a more general agent. It can complete complex tasks in previously unseen environments. And it’s a self-improving agent. So it can actually self-improve based on its own experience, which is a step towards more general-purpose robots and AGI systems more generally.”

SIMA 2 is powered by the Gemini 2.5 flash-lite model, and AGI refers to artificial general intelligence, which DeepMind defines as a system capable of a wide range of intellectual tasks with the ability to learn new skills and generalize knowledge across different areas.
Working with so-called “embodied agents” is crucial to generalized intelligence, DeepMind’s researchers say. Marino explained that an embodied agent interacts with a physical or virtual world via a body – observing inputs and taking actions much like a robot or human would – whereas a non-embodied agent might interact with your calendar, take notes, or execute code.
Jane Wang, a research scientist at DeepMind with a background in neuroscience, told TechCrunch that SIMA 2 goes far beyond gameplay.
“We’re asking it to actually understand what’s happening, understand what the user is asking it to do, and then be able to respond in a common-sense way that’s actually quite difficult,” Wang said.
Techcrunch event
San Francisco
|
October 13-15, 2026
By integrating Gemini, SIMA 2 doubled its predecessor’s performance, uniting Gemini’s advanced language and reasoning abilities with the embodied skills developed through training.
Marino demoed SIMA 2 in No Man’s Sky, where the agent described its surroundings – a rocky planet surface – and determined its next steps by recognizing and interacting with a distress beacon. SIMA 2 also uses Gemini to reason internally. In another game, when asked to walk to the house that’s the color of a ripe tomato, the agent showed its thinking – ripe tomatoes are red, therefore I should go to the red house – then found and approached it.
Being Gemini-powered also means SIMA 2 follows instructions based on emojis: “You instruct it 🪓🌲, and it’ll go chop down a tree,” Marino said.
Marino also demonstrated how SIMA 2 can navigate newly generated photorealistic worlds produced by Genie, DeepMind’s world model, correctly identifying and interacting with objects like benches, trees, and butterflies.

Gemini also enables self-improvement without much human data, Marino added. Where SIMA 1 was trained entirely on human gameplay, SIMA 2 uses it as a baseline to provide a strong initial model. When the team puts the agent into a new environment, it asks another Gemini model to create new tasks and a separate reward model to score the agent’s attempts. Using these self-generated experiences as training data, the agent learns from its own mistakes and gradually performs better, essentially teaching itself new behaviors through trial and error as a human would, guided by AI-based feedback instead of humans.
DeepMind sees SIMA 2 as a step toward unlocking more general-purpose robots.
“If we think of what a system needs to do to perform tasks in the real world, like a robot, I think there are two components of it,” Frederic Besse, senior staff research engineer at DeepMind, said during a press briefing. “First, there is a high-level understanding of the real world and what needs to be done, as well as some reasoning.”
If you ask a humanoid robot in your house to go check how many cans of beans you have in the cupboard, the system needs to understand all of the different concepts – what beans are, what a cupboard is – and navigate to that location. Besse says SIMA 2 touches more on that high-level behavior than it does on lower-level actions, which he refers to as controlling things like physical joints and wheels.
The team declined to share a specific timeline for implementing SIMA 2 in physical robotics systems. Besse told TechCrunch that DeepMind’s recently unveiled robotics foundation models – which can also reason about the physical world and create multi-step plans to complete a mission – were trained differently and separately from SIMA.
While there’s also no timeline for releasing more than a preview of SIMA 2, Wang told TechCrunch the goal is to show the world what DeepMind has been working on and see what kinds of collaborations and potential uses are possible.





