Google DeepMind has unveiled Genie 3, a foundation world model designed to train general-purpose AI agents, marking a significant step towards artificial general intelligence (AGI). Genie 3 can generate interactive 3D environments at 720p resolution and 24 frames per second from simple text prompts.
The model builds on Genie 2 and Veo 3, incorporating a deep understanding of physics to maintain consistency in its simulations. It allows for promptable world events and remembers previous generations, enabling a grasp of physics akin to human understanding.
Genie 3's capabilities facilitate training agents for general-purpose tasks, a key element in achieving AGI. It has limitations, including the range of actions an agent can take and the ability to model complex interactions between multiple agents, and can only support a few minutes of continuous interaction.
Despite these limitations, Genie 3 represents a step forward in enabling agents to plan, explore, and learn through trial and error, potentially ushering in a new era for embodied agents and AI's ability to discover strategies beyond human understanding.