SIMA, Genie and... Midjourney: A Quick Recap of the AI Landscape

Fion Cheng
From Google Deepmind’s extensive research on AI models in virtual environments such as SIMA and Genie, to the consistency improvements in Midjourney, here’s a roundup of what’s caught our eyes this month.

1. SIMA — Google Deepmind’s New Agent that Interacts with Virtual 3D Environment

Google DeepMind introduced SIMA, an AI agent designed to comprehend any virtual 3D environment and mirror human capabilities.

→ How it works
SIMA operates via text input and image observation, translating commands into mouse and keyboard actions without needing access to the source code. Teaming up with eight game developers, Google deployed SIMA in popular games like No Man’s Sky and Goat Simulator to teach SIMA skills in different interactive worlds. 

→ What lies ahead? 
With around 600 skills covering navigation and object manipulation, SIMA impressively showcases its adaptability across a range of virtual scenarios. We’re eager to see more progress in the development of similar actionable models, shifting from passively analysing data to taking actions on our behalf. 

2. Genie — Image-to-Game model

Another research preview published by Google DeepMind this month looks at a new foundation world model that generates interactive, playable environments through text, photographs, and even sketches.

→ How it works

Genie takes a single image and generates a prediction of the next action when given a specific input. Users act in the generated environments on a frame-by frame basis. 

→ What lies ahead? 
While frame-by-frame image generation may currently be viewed as less efficient, Genie's framework is laying the foundation for future AI agents, serving the purposes of game prototyping and the creation of game assets for developers.

3. Midjourney — New Consistent Character Feature

Midjourney recently rolled out a new feature: the consistent character tool, which allows creators to apply the same facial features and characteristics across multiple image generations with a simple tag.

Image Credit: CharacterOrdinary551 via Reddit

→ How it works
The new “-cref” tag (short for character reference) allows users to link to a reference character for matching artwork and text, and multiple links can blend information from different images. Users can control how closely the image matches the reference on a scale of 1 to 100. With lower settings, facial features remain consistent while clothing varies.

→ What lies ahead? 
Whether it’s creating storyboards for animation, planning moodboards for events or product photoshoots, or simply visualising ideas, we’re excited to see how this new feature will streamline the creative process of sourcing materials and speed up project development. 

Each month, we research and share the latest developments in innovative tech. Stay tuned for more monthly recaps by signing up for our newsletter.

Blue Gradient Background