Nvidia GameGAN

Nvidia’s GameGAN generates games like Pac-Man by watching videos

Nvidia researchers two years ago detailed An AI that could generate visuals and combine them with a game engine to create an interactive and navigable environment. As an extension of this work, scientists from society, the Vector Institute, MIT and the University of Toronto published an article this week describing GameGAN, a system capable of synthesizing a functional version of a game without an underlying engine.

Although game generation does not seem to be the most practical application for AI, algorithms like GameGAN may one day be used to produce simulators for training robotic systems. Before being deployed in the real world, the AI ​​controlling the robot typically undergoes extensive testing in simulated environments, which include procedural models that synthesize scenes and behavior trees specifying the behaviors of simulation agents. Writing these models and trees requires both time and highly skilled experts in the field, which translates into increased expenses for companies looking to transfer models to real-world robots.

It should be mentioned that GameGAN is not the first system designed to tackle game generation. A recent article co-written by researchers at Google Brain describes an algorithm that uses video prediction techniques to train game AI in learned models of Atari games. A Georgia Tech study proposes an algorithm which absorbs the images of the game and probabilistically maps the relationships between the objects of the game and their evolution. Facebook system can extract controllable characters from actual videos of tennis players, fencing instructors, etc. And systems like those proposed by researchers from the University of California, Santa Barbara and the Politecnico di Milano in Italy rely on knowledge of existing stages to create new stages in games like Condemn and super mario bros.

Nvidia GameGAN

Above: a Pac-Man type game synthesized by GameGAN from Nvidia.

Image credit: Nvidia

But GameGAN uniquely defines game creation as an image generation problem. Given the image sequences of a game and the corresponding actions of agents (i.e., players) in the game, the system visually mimics the game using a trained AI model. Concretely, GameGAN ingests the scenario and the keyboard actions during training and aims to predict the next image by conditioning the action – for example, a button pressed by a player. It learns pairs of images and actions directly without having access to the underlying logic or engine, taking advantage of a memory module that encourages the system to build a map of the game environment. A decoder learns to unravel static and dynamic components in images, which makes GameGAN’s behavior more interpretable, and it allows to modify existing games on the fly by exchanging various assets.

VB Transform 2020 Online – July 15-17. Join key AI leaders: Sign up for the free livestream.

To achieve this, we had to overcome formidable design challenges on the part of researchers, such as emulating physical engines and preserving consistency in the long term. (Players generally expect a scene they move away to look like if they return.) They also had to make sure that GameGAN could model deterministic (predictable) and stochastic (random) behaviors in games he was trying to recreate.

A three-part model

The team’s solution was a three-part model consisting of a dynamic engine, the aforementioned memory module and a rendering engine. At a high level, GameGAN reacts to the actions of an AI agent playing the generated game by producing images of the environment in real time, even layouts he has never seen before.

The dynamics engine is responsible for learning which actions are not “allowed” in the context of a game (such as walking through a wall) and modeling the response of objects following actions. The memory module establishes long-term consistency so that simulated scenes (such as buildings and streets) do not change unexpectedly over time, in part by “remembering” each scene generated. (The memory model also retrieves static elements such as backgrounds, as needed.) The rendering engine – the last step in the pipeline – renders the simulated images based on object and attribute maps, holding automatically counts depth by occluding objects.

Nvidia GameGAN

GameGAN trains using a so-called contradictory approach, where the system attempts to “trick” the discriminators – a single-image discriminator, an action-conditioned discriminator and a temporal discriminator – to produce realistic and consistent games. GameGAN synthesizes images of random noise samples using a distribution, and then transmits them, along with real examples from a training data set, to the discriminators, who try to distinguish between the two. GameGAN and the discriminators improve in their respective capacities until the discriminators are unable to distinguish the real examples from the synthesized examples with an accuracy greater than 50% expected from chance.

The training takes place without supervision, which means that GameGAN deduces the models in the data sets without reference to known, labeled or annotated results. Interestingly, the work of the discriminators informs that of GameGAN – whenever the discriminator correctly identifies a synthesized work, it tells GameGAN how to modify its output so that it is more realistic in the future.


During experiments, the Nvidia team fed GameGAN 50,000 episodes (several million images in total) of Pac-Man and the Doom-based VizDoom artificial intelligence research platform for four days. (Bandai Namco’s research division provided a copy of Pac-Man for training.) They used a modified version of Pac-Man with an environment half the normal size (a 7×7 grid as opposed to a grid 14 x 14)) as well as a variant called Pac-Man-Maze, which lacked ghosts and had walls created randomly by an algorithm.

With the exception of the occasional failure, GameGAN has indeed delivered Pac-Man and Doom type experiments “temporally coherent” with ghosts and pellets (in the case of imitation Pac-Man) and balls and rooms (VizDoom).

Nvidia GameGAN

Above: a Doom type game generated by GameGAN.

Image credit: Nvidia

Perhaps more exciting, due to its untangling stage, the system allowed enemies in simulated games to be moved around the map and the backgrounds or foregrounds to be swapped with random images.

In order to more quantitatively measure the qualities of the games generated, the researchers deployed reinforcement learning agents in the two games and asked them to obtain high scores. For example, agent Pac-Man had to “eat” dumplings and capture a flag and was penalized each time a ghost consumed it or used a maximum number of steps. During 100 test environments, the agents solved the VizDoom type game – making them the first trained with a GAN framework to do so, the team claims – and beat several baselines in Pac-Man.

Nvidia GameGAN

Above: exchanging backgrounds and sprites using GameGAN.

Image credit: Nvidia

Researchers believe GameGAN has obvious applicability to game design, where it could be used alongside tools such as Promethean AICreative art platform to quickly create new levels and environments. But they are also considering future similar systems that can learn to imitate the rules of conduct, for example, or the laws of physics simply by watching videos and seeing agents acting. In the shorter term, as mentioned earlier, GameGAN could write simulators to train warehouse robots that can grab and move objects or delivery robots that have to cross the sidewalks to deliver food and medicine.

Nvidia says it will make the games generated from its experiences available on its AI Playground platform later this year.

Leave a Comment

Your email address will not be published. Required fields are marked *