In a technical document released quietly earlier this year, IBM detailed what it calls the IBM Neural Computer, a tailor-made reconfigurable parallel processing system designed to research and develop emerging AI algorithms and neuroscience computational. This week, the company released a preprint describing the first application presented on the Neural Computer: a deep “neuroevolution” system that combines the hardware implementation of an Atari 2600, image preprocessing and AI algorithms in an optimized pipeline. Coauthors report competitive results compared to advanced techniques, but perhaps more importantly, they claim that the system achieves a record training time of 1.2 million frames of frames per second.
The neural computer represents something of a shot through the arc in the AI computing arms race. According to an analysis recently published by OpenAI, from 2012 to 2018, the amount of computation used in the largest AI training sessions increased more than 300,000 times with a doubling time of 3.5 months, far exceeding the rhythm of Moore’s law. In the process, supercomputers like the next Intel Dawn at the Argonne National Laboratory of the Department of Energy and at AMD Border at the Oak Ridge National Laboratory promise more than one exaflop (a quintillion of floating point calculations per second) of computer performance.
Video games are a well-established platform for AI research and machine learning. They have made money not only because of their availability and the low cost of their large-scale execution, but because in some areas like reinforcement learning, where AI learns optimal behaviors by interacting with them. environment in search of rewards, game scores serve as direct rewards. The AI algorithms that develop in games have proven to be adaptable to more practical uses, such as protein folding prediction. And if the results from IBM’s Neural Computer prove to be reproducible, the system could be used to accelerate the development of these AI algorithms.
The neural computer
IBM’s neural computer includes 432 nodes (27 nodes on 16 modular cards) based on Field Programmable Door Arrays (FPGAs) from Xilinx, a longtime strategic collaborator of IBM. (FPGAs are integrated circuits designed to be configured after manufacture.) Each node includes a Xilinx Zynq system on a chip – a dual-core ARM A9 processor coupled to an FPGA on the same chip – as well as 1 GB of dedicated RAM. The nodes are arranged in a 3D mesh topology, interconnected vertically with electrical connections called silicon bushings which completely cross the silicon wafers or matrices.
On the network side, the FPGAs allow access to the physical communication links between the cards in order to establish several distinct communication channels. A single card can theoretically support transfer speeds of up to 432 GB per second, but the network interfaces of the Neural Computer can be adjusted and gradually optimized to best suit a given application.
“The availability of FPGA resources on each node allows the unloading of the application-specific processor, a functionality which is not available on any parallel machine of this scale that we know”, wrote the co-authors of a paper detailing the architecture of the Neural Computer. “[M]ost critical steps for performance [are] unloaded and optimized on the FPGA, with the ARM [processor] … Provide auxiliary support. “
Play Atari games with AI
The researchers used 26 of the 27 nodes per card within the neural computer, performing experiments on a total of 416 nodes. Two instances of their Atari game application – which extracted images from a given Atari 2600 game, preprocessed the images, executed the images through machine learning models, and performed an action in the game – worked on each of the 416 FPGAs, going to the next level. to 832 instances executed in parallel.
For best performance, the team avoided emulating the Atari 2600, opting instead for using FPGAs to implement the functionality of the console at higher frequencies. They exploited an open source framework MiSTer project, which aims to recreate consoles and arcade machines using modern hardware, and increased the clock on the Atari 2600’s processor to 150 MHz instead of 3.58 MHz. This produced approximately 2,514 frames per second compared to the original 60 frames per second.
During the image preprocessing step, the IBM application converted the color images to grayscale, eliminated flickering, resized the images to a lower resolution, and stacked the images in groups of four. He then passed them on to an AI model who reasoned about the game environment and to a submodule that selected the action for the following images by identifying the maximum reward predicted by the AI model.
Yet another algorithm – a genetic algorithm – worked on an external computer connected to the Neural Computer via a PCIe connection. He evaluated the performance of each instance and identified the highest performance of the group, which he selected as “parents” of the next generation of instances.
In 5 experiments, IBM researchers ran 59 Atari 2600 games on the neural computer. The results imply that the approach was not efficient in terms of data compared to other reinforcement learning techniques – it required 6 billion frames of play in total and failed in difficult exploration games like Montezuma’s Revenge and Pitfall. But he managed to surpass a popular baseline – a Deep Q network, an architecture launched by DeepMind – in 30 of 59 games after 6 minutes of training (200 million training frames) against 10 of the Deep-Q network days training. With 6 billion training frames, it surpassed the Deep Q-network in 36 games while taking 2 orders of magnitude less training time (2 hours and 30 minutes).