Nvidia unveils monstrous A100 AI chip with 54 billion transistors and 5 petaflops of performance

Nvidia unveils monstrous A100 AI chip with 54 billion transistors and 5 petaflops of performance

Nvidia unpacked her Nvidia A100 artificial intelligence chip today, and CEO Jensen Huang called it the ultimate tool for advancing AI. Huang said he can make the heavy-duty computing tasks – which are vital in the fight against COVID-19 – much more economical and powerful than today’s more expensive systems.

The chip has a monstrous 54 billion transistors (the on-off switches that are the building blocks of everything that is electronic), and it can run 5 petaflops of performance, about 20 times more than the Volta chip of the previous generation . Huang made the announcement during his opening speech at the Nvidia GTC event, which was digital this year.

The launch was originally scheduled for March 24 but has been delayed by the pandemic. Nvidia postponed the release for today because the chips and DGX A100 systems that used the chips are now available and shipped.

The Nvidia A100 chip uses the same Ampère architecture (named after the French mathematician and physicist André-Marie Ampère) which could be used in consumer applications such as Nvidia’s GeForce graphics chips. Unlike Advanced Micro Devices (AMD), Nvidia focuses on creating a unique microarchitecture for its GPUs for both commercial AI and consumer graphics use. But Huang said the mix and match of the different elements of the chip will determine whether it is used for AI or graphics.

VB Transform 2020 Online – July 15 to 17: join the main AI leaders at the AI ​​event of the year. Register today and save 30% on digital access passes.

The DGX A100 is the third generation of Nvidia’s AI DGX platform, and Huang said it essentially bundles the capabilities of an entire data center in a single rack. It’s hyperbole, but Paresh Kharya, director of the product management center and cloud platforms, said in a press briefing that the 7-nanometer chip, codenamed Ampere, can replace many AI systems in use today.

“You get all the overhead of extra memory, CPU, and power from 56 servers … all in one,” said Huang. “The economic value proposition is really out of the ordinary, and that’s the thing that is really exciting.”

Jensen Huang of Nvidia owns the largest graphics card in the world.

Above: Jensen Huang of Nvidia owns the largest graphics card in the world.

Image credit: Nvidia

For example, to manage AI training tasks today, a customer needs 600 central processing unit (CPU) systems to handle millions of requests for data center applications. It costs $ 11 million and would require 25 server racks and 630 kilowatts of energy. With Ampere, Nvidia can do the same amount of processing for $ 1 million, a single server rack and 28 kilowatts of power.

“That’s why you hear Jensen say,” The more you buy, the more you save, “said Kharya.

Huang added, “This will replace a whole bunch of inference servers. The performance of training and inference is off the charts – 20 times off the charts. “

The first order

DGX A100 servers used at the Argonne National Lab.

Above: DGX A100 servers used at the Argonne National Lab.

Image credit: Argonne

The first chip order goes to the Argonne national laboratory of the American Department of Energy (DOE), which will use the AI ​​and the computing power of the cluster to better understand and combat COVID-19. DGX A100 systems use eight of the new Nvidia A100 Tensor Core GPUs, providing 320 gigabytes of memory for training the largest artificial intelligence data sets, and the latest Nvidia Mellanox HDR 200Gbps interconnects at high speed.

Several smaller workloads can be accelerated by partitioning the DGX A100 as many as 56 instances per system, using the multi-instance A100 GPU function. Combining these capabilities allows organizations to leverage computing power and on-demand resources to accelerate various workloads – including data analysis, training, and inference – on a single, fully-featured platform. integrated and defined by software.

Immediate adoption and support of the DGX A100

Above: DGX A100 system at the Argonne National Lab.

Image credit: Argonne

Nvidia said that a number of the world’s largest companies, service providers and government agencies had placed initial orders for the DGX A100, with the first systems being delivered to Argonne earlier this month.

Rick Stevens, associate laboratory director for computer, environment and life sciences at Argonne National Lab, said in a statement that the centre’s supercomputers are used to fight the coronavirus, with models and simulations AI running on machines hoping to find treatments and a vaccine. The power of the DGX A100 systems will allow scientists to do a year of work in months or days.

The University of Florida will be the first higher education institution in the United States to receive DGX A100 systems, which it will deploy to infuse AI throughout its program to foster a compatible workforce with AI.

The Center for Biomedical AI at the University Medical Center in Hamburg-Eppendorf, Germany, is among the first to adopt the DGX A100 to advance clinical decision support and process optimization.

Thousands of previous generation DGX systems are currently in use worldwide by a wide range of public and private organizations. Among these users are some of the world’s leading companies, including automakers, healthcare providers, retailers, financial institutions and logistics companies that are adopting AI in their industries.


Above: a DGP SuperPod

Image credit: Nvidia

Nvidia has also revealed its new generation DGX SuperPod, a cluster of 140 DGX A100 systems capable of reaching 700 petaflops of AI computing power. By combining 140 DGX A100 systems with Nvidia Mellanox HDR 200 Gbit / s InfiniBand interconnections, the company has built its own next-generation DGX SuperPod AI supercomputer for internal research in areas such as conversational AI, genomics and driving. autonomous.

It only took three weeks to build this SuperPod, said Kharya, and the cluster is one of the fastest AI supercomputers in the world – reaching a level of performance that previously required thousands of servers.

To help customers build their own A100-powered data centers, Nvidia has released a new DGX SuperPod reference architecture. This gives customers a plan that follows the same design principles and best practices used by Nvidia.

DGXpert program, DGX compatible software

Above: Nvidia A100 chip on a printed circuit board.

Image credit: Nvidia

Nvidia also launched the Nvidia DGXpert program, which brings DGX customers together with the company’s AI experts, and Nvidia DGX-ready software, which helps customers leverage professional-quality certified software for AI workflows .

The company said each DGX A100 system has eight Nvidia A100 Tensor Core graphics processing units (GPUs), providing 5 petaflops of AI power, with 320 GB of total GPU memory and 12.4 TB per second of bandwidth.

The systems also have six Nvidia NVSwitch interconnect matrices with third generation Nvidia NVLink technology for 4.8 terabytes per second of two-way bandwidth. And they have nine Nvidia Mellanox ConnectX-6 HDR 200 GB network interfaces per second, providing a total of 3.6 terabits per second of two-way bandwidth.

The chips are manufactured by TSMC in a 7 nanometer process. Nvidia DGX A100 systems start at $ 199,000 and are now shipped through Nvidia Partner Network resellers worldwide.

Huang said the DGX A100 uses the HGX motherboard, which weighs around 50 pounds and is “the most complex motherboard in the world”. (Here is the board he took out of his oven in a teaser video). It has 30,000 components and one kilometer of traces of wires.

As for a consumer graphics chip, Nvidia would configure a chip based on Ampère in a very different way. The A100 uses high bandwidth memory for data center applications, but this would not be used in consumer graphics. Cores would also be heavily biased for graphics instead of the double precision floating point calculations that data centers need, he said.

“We’re going to bias it differently, but every single workload runs on every GPU,” said Huang.

Leave a Comment

Your email address will not be published. Required fields are marked *