Meta, the tech giant previously known as Facebook, revealed Monday that it’s built one of the world’s fastest supercomputers, a behemoth called the Research SuperCluster, or RSC. With 6,080 graphics processing units packaged into 760 Nvidia A100 modules, it’s the fastest machine built for AI tasks, Chief Executive Mark Zuckerberg says.
That processing power is in the same league as the Perlmutter supercomputer, which uses more than 6,000 of the same Nvidia GPUs and currently ranks as the world’s fifth fastest supercomputer. And in a second phase, Meta plans to boost performance by a factor of 2.5 with an expansion to 16,000 GPUs this year.
Meta will use RSC for a host of research projects that require next-level performance, such as “multimodal” AI that bases conclusions on a combination of sound, imagery and actions instead of just one type of input data. That could be helpful for dealing with the subtleties of one of Facebook’s big issues, spotting harmful content.
Meta, a top AI researcher, hopes the investment will pay off by using RSC to help build out the company’s latest priority: the virtual realm it calls the metaverse. RSC could be powerful enough to simultaneously translate speech for a large group of individuals who each speak a different language.
“The experiences we’re building for the metaverse require enormous compute power,” Meta CEO Mark Zuckerberg said in a statement. “RSC will enable new AI models that can learn from trillions of examples, understand hundreds of languages and more.”
When it comes to one of the top uses of AI — training an AI system to recognize what’s in a photo — RSC is about 20 times faster than its earlier 2017-era Nvidia machine, Meta researchers Kevin Lee and Shubho Sengupta said in a blog post. For decoding human speech, it’s about three times faster.
The term artificial intelligence today typically refers to a method called machine learning or deep learning that processes data similarly to how human brains work. It’s revolutionary because AI models are trained through exposure to real-world data. For example, AI can learn what cat faces look like by analyzing thousands of cat photos, in comparison to traditional programming where a developer would try to describe the full feline variety of fur, whiskers, eyes and ears.
RSC also could help a particularly thorny AI problem that Meta calls self-supervised learning. AI models are trained today based on carefully annotated data. For example, stop signs are labeled in photos used to train autonomous vehicles AI, and a transcript accompanies the audio used to train speech recognition AI. The more difficult task of self-supervised training uses raw, unlabeled data instead. So far, that’s an area in which humans still have an edge over computers.
Meta and other AI proponents have shown that training AI models with ever larger data sets produces better results. Training AI models takes vastly more computing horsepower than running those models, which is why iPhones can unlock when they recognize your face without requiring a connection to a data center packed with servers.
Supercomputer designers customize their machines by picking the right balance of memory, GPU performance, CPU performance, power consumption and internal data pathways. In today’s AI, the star of the show is often the GPU, a type of processor originally developed for accelerating graphics but now used for many other computing chores.
Nvidia’s cutting-edge A100 chips are geared specifically for AI and other heavy-duty data center tasks. Big companies like Google, as well as a host of startups, are working on dedicated AI processors, some of them the largest chips ever built. Facebook prefers the relatively flexible A100 GPU foundation because, when combined with Facebook’s own PyTorch AI software, it’s the most productive environment for developers, the company believes.