Meta Platforms Inc, the parent of social media giant Facebook, is throwing its hat into the AI supercomputer race with a bold claim. The company said on Monday that its research team has built a new artificial intelligence supercomputer that it believes will be the fastest in the world when completed in mid-2022.
Called AI Research SuperCluster, or RSC, Meta says it has a high-speed computer designed specifically to train machine learning systems. The company sees the supercomputer as crucial to build better AI models by learning from trillions of data points. Meta is not the first to announce supremacy in AI supercomputers. Microsoft and NVIDIA have announced their own AI supercomputers.
Why Meta is building its own AI supercomputer
For big tech giants, AI is the next big frontier and it is now central that they do not rely on rivals to run their AI operation. Meta is already embroiled in a number of controversies regarding its inability to stop the distribution of harmful content on Facebook and other social media platforms it owns. With the AI supercomputer, Meta wants to ensure it doesn’t fall behind any further.
The AI supercomputer, announced by Meta and others, is different from regular supercomputers. Meta says its RSC will be used to train a range of systems across its businesses. The social media conglomerate plans to use RSC for content moderation algorithms used to detect harmful speech on its platforms and augmented reality features it plans to build as part of its metaverse ambitions.
“Our researchers will be able to train the largest models needed to develop advanced AI for computer vision, NLP, speech recognition, and more. We hope RSC will help us build entirely new AI systems that can, for example, power real-time voice translations to large groups of people, each speaking a different language, so they can seamlessly collaborate on a research project or play an AR game together,” Meta engineers Kevin Lee and Shubho Sengupta, wrote in a blog post.
Meta, of course, says that RSC will pave the way for AI-driven applications and products will form part of its metaverse. To recall, Facebook rebranded itself to Meta in October 2021, with an intent to reflect its focus on the metaverse. Metaverse has become a Silicon Valley buzz term that stands for a virtual environment where CEO Mark Zuckerberg envisions people will work, play and socialise in the future.
However, to build a metaverse akin to the one described by Ernest Cline in his best-selling 2011 novel Ready Player One, Meta needs enormous computing power. The AI Research SuperCluster announced this week should be seen as the first step towards building the computing power necessary to make metaverse functional.
Meta says the AI RSC will also help the company fully realise the benefits of “self-supervised learning and transformer-based models”. Facebook has been investing in AI research since 2013, including areas such as self-supervised learning, where algorithms learn from vast numbers of unlabelled examples and transformers that allow AI models to reason effectively by focussing on certain areas of input.
The idea behind RSC is to gain AI supremacy that will help the company build effective tools for content moderation, augmented reality, metaverse, and robotics. With RSC, Meta has found the computing power it needs to build these effective tools but we will need to wait before these technologies and services come to fruition.
RSC: What’s inside Meta’s AI supercomputer
An AI supercomputer is built by combining multiple GPUs into compute nodes and they are then connected to a high-performance network for fast communication between those nodes. Meta says it partnered with teams from NVIDIA, Pure Storage and Penguin Computing to build its own AI supercomputer.
The RSC comprises a total of 760 NVIDIA DGX A100 systems as its compute nodes, which translates to a total of 6,080 GPUs. The A100 system used is more powerful than the V100 used in Meta’s previous system. These NVIDIA DGX systems communicate via an NVIDIA Quantum 1600 Gb/s InfiniBand two-level Clos fabric that has no oversubscription.
The storage tier of RSC has 175 petabytes of Pure Storage FlashArray, 46 petabytes of cache storage in Penguin Computing Altus systems, and 10 petabytes of Pure Storage FlashBlade. Meta plans to complete phase two of RSC before the end of 2022. At that point, the RSC will contain a total of 16,000 GPUs and will be capable of training AI systems “with more than a trillion parameters on data sets as large as an exabyte.”
The sheer number of GPUs alone gives a narrow window into the performance of Meta’s AI supercomputer. For comparison, the AI supercomputer built by Microsoft in collaboration with OpenAI uses 10,000 GPUs. The AI supercomputer is also different from traditional high-performance computers since machine learning requires less accuracy. When Meta’s AI supercomputer is complete, we will need to know the floating-point arithmetic calculation to grasp the scale at which it will function in the real world.