As seen in the past, GPU shortage is impending yet again. The release of ChatGPT has not only been disruptive in terms of AI possibilities but it has also kickstarted an arms race for creating even better models. This race is fueled primarily by NVIDIA GPUs.
This article was written by Anouk Dutree of Ubiops.
A GPU (Graphics Processing Unit) is a specialized electronic circuit designed to quickly perform calculations needed for rendering graphics and images. In artificial intelligence, GPUs are used to accelerate the training of machine learning models, which involves performing complex mathematical computations in parallel. By offloading these computations to the GPU, the AI algorithms can be trained much faster and more efficiently than using traditional CPUs alone, which significantly speeds up the development of AI applications.
ChatGPT was trained with the help of 10,000 Nvidia GPUs, according to UBS analyst Timothy Arcuri. And that’s only for training, not even inference. Dylan Patel and Afzal Ahmad made a nice thought experiment that if software and hardware would not be optimized, that deploying current ChatGPT into every search done by Google would require 512,820.51 A100 HGX servers with a total of 4,102,568 A100 GPUs. Obviously it’s safe to assume that software and hardware will be optimized but it does bring into perspective just how big the demand for GPUs is due to this boom of Generative AI that we are experiencing.
There are many rumors going around that it will simply be impossible to keep up with the increased demand and that a shortage is coming, some even believe that the shortage is already there. Important to note here though is that this seems to be a shortage only in a specific category of GPUs: the high-end ones used for AI purposes. The shortage in 2021/2022 was mainly a problem for consumer grade GPUs.
So why are GPUs so important for AI? And is there anything you can do to shield yourself from a possible shortage if you need GPUs for your AI use cases?
Why are GPUs so important?
To understand why GPUs are so important for operationalizing Machine Learning models, it’s good to have a look at what type of applications GPUs are made for. Owens et al. listed the following characteristics of GPU applications in his paper “GPU computing”:
- Computation requirements are large
- High parallelism
- Throughput is more important than latency
In other words, GPUs are extremely good at executing tens of thousands of parallel threads to rapidly solve large problems having substantial inherent parallelism. Each individual computation might be a bit slower than it would be on a CPU, but because a GPU can handle so many in parallel, the overall problem is solved much faster.
Since most machine learning algorithms are basically a concatenation of a bunch of linear algebra operations, they are very suitable for parallelization. A typical neural network consists of long chains of interconnected layers (see figure below). While there is a massive amount of compute in a full network, it can be broken up into layers of smaller, sequentially dependent chunks of work. Because of this, neural networks can really leverage the power of GPUs to reach much faster speeds than possible with a CPU.
Picture taken from Choquette et al. (2021) “NVIDIA A100 Tensor Core GPU: Performance and Innovation”. This graphic outlines the Deep Learning network mapping to GPU.
Shi et al. actually performed a thorough benchmark between CPUs and GPUs for deep learning. Even though their paper is from 2017, their conclusions still hold. GPUs outperformed CPUs when it came to deep learning. You can see their results table for yourself in the figure below. The quickest time in each category is marked in bold.
Figure taken from Shi et. al (2017) “Benchmarking State-of-the-Art Deep Learning Software Tools”. In bold you can find the quickest time within each category (Desktop CPU, Server CPU, Single GPU). GPU outperforms CPU.
The difficulty of obtaining a GPU when you need it
With GPUs being so perfect for ML workloads, and more and more companies using AI, demand has risen tremendously. This increase in demand makes it quite difficult to obtain a GPU nowadays. When you want to run your workloads on a GPU you have roughly two options:
- Buy a physical GPU
- Use a cloud GPU (for instance at AWS or Google, or via specific Machine Learning (ML) platforms like UbiOps)
Physical GPUs are very expensive, but they’re also often sold out. This is caused by so-called scalpers that buy up all available GPUs with bots once they get offered on a website. Because of this, you can only buy GPUs from these scalpers, unless you’re faster than the bots of course. Scalpers resell the products at prices inflated by more than 200% compared to the original cost, making them ridiculously expensive.
With astronomical prices for physical GPUs, cloud GPUs become a more interesting option. You normally only have to pay for the time you use the GPU, so if your workloads don’t run too long it’s often a lot cheaper than buying a GPU yourself. Cloud GPUs are highly scalable, they minimize costs, and they also clear up local resources. You can just continue using your laptop while your model is running on a GPU in one of the clouds’ datacenters.
However, clouds have been having difficulty keeping up with the increased demands of GPUs. With increased demands, but too little increase in supply, users end up needing to wait for a GPU to become available. Gigaom AI analyst Anand Joshi looked into this issue and noted that a lot of users are experiencing longer wait times than a couple of years ago. If you have a Machine Learning (ML) model in production that needs to respond within seconds, it can become a big problem if you cannot even get an available GPU within a minute. In that case you might need to buy dedicated GPUs from the cloud that are available 24/7, but therefore also a lot pricier. With the increase in demand due to generative AI and LLM models, the prices of high-end GPUs have also been driven up considerably.
What are your options if you need a GPU?
Let’s say you need a GPU for your model, but you cannot get your hands on a physical one, and you are experiencing too long wait times in queues in the cloud, what are your options?
Well, as I mentioned before, you could buy dedicated serverless GPUs from any of the cloud providers (where you get a GPU on a per month basis as opposed to per hour basis), but this might be costly for your organization. You could also try to poll for available GPUs in different clouds and pick the one that provides a GPU the fastest. But then you have to deal with multiple clouds and make your infrastructure cloud agnostic. It’s quite a hassle.
Another option is to go for a cloud environment that focuses on GPUs, like Escher Cloud (A European cloud backed by NVIDIA), or Bytesnet. This is a great option if you really need a lot of GPUs. This might also be overkill for your organization though, especially if you also just need regular CPUs or other features related to MLOps. However, if you focus on things like image or video processing, it might be a good fit!
The last option I want to highlight is going for a Machine Learning Operations (MLOps) platform with GPU support, like UbiOps. Which offers additional features for MLOps and through service level agreements (SLA’s) you can have a say in how fast a GPU should be available. UbiOps is an MLOps platform that runs in the cloud and it offers scalable GPU support. It’s completely cloud agnostic and they also have partnerships with Bytesnet and other European clouds, which is handy if you are an EU based company working with sensitive data.
With the current demand and companies planning on incorporating more and more AI into their business, I think it’s safe to say that GPU prices will continue to rise, while the supply will get tighter and tighter. There are still ways to get your hands on a GPU though.
Whether you buy one of your own, use a cloud one, or use one through an MLOps platform, there are options out there! What option will fit best depends on your use case and how long you’re willing to wait for a GPU to become available.
This article was written by Anouk Dutree of Ubiops.