At the AWS re:Invent conference last month, Amazon unveiled two new custom computing chips called Gravitron 3 (an ARM-based chip) and Trn1 (a new AWS instance) powered by Trainium chip. The latter is being pitched as a virtual server specifically designed for AI tasks like image recognition, natural language processing and fraud detection.
With these new chips, AWS – known for spending big money on data-centre chips and renting it out as compute power to customers – is challenging the likes of NVIDIA. The primary reason for Amazon building its in-house chip is cost and performance efficiency. Now, one startup wants to further tip the scale on data centre AI. London-based Plumerai wants to replace GPUs that cost $10,000 with a microcontroller that costs only $3. On their company blog, Dutch co-founder Roeland Nusselder shares how they have done it.
Deep learning made possible on a tiny Arm Cortex-M7 microcontroller
It is often the wise move to not reinvent the wheel. Plumerai is doing something similar in its quest to fit a deep learning model into a tiny microcontroller. It is not making the microcontroller from the ground up and is instead using Arm Cortex-M7 microcontroller. Where it really stands out is by showcasing that you can make deep learning tiny without any compromise on reliability or detection accuracy.
The best way to understand Plumerai’s approach to the tiny deep learning model is to see the transformation of classical computing. First, we had computers the size of a room, then they became smaller, portable and now, we all carry phones that have more computing power than Apollo 11 that landed humans on the Moon.
Plumerai is showing that its tiny technology works with two distinct AI examples – person detection and speech recognition. The person detection model from Plumerai fits on an Arm Cortex-M7 microcontroller and the startup claims that it is “as accurate as Google’s much larger EfficientDet-D4 model running on a 250 Watt NVIDIA GPU”.
The person detection model is used by smart cameras to detect someone walking up to your door or a thermostat to stop heating when you go to bed. It is also used with home audio systems to allow the music to follow you from one room to another. While it is useful at home, it can also be used outdoors like at stores, offices, industries and in smart cities.
Plumerai argues that such a platform needs a system that is “small, inexpensive, battery-powered, and keeps the images at the edge for better privacy and security”. It further states that microcontrollers are ideal for such an application even though they don’t offer powerful compute performance or a lot of memory.
How does Plumerai succeed at bringing person detection to a tiny MCU
The secret sauce of Plumerai’s success is a full system-level approach where the startup collects its own data, uses its own data pipeline and develops its own models. This means that Plumerai’s deep learning model is capable of its own inference engine and hence, it can do real-time person detection on a tiny Arm Cortex-M7 microcontroller.
This person detection system built in-house by Plumerai and squeezed into a tiny microcontroller also does not seem to compromise accuracy. However, Plumerai is being careful about how the accuracy of its model is being tested. While the most common approach is to measure accuracy on a public dataset such as COCO, Plumerai says they “found this to be flawed”. It cites the example of many datasets not being an accurate representation of real-world scenarios.
As a result, Plumerai says the right dataset to measure the accuracy of its person detection system is one that is “specifically collected for the target application”. It went ahead with its own test dataset to measure accuracy but does note that this “test dataset is completely independent and separate from the training dataset”.
After selecting the dataset, the startup chose Google’s state-of-the-art EfficientDet as the most appropriate model to compare to. For the uninitiated, the EfficientDet is a large and highly accurate model that is scalable, enabling a range of performance points. The model starts with D0, which has a model size of 15.6MB and goes up to D7x, which requires 308MB.
The accuracy test shared by Plumerai shows that its model is just as accurate as the pre-trained EfficientDet-D4. With a real-world person detection mAP of 88.2, Plumerai’s model has a size of just 737KB, which makes it 112 times smaller than the EfficientDet-D4 model at 79.7MB. The thing working in favour of Plumerai’s model is that it does not use a $10,000 NVIDIA Tesla V100 GPU like the EfficientDet-D4 model.
Ambient computing and full-stack optimisation
Every tech computing talks about ambient computing, which is essentially the combination of hardware, software, sensors, machines and user experiences communicating and adapting as part of our daily life. To get there, IoT is one important element and technology like Plumerai’s person detection model housed in a tiny microcontroller could be the other important element.
The fact that Plumerai is offering the solution as a full AI stack makes it more useful to end customers. The startup says that its AI stack starts with wise selection of data and it does extensive data collection for its platform. It also uses its own data pipeline for intelligent curation, augmentation, oversampling strategies, model debugging, data unit tests, and more.
The next step is to design and train the model and Plumerai’s model supports a single class as opposed to multi-class models like EfficientDet. It also includes Binarised Neural Networks technology to not only shrink the network but also speed up the model. The final element in the inference engine was developed from scratch. Plumerai says its inference model outperforms TensorFlow Lite for Microcontrollers.
Technology is supposed to be affordable, efficient and widely available over time but data centres are an area where scarcity is often the game. With its approach, Plumerai could convince several AI-driven companies to adopt its model instead of using expensive GPUs to build their own AI stack. However, that success will depend on how well the model performs in the real world.