AI Benchmark v4:Pushing Mobile NPUs to Their Limits
Twice larger number of tests, native hardware acceleration on many mobile platforms, new tasks targeted at multiple model acceleration, the possibility of loading and running custom TFLite models, NPU / DSP throttling tests — this isn't the full list of improvements coming with the 4th version of AI Benchmark. The detailed description of these and other changes introduced in this release are provided below.
Native Hardware Acceleration
One of the most awaited features in this release is native AI hardware acceleration on many mobile chipsets. This became possible with the introduction of TensorFlow Lite delegates working as a middleware between the standard TFLite runtime and vendors' custom deep learning libraries. Unlike the standard SDK-based benchmarking approach that requires each model to be converted and "optimized" for each vendor, thus making the comparison absolutely unfair, this solution allows to run the identical model on every single mobile device, while benefiting from highly optimized proprietary SDKs and avoiding all limitations of the standard Android NNAPI acceleration path.
Performance Review of all Mobile SoCs with AI capabilities
The performance of mobile AI accelerators has been evolving rapidly in the past two years, nearly doubling with each new generation of SoCs. The current 4th generation of mobile NPUs is already approaching the results of CUDA-compatible Nvidia graphics cards presented not long ago, which together with the increased capabilities of mobile deep learning frameworks makes it possible to run complex and deep AI models on mobile devices. Below, we evaluate the performance and compare the results of all chipsets from Qualcomm, HiSilicon, Samsung, MediaTek and Unisoc that are providing hardware acceleration for AI inference. The full version of this material can be found in the AI Benchmark 2019 ICCV paper.
AI Benchmark for Windows, Linux and macOS: Let the AI Games Begin...
While Machine Learning is already a mature field, for many years it was lacking a professional, accurate and lightweight tool for measuring AI performance of various hardware used for training and inference with ML algorithms. Today we are making a step forward towards standardizing the benchmarking of AI-related silicon, and present a new standard for all-round performance evaluation of hardware platforms capable of running machine and deep learning models.
Spreadtrum Reborn:Performance Review of the Unisoc Tiger T710
For many years, Unisoc was holding a small share of the mobile SoC market and was primarily known for its budget processors for low-end Android devices. It looks like with its recent rebranding you should forget everything that you knew about it – while being a pre-release engineering sample, its new mid-range chipset is easily beating the giants of the SoC industry.
Below we present a detailed performance analysis of the Unisoc Tiger T710. Please note that this is a pre-release sample (also known as ud710_3h10_native), thus the results of the final commercial version might be slightly different.
HTC HD2:Testing AI Performance of a 10-year old Legend
HTC is going through a really tough time right now: no phone releases this year and a small assortment of devices from 2018. But 10 years ago everything was quite different - being one of the most powerful and innovative companies of its time, it was defining the trends and standards of mobile industry. In 2009, it revealed several milestone smartphones such as HTC Magic, Hero, Touch2, Diamond2, and of course, one of the greatest hit ever produced by HTC - HD2 (Leo), the pinnacle of Windows Mobile devices. With the first Qualcomm Snapdragon chipset, 576 MB of RAM, stylish design and huge display (4.3 inches was quite a challenge in 2009), it was a cherished dream of many geeks at the end of 00's. The beginning of its commercial life wasn't too fortunate though - just after a couple of months after its release, HTC announced the deprecation of Windows Mobile devices and completely switched to Android and WP7, leaving all existing owners of this mighty phone without any updates. But what first seemed like the end of Leo's life, was actually just its beginning: as its hardware was almost identical to later HTC devices running Android, its faithful developer community has soon ported Android 1.6-2.3 on HD2 that was running perfectly on its hardware. And then... Windows Phone 7 and 8, Windows RT, Ubuntu and Ubuntu Phone, MeeGo and Firefox OS - all these operating systems got a chance to run on our hero. There were even some attempts to port iOS (!) on HD2, though the source code of this project has never been published.
The latest AI Benchmark version is introducing the largest update since its first release. With new functionality, tests and measurements, it becomes an ultimate and unique solution for assessing real AI performance of mobile devices extensively and reliably. Below is the detailed description of the changes introduced in this version.
AI Performance: Accuracy Matters!
Starting from now, we are introducing accuracy checks in tests running on NPUs, GPUs and DSPs: increasing the speed of AI computations at the expense of their precision is no longer possible. While we were internally checking the accuracy before, it was not displayed in the benchmark and was not taken into account by the scoring system. However, during the past months it became clear that it can't be ignored anymore: while some chipsets were demonstrating ideal results, other SoCs had huge issues with precision - in some cases the error was up to 100 times higher than the normal values.
The fastest AI chip, Samsung Galaxy S10 scores, and the recent ranking changes
During the past months, AI Benchmark scores were used in a number of events and publications, rising many questions regarding the performance of some newly presented chipsets and phones. We are providing our official explanation of the results and score updates for February 2019 below.
AI rush: Snapdragon 855, MediaTek P90, Kirin 980 or Exynos 9820 — who rules the game?
Snapdragon 855, currently placed on top of our ranking, is without a doubt one of the fastest chipsets available on the market. It is demonstrating very strong AI performance and provides hardware acceleration for both float and quantized neural networks: in the first case inference is done on Adreno 640 GPU, while quantized networks are running on its built-in Hexagon 690 DSP. This combination of GPU and DSP allows Qualcomm to omit the necessity of using a separate NPU for accelerating AI computations, which leads to smaller SoC size and its easier development. However, this decision also has its costs - Snapdragon's GPU cannot be fully utilized for running neural networks as its design was originally developed for pure computer graphics tasks, and thus only a small amount of its power can be used when running AI computations. This might also cause some difficulties in their future products development, as there are generally two ways of improving Snapdragon's AI capabilities: increasing GPU performance or radically changing its design, though the latter will also cause the change of the whole graphical system and drivers. And the third option is to introduce a separate dedicated AI chip, which actually might be the case in the next Qualcomm high-end SoC.