November 02, 2023
For the successful integration, development, and maintenance of machine learning frameworks, benchmarking is of paramount importance. Developers or maintainers should be able to effortlessly gauge how their frameworks perform compared to other implementations, prior code versions, or across different boards, with respect to both runtime performance and other metrics. Despite this clear need, there exist relatively few attempts to address the issue –. Further, existing attempts are generally not very automated, meaning they have high maintenance requirements and thus are of limited utility in the context of software development workflows.
At Collabora, we developed MLBench, a flexible benchmarking system for the needs of benchmarking machine learning frameworks across different backends and boards. Given some implementations of models, sets of parameters with which to run each model, and a set of datasets, the system will automatically gather benchmarking information for each combination of parameters and datasets, and store it in a database. This benchmarking information is not limited to runtimes but also includes other metrics such as power consumption, memory and CPU usage, and precision.
As we will demonstrate below, MLBench is versatile tool that can be adapted to many different use cases. It enables developers to easily:
We have also integrated MLBench into a continuous integration tool, allowing it to operate in a relatively low-maintenance, turnkey fashion.
A primary obstacle for a benchmarking system is varying interfaces for the tested frameworks; for instance, frameworks may be written in different languages. In addition, the desired output—runtimes, performance, or other metrics may be returned differently. This diversity necessitates the construction of wrapper scripts to standardize the program interfaces. For each method to be benchmarked, our system expects a wrapper script that handles the different aspects of running the framework.
Assuming that wrapper scripts have been written for each framework of interest, the user must then specify a configuration file. This configuration file serves as a blueprint, detailing the models and datasets associated with each library. By utilizing the wrapper script, the system ensures consistent monitoring of different metrics to facilitate meaningful comparisons.
With MLBench, the ease of comparison becomes evident at a glance. You can effortlessly assess runtime performance, memory usage trends, and more, making it simple to pinpoint the strengths and weaknesses of each configuration. MLBench streamlines the process of evaluating machine learning frameworks, enabling informed decision-making for optimal results. The dashboard image below illustrates a side-by-side comparison of the Inception V4 on Jetson Nano with TensorRT and onnxruntime, Rockpi RK3399 with tflite, and Coral TPU with tflite for EdgeTPU.
MLBench is a versatile tool that's adaptable to a range of setups, whether you have unique hardware or different software settings. Its flexible nature allows delving into multiple scenarios, helping to uncover insights about performance and architecture issues. Check out the MLBench dashboard for comprehensive benchmarking of machine learning models and hardware configurations. It offers in-depth performance evaluations and collects data on temperature, memory usage, power consumption, CPU/GPU utilization, and more. Visit MLBench for detailed information.
Writing unit tests to answer the question "Does my software work?" is standard, but it is not always common to answer performance-related questions, such as "Is my software fast?" Existing approaches to this problem are often manually invoked, limited in scope, or require a fairly large amount of maintenance.
At Collabora, we picked up the results from MLPerf and extended the benchmarking system; this allows users to quickly compare the performance of different frameworks with earlier revisions or other implementations. MLBench is flexible and easily configurable; thus, deploying for other frameworks and boards should be straightforward. As we explore the intricacies of benchmarking machine learning models on different hardware platforms, with various frameworks and datasets, our objective is clear: to provide the insights and tools needed to make informed decisions, optimize models, and stay at the forefront of AI innovation.
MLBench is fully open-source and the code is now readily available on GitHub. We hope that by releasing the code, we can engage with the community and push the envelope of what's possible. The community can undoubtedly benefit from having access to the code to build something even better.
With MLBench, developers can now quickly and easily determine how their own changes have affected the performance of their framework, and how their implementations fare against other frameworks unified across different boards.
If you have any questions or feedback, please don't hesitate to leave a comment below or join the discussion on our Gitter #lounge channel!
It is with the utmost excitement that we witness the release of PipeWire 1.0, the first officially stable release of this noteworthy inter-process…
This week, the Debian project takes over Cambridge as MiniDebConf kicks off right in our own British backyard! Organized by Debian project…
As of today, NVK is now an officially conformant implementation of the Vulkan 1.0 API on NVIDIA Turing hardware. This is the first time…