Benchmarking machine learning frameworks

Benchmarking machine learning frameworks

Vineet Suryan
November 02, 2023

Share this post:

Reading time:

Benchmarking machine learning frameworks

Introduction

System overview

Outlook

For the successful integration, development, and maintenance of machine learning frameworks, benchmarking is of paramount importance. Developers or maintainers should be able to effortlessly gauge how their frameworks perform compared to other implementations, prior code versions, or across different boards, with respect to both runtime performance and other metrics. Despite this clear need, there exist relatively few attempts to address the issue [1]–[3]. Further, existing attempts are generally not very automated, meaning they have high maintenance requirements and thus are of limited utility in the context of software development workflows.

At Collabora, we developed MLBench, a flexible benchmarking system for the needs of benchmarking machine learning frameworks across different backends and boards. Given some implementations of models, sets of parameters with which to run each model, and a set of datasets, the system will automatically gather benchmarking information for each combination of parameters and datasets, and store it in a database. This benchmarking information is not limited to runtimes but also includes other metrics such as power consumption, memory and CPU usage, and precision.

As we will demonstrate below, MLBench is versatile tool that can be adapted to many different use cases. It enables developers to easily:

compare the runtimes of different frameworks' implementations of the same models;
display the memory usage over time for a particular run;
show the changes in runtime for different revisions of a particular framework implementation for a particular setting; and
given some performance metric (i.e., precision, power consumption), compare the performance of different frameworks or different implementations of the same model.

We have also integrated MLBench into a continuous integration tool, allowing it to operate in a relatively low-maintenance, turnkey fashion.

System overview

A primary obstacle for a benchmarking system is varying interfaces for the tested frameworks; for instance, frameworks may be written in different languages. In addition, the desired output—runtimes, performance, or other metrics may be returned differently. This diversity necessitates the construction of wrapper scripts to standardize the program interfaces. For each method to be benchmarked, our system expects a wrapper script that handles the different aspects of running the framework.

Assuming that wrapper scripts have been written for each framework of interest, the user must then specify a configuration file. This configuration file serves as a blueprint, detailing the models and datasets associated with each library. By utilizing the wrapper script, the system ensures consistent monitoring of different metrics to facilitate meaningful comparisons.

With MLBench, the ease of comparison becomes evident at a glance. You can effortlessly assess runtime performance, memory usage trends, and more, making it simple to pinpoint the strengths and weaknesses of each configuration. MLBench streamlines the process of evaluating machine learning frameworks, enabling informed decision-making for optimal results. The dashboard image below illustrates a side-by-side comparison of the Inception V4 on Jetson Nano with TensorRT and onnxruntime, Rockpi RK3399 with tflite, and Coral TPU with tflite for EdgeTPU.

MLBench is a versatile tool that's adaptable to a range of setups, whether you have unique hardware or different software settings. Its flexible nature allows delving into multiple scenarios, helping to uncover insights about performance and architecture issues. Check out the MLBench dashboard for comprehensive benchmarking of machine learning models and hardware configurations. It offers in-depth performance evaluations and collects data on temperature, memory usage, power consumption, CPU/GPU utilization, and more. Visit MLBench for detailed information.

Outlook

Writing unit tests to answer the question "Does my software work?" is standard, but it is not always common to answer performance-related questions, such as "Is my software fast?" Existing approaches to this problem are often manually invoked, limited in scope, or require a fairly large amount of maintenance.

At Collabora, we picked up the results from MLPerf and extended the benchmarking system; this allows users to quickly compare the performance of different frameworks with earlier revisions or other implementations. MLBench is flexible and easily configurable; thus, deploying for other frameworks and boards should be straightforward. As we explore the intricacies of benchmarking machine learning models on different hardware platforms, with various frameworks and datasets, our objective is clear: to provide the insights and tools needed to make informed decisions, optimize models, and stay at the forefront of AI innovation.

MLBench is fully open-source and the code is now readily available on GitHub. We hope that by releasing the code, we can engage with the community and push the envelope of what's possible. The community can undoubtedly benefit from having access to the code to build something even better.

With MLBench, developers can now quickly and easily determine how their own changes have affected the performance of their framework, and how their implementations fare against other frameworks unified across different boards.

If you have any questions or feedback, please don't hesitate to leave a comment below or join the discussion on our Gitter #lounge channel!

WhisperSpeech: Exploring new horizons in text-to-speech tech

Triple Threat: The Power of Transcription, Summary, and Translation

MLfix to quickly fix datasets

WhisperSpeech: Exploring new horizons in text-to-speech tech

Triple Threat: The Power of Transcription, Summary, and Translation

MLfix to quickly fix datasets

Search the newsroom

Latest News & Events

AMD Embedded Computing Summit 2026 in Eindhoven

17/06/2026

Join us on June 18 for our low-latency ML video analytics demo on the Ryzen AI Max 300 Series at the AMD Embedded Computing Summit!

Kernel 7.1: Graphics, Rust, and SoC Improvements

17/06/2026

Linux kernel 7.1 brings improvements across filesystems, networking, scheduling, graphics, Rust, and hardware enablement, with Collabora…

Making OpenXR Spatial at AWE USA 2026

12/06/2026

Next week we'll be in Long Beach, California for AWE USA 2026. Catch our talk exploring OpenXR, from its practical foundation for cross-platform…

About Collabora

Whether writing a line of code or shaping a longer-term strategic software development plan, we'll help you navigate the ever-evolving world of Open Source.

한국의 국기 한국어 버전의 Collabora.com 보기