We're hiring!
*

Open Source OpenGL ES 3.1 on Mali GPUs with Panfrost

Alyssa Rosenzweig avatar

Alyssa Rosenzweig
June 11, 2021

Share this post:

Reading time:

Panfrost, the open source driver for Arm Mali, now supports OpenGL ES 3.1 on both Midgard (Mali T760 and newer) and Bifrost (Mali G31, G52, G72) GPUs. OpenGL ES 3.1 adds a number of features on top of OpenGL ES 3.0, notably including compute shaders. While Panfrost has had limited support for compute shaders on Midgard for use in TensorFlow Lite, the latest work extends the support to more GPUs and adds complementary features required by the OpenGL ES 3.1 specification, like indirect draws and no-attachment framebuffers.

The new feature support represents the cumulative effort of multiple Collaborans -- Boris Brezillon, Italo Nicola, and myself -- in tandem with the wider Mesa community. The OpenGL driver has seen over 1000 commits since the beginning of 2021, including several hundred targeting OpenGL ES 3.1 features. Our focus is Mali G52, where we are passing essentially all drawElements Quality Program and Khronos conformance tests and are aiming to become formally conformant. Nevertheless, thanks to a unified driver, many new features on Bifrost trickle down to Midgard allowing the older architecture still in wide use to improve long after the vendor has dropped support. On Mali T860, we are passing about 99.5% of tests required for conformant OpenGL ES 3.1. That number can only grow thanks to Mesa's continuous integration running these tests for every merge request and preventing Panfrost regressions. With a Vulkan driver in the works, Panfrost's API support is looking good.

Instruction scheduling

Since the last Panfrost update, we've added an instruction scheduler to the Bifrost compiler. To understand the motivation, recall the hardware design. The Bifrost instruction set pairs instructions into "tuples", one using the multipliers and the other using the adders. Up to 8 tuples are grouped into a "clause", a sequence of instructions with fixed latency that can execute back-to-back with no pipeline bubbles in the middle. The benefit to the hardware designers was that Bifrost's pipeline could be statically filled by the compiler, rather than adding logic in the hardware to dynamically dispatch instructions to different parts of the units like a superscalar chip. Unfortunately, that means the compiler becomes significantly more complicated, as it has to group instructions itself satisfying dozens of architectural invariants. If any condition fails to be met, the GPU will fault with an Invalid Instruction Encoding exception and abort execution. In Panfrost, we've approached this by formally modeling the constraints to produce a predicate (function returning a boolean) for whether a given instruction may be scheduled in a given position in the program. Then it's simple enough to schedule greedily by choosing instructions with this predicate according to some selection heuristic. Wait, selection heuristic?

An algorithm is "greedy" if it makes a locally optimal choice at every step. On the surface, it seems like that strategy would produce a globally optimal result. In special cases, this is true and the best algorithms to solve a problem are greedy. Unfortunately, greedy algorithms produce suboptimal results on many other problems, sometimes spectacularly so. Instruction scheduling is one of those cases: when the predicate shows two different instructions can be scheduled next, which one should be picked? It's not enough to always pick the first or pick one at random; while both strategies are locally optimal, both have poor global performance. We need a heuristic to choose the best instruction from the set of candidate instructions at each point, taking into account how it will affect our ability to schedule in the future. Coming up with good heuristics is tricky, and we have a great deal of room to grow in Panfrost, but the basic model is serving us well so far.

Towards zero overhead

Another large change to the driver since our last blog was the addition of dirty tracking, a common graphics driver optimization with a twist for Mali. Typical GPUs are stateful, and the driver emits commands to set pieces of graphics state -- for instance, setting uniforms -- before each draw. Mali GPUs present an unusual stateless interface. Instead of commands, the driver prepares descriptors containing large amounts of graphics state bundled together, and each draw has pointers to the different descriptors. In a sense, typical GPUs are programmed with an OpenGL-like state machine, whereas Mali is programmed with Vulkan-like pipelines.

There is conventional graphics wisdom that "state changes are expensive", so graphics programmers try to minimize the number of API calls they make. Drivers can help minimize state changes as well, by tracking which state is "dirty" (modified) and which state is "clean" (unmodified). Then the driver only has to emit commands for the dirty state, reducing its own CPU overhead as well as reducing work for the GPU to process.

Surprisingly, the same idea generalizes to Mali. It is inefficient for the driver to upload every Mali descriptor for every application draw call. Ideally, we could reuse the same descriptor for subsequent draw calls if we know the state hasn't changed. Dirty tracking lets us know exactly when the state has changed, allowing us to only upload new descriptors when required, and reusing the descriptor otherwise. On the surface, the purpose is simply reducing CPU overhead, since the GPU's programming model is stateless and therefore must redo work anyway. However, the GPU has several layers of caches, so reusing these descriptors can enable the GPU to use cached descriptors as opposed to invalidating its descriptor caches after every draw call. Implementing dirty tracking in Panfrost improved draws per second in one synthetic benchmark by about 400%. Not bad for a week's work.

Looking forward

In the coming months, we're aiming to polish the OpenGL ES 3.1 support in time for the Mesa 21.2 release next month. Next stop after that: Bifrost performance improvements and introducing support for the modern Valhall (Mali G77 and newer) architecture family.

Search the newsroom

Latest Blog Posts

Re-thinking framebuffers in PanVK

23/03/2026

PanVK’s new framebuffer abstraction for Mali GPUs removes OpenGL-specific constraints, unlocking more flexible tiled rendering features…

Running Mainline Linux, U-Boot, and Mesa on Rockchip: A year in review

02/03/2026

Get the recap of Nicolas Frattaroli's FOSDEM talk detailing Rockchip’s mainline progress, including Vulkan 1.4 and NPU support as a vital…

Now streaming: Collabora XDC 2025 presentations

02/12/2025

As an active member of the freedesktop community, Collabora was busy at XDC 2025. Our graphics team delivered five talks, helped out in…

Implementing Bluetooth LE Audio & Auracast on Linux systems

24/11/2025

LE Audio introduces a modern, low-power, low-latency Bluetooth® audio architecture that overcomes the limitations of classic Bluetooth®…

Strengthening KernelCI: New architecture, storage, and integrations

17/11/2025

Collabora’s long-term leadership in KernelCI has delivered a completely revamped architecture, new tooling, stronger infrastructure, and…

Font recognition reimagined with FasterViT-2

11/11/2025

Collabora extended the AdobeVFR dataset and trained a FasterViT-2 font recognition model on millions of samples. The result is a state-of-the-art…

Open Since 2005 logo

Our website only uses a strictly necessary session cookie provided by our CMS system. To find out more please follow this link.

Collabora Limited © 2005-2026. All rights reserved. Privacy Notice. Sitemap.