We're hiring!
*

Bifrost meets GNOME: Onward & upward to zero graphics blobs

Alyssa Rosenzweig avatar

Alyssa Rosenzweig
June 05, 2020

Share this post:

Reading time:

Bifrost встречается с GNOME: Вперед и вверх до нуля



In our last blog update for Panfrost, the free and open-source graphics driver for modern Mali GPUs, we announced initial support for the Bifrost architecture. We have since extended this support to all major features of OpenGL ES 2.0 and even some features of desktop OpenGL 2.1. With only free software, a Mali G31 chip can now run Wayland compositors with zero-copy graphics, including GNOME 3. We can run every scene in glmark2-es2, and 3D games like Neverball can be played. In addition, we can support hardware-accelerated video players mpv and Kodi. Screenshots above are from a Mali G31 board running Panfrost.

All of the above is included in upstream Mesa with no out-of-tree patches required, with the upcoming Bifrost support enabled via the PAN_MESA_DEBUG=bifrost environmental variable.

Collabora - GNOME Shell
GNOME Shell
  
Collabora - Neverball
Neverball

New opcodes

Bringing up these new applications required implementing many new floating-point arithmetic opcodes, including comparisons, selections, and additional type conversions. Further, I’ve added initial support for integer arithmetic and bitwise operations, used to implement integer types directly as well as booleans. While there are a number of arithmetic logic unit (ALU) opcodes required, this is not an obstacle on architectures with regular instruction encodings.

Unfortunately, Bifrost is not a regular architecture and has dozens of distinct instruction encodings in order to conserve space. Adding opcodes to the compiler is still routine, but requires adding quite a bit more code. Plus, the duplication can be error-prone, so as soon as I add a new opcode, I add comprehensive tests against the real hardware iterating through different combinations of operand size and modifiers to exercise all the packing special cases.

The upshot is that the testing coverage eliminates entire classes of compiler bugs which tend to plague new drivers, allowing our open source Bifrost driver to flourish despite such a quirky architecture.

Beyond new ALU opcodes, I extended the texture support to enable simple texture operations from vertex shaders, a pattern occurring in glmark2’s terrain scene. Mali GPUs use slightly different encodings for fragment and vertex texture operations, since fragment shaders can automatically compute the level-of-detail parameter based on neighboring fragments, whereas there is no notion of neighboring fragments in vertex shaders.

Finally, I added initial control flow support (branching) support for if/else statements and loops. As Bifrost is a Single Instruction, Multiple Thread (SIMT) architecture in which multiple threads run the same shader in lockstep, branching is a complicated affair if threads diverge. Most of the complexity is handled in hardware, but just enough seeps through that the branching implementation ends up a hair more complicated than that of Midgard. Still, it’s enough for glmark2’s loop scene, and there’s always room for improvement.

A simpler IR

Of couse, Bifrost progress is no obstacle to improving our Midgard support. Inspired by the lessons learned designing the Bifrost Intermediate Representation as previously blogged, I revisited our Midgard Intermediate Representation as well. The focus was two fold:

  • Simplify to enable faster, more effective optimizations in fewer lines of code.

  • Generalize the IR to support non-32-bit operation.

To do so, I implemented generic helpers for inferring instruction modifiers like saturation. Consider a shader that squares a variable and saturates it to the range [0, 1].

X = clamp(X * X, 0.0, 1.0);

In NIR, Mesa’s common intermediate representation used across drivers, this line might look like the following, using NIR’s fsat opcode to clamp to [0, 1]:

ssa_10 = fmul ssa_9, ssa_9
ssa_11 = fsat ssa_10

Our hardware has native support for saturating the results of floating-point instructions. There are a few approaches to take advantage of this. One is to use NIR’s builtin saturation handling, as Midgard’s compiler used to. A NIR pass can fuse the fsat instruction into the multiply, producing the NIR:

ssa_10 = fmul.sat ssa_9, ssa_9

Then our backend compiler can use the .sat flag directly. While this is an easy approach, it is inflexible, since the hardware might be able to use modifiers that NIR does not express. For instance, Mali GPUs have a .clamp_positive operation which does max(x, 0.0) on the result for free. If we wrote X = max(X * X, 0.0), NIR could give us code using a dedicated fclamp_positive instruction:

ssa_10 = fmul ssa_9, ssa_9
ssa_11 = fclamp_positive ssa_10

However, it could not fuse the modifier in without substantial changes affecting common code. The second approach would be to compile this to two instructions in the IR, and use a second propagation pass on our backend IR to fuse it together.

10 = fmul 9, 9                        10 = fmul.pos 9, 9
11 = fclamp_positive ssa_10

However, there’s a third option unifying both cases and simplifying the compiler: inferring the modifiers generically while translating NIR into our backend IR. This enables us to use architecture-specific modifiers, like .pos, while still having the original NIR available for efficient handling. This approach enabled us to replace hundreds of lines of optimizations for floating-point modifiers and bitwise inverses, while optimizing new patterns that the original design could not, promising savings in code complexity and performance improvements. Since it’s generic, it allows us to optimize not just Midgard programs, but soon Bifrost modifiers as well.

Midgard FP16

With a simpler compiler, I was able to add 16-bit support to the Midgard compiler to reduce register pressure and improve thread count (occupancy) due to the architecture’s register sharing mechanism. As previously blogged, our Bifrost compiler is built to support this from day 1, and through the lessons learned there, I was able to backport the improvements to Midgard.

To prepare, I added types into the IR to avoid compiler passes requiring type inference, a complex and error prone pursuit. Once type sizes were preserved cleanly, I added additional support to the Midgard compiler’s packing routines to handle some outstanding details of 16-bit instructions. Midgard is significantly simpler to pack than Bifrost; whereas 16-bit and 32-bit instructions on Bifrost involve separate instructions with dramatically differing opcodes and formats, Midgard has a one-size-fits-most approach which – despite its inherent limitations – is refreshing. Miscellaneous fixes were needed across the compiler; nevertheless, the simplified IR lived up to its design and is now able to support 16-bit operations.

The bulk of the code required for FP16 has now landed in upstream Mesa but is disabled by default pending further testing. Nevertheless, for the adventurous among you, you can set PAN_MESA_DEBUG=fp16 on a recent build of master. Beware: here be dragons.

Colour masks

Stepping away from the compiler, an interesting improvement is the new handling of draws with colour masked out. A typical draw in OpenGL that does not use blending or colour masks might look like:

glColorMask(true, true, true, true);
glDepthMask(true);
glDrawArrays(GL_TRIANGLES, 0, 15);

Since blending is disabled and all colour channels (RGBA) are written simultaneously, this draw does not need to read from the colour buffer (tilebuffer). But what if the draw does not write to any colour channels?

glColorMask(false, false, false, false);
glDepthMask(true);
glDrawArrays(GL_TRIANGLES, 0, 15);

Naively, the GPU would need to read the previous colour and write it back immediately - but that’s wasteful. Instead, we can detect the case where no colour is written, and elide all access to the colour buffer, skipping both the read and the write.

Could we skip the draw entirely? If there are no side effects, we can, but applications typically mask out colour while also unmasking the depth buffer, which is independent of the colour computation. Midgard has a solution.

Even if depth/stencil updates are required, as long as the shader only computes colour with no side effects, there’s no reason to run the shader. While Bifrost does not appear to, Midgard allows the driver to specify a draw with no shader, saving not only colour buffer read/write but also shader execution.

Community contributions

In addition to our work on Midgard performance, community Panfrost hacker Icecream95 has been improving the Midgard stack nonstop.

Since our last blog post, they contributed a major bug fix for handling discard instructions. For background, OpenGL conceptually first runs the fragment shader for each pixel on the screen and then performs depth testing. In practice, modern hardware attempts to perform depth tests before running the shader, known as “early-z” testing, in order to avoid needlessly executing the shader for occluded pixels.

However, games use discard, an OpenGL directive allowing shaders to eliminate fragments, which can interfere with optimizations like early-z. The driver is responsible for detecting these situations, disabling these optimizations, and enabling standards-compliant fallback paths including “late-z” testing. After Icecream95 investigated issues with Panfrost’s handling of depth testing in the presence of discard instructions, they were able to fix rendering bugs in many games including SuperTuxKart, OpenMW, and RVGL.

On the performance front, in the past they have significantly optimized Panfrost’s tiling routines and Mesa’s min/max index calculation, and added support for ASTC and ETC compressed textures.

Some Panfrost (Mali T760) screenshots of games improved by Icecream95’s patches:

Hats off to a great community contributor!

Performance counters

One final area that we’ve been working on is exposing Mali’s performance counters to userspace in Panfrost, allowing us to identify bottlenecks in the driver and other developers to identify bottlenecks in their application running on Panfrost. For about a year, we have had experimental support for passing the raw counters from kernelspace. Collaborans Antonio Caggiano and Rohan Garg, in conjunction with Icecream95 and other contributors, have been working on integrating these counters with Perfetto to enable high-level analysis with an elegant, free software user interface.

Perfetto with Panfrost on Mali T760

Looking ahead

In the past 3 months since we began work on Bifrost, fellow Collaboran Tomeu Vizoso and I have progressed from stubbing out the new compiler and command stream in March to running real programs by May. Driven by a reverse-engineering effort in tandem with the free software community, we are confident that against proprietary blobs and downstream hacks, open-source software will prevail.

Looking to the future, we plan to improve Bifrost’s coverage of OpenGL ES 2.0 to support more 3D games, now that the basic accelerated desktop is working. We also plan to improve Bifrost compiler performance, in order to approach the proprietary stack’s performance as we did for Midgard. Most of all, we’d like to build a community around the driver, with software freedom and an open first approach as core values.

It worked for Freedreno, Etnaviv, and Lima. It worked for Panfrost on Midgard. And I’m confident it will work again on Bifrost.

Happy hacking.

Search the newsroom

Latest Blog Posts

Re-thinking framebuffers in PanVK

23/03/2026

PanVK’s new framebuffer abstraction for Mali GPUs removes OpenGL-specific constraints, unlocking more flexible tiled rendering features…

Running Mainline Linux, U-Boot, and Mesa on Rockchip: A year in review

02/03/2026

Get the recap of Nicolas Frattaroli's FOSDEM talk detailing Rockchip’s mainline progress, including Vulkan 1.4 and NPU support as a vital…

Now streaming: Collabora XDC 2025 presentations

02/12/2025

As an active member of the freedesktop community, Collabora was busy at XDC 2025. Our graphics team delivered five talks, helped out in…

Implementing Bluetooth LE Audio & Auracast on Linux systems

24/11/2025

LE Audio introduces a modern, low-power, low-latency Bluetooth® audio architecture that overcomes the limitations of classic Bluetooth®…

Strengthening KernelCI: New architecture, storage, and integrations

17/11/2025

Collabora’s long-term leadership in KernelCI has delivered a completely revamped architecture, new tooling, stronger infrastructure, and…

Font recognition reimagined with FasterViT-2

11/11/2025

Collabora extended the AdobeVFR dataset and trained a FasterViT-2 font recognition model on millions of samples. The result is a state-of-the-art…

Open Since 2005 logo

Our website only uses a strictly necessary session cookie provided by our CMS system. To find out more please follow this link.

Collabora Limited © 2005-2026. All rights reserved. Privacy Notice. Sitemap.