Faith Ekstrand
June 25, 2026
Reading time:
The Panfrost compiler stack is getting a little long in the tooth. It was originally written for Bifrost and then Valhall support was sort of bolted on. It works but the Bifrost compiler is still, at its heart, a Bifrost compiler and we're running into the limitations of the original design.
After doing a thorough evaluation of what we actually want, where we are at now, and what it would take to get there, we came to the conclusion that we need a fresh start. In particular, what we're looking for in a new compiler is:
Proper 64-bit sources. The old IR (Intermediate Representation, the compiler's "working format") borrowed the Bifrost convention where 64-bit ops take two registers, one for each half of the 64-bit value. On Valhall (v9) and all later Mali GPUs, 64-bit sources are a single source in the instruction encoding that has to be aligned to an even register. While a translation from Valhall to Bifrost convention would be easy, translating the other direction requires a bunch of tracking everywhere to know which sources are paired. It's a headache all over the IR.
16-bit SSA defs (and maybe 8-bit, too): The Bifrost IR is fundamentally a 32-bit IR. Each SSA (Static Single Assignment) value maps to one or more 32-bit registers. For mediump and other smaller bit sizes, we support them as vectors that pack multiple values into a 32-bit register. This works okay as long as everything nicely packs. However, our experience with mediump in OpenGL ES is that there are often scalar values which don't get nicely packed into vectors. Those scalars end up taking a full 32-bit register even if only 16 bits of data are used. If we can pack two scalar values together into the same register, we may be able to reduce the number of registers used by a shader and get higher occupancy.
Core IR definition separate from encoding: The Bifrost IR was originally built for Bifrost and it was based on an ISA (Instruction Set Architecture) definition in XML. Each instruction is a Bifrost hardware instruction and the translation from bi_instr to encoded bits is mostly auto-generated. This approach has its advantages but it means that the entire IR is intrinsically linked to the Bifrost instruction set, making problems like the 64-bit source issue very difficult to solve. Instead, we want to separate the core IR from the encoding so we can make choices in the IR that make sense from a compiler perspective and then map that IR onto the ISA as a second step. This will help us isolate hardware generation differences and make it easier to support any changes Arm throws at us in the future.
A new SSA-based register allocator: While the old IR uses SSA form, it still uses an iterative register allocation and spilling approach. This is slow, especially for large shaders which need to spill more than a few values. Even worse, there are cases where the current register allocation and spilling algorithm will simply fail to compile the shader.
HW unit tests: One of the things I learned while writing the Nouveau compiler (for NVIDIA GPUs) is value of a hardware unit test suite. Unlike software unit tests, these compile tiny shaders and execute them on the GPU. This makes it easy to test the exact behavior of the specific hardware instructions. Even though we have access to documentation from Arm, there are still details that aren't documented. But when you're writing a compiler, details matter. The ability to ferret out hardware corner cases is often essential to writing a compiler that's actually correct.
Better generalization of opcodes across data types: The Mali instruction set often has multiple forms of each instruction that operate on different data types. For instance, IAND comes in 4 different variants: IAND.v4i8, IAND.v2i16, IAND.i32, and IAND.i64. In the old Bifrost IR, these are treated as different opcodes. We want a better ability to reason about the different variants of an instruction so that we can write generic code which works on all variants of a given opcode.
v2i8 sources support an array of 16 swizzles which allow selecting any pair of bytes from the source. Other instructions support widening where you can, for instance, select a single byte or 16-bit half word and the hardware will expand it to 32 or 64 bytes as needed by the instruction. In the old Bifrost compiler, we made the mistake of conflating these two concepts in an oversimplified way. This led to a number of bugs when we started trying to optimize shaders to take more advantage of swizzles and widens. We need to rethink the way we model swizzles and widens from the ground up to avoid confusion.As you can see, this is a pretty long wishlist. While some of these changes could probably be made incrementally, some of them would be much harder to make without breaking things. Also, attempting to make large, sweeping changes like this in an incremental way risks breaking the upstream driver our customers rely on.
In the end, we decided on a full rewrite. This gives us the opportunity to not only make the improvements we want but also fix some of the structural issues that we could never fix incrementally. It also means that we can switch over to the new compiler only once it's proven out, avoiding the problem of upstream regressions while things are in flux. The risk, of course, is that a completely new compiler will have completely new bugs that are likely to surface when we switch. However, if we do a good job of fixing the structural issues, switching to the new compiler will hopefully fix more bugs than it creates.
We're also taking the opportunity with this rewrite to switch to Rust. We've had good success with Rust in the Nouveau driver stack (for NVIDIA GPUs) and we're hoping to repeat that success here. We're also able to reuse or share a bunch of minor things with the Nouveau compiler which helps with development. Even though Mali and NVIDIA are different instruction sets, there are still opportunities to share code. Several of the core algorithms and data structures have less to do with the specifics of the instruction set as they do with general compiler theory. And thanks to Rust's efficient traits and call-backs, we're able to implement them in a pluggable way that lets Kraid and the Nouveau compiler maintain their own details while using the same algorithm.
We've already merged Kraid into Mesa and we're continuing development upstream. At the time the initial MR was created, PanVK + Kraid was passing a single Vulkan CTS test. As of the writing of this blog post, we're passing all of the SSBO (Shader Storage Buffer Object) layout tests and work has started on the new register allocator.
Currently, the new compiler is hidden behind a PAN_USE_KRAID=1 environment variable as well as a -Dpanfrost-rust Meson configuration option. This allows us to continue developing upstream without disturbing our users. The Meson configure option also removes the Rust build-time dependency. Some of our customers are using very bespoke Linux distros and may not have Rust incorporated into their build systems yet. This buys them some time and allows them to continue building without Rust for the time being.
While I'm hesitant to promise too much, the goal is to have good things to show by the X.Org Developers Conference at the end of September. If things continue to go well, we will hopefully be able to switch over sometime next year.
25/06/2026
Kraid is a new Rust-based compiler for Panfrost that replaces the aging Bifrost-rooted compiler stack with a cleaner, more flexible design…
17/06/2026
Join us on June 18 for our low-latency ML video analytics demo on the Ryzen AI Max 300 Series at the AMD Embedded Computing Summit!
17/06/2026
Linux kernel 7.1 brings improvements across filesystems, networking, scheduling, graphics, Rust, and hardware enablement, with Collabora…