December 20, 2019
|Panfrost, Lima, and Arm developers together in Montreal during XDC 2019. From left to right: Lyude Paul (Panfrost), Ryan Houdek (Panfrost), Tomeu Vizoso (Panfrost, Collabora), Alyssa Rosenzweig (Panfrost, Collabora), John Einar Reitan (Arm), Rohan Garg (Panfrost, Collabora), Boris Brezillon (Panfrost, Collabora), Erico Nunes (Lima), Connor Abbott (Lima, Panfrost), Rob Herring (Panfrost)|
If you have a device with a Mali T720 or T820 GPU, you’re in luck – your device is now supported in upstream Mesa at feature parity with other GPUs. Get out your Allwinner H6 or Amlogic S912 board, grab the latest Mesa, and enjoy a match of SuperTuxKart with fully free and open source mainline drivers!
When Panfrost began, we focused on the highest performance Mali GPUs found in Chromebooks. By contrast, Mali GPUs like T720 are designed for simplicity, where minimizing size is more important than maximizing performance.
Simplicity for the hardware, that is. For us, those changes mean new complexity – but we’re up to the challenge. Over the past month, Collaboran Tomeu Vizoso and I reverse-engineered the Mali T720 and adapted Panfrost for the new devices.
Much of our work focused on the tiler. As I blogged about over the summer, Mali GPUs are “tiling” architectures, meaning they divide the screen into many small “tiles” or “bins” and operate on those smaller sections of the screen to save memory bandwidth and improve power efficiency. The fastest Mali architectures use “hierarchical tiling”, where many different sizes of tiles are used at once. But this tiler is simplified, with no support for hierarchical tiling. Instead, the driver selects a single tile size used for the entire screen; the new model requires new driver changes. Fortunately after my work on hierarchical tiling over the summer, we were able to figure out the non-hierarchical tiler and then implement our findings in Panfrost with ease.
On the compiler side, these GPUs feature another simplification. Most instruction sets, including Midgard, are based on “registers”, where data can be written and read for computation. On Midgard, there are three types of registers: work registers, load/store registers, and texture registers. Work registers are general purpose, used for arithmetic. Load/store registers and texture registers, however, are special, used with load/store and texture instructions respectively. On most Mali chips, there are three separate sets of registers for each of the three types. But the simplified GPUs are a bit special, diffusing the texture registers into the work and load/store register spaces – a surprising and rather confusing discovery at first. Nevertheless, once we understood this unique phenomenon known as “interpipe register aliasing”, we were able to modify our compiler accordingly, fixing assorted issues relating to textures.
One final focus area surrounded Mali’s framebuffer descriptors. OpenGL features “multiple render target” support, allowing an app to render to different render targets (surfaces) at once, useful for effects like deferred shading. Mali GPUs support this feature in hardware since Mali T760, via the “multiple framebuffer descriptor”. Nevertheless, earlier Mali GPUs do not support this feature in hardware, instead emulating support in software. These GPUs use the simplified “single framebuffer descriptor”. We improved support for handling these simplified descriptors, reverse-engineering and integrating features like transaction elimination. As a bonus, this work also benefits anyone with T6xx GPUs.
With these improvements and many other minor features and bug fixes, we brought T720 and T820 up to feature parity with our existing boards, and added these into our continuous integration infrastructure to ensure Panfrost continues to work beautifully. Panfrost is now ready for daily use on Mali GPUs from T720 to T860. All of the source code is upstream… so happy hacking :-)
Did you know you could run a permissively-licensed MTP implementation with minimal dependencies on an embedded device? Here's a step-by-step…
Earlier this year, the Rust compiler gained support for LLVM source-base code coverage. In this post we'll explain how to setup a CI job…
Over the past few months, I've been working on a side project to improve Meson sub-project support. The best stress test is to build projects…
The most complete automated testing and continuous integration tool for the Linux kernel continues to evolve at a rapid pace. Here's a look…
In the embedded world, many modern SoCs such as the ST Microelectronics STM32MP1 now include coprocessor cores which can be used for a wide…
Our recent efforts on the Hantro kernel driver have resulted in the addition of H.264 decoding support and multiple performance improvements.…