We're hiring!
*

GNOME meets Panfrost

Alyssa Rosenzweig avatar

Alyssa Rosenzweig
June 26, 2019

Share this post:

Reading time:

In my last Panfrost blog post, I announced my internship goal: improve Panfrost to run GNOME3. GNOME is a popular Linux desktop making heavy use of OpenGL; to use GNOME with only free and open source software on a machine with Mali graphics, Panfrost is necessary.

Two months ahead of schedule, here I am, drafting this blog post from GNOME on my laptop running Panfrost!

Screenshot of GNOME with Panfrost.

A tiled architecture

Bring-up of GNOME required improving the driver's robustness and performance, focused on Mali's tiled architecture. Typically found in mobile devices, tiling GPU architectures divide the screen into many small tiles, like a kitchen floor, rendering each tile separately. This allows for unique optimizations but also poses unique challenges.

One natural question is: how big should tiles be? If the tiles are too big, there's no point to tiling, but if the tiles are too small, the GPU will repeat unnecessary work. Mali offers a hybrid answer: allow lots of different sizes! Mali's technique of "hierarchical tiling" allows the GPU to use tiles as small as 16x16 pixels all the way up to 2048x2048 pixels. This "sliding scale" allows different types of content to be optimized in different ways. The tiling needs of a 3D game like SuperTuxKart are different from those of a user interface like GNOME Shell, so this technique gets us the best of both worlds!

Although primarily handled in hardware, hierarchical tiling is configured by the driver; I researched this configuration mechanism in order to understand it and improve our configuration with respect to performance and memory usage.

Tiled architectures additionally present an optimization opportunity: if the driver can figure out a priori which 16x16 tiles will definitely not change, those tiles can be culled from rendering entirely, saving both read and write bandwidth. As a conceptual example, if the GPU composites your entire desktop while you're writing an email, there's no need to re-render your web browser in the other window, since that hasn't changed. I implemented an initial version of this optimization in Panfrost, accumulating the scissor state across draws within a frame, rendering only to the largest bounding box of the scissors. This optimization is particularly helpful for desktop composition, ideally improving performance on workloads like GNOME, Sway, and Weston.

…Of course, theory aside, mostly what GNOME needed was a good, old-fashioned bugfixing spree, because the answer is always under your nose. Turns out what really broke the desktop was a trivial bug in the viewport specification code. Alas.

Scoreboarding

Looking forward to sophisticated workloads as this open driver matures, I researched job "scoreboarding". For some background, the Mali hardware divides a frame into many small "jobs". For instance, a "vertex job" executes a vertex shader; a "tiler job" executes tiling (sorting geometry job into tiles at varying hierarchy levels). Many of these jobs have to execute in a specific order; for instance, geometry has to be output by a vertex job before a tiler job can read that geometry. Previously, these relationships were hard-coded into the driver, which was okay for simple workloads but does not scale well.

I have since replaced this code with an elegant dependency management system, based on the hardware's scoreboarding. Instead of hard-coding relationships, the driver can now specify high level dependencies, and a generic algorithm (based on topological sorting) works out the order of submission and scoreboard flags necessary to actualize the given requirements. The new scoreboarding implementation has enabled new features, like rasterizer discard, to be implemented with ease.

With these improvements and more, several new features have landed in the driver, fixing hundreds of failing dEQP tests since my last blog post, bringing us nearer to conformance on OpenGL ES 2.0 and beyond.


Add a Comment

Search the newsroom

Latest Blog Posts

Running Mainline Linux, U-Boot, and Mesa on Rockchip: A year in review

02/03/2026

Get the recap of Nicolas Frattaroli's FOSDEM talk detailing Rockchip’s mainline progress, including Vulkan 1.4 and NPU support as a vital…

Now streaming: Collabora XDC 2025 presentations

02/12/2025

As an active member of the freedesktop community, Collabora was busy at XDC 2025. Our graphics team delivered five talks, helped out in…

Implementing Bluetooth LE Audio & Auracast on Linux systems

24/11/2025

LE Audio introduces a modern, low-power, low-latency Bluetooth® audio architecture that overcomes the limitations of classic Bluetooth®…

Strengthening KernelCI: New architecture, storage, and integrations

17/11/2025

Collabora’s long-term leadership in KernelCI has delivered a completely revamped architecture, new tooling, stronger infrastructure, and…

Font recognition reimagined with FasterViT-2

11/11/2025

Collabora extended the AdobeVFR dataset and trained a FasterViT-2 font recognition model on millions of samples. The result is a state-of-the-art…

Expanding access to XR: Google Cardboard comes to Monado

31/10/2025

Collabora has advanced Monado's accessibility by making the OpenXR runtime supported by Google Cardboard and similar mobile VR viewers so…

Open Since 2005 logo

Our website only uses a strictly necessary session cookie provided by our CMS system. To find out more please follow this link.

Collabora Limited © 2005-2026. All rights reserved. Privacy Notice. Sitemap.