Open Source OpenGL ES 3.1 on Mali GPUs with Panfrost

Open Source OpenGL ES 3.1 on Mali GPUs with Panfrost

Alyssa Rosenzweig
June 11, 2021

Share this post:

Reading time:

Panfrost, the open source driver for Arm Mali, now supports OpenGL ES 3.1 on both Midgard (Mali T760 and newer) and Bifrost (Mali G31, G52, G72) GPUs. OpenGL ES 3.1 adds a number of features on top of OpenGL ES 3.0, notably including compute shaders. While Panfrost has had limited support for compute shaders on Midgard for use in TensorFlow Lite, the latest work extends the support to more GPUs and adds complementary features required by the OpenGL ES 3.1 specification, like indirect draws and no-attachment framebuffers.

The new feature support represents the cumulative effort of multiple Collaborans -- Boris Brezillon, Italo Nicola, and myself -- in tandem with the wider Mesa community. The OpenGL driver has seen over 1000 commits since the beginning of 2021, including several hundred targeting OpenGL ES 3.1 features. Our focus is Mali G52, where we are passing essentially all drawElements Quality Program and Khronos conformance tests and are aiming to become formally conformant. Nevertheless, thanks to a unified driver, many new features on Bifrost trickle down to Midgard allowing the older architecture still in wide use to improve long after the vendor has dropped support. On Mali T860, we are passing about 99.5% of tests required for conformant OpenGL ES 3.1. That number can only grow thanks to Mesa's continuous integration running these tests for every merge request and preventing Panfrost regressions. With a Vulkan driver in the works, Panfrost's API support is looking good.

Instruction scheduling

Since the last Panfrost update, we've added an instruction scheduler to the Bifrost compiler. To understand the motivation, recall the hardware design. The Bifrost instruction set pairs instructions into "tuples", one using the multipliers and the other using the adders. Up to 8 tuples are grouped into a "clause", a sequence of instructions with fixed latency that can execute back-to-back with no pipeline bubbles in the middle. The benefit to the hardware designers was that Bifrost's pipeline could be statically filled by the compiler, rather than adding logic in the hardware to dynamically dispatch instructions to different parts of the units like a superscalar chip. Unfortunately, that means the compiler becomes significantly more complicated, as it has to group instructions itself satisfying dozens of architectural invariants. If any condition fails to be met, the GPU will fault with an Invalid Instruction Encoding exception and abort execution. In Panfrost, we've approached this by formally modeling the constraints to produce a predicate (function returning a boolean) for whether a given instruction may be scheduled in a given position in the program. Then it's simple enough to schedule greedily by choosing instructions with this predicate according to some selection heuristic. Wait, selection heuristic?

An algorithm is "greedy" if it makes a locally optimal choice at every step. On the surface, it seems like that strategy would produce a globally optimal result. In special cases, this is true and the best algorithms to solve a problem are greedy. Unfortunately, greedy algorithms produce suboptimal results on many other problems, sometimes spectacularly so. Instruction scheduling is one of those cases: when the predicate shows two different instructions can be scheduled next, which one should be picked? It's not enough to always pick the first or pick one at random; while both strategies are locally optimal, both have poor global performance. We need a heuristic to choose the best instruction from the set of candidate instructions at each point, taking into account how it will affect our ability to schedule in the future. Coming up with good heuristics is tricky, and we have a great deal of room to grow in Panfrost, but the basic model is serving us well so far.

Towards zero overhead

Another large change to the driver since our last blog was the addition of dirty tracking, a common graphics driver optimization with a twist for Mali. Typical GPUs are stateful, and the driver emits commands to set pieces of graphics state -- for instance, setting uniforms -- before each draw. Mali GPUs present an unusual stateless interface. Instead of commands, the driver prepares descriptors containing large amounts of graphics state bundled together, and each draw has pointers to the different descriptors. In a sense, typical GPUs are programmed with an OpenGL-like state machine, whereas Mali is programmed with Vulkan-like pipelines.

There is conventional graphics wisdom that "state changes are expensive", so graphics programmers try to minimize the number of API calls they make. Drivers can help minimize state changes as well, by tracking which state is "dirty" (modified) and which state is "clean" (unmodified). Then the driver only has to emit commands for the dirty state, reducing its own CPU overhead as well as reducing work for the GPU to process.

Surprisingly, the same idea generalizes to Mali. It is inefficient for the driver to upload every Mali descriptor for every application draw call. Ideally, we could reuse the same descriptor for subsequent draw calls if we know the state hasn't changed. Dirty tracking lets us know exactly when the state has changed, allowing us to only upload new descriptors when required, and reusing the descriptor otherwise. On the surface, the purpose is simply reducing CPU overhead, since the GPU's programming model is stateless and therefore must redo work anyway. However, the GPU has several layers of caches, so reusing these descriptors can enable the GPU to use cached descriptors as opposed to invalidating its descriptor caches after every draw call. Implementing dirty tracking in Panfrost improved draws per second in one synthetic benchmark by about 400%. Not bad for a week's work.

Looking forward

In the coming months, we're aiming to polish the OpenGL ES 3.1 support in time for the Mesa 21.2 release next month. Next stop after that: Bifrost performance improvements and introducing support for the modern Valhall (Mali G77 and newer) architecture family.

PanVk: An Open Source Vulkan driver for Arm Mali Midgard and Bifrost GPUs

Desktop OpenGL 3.1 on Mali GPUs with Panfrost

From Panfrost to production, a tale of Open Source graphics

PanVk: An Open Source Vulkan driver for Arm Mali Midgard and Bifrost GPUs

Desktop OpenGL 3.1 on Mali GPUs with Panfrost

From Panfrost to production, a tale of Open Source graphics

Comments (11)

José Manuel:
Jun 13, 2021 at 09:42 AM

Thank you all for your huge contribution to the open source world!

Reply to this comment

Reply to this comment
debiangamer:
Jun 15, 2021 at 08:15 AM

" many new features on Bifrost trickle down to Midgard allowing the older architecture still in wide use to improve long after the vendor has dropped support. On Mali T860, we are passing about 99.5% of tests required for conformant OpenGL ES 3.1."

The Panfrost driver is unusable with Mali T820 while you are working on this. The Panfrost driver is slow, spams kernel messages to syslog and crashes time to time. The fbdev driver with llvm is faster and stable. Will this long time bug get ever fixed?

https://gitlab.freedesktop.org/mesa/mesa/-/issues/3143

Reply to this comment

Reply to this comment
1. Daniel Stone:
  Jun 15, 2021 at 02:17 PM
  
  As you were told by Arm developers on dri-devel@, the message in dmesg is likely to be removed. However, the fact you are seeing those messages indicates that your system is already under severe memory pressure which is the root cause of your problems. Diagnosing and fixing this is your first step.
  
  T860 is the best-supported GPU for Midgard. T820 has some rough edges, and these will be fixed, however with 18 different GPU revisions - all with their own differences and quirks to support - not every GPU will be perfect all the time.
  
  Reply to this comment
  
  Reply to this comment
  1. debiangamer:
    Jun 15, 2021 at 03:58 PM
    
    "severe memory pressure which is the root cause of your problems. Diagnosing and fixing this is your first step."
    
    Yes, the Sunvell T95Z Plus TV box has 2GB RAM and Firefox is using that all. fbdev and llvmpipe handles memory pressure situation fine but not Panfrost. But the Xfce desktop runs slowly after boot when there is RAM available.
    
    Reply to this comment
    
    Reply to this comment
    1. Daniel Stone:
      Jun 15, 2021 at 06:28 PM
      
      llvmpipe can swap all its pages out to disk because it's a CPU renderer. We can't do that with GPU acceleration, because the GPU needs to be able to access those pages in memory, not on disk. Anyway, as discussed in the thread on dri-devel@, the message is going to be removed, but the error-path handling cannot be removed.
      
      Reply to this comment
      
      Reply to this comment
    2. debiangamer:
      Jun 15, 2021 at 06:39 PM
      
      Disabling the Xfce desktop compositing and using this command makes the Xfce desktop and Firefox works better with Panfrost: sudo xfconf-query -c xfwm4 -p /general/vblank_mode -t string -s "xpresent" --create
      However, the Panfrost kernel driver spams to dmesg:
      [ 921.575208] panfrost d00c0000.gpu: AS_ACTIVE bit stuck
      [ 921.579233] panfrost d00c0000.gpu: AS_ACTIVE bit stuck
      [ 921.590248] panfrost d00c0000.gpu: AS_ACTIVE bit stuck
      [ 921.922100] panfrost d00c0000.gpu: AS_ACTIVE bit stuck
      
      The spam message in panfrost_gem_shrinker_scan is disabled.
      // if (freed > 0)
      // pr_info_ratelimited("Purging %lu bytes\n", freed
      
      Reply to this comment
      
      Reply to this comment
      1. pepe:
        Jan 22, 2022 at 03:04 AM
        
        how do i disable that?
        
        Reply to this comment
        
        Reply to this comment
Walter ZAMBOTTI:
Jul 16, 2021 at 08:05 AM

I would just like to thank everyone working on the Panfrost project for a fantastic effort and result. The recent July 2021 updates have completely changed my ARM desktop experience (for the better).

Many thanks.

Walter ZAMBOTTI
Independent developer

System tested. ODROID N2 - Ubuntu 21.04 MATE - Kernel 5.13 - MESA 5:21.2.0

Reply to this comment

Reply to this comment
mctom:
Jul 16, 2021 at 08:48 AM

Just wanted to say big Thank You from Odroid community, people on the official forum are super excited with a significant boost of performance on their N2/N2+ computers. :)

Reply to this comment

Reply to this comment
christian ponzoni:
Sep 20, 2021 at 08:40 AM

thank you for all the effort!

Reply to this comment

Reply to this comment
ericek111:
Oct 01, 2021 at 12:42 PM

This sounds exciting! Are you planning on adding support for Mali G76, too? Or is it too different from your main targets?

Reply to this comment

Reply to this comment

Add a Comment

Search the newsroom

Latest Blog Posts

PipeWire workshop 2025: Updates on video transport, Rust efforts, TSN networking, and Bluetooth support

03/07/2025

As part of the activities Embedded Recipes in Nice, France, Collabora hosted a PipeWire workshop/hackfest, an opportunity for attendees…

Coccinelle for Rust progress report

25/06/2025

In collaboration with Inria, the French Institute for Research in Computer Science and Automation, Tathagata Roy shares the progress made…

Linux Media Summit 2025 recap

23/06/2025

Last month in Nice, active media developers came together for the annual Linux Media Summit to exchange insights and tackle ongoing challenges…

Constructor acquires, destructor releases

09/06/2025

In this final article based on Matt Godbolt's talk on making APIs easy to use and hard to misuse, I will discuss locking, an area where…

What if C++ had decades to learn?

21/05/2025

In this second article of a three-part series, I look at how Matt Godbolt uses modern C++ features to try to protect against misusing an…

Unleashing gst-python-ml: Python-powered ML analytics for GStreamer pipelines

12/05/2025

Powerful video analytics pipelines are easy to make when you're well-equipped. Combining GStreamer and Machine Learning frameworks are the…

About Collabora

Whether writing a line of code or shaping a longer-term strategic software development plan, we'll help you navigate the ever-evolving world of Open Source.

한국의 국기 한국어 버전의 Collabora.com 보기