We're hiring!
*

Open Source OpenGL ES 3.1 on Mali GPUs with Panfrost

Alyssa Rosenzweig avatar

Alyssa Rosenzweig
June 11, 2021

Share this post:

Panfrost, the open source driver for Arm Mali, now supports OpenGL ES 3.1 on both Midgard (Mali T760 and newer) and Bifrost (Mali G31, G52, G72) GPUs. OpenGL ES 3.1 adds a number of features on top of OpenGL ES 3.0, notably including compute shaders. While Panfrost has had limited support for compute shaders on Midgard for use in TensorFlow Lite, the latest work extends the support to more GPUs and adds complementary features required by the OpenGL ES 3.1 specification, like indirect draws and no-attachment framebuffers.

The new feature support represents the cumulative effort of multiple Collaborans -- Boris Brezillon, Italo Nicola, and myself -- in tandem with the wider Mesa community. The OpenGL driver has seen over 1000 commits since the beginning of 2021, including several hundred targeting OpenGL ES 3.1 features. Our focus is Mali G52, where we are passing essentially all drawElements Quality Program and Khronos conformance tests and are aiming to become formally conformant. Nevertheless, thanks to a unified driver, many new features on Bifrost trickle down to Midgard allowing the older architecture still in wide use to improve long after the vendor has dropped support. On Mali T860, we are passing about 99.5% of tests required for conformant OpenGL ES 3.1. That number can only grow thanks to Mesa's continuous integration running these tests for every merge request and preventing Panfrost regressions. With a Vulkan driver in the works, Panfrost's API support is looking good.

Instruction scheduling

Since the last Panfrost update, we've added an instruction scheduler to the Bifrost compiler. To understand the motivation, recall the hardware design. The Bifrost instruction set pairs instructions into "tuples", one using the multipliers and the other using the adders. Up to 8 tuples are grouped into a "clause", a sequence of instructions with fixed latency that can execute back-to-back with no pipeline bubbles in the middle. The benefit to the hardware designers was that Bifrost's pipeline could be statically filled by the compiler, rather than adding logic in the hardware to dynamically dispatch instructions to different parts of the units like a superscalar chip. Unfortunately, that means the compiler becomes significantly more complicated, as it has to group instructions itself satisfying dozens of architectural invariants. If any condition fails to be met, the GPU will fault with an Invalid Instruction Encoding exception and abort execution. In Panfrost, we've approached this by formally modeling the constraints to produce a predicate (function returning a boolean) for whether a given instruction may be scheduled in a given position in the program. Then it's simple enough to schedule greedily by choosing instructions with this predicate according to some selection heuristic. Wait, selection heuristic?

An algorithm is "greedy" if it makes a locally optimal choice at every step. On the surface, it seems like that strategy would produce a globally optimal result. In special cases, this is true and the best algorithms to solve a problem are greedy. Unfortunately, greedy algorithms produce suboptimal results on many other problems, sometimes spectacularly so. Instruction scheduling is one of those cases: when the predicate shows two different instructions can be scheduled next, which one should be picked? It's not enough to always pick the first or pick one at random; while both strategies are locally optimal, both have poor global performance. We need a heuristic to choose the best instruction from the set of candidate instructions at each point, taking into account how it will affect our ability to schedule in the future. Coming up with good heuristics is tricky, and we have a great deal of room to grow in Panfrost, but the basic model is serving us well so far.

Towards zero overhead

Another large change to the driver since our last blog was the addition of dirty tracking, a common graphics driver optimization with a twist for Mali. Typical GPUs are stateful, and the driver emits commands to set pieces of graphics state -- for instance, setting uniforms -- before each draw. Mali GPUs present an unusual stateless interface. Instead of commands, the driver prepares descriptors containing large amounts of graphics state bundled together, and each draw has pointers to the different descriptors. In a sense, typical GPUs are programmed with an OpenGL-like state machine, whereas Mali is programmed with Vulkan-like pipelines.

There is conventional graphics wisdom that "state changes are expensive", so graphics programmers try to minimize the number of API calls they make. Drivers can help minimize state changes as well, by tracking which state is "dirty" (modified) and which state is "clean" (unmodified). Then the driver only has to emit commands for the dirty state, reducing its own CPU overhead as well as reducing work for the GPU to process.

Surprisingly, the same idea generalizes to Mali. It is inefficient for the driver to upload every Mali descriptor for every application draw call. Ideally, we could reuse the same descriptor for subsequent draw calls if we know the state hasn't changed. Dirty tracking lets us know exactly when the state has changed, allowing us to only upload new descriptors when required, and reusing the descriptor otherwise. On the surface, the purpose is simply reducing CPU overhead, since the GPU's programming model is stateless and therefore must redo work anyway. However, the GPU has several layers of caches, so reusing these descriptors can enable the GPU to use cached descriptors as opposed to invalidating its descriptor caches after every draw call. Implementing dirty tracking in Panfrost improved draws per second in one synthetic benchmark by about 400%. Not bad for a week's work.

Looking forward

In the coming months, we're aiming to polish the OpenGL ES 3.1 support in time for the Mesa 21.2 release next month. Next stop after that: Bifrost performance improvements and introducing support for the modern Valhall (Mali G77 and newer) architecture family.

Comments (7)

  1. José Manuel:
    Jun 13, 2021 at 09:42 AM

    Thank you all for your huge contribution to the open source world!

    Reply to this comment

    Reply to this comment

  2. debiangamer:
    Jun 15, 2021 at 08:15 AM

    " many new features on Bifrost trickle down to Midgard allowing the older architecture still in wide use to improve long after the vendor has dropped support. On Mali T860, we are passing about 99.5% of tests required for conformant OpenGL ES 3.1."

    The Panfrost driver is unusable with Mali T820 while you are working on this. The Panfrost driver is slow, spams kernel messages to syslog and crashes time to time. The fbdev driver with llvm is faster and stable. Will this long time bug get ever fixed?

    https://gitlab.freedesktop.org/mesa/mesa/-/issues/3143

    Reply to this comment

    Reply to this comment

    1. Daniel Stone:
      Jun 15, 2021 at 02:17 PM

      As you were told by Arm developers on dri-devel@, the message in dmesg is likely to be removed. However, the fact you are seeing those messages indicates that your system is already under severe memory pressure which is the root cause of your problems. Diagnosing and fixing this is your first step.

      T860 is the best-supported GPU for Midgard. T820 has some rough edges, and these will be fixed, however with 18 different GPU revisions - all with their own differences and quirks to support - not every GPU will be perfect all the time.

      Reply to this comment

      Reply to this comment

      1. debiangamer:
        Jun 15, 2021 at 03:58 PM

        "severe memory pressure which is the root cause of your problems. Diagnosing and fixing this is your first step."

        Yes, the Sunvell T95Z Plus TV box has 2GB RAM and Firefox is using that all. fbdev and llvmpipe handles memory pressure situation fine but not Panfrost. But the Xfce desktop runs slowly after boot when there is RAM available.

        Reply to this comment

        Reply to this comment

        1. Daniel Stone:
          Jun 15, 2021 at 06:28 PM

          llvmpipe can swap all its pages out to disk because it's a CPU renderer. We can't do that with GPU acceleration, because the GPU needs to be able to access those pages in memory, not on disk. Anyway, as discussed in the thread on dri-devel@, the message is going to be removed, but the error-path handling cannot be removed.

          Reply to this comment

          Reply to this comment

  3. Walter ZAMBOTTI:
    Jul 16, 2021 at 08:05 AM

    I would just like to thank everyone working on the Panfrost project for a fantastic effort and result. The recent July 2021 updates have completely changed my ARM desktop experience (for the better).

    Many thanks.

    Walter ZAMBOTTI
    Independent developer

    System tested. ODROID N2 - Ubuntu 21.04 MATE - Kernel 5.13 - MESA 5:21.2.0

    Reply to this comment

    Reply to this comment

  4. mctom:
    Jul 16, 2021 at 08:48 AM

    Just wanted to say big Thank You from Odroid community, people on the official forum are super excited with a significant boost of performance on their N2/N2+ computers. :)

    Reply to this comment

    Reply to this comment


Add a Comment






Allowed tags: <b><i><br>Add a new comment:


Search the newsroom

Latest Blog Posts

Adding VP9 and MPEG2 stateless support in v4l2codecs for GStreamer

23/06/2021

Earlier this year, from January to April 2021, I worked on adding support for stateless decoders for GStreamer as part of a multimedia internship…

Bag of Freebies for XR Hand Tracking: Machine Learning & OpenXR

17/06/2021

In our previous post, we presented a project backed by INVEST-AI which introduces a multi-stage neural network-based solution. Now let's…

Testing cameras with lc-compliance on KernelCI

15/06/2021

Initiated as a joint effort by the Google Chrome OS team and Collabora, the recent KernelCI hackfest brought the addition of new tests including…

Zink: Summer 2021 update

14/06/2021

There's a lot that has happened in the world of Zink since my last update, so let's see if I can bring you up to date on the most important…

Open Source OpenGL ES 3.1 on Mali GPUs with Panfrost

11/06/2021

Panfrost, the open source driver for Arm Mali, now supports OpenGL ES 3.1 on both Midgard (Mali T760 and newer) and Bifrost (Mali G31, G52,…

Optimizing 3D performance with virglrenderer

17/05/2021

Collabora has been investing into Perfetto to enable driver authors & users to get deep insights into driver internals and GPU performance.…

Open Since 2005 logo

We use cookies on this website to ensure that you get the best experience. By continuing to use this website you are consenting to the use of these cookies. To find out more please follow this link.

Collabora Ltd © 2005-2021. All rights reserved. Privacy Notice. Sitemap.

Collabora Limited is registered in England and Wales. Company Registration number: 5513718. Registered office: The Platinum Building, St John's Innovation Park, Cambridge, CB4 0DS, United Kingdom. VAT number: 874 1630 19.