We're hiring!
*

From Panfrost to production, a tale of Open Source graphics

Alyssa Rosenzweig avatar

Alyssa Rosenzweig
November 03, 2020

Share this post:

Since our previous update on Panfrost, the open source stack for Arm's Mali Midgard and Bifrost GPUs, we've focused on taking our driver from its reverse-engineered origins on Midgard to a mature stack. We've overhauled both the Gallium driver and the backend compiler, and as a result, Mesa 20.3 -- scheduled for release at the end-of-the-month -- will feature some Bifrost support out-of-the-box.

Aquarium demo with Panfrost on Mali G52

Autogenerating the data structures

For the first years of Panfrost, everything we knew about the hardware was from reverse-engineering, an understanding superseded by canonical information on the hardware. For example, we now know canonical names for data structures and instructions for which we had picked names arbitrarily during reverse-engineering. To accelerate driver development, removing unknown magic fields and exposing logical errors, we needed to integrate this new information. Our approach was two prong:

  • GenXML packing for the GPU data structures used in Gallium
  • Custom XML packing for Bifrost instructions used in the compiler

GenXML is a tool developed for the open source Intel graphics drivers and modified for use in the VideoCore drivers. For Panfrost, we modified the Broadcom GenXML to produce our own Panfrost flavour of GenXML optimized for Mali GPUs.

While reverse-engineering, it was convenient to lay out data structures as ad hoc bitfields in a C header file:

struct mali_blend_mode {
        enum mali_blend_modifier clip_modifier : 2;
        unsigned unused_0 : 1;
        unsigned negate_source : 1;
        enum mali_dominant_blend dominant : 1;
        enum mali_nondominant_mode nondominant_mode : 1;
        unsigned unused_1 : 1;
        unsigned negate_dest : 1;
        enum mali_dominant_factor dominant_factor : 3;
        unsigned complement_dominant : 1;
} __attribute__((packed));

GenXML allows us to provide a simple, strongly typed, XML description of the machine:

<struct name="Blend Function" no-direct-packing="true">
  <!-- Blend equation: A + (B * C) -->
  <field name="A" size="2" start="0" type="Blend Operand A"/>
  <field name="Negate A" size="1" start="3" type="bool"/>
  <field name="B" size="2" start="4" type="Blend Operand B"/>
  <field name="Negate B" size="1" start="7" type="bool"/>
  <field name="C" size="3" start="8" type="Blend Operand C"/>
  <field name="Invert C" size="1" start="11" type="bool"/>
</struct>

Then, instead of directly filling out the bitfield, we can use automatically generated packing macros which are easier to use by abstracting packing details away from the programmer. They are less error prone, as the packing writes to memory contiguously and only once all data is finalized, avoiding subtle errors where bitfields lead to the CPU reading back GPU memory, which is slower than a contiguous write by an order of magnitude due to cache behaviour. They also perform better handling of integer overflow: while bitfields are defined to wrap around (rarely desired behaviour, often hiding bugs, and requiring an extra bitwise-and instruction at run-time), GenXML forbids overflow. As such, debug builds validate the range of integer values, identifying bugs early on, and release builds may omit the checks entirely and come out faster than bitfields. All in all, the direct benefits of GenXML for us are clear.

More importantly, the process of transitioning to GenXML let us revisit every GPU data structure used in the driver, correcting misunderstandings as seen in the above blend function descriptor, which has a ripple effect in correcting long-standing bugs due to misunderstandings of the hardware. As a real example, the above blend function rework fixed a conspicuous bug in the fixed-function blending for subtract and reverse-subtract blend modes, avoiding a needless fallback to the slower "blend shader" path. Indeed, many of the changes necessitated by the GenXML refactor transcended aesthetic concerns and led to optimizations and bug fixes affecting all of our GPUs. That's a good refactor in my book.

Autogenerating the Bifrost disassembler

For the Bifrost compiler, we wanted to take a similar approach, auto-generating our handling of instruction encoding to account for new information about the complete Bifrost instruction set. Unfortunately, Bifrost packing is highly irregular, where _every_ instruction can have numerous encodings with no relation to any other instruction. This allows the hardware to perform unique optimizations to reduce code size and improve the instruction cache hit rate, but it complicates the compiler and prevents the use of off-the-shelf tooling like GenXML.

Our solution was to represent the instruction set with a custom XML-based file format designed to account for Bifrost's many quirks, serving as ground truth for the machine. The file lists every instruction, with entries like:

<ins name="+LD_CVT" staging="w" mask="0xff800" exact="0xc9000">
  <src start="0"/>
  <src start="3"/>
  <src start="6" mask="0xf7"/>
  <mod name="vecsize" start="9" size="2">
    <opt>none</opt>
    <opt>v2</opt>
    <opt>v3</opt>
    <opt>v4</opt>
  </mod>
</ins>

Notice in particular there is no opcode specified, unlike regular architectures. Instead, the exact and mask fields together specify that specific bits of the instruction must take particular values, acting as a generalized opcode.

From the instruction reference, we can autogenerate functions to pack instructions from the compiler's immediate representation and to disassemble instructions from packed forms in the disassembler. Notice instruction encoding and disassembly are higher-level functions than the simple packs and unpacks handled by GenXML; the irregularity of Bifrost demands such an approach. Nevertheless, we now have a complete Bifrost (architecture version 7) disassembler open source and upstream, and as we've extended the compiler, the packing helpers have paid off beautifully.

With our new infastructure in place, we could iterate quickly on compiler features like nested control-flow, complex texturing including texel fetch support from OpenGL ES 3.0 used in Mesa's blitter, and basic register spilling. In total, the open source Bifrost compiler can now handle the notorious gradient rendering shaders from glamor, the OpenGL backend for the X server. With these changes, Bifrost can now run X11 desktops like MATE, as well as X applications on Wayland desktops via Xwayland. Never mind _why_ loops and register spilling are used for 2D acceleration.

Speeding up Midgard

For those of you with GPUs like Mali T860, Panfrost's support for Midgard has improved as well. Though the Bifrost compiler is a separate code base, the improvements via GenXML benefit Midgard. Beyond that, over the summer we added support for Arm FrameBuffer Compression (AFBC) as a significant optimization for certain workloads.

Recent builds of Mesa will automatically compress framebuffer objects to save memory bandwidth, improve performance, and reduce power. Panfrost is even smart enough to compress textures as AFBC on the fly when it makes sense to do so, improving texturing performance for applications that do not support compressed texture formats like ETC directly. In the future, Panfrost will be able to compress the framebuffer itself en route to the display if paired with a compatible display controller, further reducing bandwidth on high resolution monitors. AFBC work was conducted on a Midgard GPU, but will be extended to Bifrost in the future.

The Midgard compiler also saw a flurry of activity, improving its scheduler to optimize for register pressure, supporting atomic operations and atomic counters, and fixing a long-tail of bugs.

Teamwork

The above work on Bifrost and GenXML has been a collaboration between Collaboran Boris Brezillon and myself. Additionally, our intern, Italo Nicola, has been improving support for compute shaders on Midgard and spearheaded the atomic operation support.

I'd like to give a special shoutout to the outstanding Icecream95, who has added features including 16-bit depth buffers, desktop-style occlusion counters, 8 simultaneous render targets, dual-source blending, and framebuffer fetch. Icecream95 has focused on desktop OpenGL support for Midgard GPUs and has made strides here, fixing and optimizing a spectrum of 3D games never thought to run on embedded hardware like ours.

Next stop: better performance and OpenGL 3.1, of course!

Comments (1)

  1. Aditya:
    Dec 12, 2020 at 08:38 PM

    Mesa 20.3 recently got picked up in my distribution of choice. Fantastic work! panfrost is stable and smooth. Getting respectable 30+ fps in supertuxkart with graphics level 2.

    Thank you all, much appreciated.

    Reply to this comment

    Reply to this comment


Add a Comment






Allowed tags: <b><i><br>Add a new comment:


Search the newsroom

Latest Blog Posts

Bag of Freebies for XR Hand Tracking: Machine Learning & OpenXR

17/06/2021

In our previous post, we presented a project backed by INVEST-AI which introduces a multi-stage neural network-based solution. Now let's…

Testing cameras with lc-compliance on KernelCI

15/06/2021

Initiated as a joint effort by the Google Chrome OS team and Collabora, the recent KernelCI hackfest brought the addition of new tests including…

Zink: Summer 2021 update

14/06/2021

There's a lot that has happened in the world of Zink since my last update, so let's see if I can bring you up to date on the most important…

Open Source OpenGL ES 3.1 on Mali GPUs with Panfrost

11/06/2021

Panfrost, the open source driver for Arm Mali, now supports OpenGL ES 3.1 on both Midgard (Mali T760 and newer) and Bifrost (Mali G31, G52,…

Optimizing 3D performance with virglrenderer

17/05/2021

Collabora has been investing into Perfetto to enable driver authors & users to get deep insights into driver internals and GPU performance.…

Mainline Linux gains accelerated video decoding for Microchip's SAMA5D4

11/05/2021

The Hantro Video4Linux2 (V4L2) kernel module has gained support for another SoC! The Microchip SAMA5D4 features a single decode unit supporting…

Open Since 2005 logo

We use cookies on this website to ensure that you get the best experience. By continuing to use this website you are consenting to the use of these cookies. To find out more please follow this link.

Collabora Ltd © 2005-2021. All rights reserved. Privacy Notice. Sitemap.

Collabora Limited is registered in England and Wales. Company Registration number: 5513718. Registered office: The Platinum Building, St John's Innovation Park, Cambridge, CB4 0DS, United Kingdom. VAT number: 874 1630 19.