June 06, 2022
The open source Panfrost driver for Mali GPUs now supports the new Valhall architecture with fully-conformant OpenGL ES 3.1 on Mali-G57, a Valhall GPU. The final Mesa patches are landing today, and the required kernel patches are queued for merge upstream.
Mali-G57 features in new MediaTek Chromebooks with the MT8192 and MT8195 system-on-chips. Collaborans AngeloGioacchino Del Regno and Nícolas F. R. A. Prado are spearheading the mainlining effort for these devices. With Mesa 22.2 and an appropriate kernel, accelerated graphics will work out of the box on Linux on these laptops.
Valhall is based on the older Bifrost architecture, for which we achieved conformance last year, so driver changes are direct functions of the hardware changes. Valhall has a streamlined instruction set architecture, which we previously reverse-engineered and documented. Accordingly, we need a Valhall compiler. As the Bifrost and Valhall instruction sets are closely related, we can reuse compiler passes like instruction selection and register allocation, while replacing other passes like scheduling.
Beyond the new instruction set, Valhall’s fixed function is adapted to improve Vulkan performance. Consider transform feedback, a deprecated feature to capture vertex shader outputs into an application buffer. Transform feedback is limited, expensive to implement in hardware, and superseded by compute shaders, but it remains for compatibility with older applications.
On previous Mali GPUs, vertex shaders store their outputs to a buffer provided by the driver, and fragment shaders read their inputs from that buffer. Seemingly, we may implement “transform feedback” by simply providing the application buffer. But the design is otherwise problematic. To allocate the buffer when transform feedback isn’t used, the driver must determine the size, which is expensive with indexed rendering and impossible with indirect rendering. In the worst case, the driver must allocate a massive amount of memory and hope it’s enough.
Valhall solves this. Now the hardware allocates a temporary buffer for vertex outputs, rather than the driver, reducing overhead. Unfortunately, this new design breaks our “free” implementation of transform feedback.1
How do we implement transform feedback without hardware support? Use compute shaders. Thanks to core code by Marek Olšák and Collaboran Faith Ekstrand, the driver can replace transform feedback with explicit stores to the transform feedback buffer. This approach is less efficient than our old trick, because we may store the same data twice. The hardware designers made a trade-off: prioritize the performance of modern applications using indexed and indirect draws over legacy applications using transform feedback. That was probably the right decision. However, transform feedback is mandatory in OpenGL ES 3.0, so we need to support it – even at a performance penalty – if only for Zink.
There’s a similar pattern with another legacy feature: provoking vertex selection.
In a fragment shader, there are two ways to access vertex shader outputs. Usually, the outputs from each vertex of a triangle are smoothly interpolated to form a single value at each pixel. However, flatshading may be used instead: the output of a single “provoking” vertex is used for each triangle.
Which vertex? That depends on the ordering of vertices in the triangle. OpenGL uses the last vertex, and Vulkan uses the first. We should be able to specify the API just once, since that shouldn’t change within an application.
Unfortunately, OpenGL and Vulkan extensions allow changing the provoking vertex to emulate the other API. In fact, desktop OpenGL allows changing the provoking vertex every draw. With older Mali, the driver respecifies the provoking vertex on every draw.
But Valhall doesn’t cater to desktop OpenGL, so the Valhall driver can only specify one provoking vertex per render pass. Our desktop OpenGL driver has to split the render pass whenever the provoking vertex changes. That works, but it’s slow. This change trades simpler hardware against performance for old applications.
And that’s okay. Process node improvements mean new hardware is usually faster than old hardware, yet software is written for contemporary hardware, so old software remains fast on new hardware. No matter how much legacy OpenGL cruft is removed to optimize new apps, old games will remain buttery smooth.
The future is Vulkan – and Valhall embraces it.
1. Valhall retains the legacy software-allocated path, but it can’t be used for transform feedback due to extra padding introduced with instancing. The legacy path is intended for 2D workloads, not 3D rendering.
Sunny Spain will have multimedia developers on speed dial next week for the 11th edition of the GStreamer Conference, taking place at the…
The world-renowned culinary scene in The City of Light will be getting a pack of different types of chefs next week for Kernel Recipes and…
Collabora is headed to Amsterdam! This year, we will be showcasing some of our recent work on the DAB protocol, as well as the software…