A new era for Linux's low-level graphics

A new era for Linux's low-level graphics - Part 2

Daniel Stone
March 23, 2018

Share this post:

Reading time:

Following on from part 1 in the series, today we cover more developments in low-level graphics, which we’ve enabled through the whole stack. Support for buffer modifiers is something we’ve worked on in the kernel, Mesa, Wayland, Weston, Mutter and GNOME Shell, and X.Org. Meanwhile, our community continues to grow thanks to Google Summer of Code. Read on for more.

More performance from buffer modifiers

Another kernel feature from the same era we are now able to take full advantage of, is buffer modifiers.

Although we tend to think of images as being laid out in memory how they appear: left to right, then top to bottom, hardware would very much prefer you wouldn't. The most optimal way to access buffers is in tiled modes, and even supertiled modes (with up to three levels of tiles containing tiles). Not only this, but GPUs are beginning to be able to compress data quite well.

Whilst great for performance, until recently we didn't have a good way to annotate buffers with their modifiers. For some time, we could pretend that the buffers were linear and hope that we could get away with it, but with the increasing complexity of current use cases, these heuristics are not good enough. This illusion of being linear was also underpinned by magic side-channels: each driver implementing tiling support had its own way of telling userspace what the true configuration of the buffer was. But if you were not that driver, you had no way of knowing: if you want to share buffers across different GPUs, or even different drivers, you had to pessimise and disable all support for tiling, with an often unacceptable performance cost.

At Collabora, we started developing an EGL extension, which we contributed upstream to Khronos as well as support for Mesa. This extension allows GPU drivers to advertise which format modifiers (tiling/compression modes) they support, and to explicitly specify modifiers on import. We also wrote a Wayland extension with support in Weston and GStreamer, allowing Wayland to be used as a two-way transit for modifier information.

Last year, sponsored by Intel, we finished the task by adding support to the window systems. We helped finalise support for modifier advertisement in KMS, wrote an X11 equivalent to our Wayland protocol so legacy systems could also get the benefit of modifier support, wired up support in Mesa so it was able to discover the modifiers available to allocate with and communicate the chosen modifier back to the window system once done, wired this up for both Wayland and X11 clients running on any of EGL/GLX/Vulkan, implemented support for the Wayland protocol in Mutter (GNOME Shell's display server), Xwayland (for supporting legacy X11 clients in a Wayland session), and also the classic Xorg X server. Whilst we were there, we implemented support for the atomic modesetting API in the classic X.Org server, though not using planes as the X server is architecturally unable to do so.

With the groundwork having been laid by Mesa, Robert Foss implemented support for buffer modifiers in the open-source drm_hwcomposer implementation, allowing Android to get the benefit of this support. This built on top of support added to the Etnaviv open-source driver used in NXP i.MX SoC family by Lucas Stach of Pengutronix. The i.MX family is surprisingly complex, in that its GPU doesn't actually understand linear formats at all, requiring a completely separate copy to preserve the illusion that the buffer is laid out linearly. Expressing modifiers clearly and explicitly allows this copy to be elided where it is not necessary. Modifier support later spread to the open-source VC4 driver used in the Raspberry Pi (suffering similar shadow-copy problems), and now also the open-source NVIDIA Tegra driver.

The end result of all this work is that we have been able to eliminate the magic side channels which used to proliferate, and lay the groundwork for properly communicating this information across multiple devices as well. Devices supporting ARM's AFBC compression format are just beginning to hit the market, which share a single compression format between video decoder, GPU, and display controller. We are also beginning to see GPUs from different vendors share tiling formats, in order to squeeze the most performance possible from hybrid GPU systems.

Whilst it seems simple: take a 32-bit format token and 'just' add a 64-bit modifier token, the implementation was surprisingly complex. Part of the reason was that so much of the knowledge about modifiers was previously implicit, and mixing implicit and explicit systems is notoriously difficult. Unpicking all this took time and persistence, but it seems we've finally arrived at our end goal.

More performance from fewer copies in XWayland

Finally, over the past year I had the pleasure of mentoring Roman Gilg, as he worked on reducing copies in XWayland for Google SUmmer of Code 2018. Thanks to the tireless work of Martin Peres and others, Wayland was included in GSoC under the umbrella of the X.Org Foundation. Roman's project was inside the X server, making things better for people using legacy X11 clients such as Google Chrome or Steam, when running in a Wayland session.

When Wayland clients submit content to the server, they render to a buffer which the display server then uses directly until the client provides a new one. This is true if the content is displayed directly on a hardware overlay, or if it is composited on the GPU. However, in a composited X11 environment, the compositor only has a single buffer handle for each client window, even when it's getting updated. In this environment, when the client sends a buffer to the X server, the X server copies the content of the client buffer into the staging buffer it has created for the compositor to source window content from, and releases the client buffer immediately.

This extra copy not only causes extra GPU workload, but also an unsightly tearing effect: the server may be copying new content from the client buffer at the same time as the compositor is rendering. Even worse, whilst an X server running on bare hardware can skip this for full-screen applications if there is only one screen, Xwayland cannot skip it at all.

Roman's work involved a lot of work on the core of the X11 Present extension, allowing it to do copy-free 'flips' for any window if the backend supports it. He then implemented direct flips for Xwayland, so the client buffer would be directly passed through to the Wayland server without an extra copy. This means that windowed GPU-accelerated X11 clients can be faster under Wayland then when running natively under X11.

This work is currently being reviewed, mainly by the tireless Michel Dänzer, and it's looking hopeful for landing quite soon. This will provide a great performance boost to X11 games (such as through Steam) in particular, but even browsers and other clients. Congratulations to Roman for a successful Summer of Code!

A cast of many

Over its various iterations, at Collabora the above work work has also seen work from Derek Foreman, Emil Velikov, Louis-Francis Ratté-Boulianne, Robert Foss, Tomeu Vizoso, and Varad Gautam; externally, in particular Ben Widawsky, Chad Versace, Daniel Vetter, Fabien Dessenne, Faith Ekstrand, Kristian Høgsberg, Michel Dänzer, Sergi Granell, and Tomohito Esaki, have provided assistance, code, review, and moral support. Thanks also to those who have sponsored our work on this: Intel, Google, Renesas, Zodiac Inflight Innovation, as well as Collabora's own internal efforts.

A new era for Linux's low-level graphics - Part 1

Virtualizing GPU Access

GPU virtualization update

A new era for Linux's low-level graphics - Part 1

Virtualizing GPU Access

GPU virtualization update

Comments (4)

Robert:
Jul 14, 2018 at 11:06 AM

Hej Daniel, I just hit these two posts and want to thank you for this highly interesting insights! I've heard about a lot of the terms before, but missed an explanation of exactly this kind (hardware planes, the x11 present extension). Wow this is such a big list of cool stuff!

I'm super excited about this great developments. The fall distributions (fedora 29, ubuntu 18.10) will make a big leap in term of wayland support and performance!

Reply to this comment

Reply to this comment
Robert:
Jul 14, 2018 at 11:10 AM

I missed to mention: when you wrote this posts, did phoronix.com cover/link them? Because I think a lot of its readers would be very interested in this stuff!

Reply to this comment

Reply to this comment
Umur Gedik:
Sep 27, 2018 at 05:42 AM

There is so much going on linux graphics stack, but unfortunately there is only few (dated) documentation is available for the developers. I want to learn low level graphics capabilities of linux and want to contribute them, but I've no idea where to start. Would be really nice if you can point some documentations/articles telling about the stack and the terminology. Great article btw, I'm very excited about the future of desktops in linux.

Reply to this comment

Reply to this comment
1. Daniel Stone:
  Sep 27, 2018 at 09:54 AM
  
  We don't have that yet, but it's something we really want to work on. We have a GitLab issue here which is documenting our effort so far: https://gitlab.freedesktop.org/freedesktop/freedesktop/issues/76
  
  Reply to this comment
  
  Reply to this comment

Add a Comment

Search the newsroom

Latest Blog Posts

Coccinelle for Rust progress report

25/06/2025

In collaboration with Inria, the French Institute for Research in Computer Science and Automation, Tathagata Roy shares the progress made…

Linux Media Summit 2025 recap

23/06/2025

Last month in Nice, active media developers came together for the annual Linux Media Summit to exchange insights and tackle ongoing challenges…

Constructor acquires, destructor releases

09/06/2025

In this final article based on Matt Godbolt's talk on making APIs easy to use and hard to misuse, I will discuss locking, an area where…

What if C++ had decades to learn?

21/05/2025

In this second article of a three-part series, I look at how Matt Godbolt uses modern C++ features to try to protect against misusing an…

Unleashing gst-python-ml: Python-powered ML analytics for GStreamer pipelines

12/05/2025

Powerful video analytics pipelines are easy to make when you're well-equipped. Combining GStreamer and Machine Learning frameworks are the…

Matt Godbolt sold me on Rust (by showing me C++)

06/05/2025

Gustavo Noronha helps break down C++ and shows how that knowledge can open up new possibilities with Rust.

About Collabora

Whether writing a line of code or shaping a longer-term strategic software development plan, we'll help you navigate the ever-evolving world of Open Source.

한국의 국기 한국어 버전의 Collabora.com 보기