Posted on 20/03/2018 by Daniel Stone
The latest enhancements to the DRM subsystem have made mainline Linux much more attractive, making drivers easier to write, applications portable, and a much more friendly and collaborative community than we've ever had.
Over the past couple of years, Linux's low-level graphics infrastructure has undergone a quiet revolution. Since experimental core support for the atomic modesetting framework landed a couple of years ago, the DRM subsystem in the kernel has seen roughly 300,000 lines of code changed and 300,000 new lines added, when the new AMD driver (~2.5m lines) is excluded. Lately Weston has undergone the same revolution, albeit on a much smaller scale.
Daniel Vetter's excellent two-part series on LWN covers the details quite well, but in short atomic has two headline features. The first is better display control: by grouping all configuration changes together, it is possible to change display modes more quickly and more reliably, especially if you have multiple monitors. The second is that it allows userspace to finally use overlay planes in the display controller for composition, bypassing the GPU.
A third, less heralded, feature is that the atomic core standardises user-visible behaviour. Before atomic, drivers had very wide latitude to implement whatever user-facing behaviour they liked. As a result, each chipset had its own kernel driver and its own X11 driver as well. With the rewrite of the core, backed up by a comprehensive test suite, we no longer need hardware-specific drivers to take full advantage of hardware features. With the substantial rework of Weston's DRM backend, we can now take full advantage of these. Using atomic gives us a smoother user experience, with better performance and using less power, whilst still being completely hardware-agnostic.
This has made mainline Linux much more attractive: the exact same generic codebases of GNOME and Weston that I'm using to write this blog post on an Intel laptop run equally well on AMD workstations, low-power NXP boards destined for in-flight entertainment, and high-end Renesas SoCs which might well be in your car. Now that the drivers are easy to write, and applications are portable, we've seen over ten new DRM drivers merged to the upstream kernel since atomic modesetting was merged. These drivers are landing in a much more friendly and collaborative community than we've ever had.
One of the headline features of atomic is the ability to use hardware planes for composition. To show windows to the user, display servers like Weston need to composite the content of multiple windows together into a single image - hence why they are also known as 'compositors'. With the exception of mouse cursors and fullscreen windows, the entire content of each output is a single flat image, created by using OpenGL ES, Pixman, or similar, to combine all the client images together.
Using the GPU for composition isn't exactly as complex as rendering scenes from a game, but there is a real cost. If your GPU is already straining at its limits - say you are playing the latest new game in a window, or running ShaderToy in your browser - then adding more load to the GPU with composition is the last thing you want to do. Conversely, if you aren't using the GPU at all, then GPU composition will stop your GPU from switching off entirely, adding a surprising amount of power consumption.
Display controllers can display more than just one image and a mouse cursor, though. Most hardware has a couple of overlay planes, logically positioned between the primary image and the mouse cursor. These planes are also known as 'sprites', which will ring a bell for those familiar with 1980s games. These games did exactly the same as we want to: display one static background image, with smaller images shown on top of it, without having to redraw the whole scene every time.
Using these overlay planes not only frees up the GPU for more client work (or allows it to power down), but in many cases gives us better image quality. Hardware vendors put a lot of work into their display controllers, with better quality for image scaling, YUV to RGB colourspace conversion, as well as general quality optimisations. This is especially true on dedicated hardware like set-top boxes.
Weston has had limited support for overlay planes since the very early days, but these were disabled as the legacy KMS API made it near-on unusable. Long before the atomic KMS API was marked stable, we begun work on an atomic DRM patchset for Weston to help push this work forward. Quite some time later, after a rewrite of most of Weston's KMS backend, we have finally managed to land support for Weston 4.0. This support enables us to use the core atomic API with solid and reliable state tracking.
We need such solid state tracking in order to brute-force a configuration. In spite (or because) of their capability, overlay planes have a number of surprising limits to how they can be used. These include per-plane scaling limits ('no more than 8x, or no less than 1/4x'), global bandwidth limits, shared scaler and compression units, even down to limits on the number of overlay planes which can be active on a single scanline (row) of the display. Android often has per-platform HWComposer modules, which analyse a scene and produce the most optimal result.
Without this hard-coded platform-specific knowledge, the best we can do is try, try again. Atomic gives us the ability to compute and test a configuration, to see ahead of time if it will work. Weston uses this to come up with the best configuration it can, by repeatedly testing all the different possibilities.
The patches on top of our core state-tracking and atomic API work are still in review, though with largely positive comments. So Weston 4.0 will use atomic where possible, and exercise our new state-tracking paths. On top of this, for the next release of Weston, we will add the code to use overlay planes where we can, finally delivering on the promise of display hardware to be able to use display hardware to the fullest extent possible. We expect that release will also include support for DRM leases, allowing time- or safety-critical clients such as VR or automotive information to directly drive the display hardware without intervention from the compositor.
This is Part I of a two-part blog post highlighting the latest enhancements to Linux's low-level graphics infrastructure. Part II later this week will look at performance improvements from buffer modifiers, and from fewer copies in XWayland.
23/03/2018
Following on from part 1 in the series, part 2 covers more developments in low-level graphics, including support for buffer modifiers in…
20/03/2018
The latest enhancements to the DRM subsystem have made mainline Linux much more attractive, making drivers easier to write, applications…
14/03/2018
The recent disclosure of Meltdown and Spectre hardware vulnerabilities were unprecedented in the history of computing. They affect a substantial…
01/03/2018
Today, Google Open Source announced their first 2018 Open Source Peer Bonus winners, and our graphics lead, Daniel Stone, made the list!
21/02/2018
From an introduction to Flatpak, to managing build infrastructure of a Debian derivative, to modern tools to debug GStreamer, Collaborans…
20/02/2018
Released earlier this month, the latest version of VLC, the free & open source multimedia player (which also uses the GStreamer framework)…
Comments (7)
Liam:
Mar 21, 2018 at 07:24 AM
Hi Daniel,
You mention that Android uses a per-backend hwcomposer in order to optimally compose a scene while we have to poke around in the dark until we can find a wall. Since atomic allows us to gather info when attempting various kms operations can that be used to build an ad hoc platform specific model of the display controller as more of the states are probed? This seems a bit similar to the udev database.
Thanks!
Reply to this comment
Reply to this comment
Daniel Stone:
Mar 21, 2018 at 01:24 PM
Hi Liam,
Building up a model like that would be really interesting, but I'm not entirely sure how viable it is in the long term. Part of the problem is just describing the restrictions - it ends up looking a lot like code - but also part of the problem is the variation. It's not just per-platform that the restrictions vary, but environmental factors like which clocks a particular board is using, total system memory bandwidth usage, and even thermal throttling. Luckily, testing configurations is cheap enough that we can do this a lot to figure it out by brute force ...
Another option is, now that we have an openly-developed drm_hwcomposer, to try to make more use of it in Weston et al.
Reply to this comment
Reply to this comment
Liam:
Mar 22, 2018 at 04:47 AM
Hi Daniel,
First, thanks for the response.
I'm sure I'm missing the point but, to state the obvious, what's the issue with it "looking a lot like code" since it would be code that would heuristically set and explore the space?
The biggest issues that occurred to me were the combinatorial space and bandwidth limits. That these can also throttle just multiplies the search space.
In a way, this reminds me of the problems the memory allocator project has been trying to solve, but with the added complication limiting each to a yes/no response. Additionally, you've got the power issue which suggests an intersection with the scheduling problem that the power-aware schedulers are trying to "solve". As you know, the Android solution has been the device-dependent energy models which, at a glance, seem analogous to the platform dependent hwc modules. Lots of similar problems are being worked on in different parts of the kernel:)
Lastly, I'm not really sure what is gained by using the hwc shim other than the ability to run non-SF compositors on top of Android drivers.
Best/Liam
Reply to this comment
Reply to this comment
Spanky:
Mar 21, 2018 at 04:36 PM
Wow! Thank you guys for this complex development work. Especially when in the open way to properly not break stability.
Here's the thing. I thought I was technical until I read this. Could you add just a quick word (in English LOL) abut what all this means in the big picture. I can guess it's moving to more stability, speed and new hardware compatibility; but really did not understand 1% of all that.
Reply to this comment
Reply to this comment
Nodericks:
Mar 22, 2018 at 12:29 PM
Interesting !
Reply to this comment
Reply to this comment
RRW:
Mar 22, 2018 at 01:16 PM
2.5 million lines of code for the AMD driver? Is that true? With 2.5 million lines of code you could have your own OS and a few apps. What is it about this hardware that necessitates a 2.5 million lines driver?
Reply to this comment
Reply to this comment
Liam:
Mar 22, 2018 at 03:32 PM
It's essentially a computer within a computer. It has its own: firmware, sophisticated power management, memory (with multi-level cache), general purpose ISA, io ports, and so on. The days of gpu=graphics accelerator are long over.
So, yes, it pretty much requires its own os.
Reply to this comment
Reply to this comment
Add a Comment