A new era for Linux's low-level graphics

A new era for Linux's low-level graphics - Part 1

Daniel Stone
March 20, 2018

Share this post:

Reading time:

The latest enhancements to the DRM subsystem have made mainline Linux much more attractive, making drivers easier to write, applications portable, and a much more friendly and collaborative community than we've ever had.

Atomic modesetting and Weston 4.0

Over the past couple of years, Linux's low-level graphics infrastructure has undergone a quiet revolution. Since experimental core support for the atomic modesetting framework landed a couple of years ago, the DRM subsystem in the kernel has seen roughly 300,000 lines of code changed and 300,000 new lines added, when the new AMD driver (~2.5m lines) is excluded. Lately Weston has undergone the same revolution, albeit on a much smaller scale.

Daniel Vetter's excellent two-part series on LWN covers the details quite well, but in short atomic has two headline features. The first is better display control: by grouping all configuration changes together, it is possible to change display modes more quickly and more reliably, especially if you have multiple monitors. The second is that it allows userspace to finally use overlay planes in the display controller for composition, bypassing the GPU.

A third, less heralded, feature is that the atomic core standardises user-visible behaviour. Before atomic, drivers had very wide latitude to implement whatever user-facing behaviour they liked. As a result, each chipset had its own kernel driver and its own X11 driver as well. With the rewrite of the core, backed up by a comprehensive test suite, we no longer need hardware-specific drivers to take full advantage of hardware features. With the substantial rework of Weston's DRM backend, we can now take full advantage of these. Using atomic gives us a smoother user experience, with better performance and using less power, whilst still being completely hardware-agnostic.

This has made mainline Linux much more attractive: the exact same generic codebases of GNOME and Weston that I'm using to write this blog post on an Intel laptop run equally well on AMD workstations, low-power NXP boards destined for in-flight entertainment, and high-end Renesas SoCs which might well be in your car. Now that the drivers are easy to write, and applications are portable, we've seen over ten new DRM drivers merged to the upstream kernel since atomic modesetting was merged. These drivers are landing in a much more friendly and collaborative community than we've ever had.

More performance from overlay planes

One of the headline features of atomic is the ability to use hardware planes for composition. To show windows to the user, display servers like Weston need to composite the content of multiple windows together into a single image - hence why they are also known as 'compositors'. With the exception of mouse cursors and fullscreen windows, the entire content of each output is a single flat image, created by using OpenGL ES, Pixman, or similar, to combine all the client images together.

Using the GPU for composition isn't exactly as complex as rendering scenes from a game, but there is a real cost. If your GPU is already straining at its limits - say you are playing the latest new game in a window, or running ShaderToy in your browser - then adding more load to the GPU with composition is the last thing you want to do. Conversely, if you aren't using the GPU at all, then GPU composition will stop your GPU from switching off entirely, adding a surprising amount of power consumption.

Display controllers can display more than just one image and a mouse cursor, though. Most hardware has a couple of overlay planes, logically positioned between the primary image and the mouse cursor. These planes are also known as 'sprites', which will ring a bell for those familiar with 1980s games. These games did exactly the same as we want to: display one static background image, with smaller images shown on top of it, without having to redraw the whole scene every time.

Using these overlay planes not only frees up the GPU for more client work (or allows it to power down), but in many cases gives us better image quality. Hardware vendors put a lot of work into their display controllers, with better quality for image scaling, YUV to RGB colourspace conversion, as well as general quality optimisations. This is especially true on dedicated hardware like set-top boxes.

Weston has had limited support for overlay planes since the very early days, but these were disabled as the legacy KMS API made it near-on unusable. Long before the atomic KMS API was marked stable, we begun work on an atomic DRM patchset for Weston to help push this work forward. Quite some time later, after a rewrite of most of Weston's KMS backend, we have finally managed to land support for Weston 4.0. This support enables us to use the core atomic API with solid and reliable state tracking.

We need such solid state tracking in order to brute-force a configuration. In spite (or because) of their capability, overlay planes have a number of surprising limits to how they can be used. These include per-plane scaling limits ('no more than 8x, or no less than 1/4x'), global bandwidth limits, shared scaler and compression units, even down to limits on the number of overlay planes which can be active on a single scanline (row) of the display. Android often has per-platform HWComposer modules, which analyse a scene and produce the most optimal result.

Without this hard-coded platform-specific knowledge, the best we can do is try, try again. Atomic gives us the ability to compute and test a configuration, to see ahead of time if it will work. Weston uses this to come up with the best configuration it can, by repeatedly testing all the different possibilities.

The patches on top of our core state-tracking and atomic API work are still in review, though with largely positive comments. So Weston 4.0 will use atomic where possible, and exercise our new state-tracking paths. On top of this, for the next release of Weston, we will add the code to use overlay planes where we can, finally delivering on the promise of display hardware to be able to use display hardware to the fullest extent possible. We expect that release will also include support for DRM leases, allowing time- or safety-critical clients such as VR or automotive information to directly drive the display hardware without intervention from the compositor.

Continue reading (A new era for Linux's low-level graphics - Part 2)…

A new era for Linux's low-level graphics - Part 2

Why Linux HDCP isn't the end of the world

Virtualizing GPU Access

A new era for Linux's low-level graphics - Part 2

Why Linux HDCP isn't the end of the world

Virtualizing GPU Access

Comments (13)

Liam:
Mar 21, 2018 at 07:24 AM

Hi Daniel,

You mention that Android uses a per-backend hwcomposer in order to optimally compose a scene while we have to poke around in the dark until we can find a wall. Since atomic allows us to gather info when attempting various kms operations can that be used to build an ad hoc platform specific model of the display controller as more of the states are probed? This seems a bit similar to the udev database.

Thanks!

Reply to this comment

Reply to this comment
1. Daniel Stone:
  Mar 21, 2018 at 01:24 PM
  
  Hi Liam,
  Building up a model like that would be really interesting, but I'm not entirely sure how viable it is in the long term. Part of the problem is just describing the restrictions - it ends up looking a lot like code - but also part of the problem is the variation. It's not just per-platform that the restrictions vary, but environmental factors like which clocks a particular board is using, total system memory bandwidth usage, and even thermal throttling. Luckily, testing configurations is cheap enough that we can do this a lot to figure it out by brute force ...
  
  Another option is, now that we have an openly-developed drm_hwcomposer, to try to make more use of it in Weston et al.
  
  Reply to this comment
  
  Reply to this comment
  1. Liam:
    Mar 22, 2018 at 04:47 AM
    
    Hi Daniel,
    
    First, thanks for the response.
    
    I'm sure I'm missing the point but, to state the obvious, what's the issue with it "looking a lot like code" since it would be code that would heuristically set and explore the space?
    The biggest issues that occurred to me were the combinatorial space and bandwidth limits. That these can also throttle just multiplies the search space.
    In a way, this reminds me of the problems the memory allocator project has been trying to solve, but with the added complication limiting each to a yes/no response. Additionally, you've got the power issue which suggests an intersection with the scheduling problem that the power-aware schedulers are trying to "solve". As you know, the Android solution has been the device-dependent energy models which, at a glance, seem analogous to the platform dependent hwc modules. Lots of similar problems are being worked on in different parts of the kernel:)
    Lastly, I'm not really sure what is gained by using the hwc shim other than the ability to run non-SF compositors on top of Android drivers.
    
    Best/Liam
    
    Reply to this comment
    
    Reply to this comment
Spanky:
Mar 21, 2018 at 04:36 PM

Wow! Thank you guys for this complex development work. Especially when in the open way to properly not break stability.

Here's the thing. I thought I was technical until I read this. Could you add just a quick word (in English LOL) abut what all this means in the big picture. I can guess it's moving to more stability, speed and new hardware compatibility; but really did not understand 1% of all that.

Reply to this comment

Reply to this comment
Nodericks:
Mar 22, 2018 at 12:29 PM

Interesting !

Reply to this comment

Reply to this comment
RRW:
Mar 22, 2018 at 01:16 PM

2.5 million lines of code for the AMD driver? Is that true? With 2.5 million lines of code you could have your own OS and a few apps. What is it about this hardware that necessitates a 2.5 million lines driver?

Reply to this comment

Reply to this comment
1. Liam:
  Mar 22, 2018 at 03:32 PM
  
  It's essentially a computer within a computer. It has its own: firmware, sophisticated power management, memory (with multi-level cache), general purpose ISA, io ports, and so on. The days of gpu=graphics accelerator are long over.
  So, yes, it pretty much requires its own os.
  
  Reply to this comment
  
  Reply to this comment
Daniel:
Mar 24, 2018 at 10:42 PM

I once saw a talk that compared the size of fbdev and drm drivers, which came to the conclusion that drm drivers are not necessarily more complex.
How come even drivers for dumb hardware like the tilcdc consist of thousands of lines spread over several files?

Reply to this comment

Reply to this comment
1. Daniel Stone:
  Mar 26, 2018 at 01:17 PM
  
  The majority of the tilcdc driver seems to be dedicated to power management and programming mode clocks, which is about as you'd expect. It also includes a couple of sub-drivers for encoders.
  
  If you want small, I recommend looking at the 'tinydrm' drivers, the smallest of which is 211 LoC.
  
  Reply to this comment
  
  Reply to this comment
Linux Handbook:
Mar 26, 2018 at 01:15 PM

I never understood why Linux community emphasizes on the number of lines of code.

Reply to this comment

Reply to this comment
1. Martin:
  Oct 23, 2018 at 03:44 PM
  
  Changes to lines of code. To measure the amount of development done. Know a better way?
  
  Reply to this comment
  
  Reply to this comment
Martin:
May 10, 2018 at 04:43 AM

Would be GREAT if posts like this (that involves Mesa, libdrm, Weston etc.) was spread more. E.g. to planet.freedesktop.org.

Though I understand that Collabora want to keep it on this blog and it does not make sense to add all of it to planet.freedesktop.org, but could perhaps posts tagged with "freedesktop" or something similar be put there?

Reply to this comment

Reply to this comment
Mark:
Oct 23, 2018 at 03:42 PM

The best solution would be to do no modeswitching at all, since all modern displays have a fixed, native resolution, and stick to GPU scaling instead for the main desktop configuration and isolated overlays for full-screen games, that think they need another, so they are scaled without any effect on applucations in the background. The question is not how to implement good modesetting but how to not modeset ever again.

Reply to this comment

Reply to this comment

Add a Comment

Search the newsroom

Latest Blog Posts

PipeWire workshop 2025: Updates on video transport, Rust efforts, TSN networking, and Bluetooth support

03/07/2025

As part of the activities Embedded Recipes in Nice, France, Collabora hosted a PipeWire workshop/hackfest, an opportunity for attendees…

Coccinelle for Rust progress report

25/06/2025

In collaboration with Inria, the French Institute for Research in Computer Science and Automation, Tathagata Roy shares the progress made…

Linux Media Summit 2025 recap

23/06/2025

Last month in Nice, active media developers came together for the annual Linux Media Summit to exchange insights and tackle ongoing challenges…

Constructor acquires, destructor releases

09/06/2025

In this final article based on Matt Godbolt's talk on making APIs easy to use and hard to misuse, I will discuss locking, an area where…

What if C++ had decades to learn?

21/05/2025

In this second article of a three-part series, I look at how Matt Godbolt uses modern C++ features to try to protect against misusing an…

Unleashing gst-python-ml: Python-powered ML analytics for GStreamer pipelines

12/05/2025

Powerful video analytics pipelines are easy to make when you're well-equipped. Combining GStreamer and Machine Learning frameworks are the…

About Collabora

Whether writing a line of code or shaping a longer-term strategic software development plan, we'll help you navigate the ever-evolving world of Open Source.

한국의 국기 한국어 버전의 Collabora.com 보기