August 31, 2020
Having a comprehensive and accelerated graphics stack is essential in today's world. But where would we be without one, or how do we as developers handle the lack or instability of drivers during very early hardware bring up?
Simple - we use drivers which do all the rendering via the CPU. An interesting fact is that during the dawn of 3D graphics CPU rendering was not uncommon.
With time, GPU devices became more popular, powerful and cheaper. Thus CPU (software) rendering, is not so wide spread these days and mainly used as a fallback.
Within this article, we'll provide a high-level introduction of the Linux graphics stack, how it is used within ChromeOS and the work done to improve software rendering (while simultaneously improving GPU rendering, by reducing the boilerplate needed in applications).
|Original photo by Anete Lusina on Unsplash.|
Mesa is de-facto the open-source driver implementation for accelerated graphics in the Linux world. Although as we can see there is ongoing work to make it a significant component on Microsoft Windows.
The drivers themselves, provide our rendering APIs - OpenGL and GLES To bridge those with the OS windowing system (like Wayland, X11 or KMS) we need additional API - EGL or GLX, which is part of the drivers provided by Mesa.
One of our goals is to be as flexible as possible, while minimising the amount of legacy code required - so in our case we're using OpenGL/GLES and EGL. In particular we are making use of the EGL_MESA_platform_surfaceless extension. It allows us to use OpenGL or GLES and render into a memory area, not requiring integration with the display subsystem.
Effectively allowing us to have a single code path, devoid of deep and convoluted logic about Wayland, X11 or KMS. So, job well done you say?
Not quite, there is this thing called authentication which is stopping us.
The Linux kernel graphics subsystem is called Direct Rendering Manager or DRM for short. Not to be confused with Digital Rights Management which is a completely separate thing.
The DRM subsystem has a wide variety of jobs. At the high-level they fall into one of several groups - vendor agnostic API for managing the display controller (KMS), GPU rendering, buffer allocation and cross process/context sharing.
Each application using DRM is called a client. There can be multiple clients using a DRM driver at any given time.
In the early days, both hardware and drivers, lacked the relevant separation between clients. It meant that, we needed a way to 'allow' only a given number or set of clients. This is something orchestrated by the first and most versatile client - the master. The master has the powers to grant access via an authentication session triggered by each client.
To accommodate for such requests, both Wayland and Xorg, had to wrap around the native DRM/KMS authentication method and define their own respective protocols. Thus programs, or their respective toolkits like GTK or Qt, were responsible for detecting the session type (Wayland, Xorg or KMS) and triggering the corresponding authentication code path.
Fast forward a couple of decades - both hardware and drivers have become more capable and secure, providing robust isolation across clients. Yet the authentication remained :-\
So despite our use of EGL_MESA_platform_surfaceless, we still needed some Wayland/X11 specific code. But why? As said a few moments ago, modern drivers are robust enough. Can we can simply remove the authentication requirement? Turns out we can.
As a nice bonus point, this also helps programs which were missing any of the three authentication paths - Wayland, X11 or KMS. If the said program was not open-source, we had no way of adding the required code. Therefore there was only one way to make it work - set the CAP_SYS_ADMIN capability, effectively running the program with root privileges. Something which many of us feel uncomfortable with as it grants complete access to the system.
Indeed, I did.
Today there are two types of software GL drivers, or rasterizers as they're being called. Ones that talk through X and ones which use an unaccelerated "dumb" framebuffers, exposed by the DRM driver.
The dumb buffers themselves, are a special subclass of objects exposed by DRM - they are exposed in a consistent vendor-agnostic manner, yet they lack any acceleration by the rendering device. Hence applications are allowed to draw onto those buffers only via the CPU - our usecase.
By using them, we can omit the final piece of legacy X11/Xorg code for ChromeOS, in its platform abstraction layer Ozone.
Although, using dumb buffers also requires authentication. So removing the requirement, as per our last section, effectively allows us to use software GL drivers within ChromeOS.
Among the first parts of my investigation, was to examine the inner workings of the authentication mechanism both within the kernel DRM subsystem, and the respective wrappers for Xorg and Wayland.
The results where promising - the APIs were designed only to grant permission, with no mention of revocation. This indicated that there was no explicit tracking within Xorg or Wayland but only within DRM itself.
For the next step, I embarked on auditing of all GPU DRM drivers in the kernel. While many drivers required a trivial patch to remove the DRM_AUTH notation, others like QXL where found to lack any client separation. In one particular case, additional work was required by the driver maintainers to make the existing code more robust.
With the individual drivers complete, it was time for the final kernel piece - the dumb buffers handling, as part of core DRM. Those proved safe and my observations were confirmed by the DRM maintainers.
As seemingly all the work was done, it was time to verify my analysis and observations.
I've started small: To verify that the overall authentication removal works as expected, I've hacked a few standalone utilities like vainfo. Additionally experimented with removing the authentication requests from Mesa and running GL applications against it. Without much surprise everything was working like a charm - regardless whether the application was using GLX or EGL, or was ran within a Xorg or a weston (a Wayland compositor) session.
The final step was to build ChromiumOS and run it. The initial run was within QEMU backed by the virgl GPU driver. For this use-case, I've deployed my earlier Mesa hacks, to verify the authentication, or lack thereof, did not cause any issues. With that working as expected, I also confirmed that software rendering was working fine on an actual Chromebook device. Despite that software GL drivers were used, the overall experience of booting the system and light web browsing was surprisingly smooth and responsive.
In a true, open-first manner, all the patches have been upstreamed. In particular all the patches are in Linux 5.6. If you're using an older kernel you can apply the patches manually:
Did you know you could run a permissively-licensed MTP implementation with minimal dependencies on an embedded device? Here's a step-by-step…
Earlier this year, the Rust compiler gained support for LLVM source-base code coverage. In this post we'll explain how to setup a CI job…
Over the past few months, I've been working on a side project to improve Meson sub-project support. The best stress test is to build projects…
The most complete automated testing and continuous integration tool for the Linux kernel continues to evolve at a rapid pace. Here's a look…
In the embedded world, many modern SoCs such as the ST Microelectronics STM32MP1 now include coprocessor cores which can be used for a wide…
Our recent efforts on the Hantro kernel driver have resulted in the addition of H.264 decoding support and multiple performance improvements.…