August 13, 2014
Over the past several years at Collabora, we have worked on Linux's graphics stack from top to bottom, from kernel-level hardware enablement through to the end applications. A particular focus has always been performance: not only increasing average throughput and performance metrics, but ensuring consistent results every time. One of the core underpinnings of the Linux graphics stack from its very inception has been the X Window System, which recently celebrated its 29th anniversary. Collabora have been one of the most prolific contributors to X.Org for the past several years, supporting its core development, but over the past few years we have also been working on its replacement - Wayland.
Replacing something such as X is not to be taken lightly; we view Wayland as the culmination of the last decade of the work by the entire open-source graphics community. Wayland reached 1.0 maturity in 2012, and since then has shipped in millions of smart TVs, set-top boxes, IVI systems, and more. This week at SIGGRAPH together with ARM, we have been showcasing some of our recent development on Wayland, as well as on the entire graphics stack, to provide best-in-class media playback with GStreamer.
Wayland's core value proposition for end users is simple: every frame must be perfect. What we mean by that, is that the user will never see any unintended or partially-rendered content, or any graphical glitches such as tearing. In contrast to X11, where the server performs rendering on behalf of its clients, which not only requires expensive parallelisation-destroying synchronisation with the GPU, but is often an unwanted side effect of unrelated requests, Wayland's buffer-oriented model places the client firmly in control of what the user will see. The user will only ever be shown exactly the content that the client requests, in the exact way that it requests it: painstaking care has been taken to ensure that not only do these intermediate states not exist, but that any unnecessary synchronisation has been removed. The combination of perfect frames and lower latency results in a natural, fluid-feeling user experience.
Much of the impetus for Wayland's development came from ARM-based devices, such as smart TVs and set-top boxes, digital signage, and mobile, where not only is power efficiency key, but increased demands such as 4K media mean in order to ship a functioning product in the first place, the hardware must be pushed right to the margins of its capabilities. In order to achieve these demanding targets, the window system must make full use of all IP blocks provided by the platform, particularly hardware media decoders and any video overlays provided by the display controller. Not only must it use these blocks, but it must eliminate any copies of the content made along the way.
X11 has two core problems which preclude it making full use of these features. Firstly, as X11 provides a rendering-command rather than a buffer-driven interface to clients, it is extremely difficult to integrate with hardware media decoders without making a copy of the full decoded media frame, consuming valuable memory bandwidth and time. Secondly, the X11 server is fundamentally unaware of the scene graph produced by the separate compositor, which precludes use of hardware overlays: the only interface it provides for doing this is OpenGL ES rendering, requiring another copy of the content. This increased memory bandwidth and power usage makes it extremely difficult to ship compelling products in a media-led environment.
By contrast, Wayland's buffer-driven model is a natural fit for the hardware media engines of today and tomorrow, and the integration of the display server and compositor makes it easy to use the full functionality of the display controller to provide low-power media display, whilst reserving as much memory bandwidth as possible for other applications to run without having to contend with media playback for crucial system resources, or to push systems to their limits, such as 4K content on relatively low-spec systems.
To complement our hundreds of man-years of work on the industry-standard GStreamer media framework, which has proven to scale from playback on mobile devices to serving huge live broadcast streams, Collabora has worked to ensure that Wayland provides a first-class experience when used together with GStreamer. Our recent development work on both Wayland itself and GStreamer's Wayland support, ensures that GStreamer can realise its full potential when used together with Wayland. All media playback naturally occurs in a 'zero-copy' fashion, from hardware decoding engines into either the 3D GPU or display controller, thanks to DMA-BUF buffer passing, new in version 3.16 of the Linux kernel.
The Wayland subsurface mechanism allows videos to be streamed separately to UI content, rather than combined by the client as they are today in X11. This separation allows the display server to make a frame-by-frame decision as to how to present it: using power-efficient hardware overlays, or using the more flexible and capable 3D GPU. This step allows maximum UI flexibility whilst also making the most of hardware IP blocks. The scaling mechanism also allows the compositor to scale the video at the last minute, potentially using high-quality scaling and filtering engines within the display controller, as well as reducing precious memory bandwidth usage when upscaling videos.
Deep buffer queues are also possible for the first time, with both GStreamer and Wayland supporting ahead-of-time buffer queueing, where every buffer has a target time attached. Under this model, it is possible for the client to queue up a large number of frames in advance, offload them all to the compositor, and then go to sleep whilst they are autonomously displayed, saving CPU usage and power. Wayland also provides GStreamer with feedback on when exactly their buffers were shown on screen, allowing it to automatically adjust its internal pipeline and clock for the tightest possible A/V sync.
In contrast to the X11 model of providing a driver specific to the combination of X server version, display controller and 3D GPU, Wayland offers vendors the ability to deploy drivers written according to external, well-tested, vendor-independent APIs. These drivers are required to perform only limited, well-scoped tasks, making validation, performance testing, and support much easier than under X11. This model makes it possible for vendors to deploy a single well-tested solution for Wayland, and for end users to deploy them in the knowledge that they will have reliable performance and functionality.
We are demonstrating all this at SIGGRAPH, on the ARM booth at stand #933 in the Mobility Pavilion on the Exhibition Hall. We are showing a side-by-side comparison of Wayland and X11 on Samsung Chromebook 2 machines (Samsung Exynos 5800 Octa hardware, with an ARM Mali-T628 GPU), demonstrating Collabora's expertise from the very bottom of the stack to the very top. Collabora's in-house Singularity OS runs a Linux 3.16-rc5 kernel, containing changes bound for upstream to improve and stabilise hardware support, and an early preview of atomic modesetting support inside the Exynos kernel modesetting driver for the display controller.
The Wayland machine runs Weston with the new DMA-BUF and buffer-queueing extensions on top of atomic modesetting, demonstrating that videos played through GStreamer can be seamlessly switched between display controller hardware overlays and the Mali 3D GPU, using the DMA-BUF import EGL extension. The X11 machine runs the ChromeOS X11 driver, with a client which plays video through OpenGL ES at all times. The power usage, frame 'lateness' (difference between target display time and actual time), and CPU usage are shown, with Wayland providing a dramatic improvement in all these metrics.
Note: graphs are for illustration purposes only. Data is accurate.
Did you know you could run a permissively-licensed MTP implementation with minimal dependencies on an embedded device? Here's a step-by-step…
Earlier this year, the Rust compiler gained support for LLVM source-base code coverage. In this post we'll explain how to setup a CI job…
Over the past few months, I've been working on a side project to improve Meson sub-project support. The best stress test is to build projects…
The most complete automated testing and continuous integration tool for the Linux kernel continues to evolve at a rapid pace. Here's a look…
In the embedded world, many modern SoCs such as the ST Microelectronics STM32MP1 now include coprocessor cores which can be used for a wide…
Our recent efforts on the Hantro kernel driver have resulted in the addition of H.264 decoding support and multiple performance improvements.…