*

Wayland on MALI

Posted on 13/08/2014 by Daniel Stone

Fast, efficient and reliable media with Wayland

Over the past several years at Collabora, we have worked on Linux's graphics stack from top to bottom, from kernel-level hardware enablement through to the end applications. A particular focus has always been performance: not only increasing average throughput and performance metrics, but ensuring consistent results every time. One of the core underpinnings of the Linux graphics stack from its very inception has been the X Window System, which recently celebrated its 29th anniversary. Collabora have been one of the most prolific contributors to X.Org for the past several years, supporting its core development, but over the past few years we have also been working on its replacement - Wayland.

Replacing something such as X is not to be taken lightly; we view Wayland as the culmination of the last decade of the work by the entire open-source graphics community. Wayland reached 1.0 maturity in 2012, and since then has shipped in millions of smart TVs, set-top boxes, IVI systems, and more. This week at SIGGRAPH together with ARM, we have been showcasing some of our recent development on Wayland, as well as on the entire graphics stack, to provide best-in-class media playback with GStreamer.

 

'Every frame is perfect'

Graph showing frame latency

Wayland's core value proposition for end users is simple: every frame must be perfect. What we mean by that, is that the user will never see any unintended or partially-rendered content, or any graphical glitches such as tearing. In contrast to X11, where the server performs rendering on behalf of its clients, which not only requires expensive parallelisation-destroying synchronisation with the GPU, but is often an unwanted side effect of unrelated requests, Wayland's buffer-oriented model places the client firmly in control of what the user will see. The user will only ever be shown exactly the content that the client requests, in the exact way that it requests it: painstaking care has been taken to ensure that not only do these intermediate states not exist, but that any unnecessary synchronisation has been removed. The combination of perfect frames and lower latency results in a natural, fluid-feeling user experience.

 

 

Power and resource efficient

Graph showing frame latency

Much of the impetus for Wayland's development came from ARM-based devices, such as smart TVs and set-top boxes, digital signage, and mobile, where not only is power efficiency key, but increased demands such as 4K media mean in order to ship a functioning product in the first place, the hardware must be pushed right to the margins of its capabilities. In order to achieve these demanding targets, the window system must make full use of all IP blocks provided by the platform, particularly hardware media decoders and any video overlays provided by the display controller. Not only must it use these blocks, but it must eliminate any copies of the content made along the way.

X11 has two core problems which preclude it making full use of these features. Firstly, as X11 provides a rendering-command rather than a buffer-driven interface to clients, it is extremely difficult to integrate with hardware media decoders without making a copy of the full decoded media frame, consuming valuable memory bandwidth and time. Secondly, the X11 server is fundamentally unaware of the scene graph produced by the separate compositor, which precludes use of hardware overlays: the only interface it provides for doing this is OpenGL ES rendering, requiring another copy of the content. This increased memory bandwidth and power usage makes it extremely difficult to ship compelling products in a media-led environment.

By contrast, Wayland's buffer-driven model is a natural fit for the hardware media engines of today and tomorrow, and the integration of the display server and compositor makes it easy to use the full functionality of the display controller to provide low-power media display, whilst reserving as much memory bandwidth as possible for other applications to run without having to contend with media playback for crucial system resources, or to push systems to their limits, such as 4K content on relatively low-spec systems.

 

A first-class media experience

To complement our hundreds of man-years of work on the industry-standard GStreamer media framework, which has proven to scale from playback on mobile devices to serving huge live broadcast streams, Collabora has worked to ensure that Wayland provides a first-class experience when used together with GStreamer. Our recent development work on both Wayland itself and GStreamer's Wayland support, ensures that GStreamer can realise its full potential when used together with Wayland. All media playback naturally occurs in a 'zero-copy' fashion, from hardware decoding engines into either the 3D GPU or display controller, thanks to DMA-BUF buffer passing, new in version 3.16 of the Linux kernel.

The Wayland subsurface mechanism allows videos to be streamed separately to UI content, rather than combined by the client as they are today in X11. This separation allows the display server to make a frame-by-frame decision as to how to present it: using power-efficient hardware overlays, or using the more flexible and capable 3D GPU. This step allows maximum UI flexibility whilst also making the most of hardware IP blocks. The scaling mechanism also allows the compositor to scale the video at the last minute, potentially using high-quality scaling and filtering engines within the display controller, as well as reducing precious memory bandwidth usage when upscaling videos.

Deep buffer queues are also possible for the first time, with both GStreamer and Wayland supporting ahead-of-time buffer queueing, where every buffer has a target time attached. Under this model, it is possible for the client to queue up a large number of frames in advance, offload them all to the compositor, and then go to sleep whilst they are autonomously displayed, saving CPU usage and power. Wayland also provides GStreamer with feedback on when exactly their buffers were shown on screen, allowing it to automatically adjust its internal pipeline and clock for the tightest possible A/V sync.

 

Easier deployment and support

In contrast to the X11 model of providing a driver specific to the combination of X server version, display controller and 3D GPU, Wayland offers vendors the ability to deploy drivers written according to external, well-tested, vendor-independent APIs. These drivers are required to perform only limited, well-scoped tasks, making validation, performance testing, and support much easier than under X11. This model makes it possible for vendors to deploy a single well-tested solution for Wayland, and for end users to deploy them in the knowledge that they will have reliable performance and functionality.

We are demonstrating all this at SIGGRAPH, on the ARM booth at stand #933 in the Mobility Pavilion on the Exhibition Hall. We are showing a side-by-side comparison of Wayland and X11 on Samsung Chromebook 2 machines (Samsung Exynos 5800 Octa hardware, with an ARM Mali-T628 GPU), demonstrating Collabora's expertise from the very bottom of the stack to the very top. Collabora's in-house Singularity OS runs a Linux 3.16-rc5 kernel, containing changes bound for upstream to improve and stabilise hardware support, and an early preview of atomic modesetting support inside the Exynos kernel modesetting driver for the display controller.

The Wayland machine runs Weston with the new DMA-BUF and buffer-queueing extensions on top of atomic modesetting, demonstrating that videos played through GStreamer can be seamlessly switched between display controller hardware overlays and the Mali 3D GPU, using the DMA-BUF import EGL extension. The X11 machine runs the ChromeOS X11 driver, with a client which plays video through OpenGL ES at all times. The power usage, frame 'lateness' (difference between target display time and actual time), and CPU usage are shown, with Wayland providing a dramatic improvement in all these metrics.

 Note: graphs are for illustration purposes only. Data is accurate.

Comments (6)

  1. Sumit Semwal:
    Aug 15, 2014 at 06:48 AM

    Hi,

    AFAIK, dma-buf framework was first included in Linux kernel in 3.4, so I think the post a little off on the 'newness' of it in the kernel :-)

    Reply to this comment

    Reply to this comment

  2. seniorivn:
    Apr 04, 2015 at 10:49 PM

    do you think that it's mean that port of mer project and sailfish os on devices with mali gpu is possible?

    Reply to this comment

    Reply to this comment

  3. daniels:
    Apr 07, 2015 at 03:01 PM

    @seniorivn: Yes it does, although SailfishOS currently uses libhybris to work with Android-provided drivers; this would need to be changed to the standard GBM/EGL system. I don't expect this would be a particularly big problem though. If you have an existing Android system with drivers for Mali GPUs, then you should be able to directly reuse that.

    Reply to this comment

    Reply to this comment

  4. esbeeb:
    Dec 05, 2015 at 10:41 AM

    What license is this released under? Where is the source code? Can the source code be merged into the Linux Kernel? Or some Debian package? Where can activity in this respect be seen? As far as I can tell, this "progress" explained above is effectively just vaporware, which hasn't really left the Collabora office. I sure don't see it "in the wild." Or, if it is actually "in the wild", it must be incredibly difficult to set up and use, not being merged upstream to wherever the code ought to go, *which would actually result in mass deployment*.

    Would someone from Collabora please explain things like licensing, where to download, where this has been merged upstream, etc? Why haven't these innovations really seen the light of day? Will we see this software running on a C.H.I.P (which has Mali graphics)? It's been well over a year, and I'm not seeing any "movement" on this in the Open Source world...

    Reply to this comment

    Reply to this comment

  5. daniels:
    Dec 10, 2015 at 05:13 AM

    esbeeb,
    A lot of progress has actually been made on this work, in the open. Firstly, to get it out of the way: the Mali driver itself consists of an open (kernel) and closed (userspace) portion. The kernel portion is always available under the GPLv2, and can be found in numerous places, including from ARM's own download site. A userspace build which supports Wayland is not yet available from ARM directly, however this is on their roadmap and is expected to be available shortly.

    However, the Mali driver is not specifically required for the work we have done, which is much more generally applicable, including to open drivers. Gustavo Padovan's work on turning the Exynos KMS driver is already available in released kernels, and continues to be pushed upstream, as has Nicolas Dufresne's work on the MFC media decoder and GStreamer V4L2 integration. The work done on the dmabuf Wayland extension for zerocopy media has been available in Weston for a couple of months, after an extensive period of review; the most recent patchset for continued enhancements was sent to the Wayland mailing list a couple of weeks ago. The presentation-timing protocol for accurate media timing has been in Weston for well over a year now, and the GStreamer waylandsink work which uses all this is again under review. The GStreamer and dmabuf patchsets have taken quite some time through review, as they have relied on large changes to the world around us: further EGL extensions, a redesign of how GStreamer uses dmabuf and how that integrates with other components.

    In short, everything under our control was developed in the open, and has been pushed upstream. If there's anything else you think is missing, please let us know; if you would like to see Wayland support in the Mali driver, please let your vendor (in this case, the ChIP team) know.

    Reply to this comment

    Reply to this comment

    1. esbeeb:
      Jan 09, 2016 at 09:03 AM

      Thanks for all this juicy info. It heartens me to hear about all this progress mentioned. Thank you, Collabora, for being good sports and pushing your changes upstream, "in the open". I didn't realize how many separate components there are in making this all "just work". I hope more Youtube videos get posted, to show newly-working functionality *as it works in upstream* (not just in testing scenarios, which are difficult for most people to replicate).

      Reply to this comment

      Reply to this comment


Add a Comment





Allowed tags: <b><i><br>Add a new comment:


Latest Blog Posts

ipcpipeline: Splitting a GStreamer pipeline into multiple processes

17/11/2017

Earlier this year I worked on a certain GStreamer plugin that is called “ipcpipeline”. This plugin provides elements that make it possible…

Quick hack: Experiments with crosvm

09/11/2017

Running crosvm outside Chromium OS is quite easy, with the only complication being that minijail isn't widely packaged in distros. In these…

Tracing memory leaks in the NFC Digital Protocol stack

06/11/2017

Kmemleak allows you to track possible memory leaks inside the Linux kernel. Basically, it tracks dynamically allocated memory blocks in…

Who knew we still had low-hanging fruit?

17/10/2017

Earlier this month I had the pleasure of attending the Web Engines Hackfest, hosted by Igalia at their offices in A Coruña, and also sponsored…

Performance analysis in Linux (continued)

06/10/2017

In this post, I will show one more example of how easy it is to disrupt performance of a modern CPU, and also run a quick discussion on…

XDC 2017 - Links to recorded presentations (videos)

23/09/2017

Many thanks to Google for recording all the XDC2017 talks. To make them easier to watch, here are direct links to each talk recorded at…

Open Since 2005

We use cookies on this website to ensure that you get the best experience. By continuing to use this website you are consenting to the use of these cookies. To find out more please follow this link.

Collabora Ltd © 2005-2017. All rights reserved. Website sitemap.