Weston repaint scheduling

Posted on 11/02/2015 by Pekka Paalanen

Now that Presentation feedback has finally landed in Weston (feedback, flags), people are starting to pay attention to the output timings as now you can better measure them. I have seen a couple of complaints already that Weston has an extra frame of latency, and this is true. I also have a patch series to fix it that I am going to propose.

To explain how the patch series affects Weston's repaint loop, I made some JSON-timeline recordings before and after, and produced some graphs with Wesgr. Here I will explain how the repaint loop works timing-wise.

The old algorithm

The old repaint scheduling algorithm in Weston repaints immediately on receiving the pageflip completion event. This maximizes the time available for the compositor itself to repaint, but it also means that clients can never hit the very next vblank / pageflip.

Figure 1. The old algorithm, the client paints as response to frame callbacks.

Frame callback events are sent at the "post repaint" step. This gives clients almost a full frame's time to draw and send their content before the compositor goes to "begin repaint" again. In Figure 1. you see, that if a client paints extremely fast, the latency to screen is almost two frame periods. The frame latency can never be less than one frame period, because the compositor samples the surface contents (the "repaint flush" point) immediately after the previous vblank.

Figure 2. The old algorithm, the client paints as response to Presentation feedback events.

While frame callback driven clients still get to the full frame rate, the situation is worse if the client painting is driven by presentation_feedback.presented events. The intent is to draw and show a new frame as soon as the old frame was shown. Because Weston starts repaint immediately on the pageflip completion, which is essentially the same time when Presentation feedback is sent, the client cannot hit the repaint of this frame and gets postponed to the next. This is the same two frame latency as with frame callbacks, but here the framerate is halved because the client waits for the frame to be actually shown before continuing, as is evident in Figure 2.

Figure 3. The old algorithm, client posts a frame while the compositor is idle.

Figure 3. shows a less relevant case, where the compositor is idle while a client posts a new frame ("damage commit"). When the compositor is idle graphics-wise (the gray background in the figure), it is not repainting continuously according to the output scanout cycle. To start painting again, Weston waits for an extra vblank first, then repaints, and then the new frame is shown on the next vblank. This is also a 1-2 frame period latency, but it is unrelated to the other two cases, and is not changed by the patches.

The modification to the algorithm

The modification is simple, yet perhaps counter-intuitive at first. We reduce the latency by adding a delay. The "delay before repaint" is in all the figures, and the old algorithm is essentially using a zero delay. The compositor's repaint is delayed so that clients have a chance to post a new frame before the compositor samples the surface contents.

A good amount of delay is a hard question. Too small delay and clients do not have time to act. Too long delay and the compositor itself will be in danger of missing the vblank deadline. I do not know what a good amount is or how to derive it, so I just made it configurable. You can set the repaint window length in milliseconds in weston.ini. The repaint window is the time from starting repaint to the deadline, so the delay is the frame period minus the repaint window. If the repaint window is too long for a frame period, the algorithm will reduce to the old behaviour.

The new algorithm

The following figures are made with a 60 Hz refresh and a 7 millisecond repaint window.

Figure 4. The new algorithm, the client paints as response to frame callback.

When a client paints as response to the frame callback (Figure 4), it still has a whole frame period of time to paint and post the frame. The total latency to screen is a little shorter now, by the length of the delay before compositor's repaint. It is a slight improvement.

Figure 5. The new algorithm, the client paints as response to Presentation feedback.

A significant improvement can be seen in Figure 5. A client that uses the Presentation extension to wait for a frame to be actually shown before painting again is now able to reach the full output frame rate. It just needs to paint and post a new frame during the delay before compositor's repaint. This mode of operation provides the shortest possible latency to screen as the client is able to target the very next vblank. The latency is below one frame period if the deadlines are met.


This is a relatively simple change that should reduce display latency, but analyzing how exactly it affects things is not trivial. That is why Wesgr was born.

This change does not really allow clients to wait some additional time before painting to reduce the latency even more, because nothing tells clients when the compositor will repaint exactly. The risk of missing an unknown deadline grows the later a client paints. Would knowing the deadline have practical applications? I'm not sure.

These figures also show the difference between the frame callback and Presentation feedback. When a client's repaint loop is driven by frame callbacks, it maximizes the time available for repainting, which reduces the possibility to miss the deadline. If a client drives its repaint loop by Presentation feedback events, it minimizes the display latency at the cost of increased risk of missing the deadline.

All the above ignores a few things. First, we assume that the time of display is the point of vblank which starts to scan out the new frame. Scanning out a frame actually takes most of the frame period, it's not instantaneous. Going deeper, updating the framebuffer during scanout period instead of vblank could allow reducing latency even more, but the matter becomes complicated and even somewhat subjective. I hear some people prefer tearing to reduce the latency further. Second, we also ignore any fencing issues that might come up in practise. If a client submits a GPU job that takes a long while, there is a good chance it will cause everything to miss a deadline or more.

As usual, this work and most of the development of JSON-timeline and Wesgr were sponsored by Collabora.

PS. Latency and timing issues are nothing new. Owen Taylor has several excellent posts on related subjects in his blog.

Original post

Comments (0)

Add a Comment

Allowed tags: <b><i><br>Add a new comment:

Latest Blog Posts

ipcpipeline: Splitting a GStreamer pipeline into multiple processes


Earlier this year I worked on a certain GStreamer plugin that is called “ipcpipeline”. This plugin provides elements that make it possible…

Quick hack: Experiments with crosvm


Running crosvm outside Chromium OS is quite easy, with the only complication being that minijail isn't widely packaged in distros. In these…

Tracing memory leaks in the NFC Digital Protocol stack


Kmemleak allows you to track possible memory leaks inside the Linux kernel. Basically, it tracks dynamically allocated memory blocks in…

Who knew we still had low-hanging fruit?


Earlier this month I had the pleasure of attending the Web Engines Hackfest, hosted by Igalia at their offices in A Coruña, and also sponsored…

Performance analysis in Linux (continued)


In this post, I will show one more example of how easy it is to disrupt performance of a modern CPU, and also run a quick discussion on…

XDC 2017 - Links to recorded presentations (videos)


Many thanks to Google for recording all the XDC2017 talks. To make them easier to watch, here are direct links to each talk recorded at…

Open Since 2005

We use cookies on this website to ensure that you get the best experience. By continuing to use this website you are consenting to the use of these cookies. To find out more please follow this link.

Collabora Ltd © 2005-2017. All rights reserved. Website sitemap.