Mainline Explicit Fencing

When it comes to buffer sharing synchronization in the kernel there are two ways of doing it: Implicit Fencing and Explicit Fencing. The difference between them relies on the fact that the kernel may or may not share synchronization information with userspace, it will either be implicit, with no fencing information provided, or explicit with all information available to userspace.

The fencing synchronization mechanism allows the sharing of buffers without the risk of a driver or userspace to read an incomplete buffer or write to a buffer that is still under use somewhere else in the system. The fencing provides ordering to these operations to make reads or writes happen only when the buffer is not used by other drivers anymore. For example,when a GPU job is queued a fence is associated to the buffer in the job, that fence can be used by other drivers for synchronization purposes, they won’t use the buffer a signal from the fence is received. The signal means the buffers is now free to be used. Similarly we can have the same setting for the GPU driver to wait the buffer to come out of the screen to render on it again.

The central piece here is the fence, an element that is attached to each buffer whenever a request involving the buffer is sent to the kernel. The fence can be used by userspace or other drivers to wait for the work to finish. So once the work is finished the fence signals and the waiter can proceed and do whatever they want with the buffer.

While Implicit Fencing helps a lot with buffer synchronization there are a few cases where the whole desktop compositing could stall. Imagine the following compositor flow: there are 3 buffers to process, A, B and C. A and B are sent for rendering in parallel while C is going to be composed of both A and B. But the compositor will only be notified when both buffers are rendered thus if B takes too long the compositing of the whole desktop will be blocked waiting for B and C won’t be displayed in time.

Figure 1. A compositor processing two buffers in parallel, with Implicit Fencing if B takes too long the desktop compositor freezes.

However with Explicit Fencing the compositor should have one fence for each buffer and will be notified when each buffer is rendered. So if A renders fast and B takes too long the compositor can decide not wait for B and proceed with the scanout of C with buffer A but an old version of B. The fencing information allows the compositor to be smart and take decisions to avoid the screen to freeze for example.

As of today the Linux Kernel only has generic APIs for Implicit Fencing, although some drivers have Explicit Fencing already their APIs are device specific. Android currently has its own implementation through the Android Sync Framework – which will be explained in the next article.

Explicit Fencing works on a Consumer-Producer fashion. In an GPU rendering + scanout to the screen pipeline it would synchronize between the kernel drivers, so when submitting a new rendering job to the GPU(Producer side) userspace would get back a fence related to that buffer submitted. That means userspace doesn’t need to block waiting for the job to complete, a signal is sent when the job is finished. As userspace doesn’t need to block it and has a fence of the buffer it then can proceed right away with the syscall to ask the display hardware(Consumer) to scanout the buffer that is yet to be processed. With explicit fencing the kernel is taught to wait for the fence to signal, before starting the scanout process.

A new fence is returned to userspace when the buffer is submitted to the kernel for scanout on the display hardware, that fence will signal when the buffer is not being displayed anymore, thus is ready for reuse by another rendering job. When the userspace gets this fence back it can submit a new rendering job to the GPU without waiting. The wait is done on the kernel side by the GPU driver, once the fence signals the rendering on that buffer can be initiated.

Figure 2. The fence travels all the way to userspace and the next element on the pipeline. The yellow arrows represents the fences on userspace.

Last but not least, debugability of the graphics pipeline is improved. Having access to the fence in userspace helps a lot understanding what is happening in the pipeline. Previously, with Implicit Fencing there was no infomation available, so it was hard to figure out what was happening on the pipeline, also each vendor was trying to implement their own Implicit Fencing mechanism. Now with an standard Explicit Fencing mechanism it easier to build debug/tracing infrastructure that can be used to investigate issues in any system.

The next article will explain the Android Sync Framework and later the work on mainline to support explicit fencing will be described.

Continue reading (Mainline Explicit Fencing - Part 2)…