February 01, 2021
Apitrace is a very good tool for recording the OpenGL calls of applications so that one can debug OpenGL issues and test performance and the correctness of the rendering without having access to the application. The latter is already done in the mesa CI. However, especially with games, because of intros, menus, and loading screens the traces can become rather large before the actual, interesting rendering happens. This puts a heavy computational burden on the CI runners that could be avoided if the trace could be trimmed to just render the frame(s) of interest. As an alternative RenderDoc exists that can be use to capture and replay exactly one frame, but it doesn’t support compatibility contexts, a feature that is used by many games.
Enters gltrim, an apitrace tool that has recently been added to apitrace and that is designed to trim traces to user selected target frame(s). Other than with the already available trim tool, its resulting traces can always be replayed. However, given the complexity of the object relationships that can be created in OpenGL it is not guaranteed that the rendering produced by the trace is also correct. For this gltrim provides means to keep additional states and setup frames. Below you’ll find a few notes about the possible object inter-dependencies in OpenGL, a sketch of how gltrim works, and a few example how it is applied in practice.
OpenGL offers many ways to fill textures with data:
The output to framebuffers is also dependent on many states that need to be tracked for each draw call. Considering that these so created textures can then again be used as a source for draw calls that manipulate or fill other textures, the dependency chains can be limitless.
Buffer data can also be changed in various ways: Apart from direct uploads, the contents of buffers can be manipulated by using shaders, or by copying the data from one buffer to another. With OpenGL 4.4 buffers can be mapped persistently, so that memory copies need to be tracked. Finally, the rendering output may not be cleared at each frame start, so that the actual displayed output is the result of the rendering done in multiple frames. Given all these possible inter-dependencies between OpenGL objects, doing an exhaustive tracking of the objects and states required to reproduce a certain frame from scratch can be very time and memory intensive.
RenderDoc works around this by actually executing the OpenGL so that when a frame is captured, it does not contain all calls to re-created the output from scratch, but also intermediate textures and buffer data. In the worst case one might capture a frame that recreates the output by only uploading texture data and doing a blit.
With the approach implemented in gltrim the aim is to re-create the output from scratch, always keeping all the needed draw calls and state changes. Early tries to do an exhaustive tracking did not work well: On one hand, they were quite slow, and needed a lot of memory, and on the other hand, the fact that some games display content over a number of frames made the approach where always only the target frames were specified quite unreliable.
Instead an alternative was implemented, that accepts that one might not get it right the first time when trying to trim a trace. Here, all OpenGL objects are tracked, but only a lightweight tracking of dependencies is done. Some setup calls, like the calls to create the rendering context, are always copied to the output trace. When the first target frame of one or more continuous frames is analysed, the calls of the current states and the state of the currently bound objects is copied to the output. Then, whenever an object is bound all the calls that are required to reproduce its state and also those of directly dependent object are copied.
With this method trimming a trace is pretty fast and has a low memory footprint. While the methods with more exhaustive tracking may take hours to complete, here a trimming a trace is a matter of minutes, or for short traces even only seconds. The output trace consists of one frame doing all the setup calls, and the target frame(s). When the trimming is done, qapitrace can and should be used to check the visual output and if the result is not correct, there are various options to improve it: On one hand one can specify that all simple state changes are kept. This especially helps with errors in the brightness of the output. The burden on the size of the trace are usually low, and also the final run-time of the trace does not increase a lot. If whole objects are missing in the output gltrim has the option to add extra frames to the setup frame. Usually, when a scene changes in a game, for instance, a new level is loaded, objects are created and data is uploaded. These frames can often be identified easily because they consists of more average number of OpenGL calls, and gltrim offers an option to list such frames when trimming is complete. By specifying these additional frames as setup frames, they are copied boilerplate (expect for the frame-end buffer swap) to the output setup frame and so are the calls of dependencies of objects used in these additional frames.
To obtain a trace that re-creates the intended output correctly, usually some experimenting is needed, but given the relatively short turnaround time for trimming a trace, this is a feasible approach.
In the best case the application of gltrim is straight forward: In the first example, a trace of the game Doom3 running on the Open Source implementation dhewm was taken and trimmed:
apitrace gltrim -f 666 -o doom3_trim_frame666.trace doom3.trace
In this case the original trace was trimmed from a file size of 300 MB down to 32 MB with the target frame 666 rendering like in the original trace.
For a traces taken from Alien Isolation the naive approach to trimming doesn’t result in a trace correctly reproducing the original image, that is, running
apitrace gltrim -f 2000 -o AI_trim_frame2000.trace AI.trace
results in some textures not being rendered with the correct brightness (Figure 1).
|Figure 1: Original rendering (left), naive trimming (right), not the lower intensities in some areas.|
Trimming the trace by keeping all state changes fixes the problem and is like
apitrace gltrim -f 2000 -k -o AI_trim_keep_states_frame2000.trace AI.trace
Sometimes when trimming traces in the naive way geometry goes missing (Figure 2).
|Figure 2: Original rendering (left) versus naive trimming (right). Geometry is missing the trimmed output because the background might have been drawn in another frame and was not retained.|
Here one might analyse the trace for large frames that usually contain extra setup calls that might contain the creation of the missing parts. To obtain a list of the frames with the most OpenGL calls:
gltrim -f 4000 -t 10 Unturned.trace -o Unturned-virgl-f4000.trace
resulting in an output like:
... Calls per frame: Frame = 56743 Frame = 42453 Frame = 33820 Frame = 26747 Frame = 24379 Frame = 6145 Frame = 5848 Frame = 5392 Frame = 5313 Frame = 5282
With this information one can start to add additional setup frames. Here the calls with a significantly larger number closest to the target frame are frames 3806, 3813, and 3814. Only adding the very large frame 3806 is not sufficient (see Figure 3, right), but it is required to get the geometry right. Adding frame 3814 too finally results in correct rendering.
|Figure 3: Rendering after adding the frame with the highest number of GL calls, 3806, to the setup frame (left), and also the large frame closest to the target frame, 3814 (right). Note, only adding frame 3806 as setup frame corrects the geometry, but the shading of the mountains compared to the origional (Figure 2, left) is still off, adding another setup frame fixes this (right).|
Civilization V creates the board tiles over time using the same combination of texture and framebuffer object so that tracking the dependency becomes difficult. Direct trimming results in uninitialized tile memory (Figure 4, upper right). Adding setup frames starting from the frame right after the loading screen removes the error. In this case in total of 45 additional frames are required to get correct rendering.
|Figure 4: Civilization V: rendering of original trace (upper left) and after trimming to one frame without setup frames (upper right), with 15 setup frames starting at a large frame (middle left), 25 setup frames (middle right). 35 setup frames (lower left), and finally with 45 setup frames (lower right), which finally results in correct rendering.|
Another example of output created over time is motion blur, where the rendering output is accumulated over a series of frames. An example where this occurs is SOMA (Figure 5)
|Figure 5: Original (upper, left), trimming to target frame only (upper right), trimming to target frame only keeping all states (lower left), adding one setup frame and all states (lower right). This latter output seems close but is is actually not correct.|
Here keeping the whole motion blurred scene might be needed to get the pixel-exact same rendering.
gltrim provides the means to trim apitrace traces to contain only a few frames that can still render correctly. It is a semi-automatic tool that requires some manual interaction to check and optimize the output, but the obtained trimmed traces have a significantly lower size (depending on the input trace as low as 10% of the original trace), and a lower run-time (usually about half the time the original trace trimmed to end at the target frame needs to run, sometimes even less).
By removing the intro and menu frames the resulting traces are way more useful for doing performance analysis, and by reducing the memory footprint and the run time, it becomes more feasible to use traces of resource-intensive games in the Mesa CI. In theory, it should be possible to wrap the tool in some script that automatically pick setup frames and options optimize the output, one thing we might look into in the future.
Did you know you could run a permissively-licensed MTP implementation with minimal dependencies on an embedded device? Here's a step-by-step…
Earlier this year, the Rust compiler gained support for LLVM source-base code coverage. In this post we'll explain how to setup a CI job…
Over the past few months, I've been working on a side project to improve Meson sub-project support. The best stress test is to build projects…
The most complete automated testing and continuous integration tool for the Linux kernel continues to evolve at a rapid pace. Here's a look…
In the embedded world, many modern SoCs such as the ST Microelectronics STM32MP1 now include coprocessor cores which can be used for a wide…
Our recent efforts on the Hantro kernel driver have resulted in the addition of H.264 decoding support and multiple performance improvements.…