Tracing stateless video hardware decoding in V4L2

Tracing stateless video hardware decoding in V4L2

Deborah Brouwer
December 02, 2022

Share this post:

Reading time:

Earlier this year, I joined Collabora for a six-month internship to learn how V4L2 (Video4Linux2) supports stateless video hardware decoding. My project was to build a utility that traced and replayed stateless decoding from a userspace perspective. The utility, called the v4l2-tracer, is intended to be part of v4l-utils, a collection of utilities and libraries to handle media devices. The code is currently under review on the mailing list: [PATCH v4] utils: add v4l2-tracer utility.

Although there are many excellent tracing tools, such as strace, the v4l2-tracer traces V4L2 stateless decoding more comprehensively. It adds the ability to replay (i.e. "retrace") the traced activity, portably, between different userspace environments. The project was inspired by another tool, apitrace, which provides the same tracing and retracing functionality for certain graphics APIs. Although the ability to trace V4L2 stateless decoding is interesting in itself, replaying the trace is additionally helpful for:

reliably reproducing bugs;
uniform testing of stateless drivers; and
testing specific driver behaviors through precise error injection.

Each of the two main functions, tracing and retracing, are explained below.

Tracing

To create a trace, the v4l2-tracer preloads a small, custom library that intercepts specific system calls made by userspace applications when decoding.

Along with basic calls that open and close devices and map memory, the v4l2-tracer primarily traces ioctls used in the V4L2 Memory-to-Memory Stateless Video Decoder Interface. Tracing these ioctls can be complex because V4L2 stateless decoding depends on a userspace application, like GStreamer, to provide crucial decoding metadata to the driver through ioctl arguments.

In V4L2 stateless decoding, userspace must parse the encoded bitstream to extract the information needed for the decoding of every frame, and then pass it to the decoder. For example, the stateless decoder does not know if any particular frame can be decoded just by itself or if it needs information from neighboring frames. If other frames are needed, the decoder doesn't know which ones. Userspace provides this crucial "state" information on a frame-by-frame basis through an ioctl with arguments that set the stateless codec controls. Subsequent ioctls will associate these controls with a unique request using the Request API. The request connects the encoded video data to its control information. The v4l2-tracer traces all of this state information and writes it to a JSON-formatted trace file.

In addition to system calls, the v4l2-tracer also traces the encoded video data passed to the kernel driver through OUTPUT buffers. The decoded video data, returned on CAPTURE buffers, is not traced by default because it is not needed for the retrace function and it significantly increases the trace file's size. Optionally, there are flags to turn on the tracing of the decoded video data or to write the video data to a separate .yuv file which can provide a good sanity check for the decoding.

Here is an example of a command to trace the stateless decoding of a VP8 compressed file:

v4l2-tracer trace gst-launch-1.0 -- filesrc location=test-25fps.vp8 ! parsebin ! v4l2slvp8dec ! videocodectestsink

It will produce a time-stamped trace file such as:

90608_trace.json

In this example, the userspace application is a GStreamer pipeline with the stateless decoding element v4l2slvp8dec. Of course this will only work on a machine with a stateless VP8 hardware decoder and the right kernel driver. For my internship, I used a Rockpi 4B which has a Rockchip RK3399 SoC and the Hantro VPU driver. If you want to test the v4l2-tracer without the hardware, one option is to try the virtual stateless decoder driver that is currently under development. This test-driver will not actually decode any data, but it will accept a GStreamer pipeline and return a test pattern along with debug information on the CAPTURE buffer. Another option is to use the existing virtual codec driver, vicodec, which can emulate a stateless hardware codec for the patent-free FWHT (Fast Walsh-Hadamard Transform).

Retracing

The second main function of the v4l2-tracer is retracing. The JSON-formatted trace file that is the output of the trace function becomes the input for the retracing function. Here is the simplest retrace command:

v4l2-tracer retrace 90608_trace.json

It will produce a new retrace file that can be compared with the original trace file.

90608_trace_retrace.json

The newly generated retrace file should be nearly identical to the original trace file except for changes to the video and media devices, file descriptors, and memory addresses.

When retracing, the v4l2-tracer reads the trace file and mimics the original userspace application that was traced. The v4l2-tracer makes all the same system calls and writes the same encoded video data to the OUTPUT buffers in exactly the same order and with exactly the same parameters as in the original trace file. The retracing function runs independently from the original userspace application that was traced.

A trace file generated on one machine can be retraced on another machine as long as a stateless hardware decoder and its V4L2 driver are available. Since the /dev/media and /dev/video device numbers will usually change between different machines, the v4l2-tracer will attempt to match the driver from the trace file with the device numbers available in the retrace environment. Alternatively, to use a different driver, the user can set specific video and/or media device nodes for the retracer to use. For example, to retrace on /dev/video6 and /dev/media3 the command is:

v4l2-tracer -d6 -m3 retrace 90608_trace.json

What's Next

The v4l2-tracer has lots of room to grow. So far the v4l2-tracer fully supports the tracing of MPEG2, VP8, H.264, and FWHT compression formats. The stateless controls for VP9 and HEVC formats will also be traced and retraced, since they are part of the V4L2 uAPI, but more work is needed to write the decoded video data to yuv files. The v4l2-tracer could also be adapted to trace stateless encoding in addition to decoding. Eventually, the v4l2-tracer could be used in more automated testing of stateless V4L2 drivers, for example, by randomly editing the trace files to inject errors for fuzz testing.

Wrapping Up

This internship project challenged me daily to learn, solve problems, and build new skills. It was the first time I had developed on a single-board computer and cross-compiled Linux for the ARM architecture needed by the board. I was introduced to GStreamer pipelines and how to build and configure them to run on the development board. Although I knew, theoretically, what a stateless decoder did, I didn't really understand what I was dealing with until I started to trace the hundreds of parameters parsed in userspace, watched the OUTPUT and CAPTURE buffer queues in action, and then received decoded video frames back, out-of-order, with extraneous padding skewing their display.

I cannot thank enough my mentors at Collabora, Daniel Almeida, Nícolas F. R. A. Prado, and Nicolas Dufresne, for proposing this project and their daily guidance and support. I am also deeply appreciative of advice we received from Linux media subsystem co-maintainer Hans Verkuil, and from the stateless codec developers at Collabora, along with the open-source community. This internship has enhanced not only my skills but also my confidence and commitment to open-source development, and I hope to be contributing for many more years ahead.

Adding VP9 and MPEG2 stateless support in v4l2codecs for GStreamer

HEVC uAPI mainlined

Mainline Linux gains accelerated video decoding for Microchip's SAMA5D4

Adding VP9 and MPEG2 stateless support in v4l2codecs for GStreamer

HEVC uAPI mainlined

Mainline Linux gains accelerated video decoding for Microchip's SAMA5D4

Comments (0)

Add a Comment

Search the newsroom

Latest Blog Posts

PipeWire workshop 2025: Updates on video transport, Rust efforts, TSN networking, and Bluetooth support

03/07/2025

As part of the activities Embedded Recipes in Nice, France, Collabora hosted a PipeWire workshop/hackfest, an opportunity for attendees…

Coccinelle for Rust progress report

25/06/2025

In collaboration with Inria, the French Institute for Research in Computer Science and Automation, Tathagata Roy shares the progress made…

Linux Media Summit 2025 recap

23/06/2025

Last month in Nice, active media developers came together for the annual Linux Media Summit to exchange insights and tackle ongoing challenges…

Constructor acquires, destructor releases

09/06/2025

In this final article based on Matt Godbolt's talk on making APIs easy to use and hard to misuse, I will discuss locking, an area where…

What if C++ had decades to learn?

21/05/2025

In this second article of a three-part series, I look at how Matt Godbolt uses modern C++ features to try to protect against misusing an…

Unleashing gst-python-ml: Python-powered ML analytics for GStreamer pipelines

12/05/2025

Powerful video analytics pipelines are easy to make when you're well-equipped. Combining GStreamer and Machine Learning frameworks are the…

About Collabora

Whether writing a line of code or shaping a longer-term strategic software development plan, we'll help you navigate the ever-evolving world of Open Source.

한국의 국기 한국어 버전의 Collabora.com 보기