We're hiring!
*

Adding VP9 and MPEG2 stateless support in v4l2codecs for GStreamer

Daniel Almeida avatar

Daniel Almeida
June 23, 2021

Share this post:

Reading time:

Earlier this year, from January to April 2021, I worked on adding support for stateless decoders for GStreamer as part of a multimedia internship at Collabora. The following is a recap about the completed work.

Stateless decoders, you say?

Before talking about stateless decoders, one must understand what stateful decoders are first. Here's a definition, sourced from the Linux kernel documentation:

A stateful video decoder takes complete chunks of the bytestream (e.g. Annex-B H.264/HEVC stream, raw VP8/9 stream) and decodes them into raw video frames in display order. The decoder is expected not to require any additional information from the client to process these buffers.

Stateful decoders do the processing of video data into actual frames directly, without requiring extra metadata to do so. A stateful decoder IP can parse the bitstream and extract the required data on its own. It will keep track of decoder-specific information by itself, producing video frames in the right display order.

These are the kind of things that make video decoding work for a particular codec, such as the current set of reference frames, the current position in the bitstream, whether any special decoding mode is enabled for the current frame and its current length, and also compression-specific metadata, if any.

This means that a client program can be much more lightweight, knowing that most of the responsibilities have been shifted onto another part of the decoding pipeline. It also means increasingly complex decoder hardware in what amounts to a black box running its own, mostly proprietary, firmware.

The current trend in the industry favors a different approach, centered around stateless codecs. In contrast with its stateful counterparts, a stateless codec will shift much of the bookkeeping onto userspace. The application becomes responsible for programming the chip with any metadata it may need in order to decode the bitstream and the resulting frames may not necessarily be in display order. This approach makes for simpler hardware at the cost of more complex userspace programs. Since userspace cannot directly program the underlying hardware, this drives the need for increasingly refined kernel APIs to abstract the ever-increasing amount of multimedia accelerators available on the market.

Birth of the Request API

Stateless codecs need to be programmed with the necessary metadata at every frame in order to work, a use case that wasn't the focus of previous APIs in the Linux media subsystem. The control API was initially designed to set hardware parameters such as brightness, saturation and gain. Its evolution, the extended control API was designed to allow the implementation of more complex driver APIs for standards such as MPEG but was inflexible in the sense that the values would persist until explicitly changed. It was also impossible to link a particular set of controls with their corresponding bitstream and picture buffers, meaning that a given set of metadata could be associated with a totally different frame than it was originally intended for: a recipe for disaster!

Thankfully, the new Request API was designed from the ground up to support modern devices. It provided a way to associate bitstream buffers, picture buffers and controls together under a request object, such that applying per-frame metadata became possible. Requests themselves would now be queued, dequeued and recycled as necessary, providing userspace applications with a rich set of tools to program the underlying video decoding unit as it saw fit through a uAPI.

v4l2codecs

With the new APIs in place, it was only a matter of time for the emergence of new userspace implementations.

v4l2codecs is a GStreamer plugin written from the ground up by Nicolas Dufresne, GStreamer maintainer and Principal Multimedia developer at Collabora. It targets the new Request API, effectively adding GStreamer support for stateless codecs under Linux, ushering the platform into a new age for multimedia hardware.

v4l2codecs consists of classes to abstract key kernel ioctls for dealing request and buffer management, wrapping them into GStreamer objects that can be used by GStreamer decoders when processing a given bitstream. Here is a rundown of some of the features:

  • Interface with the Media Controller API.
  • Enumerate and set V4L2 formats, converting between their GStreamer counterparts as needed.
  • Facilitate the process of negotiation at the decoder element.
  • Allocate and recycle requests and video buffers.
  • Wrap allocated buffers under GStreamer BufferPools for efficient memory management.
  • Set controls.
  • Queue and dequeue bitstream and picture buffers.
  • Queue and dequeue requests.

For all practical purposes, v4l2codecs is a framework which allows other developers to quickly add support for new stateless codecs under Linux, leaving codec specific details such as bridging the gap between parsed values and their uAPI counterpart to a codec specific implementation.

Before I started the internship, two working codec implementations had been in place: the all important H.264 and VP8.

Internship, you say?

I started at Collabora on 18th January 2021 as a Consultant Software Engineer intern to work on adding support for VP9 and MPEG2 to v4l2codecs and the related Kernel side. Another goal was to improve conformance testing for VP9.

I had some previous experience with codecs from my previous work on vidtv, a kernel module I had written as a mentee under a Linux Foundation program geared towards the Linux kernel. It was during its development that I first came across a bitstream specification - MPEG-TS, anyone? - and I must say, I quite liked it. In fact, I was mesmerized to see the amount of careful work that went into the details of how my television worked and that I selfishly took for granted while watching it.

Coming into Collabora to do VP9 work seemed like a natural progression and frankly, I felt right at home.

The first thing that needed to be done was, of course, adding a corresponding class to deal with the particularities of VP9, relaying the calls to the rest of the v4l2codecs framework. That's GstV4l2CodecsVp9Dec.

The way these work is by subclassing a base decoder class, say GstVp9Decoder, which itself is a subclass of GstVideoDecoder. This base class will use a parser to extract data into a picture object, GstVp9Picture and relay a few calls to our class. These are:

  • (a) GStreamer lifecycle related: start, stop, set_format, finish, flush, drain
  • (b) decoding process related: {new | start | decode | end | output}_picture

During the course of (b) we get the chance to negotiate, extract the relevant bits from GstVp9Picture to fill our V4L2 controls and finally enqueue our buffers and the request object. We then poll on the request to retrieve a freshly decoded frame, at which point we pass the picture buffer downstream and the process repeats.

Roadblocks makes us stronger, apparently

Naturally, even small projects come with their own set of obstacles.

To begin with, VP9 uses arithmetic encoding. This is a newer form of data compression, an evolution on top of the revered Huffman coding algorithm used quite extensively on video compression technology.

Arithmetic encoding basically attempts to reduce the amount of bits used to code symbols if they repeat often in the bitstream. The VP9 specification goes as far as to say that some symbols can be encoded using a fraction of a bit. Naturally new technologies do not come for free and for arithmetic encoding the price one must pay is keeping track of the probability of any given symbol occuring in the bitstream, updating them accordingly as frames are decoded.

The VP9 specification provides two mechanisms to do so:

  1. The probabilities can be explicitly changed in the frame headers, in which case they are themselves compressed in order to save space.
  2. The decoder keeps track of how often a given syntax element is decoded and can be told to automatically adjust the probabilities at the end of the frame to match the observed frequencies.

The problem, in this case, was that the current VP9 parser implementation in GStreamer did not parse (1), so support for that had to be added as well.

Although relatively straightforward given a well written specification, writing or augmenting bitstream parsers can quickly become nightmarish to debug when a single mistake sneaks in somewhere. Often there will be absolutely no indication on where the error originated other than the checks at the end that informs us whether we parsed the whole frame perfectly or not. That meant diffing the results with a known working implementation in hopes I'd find the point where the failure originated. Yelp, not fun at all...

A second major issue was uncovered by a colleague mid-way through development, and it had to do with the kernel uAPI we were trying to upstream.

Turns out the update method described in (2) was actually implemented by means of a bi-directional control. Our userspace application had to actually read the symbol counts provided by the hardware in order to update the probabilities to be used on the current frame, which actually created a dependency: request n depended on request n-1.

Such dependency was immediately at odds with the general workflow behind the Request API: one in which users would query multiple requests and have them be processed at a time convenient for the system. This discovery prompted the rework of the VP9 kernel uAPI in order to remove the bi-directionality and thus make the requests completely independent of one another. Since the destaging of the VP9 kernel uAPI is being undertaken by Collabora itself, in a objective way, our GStreamer implementation actually helped us to validate its design. This is why having a GStreamer implementation as part of a multimedia project is so paramount: it helps tremendously with validation.

MPEG2: a walk in the park!

After hustling a bit with the VP9 decoder, I was very pleased to see that getting a MPEG2 decoder off the ground was very straightforward. I think this is a testament on how easy it is to add new codecs under v4l2codecs under normal conditions. This should mean the framework can house codecs for many other formats in the future, benefitting the Linux ecosystem in general.

I was very happy to see that this work prompted yet another Collaboran, Ezequiel Garcia, to move forward with the destaging of the MPEG2 uAPI as we were now more confident since it worked just fine with our userspace implementation!

Improving throughput in v4l2codecs

Once the MPEG2 work was completed I then looked to pursue improvements on the v4l2codecs codebase itself. Since it is a new plugin, there are plenty of opportunities to add new features that can translate into more performance or just a better overall experience. A pretty straightforward way to improve throughput without changing much of the plugin itself is by supporting so-called "render delays", which is an artificial delay introduced between parsing and decoding.

Render delays are basically a way to ensure a decoder will not go idle. The main idea is enqueuing a few requests before asking the hardware to start the decoding process, thereby creating a surplus of requests to be processed by the driver at all times. This increases throughput for transcoding and was implemented on the codecs in v4l2codecs, meaning users can transparently enjoy enhanced performance on VP8, VP9 and MPEG2 right out of the box, whereas support for this functionality was already available on H.264 from the start.

Conformance testing

As the end of my internship drew near, I started to focus on improving conformance testing for our newly-minted VP9 decoder.

Here's the thing about these kinds of tests: they are very useful for regression testing, but they are also useful on their own, because in order to know whether your decoder does the right thing you actually need some verification at some point. Also, you can't go anywhere these days without a strategy for regression testing and for video codecs this means automatically comparing your results with a canonical reference by means of a test suite.

Owing to the work of another Collaboran - Andrzej Pietrasiewicz - support for VP9 was added to Fluster, a testing framework written in Python for decoders conformance. This means our implementation is tested by an automated testing framework using the official test vectors for VP9 and results are on par with the competition, if not slightly higher.

Wrapping up

This project was fantastic to work on as an intern. A true testament to Collabora's very well structured internship program even in these new, uncertain times. This was a very comprehensive introduction to the world of multimedia with GStreamer, alongside engineers with more than a decade of experience on the matter ready to jump in and help at a moment's notice.

For more information on Collabora's internships, keep an eye on our careers page!

Checking out the code

You can check out the code written during this internship by having a look at the merge requests below!

Comments (1)

  1. Salvador:
    Aug 07, 2021 at 10:33 AM

    Please a need to test this. What OS I need, what kernel, give me some hints. I use armbian focal on mainline right now. Give me a hint on how get this to work. So far I have zero vpu on mainline.

    Reply to this comment

    Reply to this comment


Add a Comment






Allowed tags: <b><i><br>Add a new comment:


Search the newsroom

Latest Blog Posts

Automatic regression handling and reporting for the Linux Kernel

14/03/2024

In continuation with our series about Kernel Integration we'll go into more detail about how regression detection, processing, and tracking…

Almost a fully open-source boot chain for Rockchip's RK3588!

21/02/2024

Now included in our Debian images & available via our GitLab, you can build a complete, working BL31 (Boot Loader stage 3.1), and replace…

What's the latest with WirePlumber?

19/02/2024

Back in 2022, after a series of issues were found in its design, I made the call to rework some of WirePlumber's fundamentals in order to…

DRM-CI: A GitLab-CI pipeline for Linux kernel testing

08/02/2024

Continuing our Kernel Integration series, we're excited to introduce DRM-CI, a groundbreaking solution that enables developers to test their…

Persian Rug, Part 4 - The limitations of proxies

23/01/2024

This is the fourth and final part in a series on persian-rug, a Rust crate for interconnected objects. We've touched on the two big limitations:…

How to share code between Vulkan and Gallium

16/01/2024

One of the key high-level challenges of building Mesa drivers these days is figuring out how to best share code between a Vulkan driver…

Open Since 2005 logo

We use cookies on this website to ensure that you get the best experience. By continuing to use this website you are consenting to the use of these cookies. To find out more please follow this link.

Collabora Ltd © 2005-2024. All rights reserved. Privacy Notice. Sitemap.