October 02, 2023
This text is the second part of a series that will explain how Collabora is helping shape the video virtualization story for Chromebooks in general. Readers for some background, check out part one, which discusses the importance of hardware acceleration for video codecs and describes VirtIO Video and its usage within CrosVM and ultimately, ChromeOS.
This second installment explores the Rust libraries Collabora developed to decode video and how these libraries are used within ARCVM to eventually remove CrosVM's dependency on the Chrome codec stack. It then addresses why we chose not to use VirtIO GPU for virtualization on this project.
Finally, it sets the stage for our third and last installment, which will discuss "V4L2 on VirtIO" - a new tentative protocol set to potentially replace VirtIO Video - as well as our plans for cros-codecs and cros-libva in general.
Let's first start by discussing stateless and stateful codec APIs and how they relate to the software we have written.
Before diving into cros-codecs, it is important to acknowledge the two main types of APIs used to interface with codec devices and to clarify what is meant by ‘state’ in this context. In particular, we will be referring to a decoder device during this discussion for the sake of simplicity.
For a decoder device to operate, it must keep track of data as it progresses through the media being decoded. Exactly what is referred to here by ‘data’ is mandated by the specification for the codec in use, but it usually comprises of the set of reference frames in use, as well as other metadata extracted during the course of the decode of previous frames.
It is important to realize that said metadata is contained within the bitstream itself, in the form of frame headers and other bitstream structures kept separate from the compressed slice or tile data. It is the job of a parser to extract said data before the decoding of the actual frame can start. Once that metadata is available, a new component can store it and act upon it to direct the decoding process as necessary.
Now, where exactly that state is kept is a matter of hardware design. Designs that act like a black box of sorts, ingesting raw bitstream or YUV values, while keeping track of any required state within the driver/firmware layer are said to be stateful devices. Whereas devices that act as a clean slate and require that any metadata be fed on each frame are said to be stateless.
This API distinction is important because it mandates very different userspace implementations to actually drive the device at hand.
With stateful devices, userspace components are greatly simplified at the cost of more complex hardware, as can be seen in the picture below. Note that the device returns the frames in display order, further simplifying the userspace code.
On the other hand, with stateless devices the hardware is simpler at the cost of more complex userspace code, as can be seen in the picture below. The device returns the frames in decode order, imposing a further reordering step on userspace, and it must be fed metadata on every frame in order to work at all.
The distinction between stateless and stateful devices is in turn exposed by the APIs used to interface with them, such as VA-API (stateless), or V4L2 (stateful, stateless), among others. In particular, this means that any project using a stateless interface to drive a video codec device must expose a slew of different components to implement the arithmetic decoding, parsing, and state tracking necessary to properly work with the API.
This sets the stage as to what exactly is contained within cros-codecs.
Creating a new hardware-accelerated video stack is no easy feat, but it is well within Collabora’s domain of expertise, owing to our extensive experience with the multimedia stack on Linux, and to our continuous work on stateless decoders in the past years.
Cros-codecs is a project written in Rust from the ground up for the purpose of providing safe access to codec devices. It contains parsers, state trackers, and different backends that it can use to submit work to the actual hardware for the major codec standards out there like VP8, VP9, H.264, and H.265. Support for AV1 is also planned.
It is slated to be used in production by CrosVM in the near future. In turn, this means deprecating libvda, and, with it, CrosVM’s dependency on Chrome. More importantly though, this library has no dependency on ChromeOS and can be used in other contexts. A great achievement for the Rust ecosystem in particular and for the open source community in general!
For Google, specifically, this means that cros-codecs can be reused internally in other video-related projects as it grows. With an ambitious scope, cros-codecs is still heavily a work in progress. Namely, there is only support for VA-API on the decode side, and encoding is not supported yet. As such, there is a roadmap in place to bring along some missing features, including:
Collabora remains committed in its quest to improve cros-codecs together with the ChromeOS engineers, noting the direct impact this brings to the ChromeOS ecosystem. We are also working on adjacent codebases to make this a reality, one of which happens to be yet a new project written from the ground up in Rust: cros-libva.
As previously mentioned, cros-codecs must submit work to other libraries in the system in order to interface with hardware. And it should preferably do so in safe Rust to the greatest extent possible.
As such, some work was needed to bridge the gap between libva - written in C and the only supported backend in cros-codecs as of now - and the rest of the safe Rust codebase.
The strategy of wrapping unsafe C APIs in safe Rust code is well known in the Rust community, and it ensures safety to upper layers by checking that the requirements hold before calling into C. This approach was adopted by us in what became yet another standalone project.
Cros-libva, therefore, safely wraps libva into a safe Rust API that can be consumed by cros-codecs. This is the final step of the Rust journey before work can actually be submitted to hardware for processing: after that, it’s up to the VA-API driver in the system to process the work in accordance with the codec standard in use. The picture below summarizes the different blocks involved in decoding video with our stack.
Note that, once the frames are delivered to CrosVM, it still has to interface with the VirtIO Video driver in the system to make it available to the guest OS, and lastly to the guest userspace guest application using the VirtIO Video driver, as per the picture below:
In tandem with our work on cros-libva, Collabora has also contributed initial stateless decoder support to v4l2r.
V4l2r is a library written in Rust to interface with V4L2 drivers in safe code. In this context, this project is similar to cros-libva, in that it provides a safe Rust API that cros-codecs can consume in order to submit work packages to the hardware.
This deserves our attention, as the codec hardware in some Chromebooks is simply not within the GPU, as expected by VA-API.
To this end, Andrzej Pietrasiewicz has been working on adding stateless support to v4l2r for other codecs, which should enable him to eventually write a V4L2 stateless backend in cros-codecs proper, essentially enabling support for an array of different Chromebooks out there.
Improving v4l2r is also important as it grows in relevance in the ecosystem. There were talks at the Linux Media Summit 2023 (a congregation of V4L2 developers held once a year) about userspace Rust V4L2 bindings and v4l2r was considered as a possible official solution. There are also talks about using v4l2r in the quest to improve the testing and CI landscape for V4L2 in general.
One of the very first questions that arise when presenting this work relates to the ability to virtualize video codec acceleration through the existing VirtIO GPU stack in order to communicate with the video codec IPs within the GPU chips out there. Naturally this path has been taken by other companies. and it has proven successful, with patches sent to Mesa already.
The main issue, as already pointed out, though, is that this completely overlooks the great number of Chromebook devices where the codec IP is not within the GPU. If we look into this beyond Google’s use cases, this also leaves out plenty of devices in the embedded world that could potentially benefit from this work.
Not only that, choosing the V4L2 Stateful API as the front-end for the VirtIO Video kernel driver means relying on a mature and proven interface. Its ‘black box’ approach, as discussed in the section “On the nature of stateless and stateful codecs” is also advantageous to hide the huge virtualization machinery that underpins this technology from the guest userspace application, which is simply unaware that any virtualization is taking place.
This concludes the discussion about the current status of cros-codecs and cros-libva. Stay tuned for the next installment, which will discuss "V4L2 on VirtIO" - a new tentative protocol set to potentially replace VirtIO Video - as well as our future plans for cros-codecs and cros-libva in general.
We can now confidently say that PipeWire is here to stay. But of course it is not the end of the journey. There are many new areas to explore…
Our look at the Rust crate for interconnected objects continues, as we examine how persian-rug really does tie the room together by providing…
The testing ecosystem in the Linux kernel has been steadily growing, but are efforts sufficiently coordinated? How can we help developers…
With the upcoming 0.5 release, WirePlumber's Lua scripts will be transformed with the new Event Dispatcher. More modular and extensible…
This second installment explores the Rust libraries Collabora developed to decode video and how these libraries are used within ARCVM to…
Why is creating object graphs hard in Rust? In part 1, we looked at a basic pattern, where two types of objects refer to one another. In…