We're hiring!

Adding stateless support to vicodec

Dafna Hirschfeld avatar

Dafna Hirschfeld
October 09, 2019

Share this post:

Reading time:

Prior to joining Collabora, I took part in Round 17 of the Outreachy internships, which ran from December 2018 to March 2019. Outreachy is a paid, remote internship program. Its goal is to support people from groups underrepresented in tech, and help newcomers to free software and open source make their first contributions. Open to applicants around the world, Outreachy internships run twice a year.

Once your application is approved, you must pick an open source project to make a contribution to, in hopes of being selected as an intern, and teamed with experienced mentors. You can read more about the program here.

In my case, I was selected as an intern to work on the media subsystem of the Linux kernel, and my mentors were Helen Koike, (who is now my colleague at Collabora!) and Hans Verkuil (who works for Cisco and has been working on the media subsystem for around 15 years).

Virtual Drivers in the Media Subsystem

In the media subsystem there are few drivers that are 'virtual' in the sense that they do not interact with any specific hardware, but they are implemented only in software. The main purpose of those drivers is to be used to test user space applications. Since no specific hardware or architecture is needed, userspace applications can always interact with those drivers and rely on them for running their own tests. Therefore, it is important that those drivers implement the APIs accurately and have a wide support.

The virtual drivers are: vivid, vim2m, vimc and vicodec. During my Outreachy internship I worked on the vicodec driver.

The vicodec driver and the FWHT format

vicodec, which stands for Virtual Codec, is a driver that implements a codec format based on the Fast Walsh-Hadamard Transform. FWHT was designed to be fast and simple, and to have characteristics of other video codecs so that it faces the same issues. Applications can interact with vicodec and compress videos to this format and decompress them. You can read more details about FWHT format here.

The request API

A common problem that arises in decoding is that in many cases, sequential frames have different properties such as dimensions, pixel format and so on. With the traditional codecs API, called stateful codecs, the properties are configured before the decoding/encoding streaming starts. So when a frame has a property which is different from the configuration, the decoding stream should stop, reconfigure and then start again - this sequence is called 'Dynamic resolution change'. This causes a lag and is impractical if the frames' properties change too often.

To address this issue, a new API called Request API was recently introduced. The idea is that each frame is part of a 'request', which is basically a list of elements that are clustered together. The application first composes the request and then it asks the kernel to process it. In the context of stateless codecs, a request is a combination of the frame buffers and a list of properties that can also include pointers to reference frames. Each frame is processed separately without the need to stop and restart the decoding/encoding stream.

The vicodec stateless and testing it with vl4-utils

During my Outreachy internship, I added stateless implementation to vicodec. Applications can now interact with the vicodec driver either with the stateful or the stateless API. In the stateless API, the userspace application has to do the 'hard work' - parse the frames' headers and keep track of the reference frames and their order. The v4l-utils package is a good place for code examples of how to use the various media APIs. I used it during my internship and added support for stateless vicodec.

Here are some hands-on of how to use it.

(tl;dr: the final script to test the stateless decoder is here).

The vicodec driver currently only implements a stateless decoder and a stateful encoder and decoder. The driver exposes three device nodes, /dev/videoX, one for each supported implementations.

The way to test vicodec is to first run the encoder to generate encoded fwht format files and then run the decoder on those files.

For that we should first prepare a decoded video format. During my internship I used a video from jell.yfish.us (screenshot below). You can read more about it in my internship blog. I wrote a script that generates decoded formats from that video with various pixel formats.

jell.yfish.us - Screenshot of video showing jellyfish moving in the water

Here are commands to download the video and generate a few videos of dimensions 700x1000 of decoded formats in a directory images:

wget http://jell.yfish.us/media/jellyfish-10-mbps-hd-h264.mkv
mkdir images
ffmpeg -i jellyfish-10-mbps-hd-h264.mkv -c:v rawvideo -pix_fmt yuv420p -f rawvideo  images/jelly-1920-1080.YU12 -loglevel quiet
ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt yuyv422 -f rawvideo images/jelly-700-1000.YUYV -loglevel quiet
ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt yuv422p -f rawvideo images/jelly-700-1000.422P -loglevel quiet
ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt gray -f rawvideo images/jelly-700-1000.GREY -loglevel quiet
ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt yuv420p -f rawvideo images/jelly-700-1000.YU12 -loglevel quiet
ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt nv12 -f rawvideo images/jelly-700-1000.NV12 -loglevel quiet
ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt rgb24 -f rawvideo images/jelly-700-1000.RGB3 -loglevel quiet

In order to play the files, you can run for example:

ffplay -loglevel warning -v info -f rawvideo -pixel_format yuv422p -video_size "700x1000"  images/jelly-700-1000.422P

Now we can use the files in images to first test the vicodec encoder. Here is a command example:

v4l2-ctl -d0 --set-selection-output target=crop,width=700,height=1000 -x width=700,height=1000,pixelformat=422P --stream-mmap --stream-out-mmap --stream-to jelly_700-1000-422P.fwht --stream-from images/jelly-700-1000.422P

This will generate a 'fwht' compressed file called jelly_700-1000-422P.fwht from the decoded file images/jelly-700-1000.422P. The parameter -d0 indicates the use of /dev/video0 for that.

Then to test the decoder, you can generate back a decoded format from the fwht file. This can be done with either the stateful decoder exposed in my case to /dev/video1 or the statless decoder exposed to /dev/video2.

v4l2-ctl -d1 -x width=700,height=1000 -v width=700,height=1000,pixelformat=422P --stream-mmap --stream-out-mmap --stream-from jelly_700-1000-422P.fwht --stream-to out-700-1000.422P

Running the above command with -d2 instead of -d1 will use the stateless decoder and not the stateful decoder.

Now we want to test the stateless decoder on more interesting video. For that we will take two videos with different dimensions and we will merge them together such that each frame has a different dimension from the previous one.

I wrote a utility for that. Compile it with gcc merge_fwht_frames.c -o merge_fwht_frames then running without params shows you how to use it:

dafna@ubuntu:~/jelly$ ./merge_fwht_frames 
usage: ./merge_fwht_frames     

The utility gets the two files to merge along with two arguments containing the highest value for each dimension (height and width). So the following set of commands will generate a fwht file called merged-dim.fwht that is composed of two merged videos, one of dimensions 700x1000 and one of dimensions 800x900:

ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt yuv422p -f rawvideo images/jelly-700-1000.422P -loglevel quiet
ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=800:900:0:0 -pix_fmt yuv422p -f rawvideo images/jelly-800-900.422P -loglevel quiet

v4l2-ctl -d0 --set-selection-output target=crop,width=700,height=1000 -x width=700,height=1000,pixelformat=422P --stream-mmap --stream-out-mmap --stream-to jelly_700-1000-422P.fwht --stream-from images/jelly-700-1000.422P
v4l2-ctl -d0 --set-selection-output target=crop,width=800,height=900 -x width=800,height=900,pixelformat=422P --stream-mmap --stream-out-mmap --stream-to jelly_800-900-422P.fwht --stream-from images/jelly-800-900.422P
./merge_fwht_frames jelly_700-1000-422P.fwht jelly_800-900-422P.fwht merged-dim.fwht 800 1000

And now decoding merged-dim.fwht with the stateless decoder:

v4l2-ctl -d2 -x width=800,height=1000 -v width=800,height=1000,pixelformat=422P --stream-mmap --stream-out-mmap --stream-from merged-dim.fwht --stream-to out-800-1000.422P

Now we have a decoded video file out-800-1000.422P which is composed of frames with interchanging dimensions. This is a 422P format in which each pixel is encoded with 2 bytes so that the first frame size is 2*700*1000=1400000 bytes, the second is 2*800*900=1440000 bytes, the third is again 1400000 bytes and so on.

The following script will separate the file out-800-1000.422P into two files:

size=$(stat --printf="%s" out-800-1000.422P)

frm1_sz=$((700 * 1000 * 2))
ex_size1=$(($frm1_sz * 450))
frm2_sz=$((800 * 900 * 2))
ex_size2=$(($frm2_sz * 450))

if [ $(($ex_size1 + $ex_size2)) != $size ]; then

        echo "expected size = $ex_size"
        echo "actual   size = $size"
        exit 1

double_frame=$(($frm1_sz + $frm2_sz))

        dd if=out-800-1000.422P obs=$double_frame ibs=$double_frame skip=$i count=1 >> tmp
        head -c $frm1_sz tmp >> out-mrg-700-1000.422P
        tail -c $frm2_sz tmp >> out-mrg-800-900.422P
        rm tmp
        i=$(($i + 1))

And now play the two files:

ffplay -loglevel warning -v info -f rawvideo -pixel_format "yuv422p" -video_size "700x1000"  out-mrg-700-1000.422P
ffplay -loglevel warning -v info -f rawvideo -pixel_format "yuv422p" -video_size "800x900"  out-mrg-800-900.422P

You will see glitches on the edges of the videos. Like constant color squares on the bottom of the out-mrg-700-1000.422P video and on the left of the out-mrg-800-900.422P video.

This is because the fwht format uses the previous frame as a reference frame and since the dimensions of the previous frame don't match the current ones, there are those squares where frames don't overlap.

The vicodec uses the video_gop_size control which controls the periods of I-frames (I-frames are frames that do not need reference to other frames in order to be decoded).

If you run v4l2-ctl -d0 --list-ctrls you will see:

dafna@ubuntu:~/outreachy$ v4l2-ctl -d0 --list-ctrls

User Controls

   min_number_of_output_buffers 0x00980928 (int)    : min=1 max=1 step=1 default=1 value=1 flags=read-only

Codec Controls

          video_gop_size 0x009909cb (int)    : min=1 max=16 step=1 default=10 value=10
          fwht_i_frame_qp_value 0x00990a22 (int)    : min=1 max=31 step=1 default=20 value=20
          fwht_p_frame_qp_value 0x00990a23 (int)    : min=1 max=31 step=1 default=20 value=20

So the default value for video_gop_size is 10 which means that there is an I-frame every 10 frames. If we set this values to 1 then each frame will be an I-frame and so we will not have those artifacts. We do this by adding the --set-ctrl video_gop_size=1 option to the decoding and encoding commands.

The final script can be found here.


Comments (0)

Add a Comment

Allowed tags: <b><i><br>Add a new comment:

Search the newsroom

Latest Blog Posts

The latest on cmtp-responder, a permissively-licensed MTP responder implementation


Part 3 of the cmtp-responder series with a focus on USB gadgets explores several new elements including a unified build environment with…

A roadmap for VirtIO Video on ChromeOS: part 3


The final installment of a series explaining how Collabora is helping shape the video virtualization story for Chromebooks with a focus…

Hacking on the PipeWire GStreamer elements


Last week I attended the GStreamer spring hackfest in Thessaloniki to work on the PipeWire GStreamer elements and connect with the community.

Transforming speech technology with WhisperLive


The world of AI has made leaps and bounds from what It once was, but there are still some adjustments required for the optimal outcome.…

Re-converging control flow on NVIDIA GPUs - What went wrong, and how we fixed it


While I managed to land support for two extensions, implementing control flow re-convergence in NVK did not go as planned. This is the story…

Automatic regression handling and reporting for the Linux Kernel


In continuation with our series about Kernel Integration we'll go into more detail about how regression detection, processing, and tracking…

Open Since 2005 logo

Our website only uses a strictly necessary session cookie provided by our CMS system. To find out more please follow this link.

Collabora Ltd © 2005-2024. All rights reserved. Privacy Notice. Sitemap.