Adding stateless support to vicodec

Adding stateless support to vicodec

Dafna Hirschfeld
October 09, 2019

Share this post:

Reading time:

Prior to joining Collabora, I took part in Round 17 of the Outreachy internships, which ran from December 2018 to March 2019. Outreachy is a paid, remote internship program. Its goal is to support people from groups underrepresented in tech, and help newcomers to free software and open source make their first contributions. Open to applicants around the world, Outreachy internships run twice a year.

Once your application is approved, you must pick an open source project to make a contribution to, in hopes of being selected as an intern, and teamed with experienced mentors. You can read more about the program here.

In my case, I was selected as an intern to work on the media subsystem of the Linux kernel, and my mentors were Helen Koike, (who is now my colleague at Collabora!) and Hans Verkuil (who works for Cisco and has been working on the media subsystem for around 15 years).

Virtual Drivers in the Media Subsystem

In the media subsystem there are few drivers that are 'virtual' in the sense that they do not interact with any specific hardware, but they are implemented only in software. The main purpose of those drivers is to be used to test user space applications. Since no specific hardware or architecture is needed, userspace applications can always interact with those drivers and rely on them for running their own tests. Therefore, it is important that those drivers implement the APIs accurately and have a wide support.

The virtual drivers are: vivid, vim2m, vimc and vicodec. During my Outreachy internship I worked on the vicodec driver.

The vicodec driver and the FWHT format

vicodec, which stands for Virtual Codec, is a driver that implements a codec format based on the Fast Walsh-Hadamard Transform. FWHT was designed to be fast and simple, and to have characteristics of other video codecs so that it faces the same issues. Applications can interact with vicodec and compress videos to this format and decompress them. You can read more details about FWHT format here.

The request API

A common problem that arises in decoding is that in many cases, sequential frames have different properties such as dimensions, pixel format and so on. With the traditional codecs API, called stateful codecs, the properties are configured before the decoding/encoding streaming starts. So when a frame has a property which is different from the configuration, the decoding stream should stop, reconfigure and then start again - this sequence is called 'Dynamic resolution change'. This causes a lag and is impractical if the frames' properties change too often.

To address this issue, a new API called Request API was recently introduced. The idea is that each frame is part of a 'request', which is basically a list of elements that are clustered together. The application first composes the request and then it asks the kernel to process it. In the context of stateless codecs, a request is a combination of the frame buffers and a list of properties that can also include pointers to reference frames. Each frame is processed separately without the need to stop and restart the decoding/encoding stream.

The vicodec stateless and testing it with vl4-utils

During my Outreachy internship, I added stateless implementation to vicodec. Applications can now interact with the vicodec driver either with the stateful or the stateless API. In the stateless API, the userspace application has to do the 'hard work' - parse the frames' headers and keep track of the reference frames and their order. The v4l-utils package is a good place for code examples of how to use the various media APIs. I used it during my internship and added support for stateless vicodec.

Here are some hands-on of how to use it.

(tl;dr: the final script to test the stateless decoder is here).

The vicodec driver currently only implements a stateless decoder and a stateful encoder and decoder. The driver exposes three device nodes, /dev/videoX, one for each supported implementations.

The way to test vicodec is to first run the encoder to generate encoded fwht format files and then run the decoder on those files.

For that we should first prepare a decoded video format. During my internship I used a video from jell.yfish.us (screenshot below). You can read more about it in my internship blog. I wrote a script that generates decoded formats from that video with various pixel formats.

jell.yfish.us - Screenshot of video showing jellyfish moving in the water

Here are commands to download the video and generate a few videos of dimensions 700x1000 of decoded formats in a directory images:

wget http://jell.yfish.us/media/jellyfish-10-mbps-hd-h264.mkv
mkdir images
ffmpeg -i jellyfish-10-mbps-hd-h264.mkv -c:v rawvideo -pix_fmt yuv420p -f rawvideo  images/jelly-1920-1080.YU12 -loglevel quiet
ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt yuyv422 -f rawvideo images/jelly-700-1000.YUYV -loglevel quiet
ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt yuv422p -f rawvideo images/jelly-700-1000.422P -loglevel quiet
ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt gray -f rawvideo images/jelly-700-1000.GREY -loglevel quiet
ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt yuv420p -f rawvideo images/jelly-700-1000.YU12 -loglevel quiet
ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt nv12 -f rawvideo images/jelly-700-1000.NV12 -loglevel quiet
ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt rgb24 -f rawvideo images/jelly-700-1000.RGB3 -loglevel quiet

In order to play the files, you can run for example:

ffplay -loglevel warning -v info -f rawvideo -pixel_format yuv422p -video_size "700x1000"  images/jelly-700-1000.422P

Now we can use the files in images to first test the vicodec encoder. Here is a command example:

v4l2-ctl -d0 --set-selection-output target=crop,width=700,height=1000 -x width=700,height=1000,pixelformat=422P --stream-mmap --stream-out-mmap --stream-to jelly_700-1000-422P.fwht --stream-from images/jelly-700-1000.422P

This will generate a 'fwht' compressed file called jelly_700-1000-422P.fwht from the decoded file images/jelly-700-1000.422P. The parameter -d0 indicates the use of /dev/video0 for that.

Then to test the decoder, you can generate back a decoded format from the fwht file. This can be done with either the stateful decoder exposed in my case to /dev/video1 or the statless decoder exposed to /dev/video2.

v4l2-ctl -d1 -x width=700,height=1000 -v width=700,height=1000,pixelformat=422P --stream-mmap --stream-out-mmap --stream-from jelly_700-1000-422P.fwht --stream-to out-700-1000.422P

Running the above command with -d2 instead of -d1 will use the stateless decoder and not the stateful decoder.

Now we want to test the stateless decoder on more interesting video. For that we will take two videos with different dimensions and we will merge them together such that each frame has a different dimension from the previous one.

I wrote a utility for that. Compile it with gcc merge_fwht_frames.c -o merge_fwht_frames then running without params shows you how to use it:

dafna@ubuntu:~/jelly$ ./merge_fwht_frames 
usage: ./merge_fwht_frames

The utility gets the two files to merge along with two arguments containing the highest value for each dimension (height and width). So the following set of commands will generate a fwht file called merged-dim.fwht that is composed of two merged videos, one of dimensions 700x1000 and one of dimensions 800x900:

ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt yuv422p -f rawvideo images/jelly-700-1000.422P -loglevel quiet
ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=800:900:0:0 -pix_fmt yuv422p -f rawvideo images/jelly-800-900.422P -loglevel quiet

v4l2-ctl -d0 --set-selection-output target=crop,width=700,height=1000 -x width=700,height=1000,pixelformat=422P --stream-mmap --stream-out-mmap --stream-to jelly_700-1000-422P.fwht --stream-from images/jelly-700-1000.422P
v4l2-ctl -d0 --set-selection-output target=crop,width=800,height=900 -x width=800,height=900,pixelformat=422P --stream-mmap --stream-out-mmap --stream-to jelly_800-900-422P.fwht --stream-from images/jelly-800-900.422P
./merge_fwht_frames jelly_700-1000-422P.fwht jelly_800-900-422P.fwht merged-dim.fwht 800 1000

And now decoding merged-dim.fwht with the stateless decoder:

v4l2-ctl -d2 -x width=800,height=1000 -v width=800,height=1000,pixelformat=422P --stream-mmap --stream-out-mmap --stream-from merged-dim.fwht --stream-to out-800-1000.422P

Now we have a decoded video file out-800-1000.422P which is composed of frames with interchanging dimensions. This is a 422P format in which each pixel is encoded with 2 bytes so that the first frame size is 2*700*1000=1400000 bytes, the second is 2*800*900=1440000 bytes, the third is again 1400000 bytes and so on.

The following script will separate the file out-800-1000.422P into two files:

size=$(stat --printf="%s" out-800-1000.422P)

frm1_sz=$((700 * 1000 * 2))
ex_size1=$(($frm1_sz * 450))
frm2_sz=$((800 * 900 * 2))
ex_size2=$(($frm2_sz * 450))

if [ $(($ex_size1 + $ex_size2)) != $size ]; then

        echo "expected size = $ex_size"
        echo "actual   size = $size"
        exit 1
fi

double_frame=$(($frm1_sz + $frm2_sz))


while 
do
        dd if=out-800-1000.422P obs=$double_frame ibs=$double_frame skip=$i count=1 >> tmp
        head -c $frm1_sz tmp >> out-mrg-700-1000.422P
        tail -c $frm2_sz tmp >> out-mrg-800-900.422P
        rm tmp
        i=$(($i + 1))
done

And now play the two files:

ffplay -loglevel warning -v info -f rawvideo -pixel_format "yuv422p" -video_size "700x1000"  out-mrg-700-1000.422P
ffplay -loglevel warning -v info -f rawvideo -pixel_format "yuv422p" -video_size "800x900"  out-mrg-800-900.422P

You will see glitches on the edges of the videos. Like constant color squares on the bottom of the out-mrg-700-1000.422P video and on the left of the out-mrg-800-900.422P video.

This is because the fwht format uses the previous frame as a reference frame and since the dimensions of the previous frame don't match the current ones, there are those squares where frames don't overlap.

The vicodec uses the video_gop_size control which controls the periods of I-frames (I-frames are frames that do not need reference to other frames in order to be decoded).

If you run v4l2-ctl -d0 --list-ctrls you will see:

dafna@ubuntu:~/outreachy$ v4l2-ctl -d0 --list-ctrls

User Controls

   min_number_of_output_buffers 0x00980928 (int)    : min=1 max=1 step=1 default=1 value=1 flags=read-only

Codec Controls

          video_gop_size 0x009909cb (int)    : min=1 max=16 step=1 default=10 value=10
          fwht_i_frame_qp_value 0x00990a22 (int)    : min=1 max=31 step=1 default=20 value=20
          fwht_p_frame_qp_value 0x00990a23 (int)    : min=1 max=31 step=1 default=20 value=20

So the default value for video_gop_size is 10 which means that there is an I-frame every 10 frames. If we set this values to 1 then each frame will be an I-frame and so we will not have those artifacts. We do this by adding the --set-ctrl video_gop_size=1 option to the decoding and encoding commands.

The final script can be found here.

Enjoy!

Outreachy - Round 17

Testing Video4Linux2 drivers like a boss

Virtme: The kernel developers' best friend

Outreachy - Round 17

Testing Video4Linux2 drivers like a boss

Virtme: The kernel developers' best friend

Comments (0)

Add a Comment

Search the newsroom

Latest Blog Posts

PipeWire workshop 2025: Updates on video transport, Rust efforts, TSN networking, and Bluetooth support

03/07/2025

As part of the activities Embedded Recipes in Nice, France, Collabora hosted a PipeWire workshop/hackfest, an opportunity for attendees…

Coccinelle for Rust progress report

25/06/2025

In collaboration with Inria, the French Institute for Research in Computer Science and Automation, Tathagata Roy shares the progress made…

Linux Media Summit 2025 recap

23/06/2025

Last month in Nice, active media developers came together for the annual Linux Media Summit to exchange insights and tackle ongoing challenges…

Constructor acquires, destructor releases

09/06/2025

In this final article based on Matt Godbolt's talk on making APIs easy to use and hard to misuse, I will discuss locking, an area where…

What if C++ had decades to learn?

21/05/2025

In this second article of a three-part series, I look at how Matt Godbolt uses modern C++ features to try to protect against misusing an…

Unleashing gst-python-ml: Python-powered ML analytics for GStreamer pipelines

12/05/2025

Powerful video analytics pipelines are easy to make when you're well-equipped. Combining GStreamer and Machine Learning frameworks are the…

About Collabora

Whether writing a line of code or shaping a longer-term strategic software development plan, we'll help you navigate the ever-evolving world of Open Source.

한국의 국기 한국어 버전의 Collabora.com 보기