October 09, 2019
Prior to joining Collabora, I took part in Round 17 of the Outreachy internships, which ran from December 2018 to March 2019. Outreachy is a paid, remote internship program. Its goal is to support people from groups underrepresented in tech, and help newcomers to free software and open source make their first contributions. Open to applicants around the world, Outreachy internships run twice a year.
Once your application is approved, you must pick an open source project to make a contribution to, in hopes of being selected as an intern, and teamed with experienced mentors. You can read more about the program here.
In my case, I was selected as an intern to work on the media subsystem of the Linux kernel, and my mentors were Helen Koike, (who is now my colleague at Collabora!) and Hans Verkuil (who works for Cisco and has been working on the media subsystem for around 15 years).
In the media subsystem there are few drivers that are 'virtual' in the sense that they do not interact with any specific hardware, but they are implemented only in software. The main purpose of those drivers is to be used to test user space applications. Since no specific hardware or architecture is needed, userspace applications can always interact with those drivers and rely on them for running their own tests. Therefore, it is important that those drivers implement the APIs accurately and have a wide support.
The virtual drivers are: vivid, vim2m, vimc and vicodec. During my Outreachy internship I worked on the vicodec driver.
vicodec, which stands for Virtual Codec, is a driver that implements a codec format based on the Fast Walsh-Hadamard Transform. FWHT was designed to be fast and simple, and to have characteristics of other video codecs so that it faces the same issues. Applications can interact with vicodec and compress videos to this format and decompress them. You can read more details about FWHT format here.
A common problem that arises in decoding is that in many cases, sequential frames have different properties such as dimensions, pixel format and so on. With the traditional codecs API, called stateful codecs, the properties are configured before the decoding/encoding streaming starts. So when a frame has a property which is different from the configuration, the decoding stream should stop, reconfigure and then start again - this sequence is called 'Dynamic resolution change'. This causes a lag and is impractical if the frames' properties change too often.
To address this issue, a new API called Request API was recently introduced. The idea is that each frame is part of a 'request', which is basically a list of elements that are clustered together. The application first composes the request and then it asks the kernel to process it. In the context of stateless codecs, a request is a combination of the frame buffers and a list of properties that can also include pointers to reference frames. Each frame is processed separately without the need to stop and restart the decoding/encoding stream.
During my Outreachy internship, I added stateless implementation to vicodec. Applications can now interact with the vicodec driver either with the stateful or the stateless API. In the stateless API, the userspace application has to do the 'hard work' - parse the frames' headers and keep track of the reference frames and their order. The v4l-utils package is a good place for code examples of how to use the various media APIs. I used it during my internship and added support for stateless vicodec.
Here are some hands-on of how to use it.
tl;dr: the final script to test the stateless decoder is here).
The vicodec driver currently only implements a stateless decoder and a stateful encoder and decoder. The driver exposes three device nodes, /dev/videoX, one for each supported implementations.
The way to test
vicodec is to first run the encoder to generate encoded
fwht format files and then run the decoder on those files.
For that we should first prepare a decoded video format. During my internship I used a video from jell.yfish.us (screenshot below). You can read more about it in my internship blog. I wrote a script that generates decoded formats from that video with various pixel formats.
Here are commands to download the video and generate a few videos of dimensions 700x1000 of decoded formats in a directory
wget http://jell.yfish.us/media/jellyfish-10-mbps-hd-h264.mkv mkdir images ffmpeg -i jellyfish-10-mbps-hd-h264.mkv -c:v rawvideo -pix_fmt yuv420p -f rawvideo images/jelly-1920-1080.YU12 -loglevel quiet ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt yuyv422 -f rawvideo images/jelly-700-1000.YUYV -loglevel quiet ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt yuv422p -f rawvideo images/jelly-700-1000.422P -loglevel quiet ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt gray -f rawvideo images/jelly-700-1000.GREY -loglevel quiet ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt yuv420p -f rawvideo images/jelly-700-1000.YU12 -loglevel quiet ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt nv12 -f rawvideo images/jelly-700-1000.NV12 -loglevel quiet ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt rgb24 -f rawvideo images/jelly-700-1000.RGB3 -loglevel quiet
In order to play the files, you can run for example:
ffplay -loglevel warning -v info -f rawvideo -pixel_format yuv422p -video_size "700x1000" images/jelly-700-1000.422P
Now we can use the files in
images to first test the vicodec encoder. Here is a command example:
v4l2-ctl -d0 --set-selection-output target=crop,width=700,height=1000 -x width=700,height=1000,pixelformat=422P --stream-mmap --stream-out-mmap --stream-to jelly_700-1000-422P.fwht --stream-from images/jelly-700-1000.422P
This will generate a 'fwht' compressed file called
jelly_700-1000-422P.fwht from the decoded file
images/jelly-700-1000.422P. The parameter
-d0 indicates the use of
/dev/video0 for that.
Then to test the decoder, you can generate back a decoded format from the
fwht file. This can be done with either the stateful decoder exposed in my case to
/dev/video1 or the statless decoder exposed to
v4l2-ctl -d1 -x width=700,height=1000 -v width=700,height=1000,pixelformat=422P --stream-mmap --stream-out-mmap --stream-from jelly_700-1000-422P.fwht --stream-to out-700-1000.422P
Running the above command with
-d2 instead of
-d1 will use the stateless decoder and not the stateful decoder.
Now we want to test the stateless decoder on more interesting video. For that we will take two videos with different dimensions and we will merge them together such that each frame has a different dimension from the previous one.
I wrote a utility for that. Compile it with
gcc merge_fwht_frames.c -o merge_fwht_frames then running without params shows you how to use it:
dafna@ubuntu:~/jelly$ ./merge_fwht_frames usage: ./merge_fwht_frames
The utility gets the two files to merge along with two arguments containing the highest value for each dimension (height and width). So the following set of commands will generate a
fwht file called
merged-dim.fwht that is composed of two merged videos, one of dimensions 700x1000 and one of dimensions 800x900:
ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt yuv422p -f rawvideo images/jelly-700-1000.422P -loglevel quiet ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=800:900:0:0 -pix_fmt yuv422p -f rawvideo images/jelly-800-900.422P -loglevel quiet v4l2-ctl -d0 --set-selection-output target=crop,width=700,height=1000 -x width=700,height=1000,pixelformat=422P --stream-mmap --stream-out-mmap --stream-to jelly_700-1000-422P.fwht --stream-from images/jelly-700-1000.422P v4l2-ctl -d0 --set-selection-output target=crop,width=800,height=900 -x width=800,height=900,pixelformat=422P --stream-mmap --stream-out-mmap --stream-to jelly_800-900-422P.fwht --stream-from images/jelly-800-900.422P ./merge_fwht_frames jelly_700-1000-422P.fwht jelly_800-900-422P.fwht merged-dim.fwht 800 1000
And now decoding
merged-dim.fwht with the stateless decoder:
v4l2-ctl -d2 -x width=800,height=1000 -v width=800,height=1000,pixelformat=422P --stream-mmap --stream-out-mmap --stream-from merged-dim.fwht --stream-to out-800-1000.422P
Now we have a decoded video file
out-800-1000.422P which is composed of frames with interchanging dimensions. This is a 422P format in which each pixel is encoded with 2 bytes so that the first frame size is
2*700*1000=1400000 bytes, the second is
2*800*900=1440000 bytes, the third is again 1400000 bytes and so on.
The following script will separate the file
out-800-1000.422P into two files:
size=$(stat --printf="%s" out-800-1000.422P) frm1_sz=$((700 * 1000 * 2)) ex_size1=$(($frm1_sz * 450)) frm2_sz=$((800 * 900 * 2)) ex_size2=$(($frm2_sz * 450)) if [ $(($ex_size1 + $ex_size2)) != $size ]; then echo "expected size = $ex_size" echo "actual size = $size" exit 1 fi double_frame=$(($frm1_sz + $frm2_sz)) while do dd if=out-800-1000.422P obs=$double_frame ibs=$double_frame skip=$i count=1 >> tmp head -c $frm1_sz tmp >> out-mrg-700-1000.422P tail -c $frm2_sz tmp >> out-mrg-800-900.422P rm tmp i=$(($i + 1)) done
And now play the two files:
ffplay -loglevel warning -v info -f rawvideo -pixel_format "yuv422p" -video_size "700x1000" out-mrg-700-1000.422P ffplay -loglevel warning -v info -f rawvideo -pixel_format "yuv422p" -video_size "800x900" out-mrg-800-900.422P
You will see glitches on the edges of the videos. Like constant color squares on the bottom of the
out-mrg-700-1000.422P video and on the left of the
This is because the fwht format uses the previous frame as a reference frame and since the dimensions of the previous frame don't match the current ones, there are those squares where frames don't overlap.
The vicodec uses the
video_gop_size control which controls the periods of I-frames (I-frames are frames that do not need reference to other frames in order to be decoded).
If you run
v4l2-ctl -d0 --list-ctrls you will see:
dafna@ubuntu:~/outreachy$ v4l2-ctl -d0 --list-ctrls User Controls min_number_of_output_buffers 0x00980928 (int) : min=1 max=1 step=1 default=1 value=1 flags=read-only Codec Controls video_gop_size 0x009909cb (int) : min=1 max=16 step=1 default=10 value=10 fwht_i_frame_qp_value 0x00990a22 (int) : min=1 max=31 step=1 default=20 value=20 fwht_p_frame_qp_value 0x00990a23 (int) : min=1 max=31 step=1 default=20 value=20
So the default value for
video_gop_size is 10 which means that there is an I-frame every 10 frames. If we set this values to 1 then each frame will be an I-frame and so we will not have those artifacts. We do this by adding the
--set-ctrl video_gop_size=1 option to the decoding and encoding commands.
The final script can be found here.
Maintaining a non-trivial set of GStreamer patches can be tricky. Thanks to the recent move to a single, unified git repo, you can now easily…
Earlier this year, I joined Collabora as an intern to work on improving testing in libcamera and automating it through KernelCI. Having…
With the LLVM toolchain seeing increasing development and adoption alongside the older, more established GNU toolchain, projects needing…
This summer, Christoph Haag and I had the pleasure of taking part in Google Summer of Code as mentors for xrdesktop, the Open Source project…
Earlier this year, from January to April 2021, I worked on adding support for stateless decoders for GStreamer as part of a multimedia internship…
In our previous post, we presented a project backed by INVEST-AI which introduces a multi-stage neural network-based solution. Now let's…