We're hiring!

Paving the way for high bitrate video streaming with GStreamer's RTP elements

Antonio Ospite avatar

Antonio Ospite
August 20, 2020

Share this post:

RTP is the dominant protocol for low latency audio and video transport. It sits at the core of many systems used in a wide array of industries, from WebRTC, to SIP (IP telephony), and from RTSP (security cameras) to RIST and SMPTE ST 2022 (broadcast TV backend). 

Being a flexible, Open Source framework, GStreamer is used in a variety of applications. Its RTP stack has been battle tested in multiple use-cases across all of the aforementioned industries, giving it the distinct advantage of being able to apply optimisations from one use case to another. Without a doubt, GStreamer has one of the most mature and complete RTP stacks available.

Additional unit tests, as well as key fixes and performance improvements to the GStreamer RTP elements, have recently landed in GStreamer 1.18:

The latter in particular provides an important boost in throughput, opening the gate to high bitrate video streaming.

Let's go deeper on that.

Pushing a buffer in GStreamer

One of the essential tasks of GStreamer is to move (push) buffers from an upstream element to the next downstream element, making the pipeline progress.

But what does pushing a buffer mean from a low level point of view?

Elements are connected through pads. Each element has a pad for each possible connection, a pad can either be a "source pad" which the element uses to output buffers or a "sink pad" that it uses to input buffers. To create a connection between two elements, the application programmer connects the source pad of one element to the sink pad of another. When an element wishes to send a buffer with data to the next element, it "pushes" it onto its source pad which then chains it to the sink pad which calls into the next element.

The basic tool that an element uses to push a buffer is the gst_pad_push function:

GstFlowReturn gst_pad_push (GstPad * pad, GstBuffer * buffer);

A buffer push is actually a series of intricate function calls and locks being taken, the sequence is as follows:

  1. The first element calls the gst_pad_push() function on its source pad.
  2. The source pad takes its own mutex, updates its internal state and releases it.
  3. The source pad takes a reference (increases an atomic counter) on the connected sink pad.
  4. The source pad calls the chain function of the connected sink pad.
  5. The sink pad takes its stream lock, which is a recursive mutex.
  6. The sink pad takes its own mutex, updates its internal state.
  7. The sink pad takes a reference on its parent (the second element), which means it increases an atomic counter.
  8. The sink pad releases its own mutex.
  9. The sink pad calls into the second element's chain function (the actual code of the element).
  10. The sink pad releases the reference (decreases an atomic counter) on onto its parent.
  11. The sink pad releases the stream lock, the recursive mutex.
  12. The source pad releases the reference (decreases an atomic counter) on the connected sink pad.
  13. The source pad takes its own mutex, updates its internal state and releases it.

As you can see from this incomplete list, each transfer of a buffer, even though it happens on one thread is actually a number of mutex locks and other atomic operations which are relatively costly on modern pipelined processors. When profiling a GStreamer pipeline, this is actually the part that causes the most overhead when transmitting a large number of small buffers.

Is it possible to do better?

Pushing buffer lists

GStreamer has a mechanism called "buffer list" which can be used to reduce the overhead of pushing a single buffer.

The entry point for an element to use this functionality is the gst_pad_push_list function.

GstFlowReturn gst_pad_push_list (GstPad * pad, GstBufferList * list);

What buffer lists do is to group together a number of buffers so that they are forwarded through the pipeline as one operation, which can significantly reduce this overhead as the sequence of operations described above will happen once per list and not once per buffer.

In case some elements do not support chaining buffer lists, GStreamer provides a fall-back mechanism like gst_pad_chain_list_default to push buffers one by one under the hood. This means that elements can always implement processing buffers in a list independently from the level of support in other elements.

This is nice for compatibility and allows incremental refinements, however to actually avoid the bottlenecks of pushing individual buffers and to get the biggest performance improvements all elements in a pipeline should natively support chaining buffer lists (i.e. have their own chainlist function installed on sink pads).

Buffer lists in rtpsession

The RTP specification, described in RFC 3550, defines a set of rules for the association of participants during a conversation using RTP, this is called an "RTP Session".

In GStreamer, the core element that implements the session management is rtpsession.

The rtpsession element already had support for buffer lists in its send path but not in its receive path.

Let's consider the following pipeline built around the rtpsession element:

gst-launch-1.0 -e \
    rtpsession name=rtpsess \
    videotestsrc ! imagefreeze num-buffers=10000 ! video/x-raw,format=RGB,width=320,height=240 ! rtpvrawpay ! rtpsess.recv_rtp_sink
    rtpsess.recv_rtp_src ! fakesink async=false sync=false

A test stream is generated (imagefreeze is used to reduce CPU usage in this case), split in RTP packets, processed by rtpsession, and consumed by a fakesink element.

The upstream element (rtpvrawpay) and downstream element (fakesink) could already chain buffer lists, but rtpsession could not.

After enabling buffer lists in rtpsession the element throughput improved dramatically:

A simplified visual interpretation can be obtained using flamegraphs.

⇨ Note: By clicking on the graphs below an interactive flamegraph will be opened in a new window.

When pushing individual buffers the call graph is deeper:

When pushing buffer lists the call graph is more balanced:

Real-world scenario considerations

To be fair this huge improvement is only achievable in controlled use cases, the boost in a generic real-world scenario is currently mitigated by other factors.

Usually the rtpsession element is not used directly but via rtpbin that, depending on the scenario, also connects it to other elements (like rtpjitterbuffer, rtpstorage, rtpssrcdemux); and the input may come from a remote source, like udpsrc.

Consider this more realistic pipeline:

gst-launch-1.0 -e '
    rtpbin name=rtpbin \
    udpsrc port=5000 caps=application/x-rtp,media=(string)video,clock-rate=(int)90000,encoding-name=RAW,payload=96,sampling=RGB,depth=(string)8,width=(string)320,height=(string)240 ! queue ! rtpbin.recv_rtp_sink_0 \
    rtpbin. ! fakesink async=false sync=false \
    udpsrc port=5001 caps=application/x-rtcp ! queue ! rtpbin.recv_rtcp_sink_0 \
    rtpbin.send_rtcp_src ! queue ! udpsink host= port=5003 sync=false async=false

This is the receiving pipeline for one sender, the two udpsink elements are one for RTP and one for RTCP, rtpbin handles all the RTP details and delivers media data to fakesink and RTCP replies for the other participant via udpsink.

Unless all elements support pushing buffer lists natively there will still be bottlenecks due to individual buffer pushes.

See a comparison of before and after using buffer lists in rtpsession with a pipeline that uses udpsrc and rtpbin:

The improvement is there but it is not as dramatic as in the controlled scenario.


The improvements in rtpsession available in GStreamer 1.18 are an important step towards a more efficient RTP implementation in high bitrate scenarios, but further work would be needed (e.g. enable buffer lists on udpsrc) to actually bring some of the theoretical improvements in for practical usage.

Comments (0)

Add a Comment

Allowed tags: <b><i><br>Add a new comment:

Search the newsroom

Latest Blog Posts

Open Source meets Super Resolution, part 1


Introducing an accurate and light-weight deep network for video super-resolution upscaling, running on a completely open source software…

Integrating libcamera into PipeWire


PipeWire continues to evolve with the recent integration of libcamera, a library to support complex cameras. In this blog post, I'll explain…

Pushing pixels to your Chromebook


A high-level introduction of the Linux graphics stack, how it is used within ChromeOS, and the work done to improve software rendering (while…

Using the Linux kernel's Case-insensitive feature in Ext4


Last year, a (controversial) feature was added to the Linux kernel to support optimized case-insensitive file name lookups in the Ext4 filesystem.…

Panfrost performance counters with Perfetto


We have now integrated Mali GPU hardware counters supported by Panfrost with Perfetto's tracing SDK, unlocking all-in-one graphics-aware…

Paving the way for high bitrate video streaming with GStreamer's RTP elements


Key performance improvements and fixes to GStreamer's RTP stack have landed in GStreamer 1.18, due in the coming months. The latest enhancements…

Open Since 2005 logo

We use cookies on this website to ensure that you get the best experience. By continuing to use this website you are consenting to the use of these cookies. To find out more please follow this link.

Collabora Ltd © 2005-2020. All rights reserved. Privacy Notice. Sitemap.