Unleashing gst-python-ml: Python-powered ML analytics for GStreamer pipelines

Unleashing gst-python-ml: Python-powered ML analytics for GStreamer pipelines

Aaron Boxer
May 12, 2025

Share this post:

Reading time:

Creating powerful video analytics pipelines is easy if you have the right tools. In this post, we will show you how to effortlessly build a broad range of machine learning (ML) enabled video pipelines using just two components, GStreamer and Python. We will focus on simplicity and functionality, deferring performance tuning to a future deep dive.

The core of our pipeline is GStreamer, everyone's favorite multimedia framework. Over the past few years, Collabora has contributed extensive ML capabilities to upstream GStreamer, adding support for ONNX and LiteRT inference and introducing a fine-grained, extensible metadata framework to persist model outputs.

We now take the next step by unleashing gst-python-ml: a pure Python framework that can easily build powerful ML-enabled GStreamer pipelines using standard Python packages. With just a few lines of Python, or a single gst-launch-1.0 command, you can now run complex models across multiple streams, complete with tracking, captioning, speech and text processing, and much more.

Features

The framework is composed of a set of base classes that can be easily extended to create new ML elements, and a set of tested, fully functional elements that support the following features and models:

Object Detection with Yolo, FasterRCNN, MaskRCNN, or any TorchVision object detection model
Segmentation with MaskRCNN or Yolo
Tracking with Yolo
Video Captioning with Phi3.5 Vision
Translation with Marian
Transcription with Whisper
Speech to Text with Whisper
Text to Speech with WhisperSpeech
Text to Image with Stable Diffusion
Bird's eye view of sports matches
Batch inference
Multiple incoming streams
Large Language Models (LLMs) with any HuggingFace Hub LLM
Serializing ML metadata to Kafka server for real-time post-processing
Overlay to display ML metadata such as bounding boxes and tracks

Running a Pipeline

For a taste of the ease and simplicity of gst-python-ml, we present a few sports analytics sample pipelines.

1. Here are all the steps needed to run a Yolo tracking pipeline on Ubuntu:

apt install -y gstreamer1.0-plugins-base gstreamer1.0-plugins-base-apps \
    gstreamer1.0-plugins-good gstreamer1.0-plugins-bad \
    gir1.2-gst-plugins-bad-1.0 python3-gst-1.0 gstreamer1.0-python3-plugin-loader

pip install pygobject pycairo torch torchvision transformers numpy ultralytics gst-python-ml

gst-launch-1.0 filesrc location=/path/to/video ! decodebin !  videoconvert ! \
    pyml_yolo model-name=yolo11m track=True ! pyml_overlay  ! videoconvert ! autovideosink

2. Here is a soccer match processed with this pipeline:

3. Multiple video sources are also supported.

Collabora soccer tracking

4. Another supported sports analytics feature is the creation of a bird's eye view of a game, to show a quick overview of the field:

Collabora bird's eye view

5. gst-python-ml shows its true power when using hybrid vision + language models to enable features that are simply not available in any other GStreamer-based analytics framework, whether open source or commercial. For example, video captioning is supported using the Phi3.5 Vision model. Each video frame can be automatically captioned, and these captions can be further processed to automatically summarize a game or to detect significant events such as goals.

Collabora video captioning

These are just a few of the features we have built with gst-python-ml - the possibilities are endless.

Development

gst-python-ml is distributed as a PyPI package. All elements are first class GStreamer elements that can be added to any GStreamer pipeline, and they will work with any Linux distribution's GStreamer packages, from version 1.24 onward.

Development takes place on our GitHub repository — we welcome contributions, feedback and new ideas.

As we continue building gst-python-ml we are actively looking for collaborators and partners. Our goal is to make ML workflows in GStreamer powerful and accessible — whether for real-time media analysis, content generation, or for intelligent pipelines in production environments.

If you would like to know more about Collabora's work on GStreamer ML, please contact us.

Effortless GStreamer Analytics Cross-Platform Support via ONNX Runtime

New unixfd plugin in GStreamer 1.24

A framework to share analytics data in GStreamer

Effortless GStreamer Analytics Cross-Platform Support via ONNX Runtime

New unixfd plugin in GStreamer 1.24

A framework to share analytics data in GStreamer

Comments (0)

Add a Comment

Search the newsroom

Latest Blog Posts

PipeWire workshop 2025: Updates on video transport, Rust efforts, TSN networking, and Bluetooth support

03/07/2025

As part of the activities Embedded Recipes in Nice, France, Collabora hosted a PipeWire workshop/hackfest, an opportunity for attendees…

Coccinelle for Rust progress report

25/06/2025

In collaboration with Inria, the French Institute for Research in Computer Science and Automation, Tathagata Roy shares the progress made…

Linux Media Summit 2025 recap

23/06/2025

Last month in Nice, active media developers came together for the annual Linux Media Summit to exchange insights and tackle ongoing challenges…

Constructor acquires, destructor releases

09/06/2025

In this final article based on Matt Godbolt's talk on making APIs easy to use and hard to misuse, I will discuss locking, an area where…

What if C++ had decades to learn?

21/05/2025

In this second article of a three-part series, I look at how Matt Godbolt uses modern C++ features to try to protect against misusing an…

Unleashing gst-python-ml: Python-powered ML analytics for GStreamer pipelines

12/05/2025

Powerful video analytics pipelines are easy to make when you're well-equipped. Combining GStreamer and Machine Learning frameworks are the…

About Collabora

Whether writing a line of code or shaping a longer-term strategic software development plan, we'll help you navigate the ever-evolving world of Open Source.

한국의 국기 한국어 버전의 Collabora.com 보기