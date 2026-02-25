Support for the VDPU381 and VDPU383 Rockchip video decoders has been merged into the Upstream Linux kernel. These decoders are found on modern SoCs, respectively the RK3588 and RK3576, and bring improved hardware decoding capabilities for H.264 and HEVC to mainline Linux.

This post highlights what we added, how we fixed a subtle IOMMU reset issue, a deliberate design choice in how we program hardware registers, and the introduction of new V4L2 UAPI controls required specifically for this hardware.

The IOMMU restore issue

One of the more subtle issues encountered during development was related to IOMMU state restoration, and it stems from how VDPU381/383 integrates the decoder's IOMMU core.

On these IP cores, the IOMMU core is embedded inside the decoder itself. As a result, when the decoder is reset, typically to recover from a decoding error, the internal IOMMU is also reset. This reset clears all address mappings that had previously been programmed by the driver.

From the kernel’s point of view, those mappings were still valid and cached. In reality, the hardware had lost them entirely, leading to failed memory accesses or stalled decoding after error recovery.

The fix was to explicitly restore cached IOMMU mappings after a decoder reset by programming another empty IOMMU domain, then reprogramming the default IOMMU domain, avoiding any changes in the IOMMU driver:

[PATCH] media: rkvdec: Restore iommu addresses on errors

This change ensures reliable recovery from decoding errors and avoids subtle failures that only show up after a reset. It also has been applied to other affected Rockchip IP cores, like RGA, Rockchip's Raster Graphics Accelerator hardware.

New V4L2 UAPI controls for HEVC long and short term reference sets

Supporting VDPU381/383 also required extending the V4L2 stateless HEVC UAPI with two new controls. These decoders rely on explicit Reference Picture Set (RPS) programming, split into:

Short-Term RPS

Long-Term RPS

Most HEVC decoders can manage decoding frames based on the Sequence Parameter Set (SPS) data without the LT and ST reference sets, but the Rockchip ones do not. As opposed to the VeriSilicon decoders, the Rockchip decoders also don't implement a skip method to ignore those. As a result, new UAPI controls were introduced to allow userspace to pass fully described short-term and long-term RPS tables to the kernel.

Additionally, we added support in the visl driver that shows ftraces with all the controls parameters, useful when working on a userspace implementation.

We added support for the new controls in GStreamer 1.28 and preliminary work has been done for FFmpeg.

Compatibility with Vulkan Video Decode

A key design goal for these controls was compatibility with the Vulkan Video Decode API.

The data structures closely mirror Vulkan’s HEVC reference picture descriptions, which means:

Userspace implementations can share logic between V4L2 and Vulkan backends

Translation layers remain thin and mechanical

No loss of semantic information when moving between APIs

This alignment helps ensure that Linux media APIs evolve in a consistent direction and reduces friction for projects supporting multiple decode stacks.

Using struct for register programming

The VDPU381/383 driver uses C structs to represent the full register layout, instead of relying on ad-hoc writel() calls or regmap . This decision was driven by specific hardware requirements rather than style preferences.

Writing all registers, including the default values

For these decoders, it is safer to write all registers, even those that match their documented default values. Skipping a write because its default value is correct can leave the hardware in an inconsistent internal state and cause decoding to fail.

Using a struct makes it straightforward to define a complete register image and guarantees that every register is programmed explicitly using a memcpy() flavor.

Register write order matters

Even when all registers are written, the order of writes is significant. Writing the correct values in the wrong sequence can still break the decoder. This is mainly because Rockchip uses its own multimedia library mpp to test the hardware. That library writes all registers in order, making the hardware less robust against random register access.

Struct-based programming enforces a deterministic and reviewable ordering of register writes. In contrast, scattered writel() calls provide no structural guarantees and make it easy to accidentally reorder writes during refactoring.

One of the commits in the series explicitly documents these constraints and explains why ordering and default writes are required.

More details can be found in the struct-switching patch.

Why not regmap ?

While regmap is often a good fit for register-heavy drivers, we want to have flexibility for later multi-core support where registers could be prepared while the hardware is still working on the previous frame. To do that, the registers need to not be attached to a specific core address, so that they can be used on the first available core, which is not possible with regmap .

Multi-core support on RK3588 (WIP)

The RK3588 features two VDPU381 cores, enabling parallel decoding in hardware. However, Upstream multi-core support is not yet enabled.

Supporting multi-core decoding correctly requires:

Scheduling decode jobs across multiple independent decoder cores

Managing per-core reset and IOMMU, with restore behavior

The struct-based register approach was chosen in part to prepare for this future work. With it, the driver can:

Prepare a complete register image per job

Apply that image to whichever decoder core becomes available first

Multi-core support for RK3588 is actively being worked on and is one of the next steps for the driver.

Video parallel decoding

The main difficulty regarding multi-core decoding is that the frames of H.264 or HEVC video stream will usually depend on previous frames being fully decoded already. That means scheduling jobs from 1 stream across multiple decoders is quite a hard task and may not yield a significant performance boost.

Our current implementation, which has not been upstreamed yet, doesn't do that.

Instead, it parallelizes the decoding of multiple streams, so that frames do not depend on each other. One of the main complications was the management of the IOMMU cores, but more on that later.

Summary

The upstreaming of VDPU381 and VDPU383 support required more than just enabling new hardware:

A 17-patch series adding decoder support, in addition to dt-bindings and device tree nodes

New V4L2 HEVC UAPI controls for explicit short-term and long-term RPS handling

Fixing a non-obvious IOMMU restore issue caused by decoder-embedded IOMMU resets

Adopting a struct-based register programming model to enforce completeness, ordering, and future multi-core readiness

The result is a more robust and maintainable driver that aligns with Upstream expectations while accommodating the realities of modern Rockchip media hardware.

What's next?

The decoders support other codecs, such as:

AV1 on RK3576, which has very preliminary support

VP9 on RK3588, for which D.V.A.B. Sarma added support

Christian Hewitt added support for the VDPU346 on RK356X SoCs, which is a variant of VDPU381.

As mentioned earlier, multi-core support is on its way for RK3588.