We're hiring!
*

NVMe: Officially faster for emulated controllers!

Helen Koike avatar

Helen Koike
June 13, 2017

Share this post:

Reading time:

The Doorbell Buffer Config command

When I last wrote about NVMe, the feature to improve NVMe performance over emulated environments was just a living discussion and a work in progress patch. However, it has now been officially released in the NVMe Specification Revision 1.3 under the name "Doorbell Buffer Config command", along with an implementation that is already in the mainline Linux kernel! \o/

You can already feel the difference in performance if you compile Kernel 4.12-rc1 (or later) and run it over a virtual machine hosted on Google Compute Engine. Google actually updated their hypervisor as soon as the feature was ratified by the NVMe working group, even before it was publicly released.

There were very few changes from the original proposal, I.e. opcodes, return values and now fancy names; the buffers (as described in my last post) are now called Shadow Doorbell and EventIdx buffers.

In short, the first one mimics the Doorbell registers in memory, allowing the emulated controller to fetch the Doorbell value when convenient instead of waiting for the Doorbell register to be written. For its part, the EventIdx provides a hint given by the emulated controller to tell the host if the Doorbell register needs to be updated (in case the emulated controller is not fetching the Doorbell value from the Shadow Doorbell buffer). You can check section 7.13 of the specification for an example of usage.

Results

The following test results were obtained in a machine of type n1-standard-4 (4 vCPUs, 15 GB memory) at Google Cloud Engine platform with Kernel 4.12.0-rc5 using the following command:

$ sudo fio --time_based --name=benchmark --runtime=30 \ --filename=/dev/nvme0n1 --nrfiles=1 --ioengine=libaio --iodepth=32 \ --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=1 \ --rw=randread --blocksize=4k --randrepeat=0

Results (in Input/Ouput Operations per Second):
Without Shadow Doorbell and EventIdx buffers: 43.9K IOPS
With Shadow Doorbell and EventIdx buffers: 184K IOPS
Gain ~= 4 times

Screenshot - Without Shadow Doorbell and EventIdx buffers


Screenshot - With Shadow Doorbell and EventIdx buffers


Enjoy your enhanced numbers of IOPS! :D

 

Original post

Search the newsroom

Latest Blog Posts

Simplifying Bluetooth qualification for Linux/BlueZ: New upstream documentation

26/05/2026

New upstream BlueZ documentation helps simplify Bluetooth qualification for Linux-based products by mapping supported profiles, test requirements,…

Building Tyr in Rust: CSF architecture and booting the MCU

14/05/2026

See how Tyr moves beyond MCU firmware boot to build the group, queue, VM, submission, and completion paths needed to run real Vulkan workloads…

Optimizing memory access in NIR

07/05/2026

A complete breakdown of Mesa’s NIR compiler detailing how it optimizes shader memory access with SSA promotion, deref analysis, copy propagation,…

BlueZ-powered Auracast broadcasting on Genio 700

05/05/2026

Collabora brought Bluetooth Auracast broadcasting to MediaTek Genio 700 for Embedded World 2026. Here's the complete, fully Open Source…

Making the invisible audible: Building an OpenXR experience for ocean protection

22/04/2026

Using our XR expertise, Collabora created a standalone XR experience for our 1% for the Planet partner, SOMAR, to showcase the direct impact…

Bringing BitNet to ExecuTorch via Vulkan

17/04/2026

BitNet-style ternary brings LLM inference to ExecuTorch via its Vulkan backend, enabling much smaller, bandwidth-efficient models with portable…

Open Since 2005 logo

Our website only uses a strictly necessary session cookie provided by our CMS system. To find out more please follow this link.

Collabora Limited © 2005-2026. All rights reserved. Privacy Notice. Sitemap.