Exploring Rust for Vulkan drivers, part 1

Exploring Rust for Vulkan drivers, part 1

Faith Ekstrand
February 02, 2023

Share this post:

Reading time:

Today, all of the Vulkan drivers in Mesa are written in C. Some use C++ for their back-end shader compiler or other components, but all of the Vulkan API entrypoints are implemented in straight C. While C has historically been the language of choice for Linux kernel and driver code, that doesn't mean it's the optimal language for writing graphics drivers. It has long had a reputation for software bugs and security vulnerabilities. Error handling and concurrency have been a constant source of hard-to-diagnose bugs. Recently, we even had one bug which was caused by accidentally truncating a 64-bit unsigned integer to 32 bits deep inside the common Vulkan synchronization code.

Over the course of the last decade, Rust has emerged as a new programming language for writing safe low-level code. The language itself is built from the ground up with code safety and security in mind. Not only does it have multi-threading primitives built in but they're designed in such a way to make it difficult to write code which forgets to take locks or deadlocks. It doesn't have implicit type casting so accidental sign promotion or integer truncation doesn't happen. I've been contemplating the idea of using it in Mesa for a few years now. Specifically, I'd like to know if it's practical to write a Vulkan driver mostly in Rust and if doing so would bring enough benefit to be worth the effort. This blog post is intended to be the first in a series exploring the area of using Rust to write Mesa Vulkan drivers.

I don't expect to see any actual Mesa Vulkan drivers in Rust for a few years yet. My current goal is merely to explore the possibility. When the time comes that someone does choose to write a Vulkan driver in Rust, I want us to be ready. This exploration may also be useful for informing the Rust community about language features which would make the task easier. Converting existing Vulkan drivers to Rust is an explicit non-goal at this time.

Why Rust?

The first response I've heard from many developers when I suggest using Rust is that C++ is already there, already in use in Mesa, and has many of the same features as Rust. There's no particularly significant reason why C++ couldn't be used for implementing Vulkan API entrypoints, but so far none of the driver teams in the Mesa community have chosen to use it for the API portion of their driver. The VkOn12 driver used C++ initially because it made working with D3D12 easier but switched to C before it was merged into Mesa.

While modern C++ has many features which can help with these issues if applied correctly, they're all opt-in and it's still easy to write C-like code with all the same bugs. Using these features incorrectly or mixing C and C++ patterns for things like error handling can make bugs even subtler and harder to find and fix. Unlike C++, Rust's safety features are built in to the language from day one and they intentionally make the unsafe C patterns hard while making the safe patterns easy. Rust also takes a very different approach, eschewing the object-oriented programming model in favor of its traits system. When used effectively, Rust traits provide a powerful programming model while avoiding unnecessary heap allocation and virtual function tables in most cases. In Android, a recent Google blog post showed that using Rust has reduced the number of memory safety vulnerabilities vs. the preexisting C and C++ code.

While Rust's track record certainly is impressive, that doesn't automatically make it the right choice for graphics drivers or for a project like Mesa. Adding another language like Rust increases the mental overhead for developers as they have to switch between languages as they move through the code-base. Unlike C++ which is mostly an extension of C and tries to use similar syntax when possible, Rust is a very different language from either C or C++. The core principles are all there but the syntax and many of the paradigms are different.

Rust is also much harder than C++ to integrate into a large pre-existing C or C++ code-base like Mesa because Rust code must be compiled into a separate static library from the C/C++ code. Every code boundary between C/C++ code and Rust has to go through a sanitized C interface, like you would between shared libraries. Often this involves an extra build step of generating the Rust half of the interface with bindgen. Each code boundary also introduces mental overhead as Rust concepts have no direct mapping to C/C++, and so every call to a C function from Rust must happen within an unsafe block. If you're not careful it's easy for these interface mismatches to leak everywhere and for your Rust code to be littered with unsafe blocks. This is fairly easy for for a self-contained portion of the project, such as a back-end shader compiler, as the interfaces can be made quite small. For an actual driver, however, which has to tightly integrate with the pre-existing C code, it's substantially more difficult.

Last year, rusticl merged into Mesa which is a new OpenCL state tracker written in Rust. In many ways, rusticl was the ideal first Rust project for Mesa because it's fairly small and isolated within the code-base. Writing a Vulkan driver in Rust will be more difficult because it needs to integrate tightly with the shared Vulkan runtime component in Mesa which is all written in C. With the right abstractions, however, I think writing a Rust Vulkan driver should be possible and bring with it all the safety and convenience benefits of Rust.

Goals

Before getting into specifics, we should layout a few goals or guiding principals for our Rust wrappers.

1. Rust idiomatic

While aesthetics is always subjective, we want our wrappers to be as clean and as Rust-like as possible. Rust is a beautiful language that makes coding fun and we don't want to lose that due to bad abstractions. It's tempting to diverge from common patterns when developing new abstractions because of certain details of your use-case. While it may seem like a good idea at the time, these little divergences often make it harder to use the abstractions than necessary.

Keeping in line with Rust's idioms also enhances our ability to take advantage of Rust's safety features. The entire goal of using Rust is to make the language's safety features work for you and let the compiler prove your error and concurrency handling correct. The last thing we want is for a Rust-based driver to have unsafe blocks scattered throughout. That would defeat most of the point of using Rust in the first place.

2. Track mutability and lifetimes

This sounds obvious when talking about Rust but it's a significant divergence from our C interfaces. The const keyword in C and C++ is fraught with issues and many C developers consider it to be mostly useless. As such, we use const sparingly in most of Mesa. C and C++ have no real concept of object lifetimes and certainly can't enforce anything. Rust's concepts of mutability and lifetimes, on the other hand, are core to the language and essential for its memory and concurrency safety features. If we want Rust to work for us, we need to accurately map the documented constraints and invariants of the C API onto those concepts.

Fortunately, Vulkan already has similar concepts and they map fairly well to Rust. Whenever a Vulkan object is created or destroyed, the parent object is passed to both the create and destroy functions. This ensures that the lifetime of the child object is contained within the lifetime of the parent object. In Rust terms, this means it's safe for the child object to contain a non-mutable reference to the parent object. Vulkan also defines which entrypoint parameters must be externally synchronized by the client. Externally synchronized objects follow the same rules as mutable references in Rust.

3. Use Vulkan allocators whenever possible

Each time a new Vulkan object is created, the client can optionally provide an allocator object which is a set of callbacks and a data pointer. This allows the application to provide arena allocators which may be faster than the native thread-safe allocator used by the driver. Also, when working with certain threading frameworks popular with game engines, there may be restrictions on heap allocation. The engine's arena allocators are aware of these restrictions while the system allocator may not be.

Unfortunately, Rust does not currently have a concept of using other allocators. Rust RFC #1398 has an initial implementation that exists in Rust nightly builds and there is a working group which is actively discussing the feature. However, the working group has been active for almost three years now, so it will probably be a while before the feature becomes stable.

Building an abstraction

A few weeks ago I finally started turning some of the ideas in my brain into code and posted a draft merge request which proposes some initial Rust wrappers for Vulkan in Mesa. So far, I've mostly focused on allocating Vulkan objects and integrating them with the Mesa Vulkan runtime's object model.

Allocation

The first problem to solve is allocation. Technically, we could use Rust's built-in containers and ignore the Vulkan allocators entirely. However, that would lead to all sorts of CTS warnings and is generally not kind to applications. To allow for Vulkan allocators, I wrote a new VkBox type which takes a Vulkan allocator and memory scope. It's not an exact drop-in replacement for std::boxed::Box but it should be close enough that any mismatches should be manageable.

Subclassing Vulkan runtime base objects

The more difficult problem is deriving from the base object structs in the shared Vulkan runtime in Mesa. This is required to take advantage of the wealth of shared Vulkan code in Mesa.

Embedding the parent structure in a Rust struct is easy enough. As long as the struct is declared #[repr(C)] and the base struct is the first member, the pointers will line up just like in C. What's more difficult is getting initialization right. The typical initialization pattern in C looks something like this:

struct nvk_image {
   struct vk_image vk;

   /* NVK-specific fields */
};

VKAPI_ATTR VkResult VKAPI_CALL
nvk_CreateImage(VkDevice _device,
                const VkImageCreateInfo *pCreateInfo,
                const VkAllocationCallbacks *pAllocator,
                VkImage *pImage)
{
   VK_FROM_HANDLE(nvk_device, device, _device);
   struct nvk_image *image;
   VkResult result;

   image = vk_zalloc2(&device->vk.alloc, pAllocator, sizeof(*image), 8,
                      VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
   if (!image)
      return vk_error(device, VK_ERROR_OUT_OF_HOST_MEMORY);

   result = vk_image_init(&device->vk, &image->vk, pCreateInfo);
   if (result != VK_SUCCESS) {
      vk_free2(&device->vk.alloc, pAllocator, image);
      return result;
   }

   /* Set NVK-specific fields */

   *pImage = nvk_image_to_handle(image);

   return VK_SUCCESS;
}

This doesn't work in Rust because Rust requires all memory to have defined contents at all times and as such doesn't allow for partial initialization of structures. One option would be to implement new() for each base object something like this:

impl vk_image {
    pub fn new(
        dev: &vk_device,
        pCreateInfo: *const VkImageCreateInfo,
    ) -> Result<vk_image>, VkResult> {
        unsafe {
            let mut image = std::mem::zeroed::<vk_image>();
            match vk_image_init(dev, &mut image, pCreateInfo) {
                VK_SUCCESS => Ok(image),
                err => Err(err),
            }
        }
    }
}

Unfortunately, many of our vk_foo base objects don't support moving via mem::replace(). This is usually due to our liberal use of intrusive linked lists in Mesa. In Rust terminology, this is equivalent to not supporting the Unpin trait. Instead, we have to allocate memory for the object, call vk_foo_init(), and then never move that object. This entirely rules out the usual Rust assignment pattern for object initialization.

To work around this, the VkBox struct has an in-place initialization constructor which takes a callback:

pub unsafe fn new_cb<F: FnOnce(NonNull<T>) -> VkResult>(
    alloc: &VkAllocationCallbacks,
    scope: VkSystemAllocationScope,
    f: F,
) -> Result<VkBox<T>>;

and a VkBox<vk_image> can be created as follows:

let image = unsafe {
    VkBox::new_cb(alloc, scope, |ptr| {
        vk_image_init(&dev.vk, ptr, pCreateInfo)
    })
}?;

If the init callback returns VK_SUCCESS, an Ok(VkBox) will be returned. Otherwise, the backing memory for the VkBox will be freed and Err(err) will be returned. We can use the Rust ? operator to neatly handle the error condition.

Unfortunately, this doesn't fully solve the problem. Most driver Vulkan objects are going to be more than just the vk_foo base struct. There will be other member data as well. Also, there is a vk_image_finish function which must be called on drop.

To handle both of these problems, I've implemented two more variations on Box: VkObjBox and VkBaseObjBox, both of which are generalized on two type parameters: the base type and the driver type. Semantically, the VkBaseObjBox looks like VkBox<vk_foo> with an extra bit for handling drop properly. The VkObjBox struct looks like a box that contains two things: the vk_foo base struct and the driver's Foo struct. The base struct is initialized with a callback similarly to what we did above while the driver Foo struct is initialized via the usual Rust paradigms. The final pattern looks something like this:

struct Image {
    /* Driver image fields */
}

fn create_image(
    dev: &VkObj<vk_device, Device>,
    info: *const VkImageCreateInfo,
    alloc: *const VkAllocationCallbacks,
) -> Result<VkObjBox<vk_image, Image>> {
    let vk = unsafe {
        VkObjBaseBox::new2_cb(
            &dev.vk().alloc,
            alloc,
            vk_image_finish,
            &|vk: NonNull| {
                vk_image_init(dev.vk_ptr(), vk.as_ptr(), info)
            },
        )
    }?;

    /* Stuff which may use vk */

    Ok(VkObjBox::new(vk, Image {
        /* Driver image fields */
    }))
}

While a bit clunky, it's not really any worse than 10 lines of C code we have to type every time to do the same thing. Once you've constructed the initial VkBaseObjBox inside the unsafe block, the remaining code is safe from there, including creating the VkObjBox. If an error is encountered, the drop handler on VkBaseObjBox will properly tear down the partially initialized image object by calling vk_image_finish. Once the VkObjBox is created, both halves of the image object are fully initialized and can be used safely.

The intention is that we'll eventually auto-generate a wrapper function for vkCreateImage() that looks something like this:

unsafe extern "C" fn drv_CreateImage(
    device: VkDevice,
    pCreateInfo: *const VkImageCreateInfo,
    pAllocator: *const VkAllocationCallbacks,
    pImage: *mut VkImage,
) -> VkResult {
    let device = VkObj::<vk_device, Device>::ref_for_handle(device);

    match create_image(device, pCreateInfo, pAllocator) {
       Ok(image) => unsafe {
            *pImage = VkObjBox::into_handle(image);
            VK_SUCCESS
        }
        Err(err) => err,
    }
}

unsafe extern "C" fn drv_DestroyImage(
    device: VkDevice,
    image: VkImage,
    pAllocator: *const VkAllocationCallbacks,
) {
    unsafe { VkObjBox::<vk_image, Image>::from_handle(image, pAllocator) };
}

Conclusion:

What I've done so far barely scratches the surface of what we need to do to fully wrap the Mesa Vulkan runtime into something Rust-friendly. As I have time, I hope to do further experiments and write more blog posts about my findings. So far, for the problems I've looked at, I've been able to find solutions which will keep the majority of the driver code ergonomic and, most importantly, safe.

How to write a Vulkan driver in 2022

Improving Vulkan graphics state tracking in Mesa

A look at Vulkan extensions in Venus

How to write a Vulkan driver in 2022

Improving Vulkan graphics state tracking in Mesa

A look at Vulkan extensions in Venus

Comments (1)

ACGN:
Feb 16, 2023 at 02:37 PM

Typo?
s/The core principals/The core principles/

Reply to this comment

Reply to this comment

Add a Comment

Search the newsroom

Latest Blog Posts

Building a Board Farm for Embedded World

27/06/2024

With each board running a mainline-first Linux software stack and tested in a CI loop with the LAVA test framework, the Farm showcased Collabora's…

Smart audio filters with WirePlumber 0.5

26/06/2024

WirePlumber 0.5 arrived recently with many new and essential features including the Smart Filter Policy, enabling audio filters to automatically…

The latest on cmtp-responder, a permissively-licensed MTP responder implementation

12/06/2024

Part 3 of the cmtp-responder series with a focus on USB gadgets explores several new elements including a unified build environment with…

A roadmap for VirtIO Video on ChromeOS: part 3

06/06/2024

The final installment of a series explaining how Collabora is helping shape the video virtualization story for Chromebooks with a focus…

Hacking on the PipeWire GStreamer elements

05/06/2024

Last week I attended the GStreamer spring hackfest in Thessaloniki to work on the PipeWire GStreamer elements and connect with the community.

Transforming speech technology with WhisperLive

28/05/2024

The world of AI has made leaps and bounds from what It once was, but there are still some adjustments required for the optimal outcome.…

About Collabora

Whether writing a line of code or shaping a longer-term strategic software development plan, we'll help you navigate the ever-evolving world of Open Source.

한국의 국기 한국어 버전의 Collabora.com 보기