We're hiring!
*

Oxidizing bmap-tools: rewriting a Python project in Rust

Rafael Garcia Ruiz avatar

Rafael Garcia Ruiz
March 03, 2023

Share this post:

Reading time:

Back in September, I joined Collabora as an intern to work on Rust-related projects for six months. It's been a great experience and I would recommend it to anyone who is passionate about FOSS and wants to work in an inclusive environment with very skilled and supporting people!

Over the internship, my goal was to continue to "oxidise" bmaptool, a tool for creating the block map (bmap) for a file and copying files using the block map. Oxidising means to rewrite some code, in this case a project written in Python, into Rust code. Usually a project is oxidised into Rust because of many reasons, the main usually being memory safety. Mozilla has an interesting article on Oxidation if you would like to learn more about general reasons of why Rust is so great.

Of course, rewriting a project in Rust is pointless without good reasons: if a solution to a problem already exists, there should be no need to rewrite it :-). With this project, the main goal was to remove Python dependencies and instead create a statically linked binary which should save disk space and in future allow the bmap sparse file format to be used in other Rust projects. Another reason for the project was for me to gain experience in some more advanced Rust topics and have some fun during the process!

We decided to call the new project bmap-rs and host the source code on GitHub under a permissive open-source licence to allow the wider community to benefit from the project and make contributions easier.

More than just copying

bmaptool is a generic tool for flashing sparse images to a block device or file using a custom file format called bmap. The idea is that large files, like raw system image files, can be copied or flashed a lot faster and more reliably with bmaptool than with traditional tools, like dd or cp because the bmap file format allows you to only flash the used parts of the system image and also verifies each written block. The tool's main use is to flash system images into block devices, but it can also be used for general image flashing purposes. The feature we were mainly interested in was the copy subcommand:

bmaptool copy <input> <output>

The input parameter can be a local file or a remote URL, the output parameter can be a local file or a block device. We wanted bmap-rs to be able to execute that particular job in the short term, even though having other features of bmaptool would also be useful later. Then the goal was to be able to execute the following command, with the same functionality as the original project:

bmap-rs copy <input> <output>

As the project had already been started by a colleague, the first step for me would be to import the existing project into GitHub and prepare it as an open source project: setup a CI pipeline to make sure things build, a licence file, correct README and everything else needed for it to be open to contributions. For that moment on I handled bug reports, feature requests, pull-requests and updating dependencies as one of the team. From this I learnt about the never-ending responsibilities of maintaining open-source software!

The development roadmap of the internship was as follows:

  • Parse the XML .bmap format from a local file

  • Stream image contents from an HTTP/HTTPS source

  • Load the bmap file from the same HTTP/HTTPS source as the image

  • Stream a gzip compressed image file over HTTPS and verify the written checksums against the parsed bmap

  • Write the image contents to a normal file

  • Write the image contents to a block device (e.g. SD card or USB flash drive)

Bmap file

But what exactly is a bmap file and why is it so useful for this purpose? Well, it is an XML file which contains a list of mapped areas plus some additional information about the file it was created from. For example:

  • SHA256 checksum of the bmap file itself

  • SHA256 checksum of the mapped areas

  • the original file size

  • amount of mapped data

Having each mapped area's checksum, once each part is copied to the destination we can check that the information has been copied correctly and not corrupt. Having the data mapped allows to avoid reading or copping "holes", meaning a bunch of zeroes, which allows us to only copy the parts of the image which are used. Here's an example of a bmap file.

Remote input

Even if there was already an initial project containing the copying algorithm for local copy, it wasn't able to write into block devices or copy remote files. Allow copying into block devices turned out to be a simple fix. But on the other hand, allowing remote input was a bigger issue. To allow an HTTP request from the code, it had to be able to wait for the response so we needed to create an asynchronous context for that feature. At the same time, it also needed to fetch the bmap file remotely and accept a URL as input argument on the command line.

Other enhancements have been made along the way, like the implementation of a progress bar and the ability to copy an image without using a bmap file, which can be useful in cases where you have an image without holes. These features are not required but would most likely improve the experience of using it and allow bmaptools to be fully replaces. Finally, we published the crates on Crates.io, Rust's package registry. The crates bmap-rs and bmap-parser have been published and are now ready for anyone to use them and try them out!

What's next?

The intended context for this to work was to integrate it into the tests which run on real hardware in Collabora's LAVA lab. Some tests boot a minimal Linux system from a network filesystem, then use bmaptool to flash an image to the target block device, for instance the SD card or eMMC. The device can then reboot into the flashed filesystem and run tests on the image.

Using bmaptool in this way increases the size of the NFS image since it includes the Python runtime and other libraries. In comparison, using Rust allows us to generate a small statically-linked binary to do the same and even offers the ability to make further improvements in future, for instance booting to a EFI binary to complete the flashing rather than booting a complete Linux system.

Once it's integrated with LAVA it will result in an efficiency enhancement across all projects that use bmap files, resulting in a benefit for other teams in Collabora. Knowing that is the most rewarding feeling about this achievement.

There are still some features of bmaptools that could be interesting for bmap-rs to have like the create command for generating the bmap of a file and implementing some more decompression algorithms.

Wrapping Up

During the internship, I've had to constantly learn new skills and challenge myself. For the first time I've acted as the maintainer of a project, keeping it up to date and managing it using an open-source open-first philosophy. I've learned to use Rust from scratch and ended using some advanced features of the language including async features. Participating in the development of bmap-rs and acting as it's maintainer during this time has allowed me to improve on my Rust skills and overall open source contributing abilities and confidence.

This experience has also helped me to gain knowledge about the profession itself. I feel more oriented towards what kind of engineer I want to become, which areas do I intend to investigate more and which abilities do I want to obtain in the future. I see clearer than ever that I want my work to be oriented towards Open Source, so it can be reused and shared, helping many others. Likewise, I'm looking forward to finishing my degree and rejoin the team more equipped to make a better impact.

I'm really grateful to my mentor Christopher Obbard and also Gustavo Noronha. Their implication and support during this experience has helped me a lot. I appreciate a lot how Sjoerd Simons and Ryan Gonzalez has review my code with their Rust language experience and knowledge. I'm sure all I've learned during this internship is going to help me make a better impact with my future contributions to open source and seeing how my work can be useful to other people really gave me a sense of fulfilment.

Comments (2)

  1. Mikko:
    Mar 04, 2023 at 02:14 PM

    "Here can be a local or remote file and can be a file or a block device" seems to be missing < input > and output tags.

    Reply to this comment

    Reply to this comment

    1. Christopher Obbard:
      Mar 06, 2023 at 01:51 PM

      Thank you, we have updated the blog post.

      Reply to this comment

      Reply to this comment


Add a Comment






Allowed tags: <b><i><br>Add a new comment:


Search the newsroom

Latest Blog Posts

Re-converging control flow on NVIDIA GPUs - What went wrong, and how we fixed it

25/04/2024

While I managed to land support for two extensions, implementing control flow re-convergence in NVK did not go as planned. This is the story…

Automatic regression handling and reporting for the Linux Kernel

14/03/2024

In continuation with our series about Kernel Integration we'll go into more detail about how regression detection, processing, and tracking…

Almost a fully open-source boot chain for Rockchip's RK3588!

21/02/2024

Now included in our Debian images & available via our GitLab, you can build a complete, working BL31 (Boot Loader stage 3.1), and replace…

What's the latest with WirePlumber?

19/02/2024

Back in 2022, after a series of issues were found in its design, I made the call to rework some of WirePlumber's fundamentals in order to…

DRM-CI: A GitLab-CI pipeline for Linux kernel testing

08/02/2024

Continuing our Kernel Integration series, we're excited to introduce DRM-CI, a groundbreaking solution that enables developers to test their…

Persian Rug, Part 4 - The limitations of proxies

23/01/2024

This is the fourth and final part in a series on persian-rug, a Rust crate for interconnected objects. We've touched on the two big limitations:…

Open Since 2005 logo

We use cookies on this website to ensure that you get the best experience. By continuing to use this website you are consenting to the use of these cookies. To find out more please follow this link.

Collabora Ltd © 2005-2024. All rights reserved. Privacy Notice. Sitemap.