Tathagata Roy
June 25, 2025
Reading time:
In collaboration with Inria (the French Institute for Research in Computer Science and Automation), Tathagata Roy shares the progress made over the past year on the CoccinelleForRust project, co-sponsored by Collabora.
Coccinelle is a tool for automatic program matching and transformation that was originally developed for making large-scale changes to the Linux kernel source code (i.e., C code). Matches and transformations are driven by user-specific transformation rules in the form of abstracted patches, referred to as semantic patches. As the Linux kernel—and systems software more generally—is starting to adopt Rust, we are developing Coccinelle for Rust to make the power of Coccinelle available to Rust codebases.
This diff illustrates a patch in which the type_of
function was being called before confirming that the target item’s trait was implemented. A straightforward CfR-based fix is to find every expression of the form
expression.type_of(impl_id)
and replace it with
expression.type_of(impl_id).subst_identity()
There are roughly fifty occurrences of this pattern in the diff, so updating them all by hand would be quite tedious. The accompanying Semantic Patch to perform this transformation automatically is:
@change_ty_of@
expression exp1, impl_id;
@@
-exp1.type_of(impl_id)
+exp1.type_of(impl_id).subst_identity()
While the above example could be achieved with a complicated and unreadable regex pattern, things can quickly become more complex.
The following patch changes a function signature and all the related calls.
@change_sig@
expression x;
identifier fname, sname;
@@
impl sname {
...
pub(crate) fn fname(&self,
- guard: &RevocableGuard<'_>
) -> Result { ... }
...
}
@modify_calls@
expression x, guard;
identifier change_sig.fname;
@@
x.fname(
...
- guard
)
Rule change_sig
finds all the occurrences of functions which take a guard
of type &RevocableGuard<'_>
and removes that parameter. The rule modify_calls
updates the calls to that method.
This semantic patch can be used on a whole code-base where once a guard
variable is no longer needed, it can be removed. It can also serve as an integration test to check that no such code is present in new pull requests.
RefCell
s and hash tables...
, +
, -
, disjunctions)....
)AU
.Note: ... when
not yet supported.
Computational Tree Logic (CTL) is the heart of Coccinelle, which takes semantic patches and generalizes them over Rust files. Prior to using this engine, CfR used an ad-hoc method for matching patterns of code. This engine is the same as the one used for Coccinelle for C, with a few minor changes. Most of the changes were idiomatic but to the same effect. More information on the engine and its language (CTL-VW) can be found in the POPL Paper. With a standard engine, each step of the matching process can be logged, allowing us to learn and reuse the same design patterns from Coccinelle for C, including critical test cases.
The expression-dominated nature of Rust makes the matching and transformation process a bit different from that of C. For example, in the following semantic patch:
@@
expression e
@@
-foo(e);
for C, foo(e)
would be guaranteed to be present as an immediate child of a block, i.e.:
{ // <- start of a block
foo(e); // <- this statement
}
Blocks in C are present only in specific parts of the Abstract Syntax Tree, like in function definitions, loops, or conditional blocks. However, in Rust, blocks are expressions, which can appear anywhere an expression is allowed. For example:
while { f(&mut a); a > 1 } {
//
}
This makes searching much more computationally intensive. Thus, several optimizations were implemented in CfR to address this problem, including replacing lists in the CTL engine with RefCell
s and hash tables.
While developing the parser for SmPL, we decided not to reinvent the wheel by writing a parser for the Rust language from scratch. SmPL contains custom syntax such as dots (...
), disjunctions, and modifiers (+
and -
). In the latest version, we parse only these constructs ourselves and hand off the rest to Rust Analyzer.
A rule refers to a set of changes given an environment. Multiple rules can inherit values from one another to transform code in different parts of a file.
Used in SmPL as ...
, ellipses connect two blocks of code:
@@
expression q;
@@
drop_queue(q);
...
pop(q);
This is implemented in CTL using the AU
term.
Disjunctions allow for conditional matching:
f1(10);
( // <--- disjunction start
foo(1);
|
bar(10);
) // <--- disjunction end
f2();
Matches either:
f1(10);
foo(1);
f2();
or
f1(10);
bar(10);
f2();
Transforming macros posed a problem because of their non-standard nature. For example, should the following semantic patch match
@@
expression e;
@@
foo!(
- e
+ 2
);
this code?
foo!(a b c)
AND
foo!(a)
To avoid discrepancies, we support only macros which look like function calls. For example, foo!(a, b, c)
or foo![a; b; c]
.
rustfmt
and it is then compared with the formatting from the original code. This way only the transformed code is formatted without messing up the original file formatting. Note: Pretty printing is still a work-in-progress for rust macros as they are notoriously hard to deal with and rustfmt
thus leaves them alone.Our current aim is to bring Coccinelle For Rust at par with Coccinelle For C in terms of basic functionalities. In the following months we are going on to work on:
If you want to try out CoccinelleForRust it is available on Gitlab. Please feel free to reach out to us at the email addresses on our website CoccinelleForRust, we would be happy to answer your questions!
25/06/2025
In collaboration with Inria, the French Institute for Research in Computer Science and Automation, Tathagata Roy shares the progress made…
23/06/2025
Last month in Nice, active media developers came together for the annual Linux Media Summit to exchange insights and tackle ongoing challenges…
09/06/2025
In this final article based on Matt Godbolt's talk on making APIs easy to use and hard to misuse, I will discuss locking, an area where…
21/05/2025
In this second article of a three-part series, I look at how Matt Godbolt uses modern C++ features to try to protect against misusing an…
12/05/2025
Powerful video analytics pipelines are easy to make when you're well-equipped. Combining GStreamer and Machine Learning frameworks are the…
06/05/2025
Gustavo Noronha helps break down C++ and shows how that knowledge can open up new possibilities with Rust.
Comments (0)
Add a Comment