Marcus Edel
November 11, 2025
Reading time:
Fonts are everywhere—from posters and product labels to presentation slides and user interfaces. Recognizing them automatically from images has long been a challenging problem due to the fine-grained visual differences between typefaces. In 2015, the DeepFont system introduced a new approach using convolutional neural networks (CNNs), along with the AdobeVFR dataset. Today, we revisit this problem with modern data and modern architecture—leveraging FasterViT-2, a high-performance hybrid vision transformer, to set a new standard in font classification.
The original AdobeVFR dataset offered a strong foundation for visual font recognition. Collabora has since expanded and modernized it to reflect today’s typographic landscape and usage in screen-based environments.
![]() |
| Overview of different fonts from the dataset. |
These changes make the dataset more reflective of the visual variability found in real-world design and screen environments.
To fully leverage the new dataset, we employed FasterViT-2, a hybrid vision transformer designed for both speed and accuracy.
FasterViT-2 combines CNN-like fast local feature learning with transformer-style global modeling through its Hierarchical Attention (HAT) mechanism. It introduces carrier tokens to summarize local windows and enable efficient long-range dependencies—critical for capturing the nuanced differences between fonts.
This combination allows FasterViT-2 to outperform both traditional CNNs and earlier ViT-based models in both performance and practicality.
We trained FasterViT-2 on the extended AdobeVFR dataset and evaluated its performance on the newly curated real-world test set.
Example font classification, showing the top-5 prediction.
| Model | Top-1 Accuracy | Top-5 Accuracy | Inference Throughput (img/sec, 4090) |
|---|---|---|---|
| DeepFont (2015) | 62.3% | 81.4% | ~450 |
| FasterViT-2 (ours) | 87.4% | 92.1% | 3161 |
This establishes FasterViT-2 as the new state-of-the-art for real-world font classification, with a significant accuracy improvement and a substantial gain in inference speed—making it viable for both offline and real-time applications.
While our primary goal was to advance font recognition, FasterViT-2 has also proven useful in broader applications.
As part of Collabora’s winning submission to the ICME 2025 Video Super-Resolution Challenge—specifically Track 3: Screen Sharing Videos—FasterViT-2 serves as a semantic font recognition module. Its predictions help guide super-resolution models to better preserve and enhance text during upscaling, improving the clarity of screen-shared content such as code editors, terminal windows, and slide presentations. We will write more about the challenge soon.
This integration shows that font classification can go beyond recognition, serving as a perceptual cue for more intelligent, content-aware visual enhancement.
By combining a significantly updated dataset with the power of FasterViT-2, we’ve set a new benchmark for visual font recognition. This work highlights how modern architectural advances can tackle fine-grained classification tasks with greater precision and efficiency.
FasterViT-2 is not only a state-of-the-art model for font identification, but it is also a stepping stone for improving visual quality in downstream tasks, such as screen content upscaling and intelligent design tools.
23/03/2026
PanVK’s new framebuffer abstraction for Mali GPUs removes OpenGL-specific constraints, unlocking more flexible tiled rendering features…
02/03/2026
Get the recap of Nicolas Frattaroli's FOSDEM talk detailing Rockchip’s mainline progress, including Vulkan 1.4 and NPU support as a vital…
02/12/2025
As an active member of the freedesktop community, Collabora was busy at XDC 2025. Our graphics team delivered five talks, helped out in…
24/11/2025
LE Audio introduces a modern, low-power, low-latency Bluetooth® audio architecture that overcomes the limitations of classic Bluetooth®…
17/11/2025
Collabora’s long-term leadership in KernelCI has delivered a completely revamped architecture, new tooling, stronger infrastructure, and…
11/11/2025
Collabora extended the AdobeVFR dataset and trained a FasterViT-2 font recognition model on millions of samples. The result is a state-of-the-art…