Labeling tools are great, but what about quality checks?

Labeling tools are great, but what about quality checks?

Jakub Piotr Cłapa
January 17, 2023

Share this post:

Reading time:

Labeling tools are great, but what about quality checks?

Introduction

Key takeaways

The QA problem in data labeling

How hard can it be?

MLfix in action

Slicing the data in many ways

All right, but is it worth doing?

Outlook

Modern datasets contain hundreds of thousands to millions of labels that must be kept accurate. In practice, some errors in the dataset average out and can be ignored – systematic biases transfer to the model. After quick initial wins in areas where abundant data is readily available, deep learning needs to become more data efficient to help solve difficult business problems. In the words of deep learning pioneer Andrew Ng:

In many industries where giant data sets simply don’t exist, I think the focus has to shift from big data to good data. – Andrew Ng: Unbiggen AI - IEEE Spectrum

Over the course of 2022, we worked on an open-source tool that combines novel unsupervised machine-learning pipelines with a new user interface concept that, together, help annotators and machine-learning engineers identify and filter out label errors.

Key takeaways

Even carefully curated AI datasets have errors that can be spotted and fixed to improve the accuracy of resulting models.
Existing labeling tools do not have good support for doing quality assurance.
Fixing around 3% of label errors improves the model performance by 2%, although exact results will depend on the dataset and task.
Thanks to MLfix, even a big dataset like the Mapillary Traffic Sign Dataset could be fully verified and fixed by a single person over a few days of work.

The QA problem in data labeling

Labeling is a difficult cognitive task and accurate labels require a serious Quality Assurance (QA) process. Most existing labeling tools (both commercial and Open Source) have only minimal support for review. Frequently the QA process is more difficult (and expensive!) than initial labeling since you are forced to use an interface optimized for drawing bounding boxes to verify if all labels were assigned correctly. Here is the process described by a leading annotation service provider:

Annotations are reviewed four times in order to confirm accuracy. Two annotators label a given object, a supervisor then checks the quality of their work. – keymakr, a leading annotation provider

How hard can it be?

Can you spot the mistake in the following photo? I can't blame you. This is hard because it requires expert knowledge and a lot of cognitive resources to read all the labels, remember what each of these signs should look like, and finally spot the ones that are incorrect.

What if instead we show the exact same data like this:

Now it's not so difficult to spot the one speed limit sign that does not fit with the rest (the 30km/h speed limit). It requires you to only keep a single type of object in your working memory at a time and taps into the intuitive skill of spotting items that stand out from the rest. It also takes an order of magnitude less time.

This insight directly led to the creation of MLfix. Using the streamlined interface lets us perform the QA process more than 10 times faster and avoid missing even 30% of the errors.

MLfix in action

The video below shows a user quickly scrolling through 40 objects belonging to 5 classes and finding 6 mislabeled examples.

You can also try it yourself on a selection of 60km/h speed limit signs coming from the Mapillary Traffic Sign Dataset. Note that depending on demand the live demo can take some time to start.

Slicing the data in many ways

MLfix can be used as a standalone tool, but it can also be embedded directly into Jupyter notebooks that are used by data scientists to prepare and train deep learning networks. Thanks to that, MLfix can tap into all the metadata you have about your dataset and also utilize networks you've trained to help you with the QA process. You can:

Slice the images based on the ground truth label:
Show visually similar images together (based on LPIPS metric or a novel sorting network pretrained in an unsupervised manner):
Show the output of your model (sorted by loss) on the validation set images to fish out mistakes. Here we are looking at the ground-truth class other-sign that the model believed to be the do-not-enter sign; we can see that it was right most of the time:

All right, but is it worth doing?

We made a comparison on the Mapillary Traffic Sign Dataset, which is an extensive dataset of 206 thousand traffic signs divided into 401 classes. Among these, there are 6,400 annotations of speed limit signs, and with MLfix, in about 30 minutes we could find and remove 3% of them that were erroneous. In other words, we corrected 0.11% of all the labels in the whole dataset.

We trained image classification models (based on the ResNet50 backbone) on both the original and fixed datasets 20 times and averaged out the accuracy metrics. After fixing the dataset, the model error rate went down from 7.28% to 7.05%, and the error rate for speed signs improved by almost 2% from 10.42% to 8.49%) which is a significant improvement for a very modest amount of effort. More information about these experiments (including the code to reproduce the results) can be found in the GitHub repo - jpc/mlfix-mapillary-traffic-signs. The accuracy histograms show that the improvement is consistent over multiple training runs:

Outlook

Our work could not have been possible without the help of countless open-source resources. We hope MLfix will help the annotations community to build the next generation of innovative technology.

If you have questions or ideas, join us on our Gitter #lounge channel or leave a comment in the comment section.

Machine Learning with Etnaviv and OpenCL

Carlafox, an open-source web-based CARLA visualizer

Open source machine learning for video compression

Machine Learning with Etnaviv and OpenCL

Carlafox, an open-source web-based CARLA visualizer

Open source machine learning for video compression

Comments (0)

Add a Comment

Search the newsroom

Latest Blog Posts

PipeWire workshop 2025: Updates on video transport, Rust efforts, TSN networking, and Bluetooth support

03/07/2025

As part of the activities Embedded Recipes in Nice, France, Collabora hosted a PipeWire workshop/hackfest, an opportunity for attendees…

Coccinelle for Rust progress report

25/06/2025

In collaboration with Inria, the French Institute for Research in Computer Science and Automation, Tathagata Roy shares the progress made…

Linux Media Summit 2025 recap

23/06/2025

Last month in Nice, active media developers came together for the annual Linux Media Summit to exchange insights and tackle ongoing challenges…

Constructor acquires, destructor releases

09/06/2025

In this final article based on Matt Godbolt's talk on making APIs easy to use and hard to misuse, I will discuss locking, an area where…

What if C++ had decades to learn?

21/05/2025

In this second article of a three-part series, I look at how Matt Godbolt uses modern C++ features to try to protect against misusing an…

Unleashing gst-python-ml: Python-powered ML analytics for GStreamer pipelines

12/05/2025

Powerful video analytics pipelines are easy to make when you're well-equipped. Combining GStreamer and Machine Learning frameworks are the…

About Collabora

Whether writing a line of code or shaping a longer-term strategic software development plan, we'll help you navigate the ever-evolving world of Open Source.

한국의 국기 한국어 버전의 Collabora.com 보기

Bandeira de Português Acesse Collabora.com em Português

Learn more

+44 1223 362967

+1 514 667 2499

contact@collabora.com

Our website only uses a strictly necessary session cookie provided by our CMS system. To find out more please follow this link.

Labeling tools are great, but what about quality checks?

Labeling tools are great, but what about quality checks?

Introduction

Key takeaways

The QA problem in data labeling

How hard can it be?

MLfix in action

Slicing the data in many ways

All right, but is it worth doing?

Outlook

Key takeaways

The QA problem in data labeling

How hard can it be?

MLfix in action

Slicing the data in many ways

All right, but is it worth doing?

Outlook

Related Posts

Machine Learning with Etnaviv and OpenCL

Carlafox, an open-source web-based CARLA visualizer

Open source machine learning for video compression

Related Posts

Machine Learning with Etnaviv and OpenCL

Carlafox, an open-source web-based CARLA visualizer

Open source machine learning for video compression

Comments (0)

Add a Comment

Search the newsroom

Latest Blog Posts

PipeWire workshop 2025: Updates on video transport, Rust efforts, TSN networking, and Bluetooth support

Coccinelle for Rust progress report

Linux Media Summit 2025 recap

Constructor acquires, destructor releases

What if C++ had decades to learn?

Unleashing gst-python-ml: Python-powered ML analytics for GStreamer pipelines

About Collabora

Learn more