We're hiring!
*

Understanding computer vision & AI, part 1

Marcus Edel avatar

Marcus Edel
August 13, 2020

Share this post:

Reading time:

Following our recent presentation at Open Source Summit North America, "Living on the Edge: Pure Open Source AI Stack with Panfrost, GStreamer and Tensorflow Lite" (video available here), many showed interest in learning more about solving real-world problems with computer vision. With that in mind, here is a new blog series, on computer vision, object detection, and building a system on the edge.
 

An active area in the field of computer vision is object detection, where the goal is to not only localize objects of interest within an image but also assign a label to each of these objects of interest. Considerable recent successes in the area of object detection stem from modern advances in deep learning, particularly leveraging deep convolutional neural networks. Much of the initial focus was on improving accuracy, leading to increasingly more complex object detection networks such as SSD, R-CNN, Mask R-CNN, and other extended variants of these networks. While such networks demonstrated state-of-the-art object detection performance, they were very challenging, if not impossible, to deploy on edge and mobile devices due to computational and memory constraints. This greatly limits the widespread adoption for a wide range of applications such as robotics, video surveillance, autonomous driving where local embedded processing is required.



Semantic Segmentation
Object Detection
automative

automative

automative

automative

One of the most important types of ostacles to detect are vehicles, traffic signs and other traffic participants.


Deep Learning - Computer Vision Pipeline

Developing a computer vision system can be accomplished in 3 steps:


1
Collect Data
2
Model Training
3
Evaluation

 

Collect Data

The process of collecting data depends on the type of project. The data can be collected from various sources such as a file, database, sensors, etc. Often the collected data can't be used directly to perform any analysis, because of lacks in the data, extremely large values, unorganized text data or noisy data. Therefore, data preparation is often the first step:

automative

  1. Gathering the Data — Organizing the data as files or inside a database.
  2. Exploration and Sanitization — This involves exploring and visualizing the data to map out the most interesting features as well as removing any outliers or errors that may potentially skew/create bias within the model.
  3. Transformation — This includes data normalization, translation, encoding, etc. so that it can be used to train the model.

As an example, suppose we want to predict when cars are about to cut into a lane. In order to acquire labeled data, we can observe when a car changes from a neighboring lane into the main lane and then rewind the video feed to label that a car is about to cut into the lane.

 

Model Training

Deep learning systems often learn to imitate their training datasets. Gradient descent is one of the most popular algorithms to perform optimization and one of the most common ways to optimize neural networks. A typical workflow looks like:

automative

  1. Define the neural network, in the most basic form a neural network is built out of neurons that are connected to other neurons via modifiable weighted connections. The information (the image data) is transmitted via the connections, from one set of neurons to another set which gives a response (detected objects). The goal of the training step (see below) is to update the weights to increase the performance of the network to detect and identify objects. Weights meanwhile are the relative strength of the different connections between neurons after model training, which can be likened to a human brain that has learned for example how to multiply numbers or to identify various objects.

  1. Process input through the network. In the context of object detection, the input is a collection of images, as defined in the data collectin step.
  2. Compute the loss, how far is the output from being correct. In the context of object detection, the loss measures how good the network is at detecting the objects in an image.
  3. Propagate gradients back into the network’s parameters. The gradient is just made up of the derivatives of all the inputs (images) concatenated in a vector.
  4. Update the weights of the network, e.g. using a simple update rule: weight = weight - learningrate * gradient. Using the defined weights from the first step and calculated gradients from step 5. The learningrate parameter is a gradient descent specific value that has to be tailored to the task at hand.

 

Evaluation

Model Evaluation is an integral part of the model development process. It helps to find the best model that represents our data and how well the chosen model performs on unseen data.

To improve the model we tune the hyper-parameters; parameter that determines the network structure (number of neurons in the network, network activation functions) or training parameter (gradient descent learning rate, adding parameters like momentum in the weight update rule). Tuning those parameters is an inevitable and important step to obtain better performance. Methods like GridSearch and RandomizedSearch can be used to navigate through the different parameters.

automative



Object Detection

Object detection is the task of image classification with localization, although an image may contain multiple objects that require localization and classification. In recent years, the performance of object detection has dramatically improved due to the emergence of object detection networks that utilize the structure of CNNs.

Some examples of object detection include:

  • Drawing a bounding box and labeling each object in a street scene.
  • Drawing a bounding box and labeling each object in a VR environment.
  • Drawing a bounding box and labeling each object in a factory.

In the next part, we will go over some of the recent developments in the object detection domain and explain their details.

Demo

This demo performs object detection or semantic segmentation on the first 10 seconds of the specified video. Note we use default values which are not necessarily good for the given problem, so it is suggested that the values used be tailored to the task at hand.

The models are trained on the COCO dataset (Common Objects in Context), which is a large-scale object detection, segmentation, and captioning dataset, consisting of 80k training and 40k validation images with 80 classes.

Note it can take a couple of minutes before a result is shown.


  Object Detection

  Semantic Segmentation


Metadata
Duration: 00:00:10.02
Bitrate: 1369 kb/s
Stream: #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D)

 

Related Posts

Related Posts


Add a Comment






Allowed tags: <b><i><br>Add a new comment:


Search the newsroom

Latest Blog Posts

Automatic regression handling and reporting for the Linux Kernel

14/03/2024

In continuation with our series about Kernel Integration we'll go into more detail about how regression detection, processing, and tracking…

Almost a fully open-source boot chain for Rockchip's RK3588!

21/02/2024

Now included in our Debian images & available via our GitLab, you can build a complete, working BL31 (Boot Loader stage 3.1), and replace…

What's the latest with WirePlumber?

19/02/2024

Back in 2022, after a series of issues were found in its design, I made the call to rework some of WirePlumber's fundamentals in order to…

DRM-CI: A GitLab-CI pipeline for Linux kernel testing

08/02/2024

Continuing our Kernel Integration series, we're excited to introduce DRM-CI, a groundbreaking solution that enables developers to test their…

Persian Rug, Part 4 - The limitations of proxies

23/01/2024

This is the fourth and final part in a series on persian-rug, a Rust crate for interconnected objects. We've touched on the two big limitations:…

How to share code between Vulkan and Gallium

16/01/2024

One of the key high-level challenges of building Mesa drivers these days is figuring out how to best share code between a Vulkan driver…

Open Since 2005 logo

We use cookies on this website to ensure that you get the best experience. By continuing to use this website you are consenting to the use of these cookies. To find out more please follow this link.

Collabora Ltd © 2005-2024. All rights reserved. Privacy Notice. Sitemap.