August 13, 2020
Following our recent presentation at Open Source Summit North America, "Living on the Edge: Pure Open Source AI Stack with Panfrost, GStreamer and Tensorflow Lite" (video available here), many showed interest in learning more about solving real-world problems with computer vision. With that in mind, here is a new blog series, on computer vision, object detection, and building a system on the edge.
An active area in the field of computer vision is object detection, where the goal is to not only localize objects of interest within an image but also assign a label to each of these objects of interest. Considerable recent successes in the area of object detection stem from modern advances in deep learning, particularly leveraging deep convolutional neural networks. Much of the initial focus was on improving accuracy, leading to increasingly more complex object detection networks such as SSD, R-CNN, Mask R-CNN, and other extended variants of these networks. While such networks demonstrated state-of-the-art object detection performance, they were very challenging, if not impossible, to deploy on edge and mobile devices due to computational and memory constraints. This greatly limits the widespread adoption for a wide range of applications such as robotics, video surveillance, autonomous driving where local embedded processing is required.
Developing a computer vision system can be accomplished in 3 steps:
The process of collecting data depends on the type of project. The data can be collected from various sources such as a file, database, sensors, etc. Often the collected data can't be used directly to perform any analysis, because of lacks in the data, extremely large values, unorganized text data or noisy data. Therefore, data preparation is often the first step:
As an example, suppose we want to predict when cars are about to cut into a lane. In order to acquire labeled data, we can observe when a car changes from a neighboring lane into the main lane and then rewind the video feed to label that a car is about to cut into the lane.
Deep learning systems often learn to imitate their training datasets. Gradient descent is one of the most popular algorithms to perform optimization and one of the most common ways to optimize neural networks. A typical workflow looks like:
Model Evaluation is an integral part of the model development process. It helps to find the best model that represents our data and how well the chosen model performs on unseen data.
To improve the model we tune the hyper-parameters; parameter that determines the network structure (number of neurons in the network, network activation functions) or training parameter (gradient descent learning rate, adding parameters like momentum in the weight update rule). Tuning those parameters is an inevitable and important step to obtain better performance. Methods like GridSearch and RandomizedSearch can be used to navigate through the different parameters.
Object detection is the task of image classification with localization, although an image may contain multiple objects that require localization and classification. In recent years, the performance of object detection has dramatically improved due to the emergence of object detection networks that utilize the structure of CNNs.
Some examples of object detection include:
→ In the next part, we will go over some of the recent developments in the object detection domain and explain their details.
This demo performs object detection or semantic segmentation on
the first 10 seconds of the specified video. Note we use default
values which are not necessarily good for the given problem, so
it is suggested that the values used be tailored to the task at
The models are trained on the COCO dataset (Common Objects in Context), which is a large-scale object detection, segmentation, and captioning dataset, consisting of 80k training and 40k validation images with 80 classes.
Note it can take a couple of minutes before a result is shown.
Wayland is still lacking proper consideration for color management & support for high dynamic range (HDR) imagery. However, a group of renegade…
This week marks two years since the OpenGL implementation on Vulkan was initially announced. Since then, and especially over the past few…
Since our previous update on Panfrost, the open source stack for Arm's Mali Midgard and Bifrost GPUs, we've focused on taking our driver…
The concept of a remote internship may raise some doubts, or even red flags, for many students, as would remote jobs for professionals.…
GStreamer relies on various 2D font rendering and layout libraries such as Pango and Cairo to generate text for the Pango plugin, which…
In this second part of this blog post series on Linux kernel initcalls, we'll go deeper into implementation, with a look at the colorful…