August 13, 2020
Following our recent presentation at Open Source Summit North America, "Living on the Edge: Pure Open Source AI Stack with Panfrost, GStreamer and Tensorflow Lite" (video available here), many showed interest in learning more about solving real-world problems with computer vision. With that in mind, here is a new blog series, on computer vision, object detection, and building a system on the edge.
An active area in the field of computer vision is object detection, where the goal is to not only localize objects of interest within an image but also assign a label to each of these objects of interest. Considerable recent successes in the area of object detection stem from modern advances in deep learning, particularly leveraging deep convolutional neural networks. Much of the initial focus was on improving accuracy, leading to increasingly more complex object detection networks such as SSD, R-CNN, Mask R-CNN, and other extended variants of these networks. While such networks demonstrated state-of-the-art object detection performance, they were very challenging, if not impossible, to deploy on edge and mobile devices due to computational and memory constraints. This greatly limits the widespread adoption for a wide range of applications such as robotics, video surveillance, autonomous driving where local embedded processing is required.
Developing a computer vision system can be accomplished in 3 steps:
The process of collecting data depends on the type of project. The data can be collected from various sources such as a file, database, sensors, etc. Often the collected data can't be used directly to perform any analysis, because of lacks in the data, extremely large values, unorganized text data or noisy data. Therefore, data preparation is often the first step:
As an example, suppose we want to predict when cars are about to cut into a lane. In order to acquire labeled data, we can observe when a car changes from a neighboring lane into the main lane and then rewind the video feed to label that a car is about to cut into the lane.
Deep learning systems often learn to imitate their training datasets. Gradient descent is one of the most popular algorithms to perform optimization and one of the most common ways to optimize neural networks. A typical workflow looks like:
Model Evaluation is an integral part of the model development process. It helps to find the best model that represents our data and how well the chosen model performs on unseen data.
To improve the model we tune the hyper-parameters; parameter that determines the network structure (number of neurons in the network, network activation functions) or training parameter (gradient descent learning rate, adding parameters like momentum in the weight update rule). Tuning those parameters is an inevitable and important step to obtain better performance. Methods like GridSearch and RandomizedSearch can be used to navigate through the different parameters.
Object detection is the task of image classification with localization, although an image may contain multiple objects that require localization and classification. In recent years, the performance of object detection has dramatically improved due to the emergence of object detection networks that utilize the structure of CNNs.
Some examples of object detection include:
→ In the next part, we will go over some of the recent developments in the object detection domain and explain their details.
This demo performs object detection or semantic segmentation on
the first 10 seconds of the specified video. Note we use default
values which are not necessarily good for the given problem, so
it is suggested that the values used be tailored to the task at
The models are trained on the COCO dataset (Common Objects in Context), which is a large-scale object detection, segmentation, and captioning dataset, consisting of 80k training and 40k validation images with 80 classes.
Note it can take a couple of minutes before a result is shown.
Monado now has initial support for 6DoF ("inside-out") tracking for devices with cameras and an IMU! Three free and open source SLAM/VIO…
When developing an application or a library, it is very common to want to run it without installing it, or to install it into a custom prefix…
An incredible amount has changed in Mesa and in the Vulkan ecosystems since we wrote the first Vulkan driver in Mesa for Intel hardware…
Every file system used in production has tools to try to recover from system crashes. To provide a better infrastructure for those tools,…
The PipeWire project made major strides over the past few years, bringing shiny new features, and paving the way for new possibilities in…
Over the past 18 months, we have been on a roller-coaster ride developing futex2, a new set of system calls. As part of this effort, the…