August 13, 2020
Following our recent presentation at Open Source Summit North America, "Living on the Edge: Pure Open Source AI Stack with Panfrost, GStreamer and Tensorflow Lite" (video available here), many showed interest in learning more about solving real-world problems with computer vision. With that in mind, here is a new blog series, on computer vision, object detection, and building a system on the edge.
An active area in the field of computer vision is object detection, where the goal is to not only localize objects of interest within an image but also assign a label to each of these objects of interest. Considerable recent successes in the area of object detection stem from modern advances in deep learning, particularly leveraging deep convolutional neural networks. Much of the initial focus was on improving accuracy, leading to increasingly more complex object detection networks such as SSD, R-CNN, Mask R-CNN, and other extended variants of these networks. While such networks demonstrated state-of-the-art object detection performance, they were very challenging, if not impossible, to deploy on edge and mobile devices due to computational and memory constraints. This greatly limits the widespread adoption for a wide range of applications such as robotics, video surveillance, autonomous driving where local embedded processing is required.
Developing a computer vision system can be accomplished in 3 steps:
The process of collecting data depends on the type of project. The data can be collected from various sources such as a file, database, sensors, etc. Often the collected data can't be used directly to perform any analysis, because of lacks in the data, extremely large values, unorganized text data or noisy data. Therefore, data preparation is often the first step:
As an example, suppose we want to predict when cars are about to cut into a lane. In order to acquire labeled data, we can observe when a car changes from a neighboring lane into the main lane and then rewind the video feed to label that a car is about to cut into the lane.
Deep learning systems often learn to imitate their training datasets. Gradient descent is one of the most popular algorithms to perform optimization and one of the most common ways to optimize neural networks. A typical workflow looks like:
Model Evaluation is an integral part of the model development process. It helps to find the best model that represents our data and how well the chosen model performs on unseen data.
To improve the model we tune the hyper-parameters; parameter that determines the network structure (number of neurons in the network, network activation functions) or training parameter (gradient descent learning rate, adding parameters like momentum in the weight update rule). Tuning those parameters is an inevitable and important step to obtain better performance. Methods like GridSearch and RandomizedSearch can be used to navigate through the different parameters.
Object detection is the task of image classification with localization, although an image may contain multiple objects that require localization and classification. In recent years, the performance of object detection has dramatically improved due to the emergence of object detection networks that utilize the structure of CNNs.
Some examples of object detection include:
→ In the next part, we will go over some of the recent developments in the object detection domain and explain their details.
This demo performs object detection or semantic segmentation on
the first 10 seconds of the specified video. Note we use default
values which are not necessarily good for the given problem, so
it is suggested that the values used be tailored to the task at
The models are trained on the COCO dataset (Common Objects in Context), which is a large-scale object detection, segmentation, and captioning dataset, consisting of 80k training and 40k validation images with 80 classes.
Note it can take a couple of minutes before a result is shown.
This summer, Christoph Haag and I had the pleasure of taking part in Google Summer of Code as mentors for xrdesktop, the Open Source project…
Earlier this year, from January to April 2021, I worked on adding support for stateless decoders for GStreamer as part of a multimedia internship…
In our previous post, we presented a project backed by INVEST-AI which introduces a multi-stage neural network-based solution. Now let's…
Initiated as a joint effort by the Google Chrome OS team and Collabora, the recent KernelCI hackfest brought the addition of new tests including…
There's a lot that has happened in the world of Zink since my last update, so let's see if I can bring you up to date on the most important…
Panfrost, the open source driver for Arm Mali, now supports OpenGL ES 3.1 on both Midgard (Mali T760 and newer) and Bifrost (Mali G31, G52,…