Carlafox: Towards reliable open-source 3D perception

Carlafox: Towards reliable open-source 3D perception

Vineet Suryan
April 05, 2023

Share this post:

Reading time:

Carlafox: Towards reliable open-source 3D perception

Overview

Key takeaways

Data Collection

Training 3D Perception Models on CARLA dataset

A closer look into SFA3D

Optimising with TensorRT

Testing

Outlook

Extracting precise 3D object information is one of the prime goals for comprehensive scene understanding. However, labeling errors are common in present open-source 3D perception datasets, which could have impactful consequences. To tackle this issue, we used Carlafox to automatically generate an error-free synthetic dataset for 3D perception.

Deep 3D object detectors may become confused during training due to the inherent ambiguity in ground-truth annotations of 3D bounding boxes brought on by occlusions, missing, or manual annotation errors, which lowers the detection accuracy. However, existing methods overlook such issues to some extent and treat the labels as deterministic. It is possible to create enormous datasets without cost by using a virtual simulation with known labels. According to research, using both simulated and real data helps AI models become more accurate. Our results show that simulated data can significantly reduce the amount of training on real data required to achieve satisfactory levels of accuracy.

Key Takeaways

Simulated data is becoming more crucial than ever in autonomous driving applications, both for testing pre-trained models and for developing new models.
It is imperative that the underlying dataset contains a variety of driving scenarios and that the simulated sensor readings closely resemble real-world sensors for the neural network models to generalize to real-world applications.
Carlafox is able to export high-quality, synchronized LIDAR and camera data with object annotations, and offers a configuration to accurately reflect a real-life sensor array.
Furthermore, we use the Carlafox tool to generate a dataset consisting of 10,000+ samples and use this dataset to train SFA3D, a fast open-source 3D object detection neural network.
For testing, we integrate the model back into Carlafox and visualize it against the ground truth data from the simulator.

Data Collection

Carlafox, a web-based CARLA visualizer, substantially demystifies the arduous task of synthetic dataset generation for 3D object detection. We use Carlafox to set up sensor configurations, create diverse weather conditions, and generate data from different maps in the KITTI format. One of the advantages of the dataset is that the open-source CARLA simulator was used to recreate the same LiDAR and camera configurations used to generate the original KITTI data.

The objective is to offer a challenging dataset to assess and enhance approaches in complicated vision tasks, such as 3D object detection. In total, the dataset has 12807 cars, 10252 pedestrians, and 11624 cyclists. The dataset contains 2D and 3D bounding box annotations of the classes: Car, Pedestrian, and Cyclist and contains both LIDAR and camera sensor data, as well as the generation of sensor calibration matrices.

Figure 1: 3D bounding boxes projected on LiDAR point cloud.

Figure 2: 2D & 3D bounding boxes with occlusion and unique id.

Training 3D Perception Models on CARLA dataset

Due to its numerous applications across various industries, including robotics and autonomous driving, 3D object detection has been gaining more attention from businesses and academia. LiDAR sensors are commonly used in robotics and autonomous vehicles to collect 3D scene data as sparse and erratic point clouds, which has shown to serve as helpful cues for 3D scene perception and comprehension.

We trained quite a few LiDAR-based networks, namely PointRCNN, PVRCNN, and SFA3D, and a Multimodal(RGB + LiDAR) 3D object detection model, i.e. MVXNet on the CARLA synthetic dataset but fine-tuned only one of these i.e., SFA3D, mainly because it is faster and uses less memory without much loss in performance. Although, any other model could have performed better if optimized and tuned further than just training a baseline, as shown in the following panel.

A closer look into SFA3D

Super Fast and Accurate 3D object detection is based on 3D LiDAR Point Clouds. The ResNet-based Keypoint Feature Pyramid Network (KFPN), builds the backbone of the detector and was proposed in RTM3D.

The model takes a bird's-eye-view (BEV) map as input. The height, intensity, and density of 3D LiDAR point clouds are used to encode the BEV map. On the other hand, it outputs a heatmap for the main center, the center offset, the heading angle, the dimensions of the object, and the z coordinate.

Figure 3: SFA3D predictions on held-out test set.

As for the loss functions, the focal loss is used for the main center heatmap, and l1 loss for the heading angle (yaw). It employs balanced l1 loss for the z coordinate and the three dimensions (height, width, and length). We trained the model for a total of 300 epochs by setting equal weights for the aforementioned loss components using a cosine LR scheduler with an initial learning rate of 0.001 and a batch size of 32 (on two RTX 2080Ti). Refer to the following wandb panels for results with SFA3D experiments on all towns and Town01, respectively.

Optimising with TensorRT

TensorRT enables developers to optimize inference by leveraging CUDA libraries. TensorRT supports both INT8 and FP16 post-training quantization, which greatly reduces application latency and is required for many real-time services, as well as autonomous and embedded applications.

As a first step, we convert the SFA3D PyTorch model to ONNX, and use the ONNX parser to convert ONNX model to TensorRT. We could also bypass the parser and directly convert from PyTorch to TensorRT, doing so would require us to write the SFA3D network in TensorRT network-definition API, which would be time intensive and result in negligible speed benefit but could be more efficient on an embedded device like a Jetson Nano.

In addition, we examined benchmarks across multiple frameworks like TVM and ONNX to ensure that TensorRT is the best performing. From the above results, it is clear that TensorRT aids in obtaining higher throughput on the same hardware. Furthermore, quantization to FP16 boosts performance even more. On RTX2080Ti, TensorRT may be the most efficient solution for SFA3D, but it's also possible that another framework, such as Apache TVM, performs better on a different device with the same or another network; thus, results may vary depending on the hardware.

Testing

To make it easier to compare the model's predictions with CARLA's ground truth, we incorporated the model into Carlafox and made them available in a separate Foxglove image panel. For more details on the Carlafox visualizer, please refer to this dedicated blog post.

Outlook

Numerous open-source resources paved the way for us to accomplish our work. In the future, we plan to finetune the trained models with the official KITTI dataset. Because of the expenses associated with acquiring real-world data, the use of synthetic data for training machine learning models has grown in popularity in recent years. This is especially true in the case of autonomous driving due to the rigorous requirement of generalizability to a wide range of driving conditions, so we hope our findings help others in research and development.

If you have questions or ideas on how to leverage synthetic data for 3D perception, join us on our Gitter #lounge channel or leave a comment in the comment section.

Labeling tools are great, but what about quality checks?

Carlafox, an open-source web-based CARLA visualizer

Machine Learning with Etnaviv and OpenCL

Labeling tools are great, but what about quality checks?

Carlafox, an open-source web-based CARLA visualizer

Machine Learning with Etnaviv and OpenCL

Comments (0)

Add a Comment

Search the newsroom

Latest Blog Posts

PipeWire workshop 2025: Updates on video transport, Rust efforts, TSN networking, and Bluetooth support

03/07/2025

As part of the activities Embedded Recipes in Nice, France, Collabora hosted a PipeWire workshop/hackfest, an opportunity for attendees…

Coccinelle for Rust progress report

25/06/2025

In collaboration with Inria, the French Institute for Research in Computer Science and Automation, Tathagata Roy shares the progress made…

Linux Media Summit 2025 recap

23/06/2025

Last month in Nice, active media developers came together for the annual Linux Media Summit to exchange insights and tackle ongoing challenges…

Constructor acquires, destructor releases

09/06/2025

In this final article based on Matt Godbolt's talk on making APIs easy to use and hard to misuse, I will discuss locking, an area where…

What if C++ had decades to learn?

21/05/2025

In this second article of a three-part series, I look at how Matt Godbolt uses modern C++ features to try to protect against misusing an…

Unleashing gst-python-ml: Python-powered ML analytics for GStreamer pipelines

12/05/2025

Powerful video analytics pipelines are easy to make when you're well-equipped. Combining GStreamer and Machine Learning frameworks are the…

About Collabora

Whether writing a line of code or shaping a longer-term strategic software development plan, we'll help you navigate the ever-evolving world of Open Source.

한국의 국기 한국어 버전의 Collabora.com 보기

Bandeira de Português Acesse Collabora.com em Português

Learn more

+44 1223 362967

+1 514 667 2499

contact@collabora.com

Our website only uses a strictly necessary session cookie provided by our CMS system. To find out more please follow this link.

Carlafox: Towards reliable open-source 3D perception

Carlafox: Towards reliable open-source 3D perception

Overview

Key takeaways

Data Collection

Training 3D Perception Models on CARLA dataset

A closer look into SFA3D

Optimising with TensorRT

Testing

Outlook

Key Takeaways

Data Collection

Training 3D Perception Models on CARLA dataset

A closer look into SFA3D

Optimising with TensorRT

Testing

Outlook

Related Posts

Labeling tools are great, but what about quality checks?

Carlafox, an open-source web-based CARLA visualizer

Machine Learning with Etnaviv and OpenCL

Related Posts

Labeling tools are great, but what about quality checks?

Carlafox, an open-source web-based CARLA visualizer

Machine Learning with Etnaviv and OpenCL

Comments (0)

Add a Comment

Search the newsroom

Latest Blog Posts

PipeWire workshop 2025: Updates on video transport, Rust efforts, TSN networking, and Bluetooth support

Coccinelle for Rust progress report

Linux Media Summit 2025 recap

Constructor acquires, destructor releases

What if C++ had decades to learn?

Unleashing gst-python-ml: Python-powered ML analytics for GStreamer pipelines

About Collabora

Learn more