We're hiring!

Carlafox: Towards reliable open-source 3D perception

Vineet Suryan avatar

Vineet Suryan
April 05, 2023

Share this post:

Reading time:

Carlafox: Towards reliable open-source 3D perception


Key takeaways

Data Collection

Training 3D Perception Models on CARLA dataset

A closer look into SFA3D

Optimising with TensorRT



Extracting precise 3D object information is one of the prime goals for comprehensive scene understanding. However, labeling errors are common in present open-source 3D perception datasets, which could have impactful consequences. To tackle this issue, we used Carlafox to automatically generate an error-free synthetic dataset for 3D perception.

Deep 3D object detectors may become confused during training due to the inherent ambiguity in ground-truth annotations of 3D bounding boxes brought on by occlusions, missing, or manual annotation errors, which lowers the detection accuracy. However, existing methods overlook such issues to some extent and treat the labels as deterministic. It is possible to create enormous datasets without cost by using a virtual simulation with known labels. According to research, using both simulated and real data helps AI models become more accurate. Our results show that simulated data can significantly reduce the amount of training on real data required to achieve satisfactory levels of accuracy.

Key Takeaways

  • Simulated data is becoming more crucial than ever in autonomous driving applications, both for testing pre-trained models and for developing new models.
  • It is imperative that the underlying dataset contains a variety of driving scenarios and that the simulated sensor readings closely resemble real-world sensors for the neural network models to generalize to real-world applications.
  • Carlafox is able to export high-quality, synchronized LIDAR and camera data with object annotations, and offers a configuration to accurately reflect a real-life sensor array.
  • Furthermore, we use the Carlafox tool to generate a dataset consisting of 10,000+ samples and use this dataset to train SFA3D, a fast open-source 3D object detection neural network.
  • For testing, we integrate the model back into Carlafox and visualize it against the ground truth data from the simulator.

Data Collection

Carlafox, a web-based CARLA visualizer, substantially demystifies the arduous task of synthetic dataset generation for 3D object detection. We use Carlafox to set up sensor configurations, create diverse weather conditions, and generate data from different maps in the KITTI format. One of the advantages of the dataset is that the open-source CARLA simulator was used to recreate the same LiDAR and camera configurations used to generate the original KITTI data.

The objective is to offer a challenging dataset to assess and enhance approaches in complicated vision tasks, such as 3D object detection. In total, the dataset has 12807 cars, 10252 pedestrians, and 11624 cyclists. The dataset contains 2D and 3D bounding box annotations of the classes: Car, Pedestrian, and Cyclist and contains both LIDAR and camera sensor data, as well as the generation of sensor calibration matrices.

Figure 1: 3D bounding boxes projected on LiDAR point cloud.


Figure 2: 2D & 3D bounding boxes with occlusion and unique id.

Training 3D Perception Models on CARLA dataset

Due to its numerous applications across various industries, including robotics and autonomous driving, 3D object detection has been gaining more attention from businesses and academia. LiDAR sensors are commonly used in robotics and autonomous vehicles to collect 3D scene data as sparse and erratic point clouds, which has shown to serve as helpful cues for 3D scene perception and comprehension.

We trained quite a few LiDAR-based networks, namely PointRCNN, PVRCNN, and SFA3D, and a Multimodal(RGB + LiDAR) 3D object detection model, i.e. MVXNet on the CARLA synthetic dataset but fine-tuned only one of these i.e., SFA3D, mainly because it is faster and uses less memory without much loss in performance. Although, any other model could have performed better if optimized and tuned further than just training a baseline, as shown in the following panel.

A closer look into SFA3D

Super Fast and Accurate 3D object detection is based on 3D LiDAR Point Clouds. The ResNet-based Keypoint Feature Pyramid Network (KFPN), builds the backbone of the detector and was proposed in RTM3D.

The model takes a bird's-eye-view (BEV) map as input. The height, intensity, and density of 3D LiDAR point clouds are used to encode the BEV map. On the other hand, it outputs a heatmap for the main center, the center offset, the heading angle, the dimensions of the object, and the z coordinate.

Figure 3: SFA3D predictions on held-out test set.

As for the loss functions, the focal loss is used for the main center heatmap, and l1 loss for the heading angle (yaw). It employs balanced l1 loss for the z coordinate and the three dimensions (height, width, and length). We trained the model for a total of 300 epochs by setting equal weights for the aforementioned loss components using a cosine LR scheduler with an initial learning rate of 0.001 and a batch size of 32 (on two RTX 2080Ti). Refer to the following wandb panels for results with SFA3D experiments on all towns and Town01, respectively.

Optimising with TensorRT

TensorRT enables developers to optimize inference by leveraging CUDA libraries. TensorRT supports both INT8 and FP16 post-training quantization, which greatly reduces application latency and is required for many real-time services, as well as autonomous and embedded applications.

As a first step, we convert the SFA3D PyTorch model to ONNX, and use the ONNX parser to convert ONNX model to TensorRT. We could also bypass the parser and directly convert from PyTorch to TensorRT, doing so would require us to write the SFA3D network in TensorRT network-definition API, which would be time intensive and result in negligible speed benefit but could be more efficient on an embedded device like a Jetson Nano.

In addition, we examined benchmarks across multiple frameworks like TVM and ONNX to ensure that TensorRT is the best performing. From the above results, it is clear that TensorRT aids in obtaining higher throughput on the same hardware. Furthermore, quantization to FP16 boosts performance even more. On RTX2080Ti, TensorRT may be the most efficient solution for SFA3D, but it's also possible that another framework, such as Apache TVM, performs better on a different device with the same or another network; thus, results may vary depending on the hardware.


To make it easier to compare the model's predictions with CARLA's ground truth, we incorporated the model into Carlafox and made them available in a separate Foxglove image panel. For more details on the Carlafox visualizer, please refer to this dedicated blog post.


Numerous open-source resources paved the way for us to accomplish our work. In the future, we plan to finetune the trained models with the official KITTI dataset. Because of the expenses associated with acquiring real-world data, the use of synthetic data for training machine learning models has grown in popularity in recent years. This is especially true in the case of autonomous driving due to the rigorous requirement of generalizability to a wide range of driving conditions, so we hope our findings help others in research and development.

If you have questions or ideas on how to leverage synthetic data for 3D perception, join us on our Gitter #lounge channel or leave a comment in the comment section.

Comments (0)

Add a Comment

Allowed tags: <b><i><br>Add a new comment:

Search the newsroom

Latest Blog Posts

Automatic regression handling and reporting for the Linux Kernel


In continuation with our series about Kernel Integration we'll go into more detail about how regression detection, processing, and tracking…

Almost a fully open-source boot chain for Rockchip's RK3588!


Now included in our Debian images & available via our GitLab, you can build a complete, working BL31 (Boot Loader stage 3.1), and replace…

What's the latest with WirePlumber?


Back in 2022, after a series of issues were found in its design, I made the call to rework some of WirePlumber's fundamentals in order to…

DRM-CI: A GitLab-CI pipeline for Linux kernel testing


Continuing our Kernel Integration series, we're excited to introduce DRM-CI, a groundbreaking solution that enables developers to test their…

Persian Rug, Part 4 - The limitations of proxies


This is the fourth and final part in a series on persian-rug, a Rust crate for interconnected objects. We've touched on the two big limitations:…

How to share code between Vulkan and Gallium


One of the key high-level challenges of building Mesa drivers these days is figuring out how to best share code between a Vulkan driver…

Open Since 2005 logo

We use cookies on this website to ensure that you get the best experience. By continuing to use this website you are consenting to the use of these cookies. To find out more please follow this link.

Collabora Ltd © 2005-2024. All rights reserved. Privacy Notice. Sitemap.