What You See Is What You Get:
Exploiting Visibility for 3D Object Detection

CVPR 2020

Peiyun Hu1 Jason Ziglar2 David Held1 Deva Ramanan1,2
1Robotics Institute, Carnegie Mellon University
2Argo AI

code
teaser
What is a good representation for 3D sensor data? Here is a bird-eye-view LiDAR scene with highlighted regions that may contain an object. Many contemporary deep networks process 3D point clouds, making it hard to distinguish the two regions (left). But depth sensors provide more than 3D points - they provide estimates of freespace in between the sensor and the measured 3D point. We visualize freespace by raycasting (right), where green is free and white is unknown. In this paper, we introduce deep 3D networks that leverage freespace to significantly improve 3D object detection accuracy.
teaser PDF / BibTeX

Abstract

Recent advances in 3D sensing have created unique challenges for computer vision. One fundamental challenge is finding a good representation for 3D sensor data. Most popular representations (such as PointNet) are proposed in the context of processing truly 3D data (e.g. points sampled from mesh models), ignoring the fact that 3D sensored data such as a LiDAR sweep is in fact 2.5D. We argue that representing 2.5D data as collections of (x, y, z) points fundamentally destroys hidden information about freespace. In this paper, we demonstrate such knowledge can be efficiently recovered through 3D raycasting and readily incorporated into batch-based gradient learning. We describe a simple approach to augmenting voxel-based networks with visibility: we add a voxelized visibility map as an additional input stream. In addition, we show that visibility can be combined with two crucial modifications common to state-of-the-art 3D detectors: synthetic data augmentation of virtual objects and temporal aggregation of LiDAR sweeps over multiple time frames. On the NuScenes 3D detection benchmark, we show that, by adding an additional stream for visibility input, we can significantly improve the overall detection accuracy of a state-of-the-art 3D detector.

Qualitative results

How to interpret the following visualizations

labels frame

Under different conditions

examples

3D detection results with overlayed freespace

github

Code

Our code is available at https://github.com/peiyunh/wysiwyg.

Acknowledgments

This work was supported by the CMU Argo AI Center for Autonomous Vehicle Research.