3D Object Detection with Cube R-CNN

3D Object Detection with Cube R-CNN is described in Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild and released in this repository

Overview

A description of the model and its architecture are shown below

Training Data

Cube R-CNN was trained on Omni3D, a large benchmark for 3D object detection in the wild.

Demo: Inference on Any Image

The model detects objects in 3D from a single image. There are 50 distinct object categories including car, truck, chair, table, cabinet, books, and many more. The model assumes known focal length for the image in order to predict the right metric scale. However, users can provide any focal length and will get predictions on a "relative" scale.

For example, we can predict 3D objects from COCO images with a user-defined focal length of 4.0, as shown below

The above output is produced by our demo

python demo/demo.py \
--config cubercnn://omni3d/cubercnn_DLA34_FPN.yaml \
--input-folder "datasets/image_inputs" \
--threshold 0.25 --focal 4.0 --display \
MODEL.WEIGHTS cubercnn://omni3d/cubercnn_DLA34_FPN.pth \
OUTPUT_DIR output/demo

Checkpoints

You can find model checkpoints in the original model zoo.

Intended Use and Limitations

Cube R-CNN is a data-driven method trained on an annotated dataset, Omni3D. The purpose of the project is to advance 3D computer vision and 3D object recognition. The dataset contains a pedestrian category, which we acknowledge as a potential issue in the case of unethical applications of our model.

The limitations of our approach are: erroneous predictions especially for far away objects, mistakes in predicting rotations and depth. Our evaluation reports an analysis for various depths and object sizes to better understand performance.