KLI

검색

Ulsan Univ. Repository Thesis General Graduate School Computer Engineering & Information Technology 2. Theses (Ph.D)

LiDAR-Camera-Based High-Performance 3-D Object Detection for Autonomous Driving

Metadata Downloads

Abstract: With the rapid development of autonomous vehicles, three-dimensional (3-D) object detection has become more important, whose purpose is to perceive the size and accurate location of objects in the real world. Currently, an intelligent car is equipped with at least one LiDAR apparatus, one radar and one RGB camera. Note that radar is now widely used in companies, however, only a few researchers use it to validate a new algorithm. Hence, our works focus on LiDAR and camera sensors for 3D object detection. LiDAR is employed to collect the surrounding 3-D data, referred to as a point cloud, and the camera is used to capture a high-resolution RGB image. The two devices provide two important and different types of data. However, it is non-trivial to highly efficiently and quickly extract and fuse the features of the point cloud and RGB image for high-performance 3-D object detection.
The work on this manuscript focus on the tasks of detecting 3-D objects with deep learning methods. First, we revisit the related works for LiDAR-Camera-based 3-D object detection (Chapter 2). Second, three attention mechanisms (Chapter 3) are used to enhance the global and local representative features. Third, we propose to fuse LiDAR and camera features in an early stage to do 3-D object detection (Chapter 4).
In Chapter 2, we first revisit the related works: the networks, the frameworks for object detection, the fusion methods for multi-sensor 3-D object detection, and the related dataset and metrics.
In Chapter 3, this thesis presents a novel one-stage 3D object detection framework based on three-attention mechanisms, called TAO3D, which takes raw point cloud and RGB image as inputs. Three attention mechanisms are used to obtain discriminative features. First, the height attention (HA) mechanism is introduced as an auxiliary attention module before the RGB image is fed into a network. Second, a
global feature attention (GFA) mechanism models the long-range dependencies in the channel and spatial dimensions simultaneously at the feature extraction phase. Finally, a region of interest attention (RA) mechanism weights RGB image ROIs and BEV ROIs using two learnable parameters.
In Chapter 4, this thesis first presents an early-fusion method to exploit both LiDAR and camera data for fast 3D object detection with only one backbone, achieving a good balance between accuracy and efficiency. Specifically, it proposes a novel point-wise fusion strategy between point clouds and RGB images. The proposed method directly extracts pointwise features from the raw RGB image based on the raw point cloud first. Then, it fuses the two pointwise features and feeds them into a 3D neural network. The structure has only one backbone to extract features, making the proposed model much faster than state-of-the-art LiDAR and camera fusion methods.
The presented methods achieve a new breakthrough in terms of both accuracy and speed on the KITTI 3-D object detection benchmark suite.