KLI

검색

Ulsan Univ. Repository Thesis General Graduate School Computer Engineering & Information Technology 2. Theses (Ph.D)

A High-efficiency Real-time Facial Emotion Recognizer Using Deep Learning

Metadata Downloads

Abstract: Facial emotion recognition is a method to localize and predict human facial expressions. It identifies the texture of face elements in an interaction process. Besides, this domain is nonverbal communication that conveys facial indications to show feelings. Facial expression recognition work is a trending study field in human-robot interaction. This research plays an important role in supporting assistive robot performance. A real-time system is required to increase the robot's abilities and prevent misunderstandings caused by dynamic personal activities. Moreover, a practical application requests real-time performance from the computer vision technique on low-cost computing devices. This work in this manuscript focuses on human facial emotion recognition in a real-world scenario that estimates the location of faces and fast identifies their expressions. The network efficiency does not ignore the predicted performance of each module. Therefore, the research in this thesis proposes high performance and efficiency using deep learning architecture for detecting face area and classifying the emotion from a live streams video. Each module works separately and operates smoothly by achieving real-time speed. A complete real-time facial emotion recognizer consists of two-stage CNN-based architecture containing a face detector and a facial expression classification. The proposed face detection plays an essential role in filtering the face area from the background. It also avoids the prediction error of the single-label classification system when there is more than one face in an image. It utilizes several shallow layers of convolution that form a lightweight architecture implemented as a real-time detector. However, this does not neglect its precision for localizing faces of varying sizes and poses. The proposed face detector contains two main parts, a backbone to discriminate specific components and multi-level detection to estimate the multiple-scale faces location. It also utilizes several techniques to improve the training performance, such as balanced loss function and tweaking of parameters configuration. The facial expression classification module categorizes seven fundamental human emotions: neutral, fear, surprise, disgust, sad, happy, and anger. This system also focuses on the efficiency of computational and parameters to support lightweight and fast integrated systems. An efficient facial expression framework proposes a sequential attention network to enhance the backbone performance. It includes three modules, global attention to highlight the global context of features, channel attention, and dimension attention, which concentrate on the relationship of local elements in the channel and spatial dimension. Besides, It offers the Efficient Partial Transfer (EPT) module as an efficient extractor of facial features from an image. Augmentation of various facial poses increases reliability and capability to recognize non-frontal facial expressions. It supports the proposed system's performance, enabling implementation in a real-world scenario. Several experimental results for each module show satisfying performance to each benchmark dataset and achieve competitive accuracy from competitors. It is due to the proposed modules that increase performance without producing redundant computing and parameters. The light network can precisely learn the characteristics of specific and global features in the data variation. Additionally, system integration demonstrates that the emotion recognizer operated at real-time speed on the CPU-based devices and an edge device.