KLI

검색

Ulsan Univ. Repository Thesis General Graduate School Electricity Electronics & Computer Engineering 2. Theses (Ph.D)

뉴스 인터뷰 비디오 시퀀스의 오버레이 텍스트 기반 자동 인물 색인 및 검색 시스템의 설계 및 구현

Metadata Downloads

Abstract: With the advent of the digital age, a vast amount of video data has been created by consumers and professionals over the last few decades. And the advances in the data capturing, storage, and communication technologies have made vast amounts of video data available to consumer and professional applications. The tremendous increase in the use of video data entails a need to develop effective methods to manage these multimedia resources by their content. In response to such demands, many researchers have been motivated to develop powerful indexing systems to ensure easy access to the relevant information, navigation, and organization in the vast repositories of video data.
Recognizing the overlay text embedded in images and videos provides high-level semantic clues which enhance tremendously the automatic image and video indexing. These texts contain a more concise and direct description of the content of the video. Therefore, the overlay text plays an important role in the automated content analysis systems such as the scene understanding, indexing, browsing, and retrieval.
Especially, the overlay text in the broadcasting news video sequences provides more meaningful of the content than any other type of videos. The detection and recognition of the overlay text have become a hot topic in news video analysis such as identification of person or place, name of the new-worthy event, date of the event, stock market, other news statistics, and news summaries.
This dissertation proposes a novel approach to extract meaningful content information from the broadcasted news video sequences by collaborative integration of image understanding and natural language processing. As an actual example, we developed a person browser system that associates faces and overlaid name texts in videos. This is given news videos as a knowledge source, then automatically extracts face and name text association as content information. The proposed framework consists of the text detection module, the face detection module, and the person indexing module.
For the preprocessing step, the proposed system makes the sub-clip based on the beginning frame for only focusing on the frames with overlay text. In the text detection module, the system executes overlay text detection and separates the name text line. And the system processes detection and extraction of the overlay text, and text recognition by optical character recognition (OCR). In the face detection module, the face thumbnail is extracted. The face detection module makes the representative thumbnail of the interviewee. And the person indexing module generates automatically the index metadata by named entity recognition (NER). And finally, the person indexing database is automatically made by combining the recognized text with the face thumbnail.
The successful results of person information extraction reveal that the proposed methodology of integrated use of image understanding techniques and natural language processing technique is headed in the right direction to achieve our goal of accessing real contents of multimedia information.