KLI

검색

Ulsan Univ. Repository Research Laboratory Engineering Research

동적 신경회로망을 이용한 음성인식에 관한 연구

Metadata Downloads

Alternative Title: A Study on the Speech Recigniton Using Dynamic Neural Network

Abstract: 재귀형 인공신경망을 사용하는 음성인식 계통을 구성하고 동작을 분석하였다. 신경망으로는 부분결합재귀(PCR)망과 부분결합 Elman(PCE)망의 두가지 변형이 시험되었으며, 시간적 변화를 가지는 입력 시이퀀스에 대해서 어떻게 동작하는가를 중점적으로 고찰하였다. 음성 시료는 한국어 자음 "ㄱ"과 "ㄷ", "ㅂ"의 세가지를 대상으로 하고, 이들이 모음-자음-모음의 형태로 나타나는 여러 경우를 택하여 시료로 하였다. 교육된 시료에 대해서는 두 신경망 모두 100% 인식률을 보였으나 교육되지 않은 시험용 시료에 대해서는 PCR망은 62.2%, 그리고 PCE망은 73.3%의 인식률을 나타내었으며 사용된 시료들의 대단히 과도적인 성격이 인식률을 낮추는 주요 원인이라고 판단되었다. 실험결과의 분석으로부터 개선된 시스템을 위한 방향이 제시되었다.
The phoneme recognition using the recurrent type neural networks has been studied. Two variations of network have been tested, that is; partially connected recurrent(PCR) network and partially connected Elman(PCE) network. The interest was focused on the behaviour of networks in response to the time varying input sequences. The speech samples used for the recognition task were Korean phonemes "G", "D" and "B" taken varying context of VCV structure. Both of the network recognized all the training samples. For untrained test samples, the recognition rates of 62.2% and 73.3% were achieved by PCR and PCE network, respectively. Rather low rate of recognition is considered mainly due to highly transient nature of speech samples used. A discussion toward the improvement of network performance is made based on the analysis of experimental results.
The phoneme recognition using the recurrent type neural networks has been studied. Two variations of network have been tested, that is; partially connected recurrent(PCR) network and partially connected Elman(PCE) network. The interest was focused on the behaviour of networks in response to the time varying input sequences. The speech samples used for the recognition task were Korean phonemes "G", "D" and "B" taken varying context of VCV structure. Both of the network recognized all the training samples. For untrained test samples, the recognition rates of 62.2% and 73.3% were achieved by PCR and PCE network, respectively. Rather low rate of recognition is considered mainly due to highly transient nature of speech samples used. A discussion toward the improvement of network performance is made based on the analysis of experimental results.