학술논문한국정보기술학회논문지2020.06 발행KCI 피인용 1

차분특징을 이용한 CRNN 기반의 소리 이벤트 검출

Sound Event Detection Based on CRNN Using Derivative Features

곽진열(계명대학교); 정용주(계명대학교)

18권 6호, 89~96쪽

초록

본 논문에서는 딥뉴럴네트워크 기반의 소리 이벤트 검출을 위하여 차분특징을 사용하였다. 오디오 신호의 매 프레임 마다 주파수 분석을 통한 로그-멜-필터뱅크 값을 추출하고 프레임들 간의 상관관계를 이용한 1차 및 2차 차분 특징 값을 추출하여 이용하였다. 베이스라인 검출기로는 최근 소리 이벤트 검출에서 가장 많이 사용되는 CRNN(Convolutional Recurrent Neural Network)을 사용하였으며, 64차원의 로그-멜-필터뱅크 값과 그들의 1차 차분 및 2차 차분 값을 독립적인 입력 특징 맵으로 구성하였다. CRNN의 출력단에는 global average pooling을 추가하여 강전사(strong label) 오디오 데이터 뿐만 아니라 약전사(weak label) 및 비전사(un-label)데이터도 학습에 활용할 수 있도록 하였다. 다양한 학습 환경에서 차분 특징을 사용함으로서 일관된 성능 향상이 있음을 확인하였다. DCASE Challenge 2018/2019 오디오 데이터를 이용한 실험결과, 제안된 차분 특징을 이용하여 최대 16.9%의 상대적 F-score 향상을 얻을 수 있었다.

Abstract

In this paper, we used derivative features for sound event detection based on deep neural networks. We extracted log-mel-filterbank value for each frame of the audio signal by frequency analysis and its 1st and 2nd derivative features were extracted for the use by exploiting the correlation between the frames. CRNN which is recently most popular in audio event detection was used as the baseline detector and 64 dimensional log-mel-filterbank outputs and their 1st and 2nd derivatives were constructed as independent input feature maps. Global average pooling layer is added at the output of the CRNN to make use of weak and un-label audio data as well as strong label data in the training. In the various training environment, we could observe consistent performance improvement by using the derivative features. From the experimental results using DCASE Challenge 2018/2018 audio data, we could obtain maximally 16.9% relative improvement in F-score.

발행기관:: 한국정보기술학회
DOI:: http://dx.doi.org/10.14801/jkiit.2020.18.6.89
분류:: 기타공학일반

AI 법률 상담

이 논문의 주제에 대해 더 알고 싶으신가요?

460만+ 법률 자료에서 관련 판례·법령·해석례를 찾아 답변합니다

AI 상담 시작