ENVIRONMENTAL SOUND RECOGNITION USING SPECTROGRAM IMAGE FEATURES
Amogh Hiremath
Abstract: Most of the prior research which has been carried out on audio recognition has been done in speech and music. Only in recent years, dozens of emerging works have been conducted on Environmental Sound Recognition and has gained importance. For the purpose of audio classification, many previous efforts utilize acoustic features such as Mel-frequency Cepstral Coefficients (MFCCs), Zero Crossing Rate (ZCR), Root Mean Square Error (RMSE), spectral centroid, spectral bandwidth and other frequency domain features derived from the spectrogram of the audio. In this paper, we use a slightly different approach of feature extraction, where we summarize short audio clips of about five seconds by segmenting out the most prominent part of the audio signal. We then compute spectrogram image of the segmented audio, and divide it into different sub-bands with respect to the frequency axis. For each of the sub-bands, we extract first order statistics and Gray Level Concurrence Matrix (GLCM) features. In the classification stage, we combine two SVM (Support Vector Machines) classifiers. The first classifier uses first order statistics and GLCM features. The second classifier uses acoustic features such as MFCCs, ZCR, RMSE, spectral centroid, spectral bandwidth and other frequency domain features derived from the spectrogram of the audio to obtain the final result. We evaluate our approach on two publicly available datasets, namely, ESC-10 and Freiburg-106 with a five-fold and a ten-fold cross validation for ESC-10 dataset and Freiburg-106 dataset respectively. Experiments show that the proposed approach outperforms the baselines and provides similar results compared to the state-of-art
Keywords: Environmental Sound Classification, First Order Statistics, GLCM, Spectrogram, SVM
DOI: https://doi.org/10.15623/ijret.2017.0610015
|