A paper titled "CNVVE: Dataset and Benchmark for Classifying Non-verbal Voice Expressions" has been accepted at Interspeech 2023, a premier conference on spoken language processing and technology. The paper introduces a novel dataset consisting of 950 audio samples encompassing six distinct classes of voice expressions collected from 42 generous individuals who donated their voice recordings.
The lack of a dataset of non-verbal voice expressions (NVVEs) hinders the development of accurate machine-learning models that can classify expressions such as “mm-hmm” or “uh-uh,” that occur frequently in spontaneous speech. This limitation restricts progress in enhancing human-computer interaction and providing assistive systems for individuals with speech disorders who may have difficulty communicating with common languages. Besides, various applications such as emotion recognition, sentiment analysis, and speech-based behavioral analysis can benefit from the recognition of such expressions as ASR systems often struggle with transcribing non-verbal filters. This can result in the omission of critical information, as these simple expressions can carry significant meaning and express the speaker’s opinions during a conversation.
CNVVE dataset serves as a valuable resource for training and evaluating machine learning models to accurately interpret and classify different non-verbal expressions. Also, by providing a benchmark, it enables researchers and practitioners to compare and advance the performance of non-verbal voice expression classification systems.
The researchers at AC department trained a classifier using features derived from mel-spectrograms, a commonly used audio analysis technique. This approach effectively captures the relevant information in the audio samples, allowing the model to discern the different non-verbal voice expressions. Additionally, the team explored data augmentation techniques to further enhance the model's performance. The results of a 5-fold cross-validation demonstrated an impressive test accuracy of 96.6%, surpassing the baseline model.
The acceptance of this paper at Interspeech 2023 highlights the significance of advancing the study of non-verbal voice expressions. It opens new avenues for natural and intuitive communication, improves human-computer interaction, and lays the foundation for future research and development in the field. To learn more about the CNVVE dataset and the research findings, please refer to the paper and don't hesitate to contact the authors for any inquiries.
R. Hedeshy, R. Menges, S. Staab. CNVVE: Dataset and Benchmark for Classifying Non-verbal Voice. In: Proceedings of the 24th INTERSPEECH Conference, Dublin, Ireland, August 20-24, 2023.