Our first approach tries to utilize the audio signal and the lyrics of a musical track separately, while the second approach applies a uniform multi-modal analysis to classify the given data into mood classes. This work will examine and compare single channel and multi-modal approaches for the task of music mood detection applying Deep Learning architectures. developed a multi-modal Deep Learning system combining CNN and LSTM architectures and concluded that multi-modal approaches overcome single channel models.
In 2016, Lidy and Schiner trained a CNN for the task of genre and mood classification based on audio. The first approach to correlating music and mood was made in 1990 by Gordon Burner who researched the way that musical emotion affects marketing. Automated music mood detection constitutes an active task in the field of MIR (Music Information Retrieval).
The production and consumption of music in the contemporary era results in big data generation and creates new needs for automated and more effective management of these data.