Tag Archives: multimodality
Last year I started to collaborate with my colleague Gloria Haro, from UPF, working on image processing, trying to incorporate audio and image descriptors for music analysis. We had a student who worked for several months on this and we are now opening a PhD position to further advance in the topic.
Anyone interested please apply! This is the official call:
The Music Technology Group (MTG) and the Image Processing Group (GPI) of the Department of Information and Communication Technologies, Universitat Pompeu Fabra in Barcelona are opening a joint PhD fellowship in the topic of “Audio-Visual Approaches for Music Content Description” to start in the Fall of 2015.
Music is a highly multimodal concept, where various types of heterogeneous information are associated to a music piece (audio, musician’s gestures and facial expression, lyrics, etc.). This has recently led researchers to apprehend music through its various facets, giving rise to multimodal music analysis studies (Essid and Richard, 2012).
Research on the complementarity of audio and image description technologies to improve the accuracy and meaningfulness of state of the art music description methods. These methods are the core of content-based music information retrieval tasks.
Several standard tasks could benefit from it:
- Synchronization of audio / video streams
- Audio-visual quality assessment
- Structural analysis and segmentation
- Discovery of repeated themes & sections
- Automatic video mashup generation
- Music similarity computation
- Genre / style classification
- Artist identification
- Emotion (mood) characterization
- Optical music recognition (OMR)
Supervisors: Emilia Gómez (MTG) / Gloria Haro (GPI)
Applicants should have experience in audio and image signal processing, and hold a MSc in a related field (e.g. telecommunications, electrical engineering, mathematics, physics or computer science). Experience in scientific programming (Matlab/Python/C++) and excellent English are essential. Musical background and expertise on multimedia information retrieval are also valuable.
The grant involves teaching assistance (up to 60 h a year), so interest for teaching is also valued.
More information on grant details:
Provisional starting date: November 2015
Interested candidates should send a motivation letter, a CV (preferably with references), and academic transcripts to Prof. Emilia Gómez (firstname.lastname@example.org) and Prof. Gloria Haro (email@example.com) before September 10th. Please include in the subject [PhD Audio-Visual].
They will also have to apply to the PhD program of the DTIC of the UPF.
- S. Essid and G. Richard, “Fusion of Multimodal Information in Music Content Analysis”. in Meinard Müller, Masataka Goto and Markus Schedl (Eds) “Multimodal Music Processing”, Dagstuhl Follow-ups, volume 3, pp. 37-53, ISBN 978-3-939897-37-8, 2012.
- M. Müller, M. Goto and M. Schedl (Eds) “Multimodal Music Processing”, Dagstuhl Follow-ups, volume 3, ISBN 978-3-939897-37-8, 2012.
- A. Schindel & A. Rauber. A (2013). Music Video Information Retrieval Approach to Artist Identification, CMMR.
- Y.W. Wang, Z. L.Z. Liu, & J.C. Huang. (2000). Multimedia content analysis-using both audio and visual clues. IEEE Signal Processing Magazine, 17(November). doi:10.1109/79.888862
- Yue Wu, Tao Mei, Ying-Qing Xu, Nenghai Yu, Shipeng Li, “MoVieUp: Automatic Mobile Video Mashup”, IEEE Transactions on Circuits and Systems for Video Technology, 2015.
This week, I am attending a focused seminar in Multimodal Music Processing. The organizers managed to gather together an amazing group of researchers from different areas in music technology. We are trying to discuss on the challenges related to the combination of different modalities of information into music processing systems.
What do we mean by multi-modality?
A “Modality” can be defined as “any of the various types of sensation, such as vision or hearing” or even “any of the five senses”. How to take advantage of information from our five senses into music processing systems? If we also consider the “context” and the “user” as an information source, we then have a huge amount of information to be efficiently combined.
This is mainly the challenge of all of the area, dealing and combining data and information for a particular task.
I tried to apply multimodality to my current project in flamenco music and I realized we are facing a multi-modal, multi-disciplinary and multi-cultural problem.
– Multi-modal: we are dealing with the integration of different knowledge sources: music, expression, context, cultural information, anthropological data, listener judgments, text, image and movement.
– Multi-disciplinary: each modality formalizes differently, so how to formalize knowledge from other disciplines into music processing systems: music content processing, knowledge discovery, musicology (flamenco scholars), cognition, anthropology and literature.
– Multi-cultural: How to refine music processing systems to be significant to people from different musical backgrounds and cultures.
So I am gathering nice ideas at this seminar!