New open-access journal Transactions of ISMIR, open for submissions

We launched the Transactions of the International Society for Music Information Retrieval, the open access journal of the ISMIR society at Ubiquity press. I am serving as Editor-in-Chief, together with Simon Dixon and Anja Volk.

Transactions of the International Society for Music Information Retrieval

TISMIR publishes novel scientific research in the field of music information retrieval (MIR).

We welcome submissions from a wide range of disciplines: computer science, musicology, cognitive science,  library & information science and electrical engineering.

We currently accept submissions.

View our submission guidelines for more information.


Special Issue at IEEE Multimedia Magazine

I have been collaborating for a while now on the edition of a Special Issue at IEEE Multimedia Magazine, which gathers state-of-the-art research on multimedia methods and technologies aimed at enriching music performance, production and consumption.

I have had the change to co-edit this issue with my colleagues Cynthia Liem (TU Delft, The Netherlands) and George Tzanetakis (University of Victoria, Canada), and I am very happy with the outcomes.

It is the second time I act as a co-editor for a journal (the first one was at JNMR and related to computational ethnomusicology) and I learnt a lot from the process. Editors have to asure good submissions, good reviews and recommendations, keeping the coherence and theme that we wanted to give as a message to our community. Yes: access, distribution and experiences in music are changing with new technologies. I am very happy with the outcomes!  Check our editorial paper here, and the full issue here.

And I love the design!

Journal paper and open dataset for source separation in Orchestra music

As part of the PHENICX project, we have recently published our research results in the task of audio sound source separation, which is the main research topic of one of our PhD students, Marius Miron.

During this work, we developed a method for orchestral music source separation along with a new dataset: the PHENICX-Anechoic dataset. The methods were integrated into the  PHENICX project for tasks as orchestra focus/instrument enhancement. To our knowledge, this is the first time source separation is objectively evaluated in such a complex scenario. 

This is the complete reference to the paper:

M. Miron, J. Carabias-Orti, J. J. Bosch, E. Gómez and J. Janer, “Score-informed source separation for multi-channel orchestral recordings”, Journal of Electrical and Computer Engineering (2016))”

Abstract: This paper proposes a system for score-informed audio source separation for multichannel orchestral recordings. The orchestral music repertoire relies on the existence of scores. Thus, a reliable separation requires a good alignment of the score with the audio of the performance. To that extent, automatic score alignment methods are reliable when allowing a tolerance window around the actual onset and offset. Moreover, several factors increase the difficulty of our task: a high reverberant image, large ensembles having rich polyphony, and a large variety of instruments recorded within a distant-microphone setup. To solve these problems, we design context-specific methods such as the refinement of score-following output in order to obtain a more precise alignment. Moreover, we extend a close-microphone separation framework to deal with the distant-microphone orchestral recordings. Then, we propose the first open evaluation dataset in this musical context, including annotations of the notes played by multiple instruments from an orchestral ensemble. The evaluation aims at analyzing the interactions of important parts of the separation framework on the quality of separation. Results show that we are able to align the original score with the audio of the performance and separate the sources corresponding to the instrument sections.

The PHENICX-Anechoic dataset includes audio and annotations useful for different MIR tasks as score-informed source separation, score following, multi-pitch estimation, transcription or instrument detection, in the context of symphonic music. This dataset is based on the anechoic recordings described in this paper:

Pätynen, J., Pulkki, V., and Lokki, T., “Anechoic recording system for symphony orchestra,” Acta Acustica united with Acustica, vol. 94, nr. 6, pp. 856-865, November/December 2008.

For more information about the dataset and how to download you can access the PHENICX-Anechoic web page.

Paper on melodic similarity in flamenco now online

Our paper on melodic similarity is finally online! The paper is titled

Melodic Contour and Mid-Level Global Features Applied to the Analysis of Flamenco Cantes

This work focuses on the topic of melodic characterization and similarity in a specific musical repertoire: a cappella flamenco singing, more specifically in debla and martinete styles. We propose the combination of manual and automatic description. First, we use a state-of-the-art automatic transcription method to account for general melodic similarity from music recordings. Second, we define a specific set of representative mid-level melodic features, which are manually labelled by flamenco experts. Both approaches are then contrasted and combined into a global similarity measure. This similarity measure is assessed by inspecting the clusters obtained through phylogenetic algorithms and by relating similarity to categorization in terms of style. Finally, we discuss the advantage of combining automatic and expert annotations as well as the need to include repertoire-specific descriptions for meaningful melodic characterization in traditional music collections.

This is the result of a joint work of the COFLA group, where I am contributing with tecnologies for the automatic transcription and melody description of music recordings.

This is an example on how we compare flamenco tonás using melodic similarity and phylogenetic trees:


And this is a video example of the type of styles we analyze in this paper, done by Nadine Kroher based on her work at the MTG:

You can read the full paper online:

CANTE: Open Algorithm, Code & Data for the Automatic Transcription of Flamenco Singing

Over the last months, several journal publications related to our research on flamenco & technology are finally online.

One of them is a work with my former PhD student, Nadine Kroher (who now moved to Universidad de Sevilla), on the automatic transcription of flamenco singing. Flamenco singing is really challenging in terms of computational modelling, given its ornamented character and variety, and we have designed a system for its automatic transcription, focusing on polyphonic recordings.


The proposed system outperforms state of the art singing transcription systems with respect to voicing accuracy, onset detection, and overall performance when evaluated on flamenco singing datasets. We hope it think will be a contribution not only to flamenco research but to other singing styles.

You can read about our algorithm at the paper we published at IEEE TASP, where we present the method, strategies for evaluation and comparison with state of the art approaches. You can not only read, but actually try it, as we published an open source software for the algorithm, plus a music dataset for its comparative evaluation, cante2midi (I will talk about flamenco corpus in another post). All of this to foster research reproducibility and motivate people to work on flamenco music.





CfP: Multimedia Technologies for Enriched Music Performance, Production, and Consumption

Publication: Jan. Mar. 2017
Submission deadline: Feb. 1st 2016

I am co-editing, together with my colleagues Cynthia Liem and George Tzanetakis,  a special issue on IEEE Multimedia related to music.

Internet access, mobile devices, social networks, and automated multimedia technologies enabling sophisticated information analysis and access have radically changed the ways in which people find entertainment, discover new interests, and generally express themselves online — seemingly without any physical or social barriers. Thanks to the increasing affordability of sensing, storage, and sharing, we note that information takes increasingly rich and hybrid multimedia forms, in which multimodal information streams co-occur in various social consumption settings.

This phenomenon also has enabled opportunities in the music domain. In music performance, novel opportunities for expression are found, exploiting (live) analysis and novel interaction mechanisms with musical data in multiple modalities. In music production, sophisticated multimedia data analysis techniques can both lead to more efficient and scalable workflows, as well as richer and better interfaces. In music consumption, the music data richness and its contextual and social embedding lead to novel consumer experiences stimulating music appreciation. Concerts turn into multimodal, multiperspective, and multilayer digital artifacts that can be easily explored, customized, personalized, (re)enjoyed and shared among various types of users; similar notions and opportunities hold for the consumption of general music recordings.

The goal of this special Issue is to gather state-of-the-art research on multimedia methods and technologies aimed at enriching music performance, production and consumption. We solicit novel, original work that is not published or under review elsewhere.

Topics of interest include, but are not limited to:

  • Processing of multimodal music data streams (e.g. audio, video, images, score, text, gesture…) for music performance, production and/or consumption
  • Multimedia content description and indexing for music performance, production and/or consumption
  • Multimedia information retrieval methods for music performance, production and/or consumption
  • Novel interaction mechanisms for music performance, production and/or consumption
  • Novel user interfaces for music performance, production and/or consumption
  • Novel user experience paradigms for music performance, production and/or consumption
  • Social networking and sharing for music performance, production and/or consumption
  • Digital mechanisms for remote music performers and audiences
  • Active listening, audience immersion, and inclusion of new music audiences
  • User-awareness, personalization and intent in music performance, production and/or consumption
  • Context-awareness and automatic context adaptation in music performance, production and/or consumption

Submission Guidelines

See Submissions should not exceed 6,500 words, with each table and figure counting for 200 words. Manuscripts should be submitted electronically (, selecting this special issue option.

Guest Editors

Detailed call for papers

Correlation between musical descriptors and emotions recognized in Beethoven’s Eroica

Last Wednesday I presented a poster at the Ninth Triennial Conference of the European Society for the Cognitive Sciences of Music (ESCOM 2015), that took place at the Royal Northern College of Music, Manchester, UK. It was a very interesting conference, including a very nice symposium in understanding musical audiences and inspiring talks on music education, psychology and wellbeing. Really impressed by how music have influence to improve quality of live from early years to the end of our lives.

The work I presented was leaded by Erika Trent, a student from the MIT that spent last summer at my lab thanks to the MIT Spain program. It was a very productive stay!

In this study we analysed the emotions that listener perceive when listening to Beethoven Symphony No. 3, Eroica, PHENICX target piece, played by the Royal Concertgebouw Orchestra Amsterdam. We then quantify the correlation between listeners’ perceived emotions from music and 1) musical descriptors, and 2) listeners’ backgrounds (country of origin, musical knowledge, exposure to classical music and knowledge of Eroica).

One conclusion of this study is that tonal strength (i.e. key clarity) correlates significantly with listener ratings of peacefulness, joyful activation, tension and sadness. Other significant correlations between emotion ratings and musical descriptors agree with the literature. This agreed with our hypothesis, being different parts on the same musical piece.

But there are two other unexpected and interesting findings that we might need to continue researching on.

First, we found out that listeners of varying backgrounds agree most on their ratings of sadness, compared to other emotions. Would that be similar for other musical pieces?

Second, listeners of similarly unmusical backgrounds, and listeners of young ages, recognise similar emotions to same music. On the contrary, listeners with more musical experience recognise different emotions to the same music. Caused by personal biases?

Interesting results that might corroborate the need for personalisation in music recommendation engines!

You can read the whole paper and access the poster here. 


Music Information Retrieval: Recent Developments and Applications (152 pages, 311 references, amazing reviewers!)

After one year of hard work we finished our paper (152 pages, 311 references!) on Music Information Retrieval: Recent Developments and Applications. I collaborated with Markus Schedl and Julián Urbano in this amazing project at Foundations and Trends in Information Retrieval (h-index=15, Q1 Computer Science). We tried to cover all existing techniques, approaches and key references in MIR, and to reflect the interest of our community on combining audio description, context mining, user modelling and proper evaluation methodologies.

We hope it will be an interesting reference for our community, and we also hope this paper can serve to motivate and introduce people outside or our field.

I would really like to thank the anonymous reviewers and the editor, Mark Sanderson. I don’t know who the reviewers are but I think they deserve being in the author list!  Great suggestions, discussions, restructuring, editions for a great outcome.



Music Information Retrieval: Recent Developments and Applications surveys the young but established field of research that is Music Information Retrieval (MIR). In doing so, it pays particular attention to the latest developments in MIR, such as semantic auto-tagging and user-centric retrieval and recommendation approaches.

Music Information Retrieval: Recent Developments and Applications starts by reviewing the well-established and proven methods for feature extraction and music indexing, from both the audio signal and contextual data sources about music items, such as web pages or collaborative tags. These in turn enable a wide variety of music retrieval tasks, such as semantic music search or music identification (“query by example”). Subsequently, it elaborates on the current work on user analysis and modeling in the context of music recommendation and retrieval, addressing the recent trend towards user-centric and adaptive approaches and systems. A discussion follows about the important aspect of how various MIR approaches to different problems are evaluated and compared. It concludes with a discussion about the major open challenges facing MIR.

Music Information Retrieval: Recent Developments and Applications is an invaluable reference for researchers, students or practitioners working on, or with an interest in MIR.

Paper & Matlab framework for hierarchical multi-scale set-class analysis

Journal on Mathematics and Music

As part of his recent PhD thesis, Agustín Martorell has studied the potential of multi-scale representations in music analysis. In particular, he focuses on the description of tonality from score representations and on the analysis of pitch-class sets.  We have recently published the results of this study  in Journal of Mathematics and Music: Mathematical and Computational Approaches to Music Theory, Analysis, Composition and Performance. The paper is now online!

Several analyses are discussed within the paper while addressing the problem of visualization. As a result of the work, there is also a MATLAB Toolbox that you are able to download from here.

Agustín Martorell & Emilia Gómez


This work presents a systematic methodology for set-class surface analysis using temporal multi-scale techniques. The method extracts the set-class content of all the possible temporal segments, addressing the representational problems derived from the massive overlapping of segments. A time versus time-scale representation, named class-scape, provides a global hierarchical overview of the class content in the piece, and it serves as a visual index for interactive inspection. Additional data structures summarize the set-class inclusion relations over time and quantify the class and subclass content in pieces or collections, helping to decide about sets of analytical interest. Case studies include the comparative subclass characterization of diatonicism in Victoria’s masses (in Ionian mode) and Bach’s preludes and fugues (in major mode), as well as the structural analysis of Webern’s Variations for piano op. 27, under different class-equivalences.


Forum on transcription in the journal “Twentieth-Century Music”

I contributed by means of an enriching interview to the “Forum on Transcription”, authored by Jason Stanyek (University of Oxford) in the journal Twentieth-Century MusicAs stated on the web site, this journal disseminates research on all aspects of music in the long twentieth century to a broad readership. Emphasis is placed upon the presentation of the full spectrum of scholarly insight, with the goal of fostering exchange and debate between disciplinary fields.

I share an interesting conversation about transcription with Parag Chordia. In this conversation with Jason we discussed about the challenges and potential of audio analysis tools for computer-assisted transcription and description of music recordings. I gave some examples on my work on the transcription of flamenco singing that is being carried out within the COFLA project. 

You can find the results of the forum and the rest of a very impressive special issue on transcription on the web.


