[2021-Dec-29] Automatic Music Generation with Transformers

Institute of Information Systems and Applications


Prof. Yi-Hsuan Yang 楊弈軒教授

Chief Music Scientist, Taiwan AI Labs


Automatic Music Generation with Transformers


13:20-15:00 Wednesday 29-Dec-2021

QR code:



Hosted by:

Prof. Hung-Kuo Chu


In this talk, I will first give a brief overview of recent deep learning-based approaches for automatic music generation. I will then talk about our own research that employs self-attention based architectures, a.k.a. Transformers, for music generation. A naive approach with Transformers would treat music as a sequence of text-like tokens. But, our research shows that Transformers can generate higher-quality music when music is not treated simply as text. In particular, our "Pop Music Transformer" model, published at ACM Multimedia 2020, employs a novel beat-based representation of music that informs self-attention models with the bar-beat metrical structure present in music. This approach greatly improves the rhythmic structure of the generated music. A more recent model we published at AAAI 2021, named the "Compound Word Transformer", exploits the fact that a musical note is associated with multiple attributes such as pitch, duration and velocity. Instead of predicting tokens corresponding to these different attributes one-by-one at inference time, the Compound Word Transformer predicts them altogether jointly, greatly reducing the sequence length needed to model a full-length song and also making it easier to model the dependency among these attributes.


Dr. Yi-Hsuan Yang is currently with the Taiwan AI Labs as the Chief Music Scientist leading a team working on music AI technologies.  He also holds a position as an Associate Research Fellow of the Research Center for IT Innovation, Academia Sinica. He received his Ph.D. degree in Communication Engineering from National Taiwan University in 2010. His main research activity is at the crossroads of music information retrieval and machine learning, and in particular GAN- or Transformer-based automatic music generation in recent years. Dr. Yang was a recipient of the 2011 IEEE Signal Processing Society Young Author Best Paper Award, the 2014 Ta-You Wu Memorial Research Award of the Ministry of Science and Technology, Taiwan, the 2015 Young Scholars’ Creativity Award from the Foundation for the Advancement of Outstanding Scholarship, and the 2019 Multimedia Rising Stars Award from the IEEE International Conference on Multimedia Expo. He is an author of the book Music Emotion Recognition (CRC Press 2011). He was a Technical Program Co-Chair of the International Society for Music Information Retrieval Conference (ISMIR) in 2014. And, he used to serve as an Associate Editor for the IEEE Transactions on Multimedia and IEEE Transactions on Affective Computing, both from 2016 to 2019. Dr. Yang is a senior member of the IEEE. His team developed well-known music AI models such as MidiNet, MuseGAN, Pop Music Transformer, and KaraSinger.

All faculty and students are welcome to join.