JinyuEgg
JinyuEgg

Music Technology在读

What is "music information extraction"? A brief introduction


Recently entrusted "Assembly! The blessing of "Animal Crossing Friends Club" made the speech synthesis in the game popular. Taking advantage of this shareholder wind, I would like to introduce some back stories related to it: How did the production staff of Dongsen come up with the idea of dealing with sound in this way? How do voice and music connect with technology? After understanding the music theory, what kind of brain-opening things can programmers make?

As a freshman in the first year of research, the author shamelessly introduces this interdisciplinary subject of music, signal processing and machine learning: Music Information Retrieval (MIR).

Friends who know sound engineering know that in the field of music production, thanks to the combination of technology and music creation, the sound experience we can enjoy every day is created. As the other side of music, when the process of listening and understanding music is combined with technology, it is the field of "music information extraction" that this article wants to introduce to you. Because it is from the perspective of listening and understanding, MIR is also known as Music Informatics or Machine Listening.

So what can we do with machines that "understand" music? Let's listen to a different "Golden Song":

This is a short demo of Spleeter, a sound source separation tool produced by Deezer. In a complete song, if we can make the machine distinguish which voices are sung by the singer and which are accompanied, we can separate the singing and the accompaniment of the song like in the video.


Friends who know sound engineering should know that when we make songs, we usually use different microphones for different instruments, divide them into different tracks to record, and then make a synthesized audio through the process of mixing . If you use the analogy of cooking, the process is like mixing and baking different raw materials to make the finished food. However, if we reverse this process, separate the eggs, tomatoes, and oil from the finished tomato scrambled eggs, and get the complete raw material, this is what sound source separation can do.

However, students who know the laws of thermodynamics in high school physics or have studied information theory should know that such entropy reduction process violates the law of entropy increase: it is impossible to separate half a cup of 0 degree ice water and half a glass from a glass of 50 degree warm water 100 degree boiling water. So how can we break down a sonic tomato scrambled into a sonic tomato and a sonic egg? This brings us to the importance of sound knowledge.

We know that sound, or sound waves, is essentially a regular vibration produced by air or other media, and the vibration laws of different sound-emitting objects are also different. For example, the percussion sound of the drum is short and powerful, and has almost no pitch (of course, friends who learn drums know that the drum skin also needs to be tuned); while the sound of the violin is long and even, and the pitch is very obvious.

Based on this phenomenon, we can let the algorithm learn the spectral characteristics of different instruments, so as to separate different instruments, or accompaniment and vocals.

Different instruments have different short-term spectral characteristics. Each column from left: clarinet, distorted guitar, vocals, flute, piano, saxophone, trumpet, violin




Next, I will introduce to you a skill that many people want to have: paving the spectrum.

Many friends who learn guitar will have a song that they dream of being able to play (it must not be a vast ocean). But including me, many people often have a problem when they first learn guitar: where to find sheet music? At this time, if the algorithm can help me analyze the chord used in each measure from the audio of this song, then I will become a guitar master just around the corner!

Yes! Now you can also let the machine do it for you! Please see a demo I made by myself:

Play a simple C major 1645, and the algorithm can identify the BPM I'm playing, the rhythm pattern, and what chord I'm playing on each beat! Isn't it very cool! (No) Of course, what is needed behind this is a high degree of combination of signal processing knowledge and music knowledge. If you are interested, the author can continue to explain it to you later.

The chord information is hidden in a matrix like this

In addition to sound source separation and chord recognition, MIR has many applications: listening to songs, plagiarism recognition, melody extraction, and even music creation; and when all this is combined with artificial intelligence (neural network), And create more brain-opening results! Here I will share with you an 8bit music automatic generation algorithm LakhNES :


All the music shown here is automatically generated after learning 8bit music samples with a neural network. Maybe after a while, even 8bit games will be automatically generated.

I don't know how many of my friends have heard of "GEB", a wonderful book that combines computer, logic, graphic design and music. The book discussed music, artificial intelligence and art that seemed to be irrelevant at that time. A strange fusion is taking place that may not even be imagined by the author.


Thank you for reading, this is the first time I have contributed to matter, and I have carried a small popular science article I wrote before. Looking forward to having more exchanges with friends from Matters!

CC BY-NC-ND 2.0

Like my work?
Don't forget to support or like, so I know you are with me..

was the first to support this article
Loading...

Comment