How to Use Python to Detect Music Onsets
As a full-stack developer and music technology enthusiast, I‘ve always been fascinated by the intersection of code and music. One particularly interesting challenge is automatically detecting the onsets, or beginnings, of musical notes and percussive hits in an audio file.
Once you can detect onsets, a wide range of applications become possible, from music transcription and audio alignment to adaptive audio effects and generative music systems. In this in-depth tutorial, I‘ll show you how to use two powerful Python libraries, Librosa and Aubio, to detect onsets in your own music files.
But first, let‘s make sure we understand exactly what onsets are and why they‘re worth detecting. In musical terminology, an onset refers to the exact moment that a new note or sound begins. This could be the start of a sung syllable, a piano key press, a drum hit, or any other musical event.
Essentially, onsets give us a way to map out the temporal structure of a piece of music. Once you know where the onsets are, you know the timing of the musical events, which is key information needed for things like automatically transcribing the rhythm of a song or syncing up a beat counter to music.
While our human ears are remarkably good at perceiving onsets, getting a computer to detect them accurately in an audio file is a complex undertaking. Under the hood, onset detection algorithms have to do some sophisticated digital signal processing, such as computing spectral flux, phase deviation, or complex domain detection functions.
Fortunately, we don‘t have to implement those mathematically-intense algorithms ourselves, because the Python libraries Librosa and Aubio provide convenient high-level interfaces for music onset detection. Let‘s take a closer look at each library and see how to use them.
Detecting Onsets with Librosa
Librosa is a feature-rich Python package for music and audio analysis. It provides an extensive range of tools for working with audio data, including loading audio files, calculating various spectral features, beat tracking, and much more.
To get started with Librosa, first make sure you have it installed. You can install Librosa using pip:
pip install librosa
With Librosa installed, let‘s see how to use it to detect onsets. We‘ll start by loading an audio file:
import librosa
# Load the audio file
y, sr = librosa.load(‘example.mp3‘)
Here we use librosa.load()
to read in an audio file. It returns two values: y
, the audio time series data as a numpy array, and sr
, the sampling rate of the audio.
Next, to detect the onsets, we pass the audio data and sample rate to librosa.onset.onset_detect()
:
# Detect onsets
onset_frames = librosa.onset.onset_detect(y=y, sr=sr)
The onset_detect()
function returns an array of frame numbers corresponding to the detected onsets. By default, it uses a spectral flux based method to detect onsets, but Librosa provides several other onset detection algorithms you can choose from, such as complex flux, high-frequency content, and phase deviation.
We can convert the array of onset frames to a list of onset times in seconds like this:
# Convert frames to time
onset_times = librosa.frames_to_time(onset_frames, sr=sr)
print(onset_times)
This gives us a numpy array of onset timestamps in seconds, which might look something like:
[0.046 0.48322 0.97161 1.39605 1.90366 2.32651 2.78823 3.20807]
And there we have it – a list of all the detected onset times in the audio file!
We can also plot the onsets visually using Librosa‘s display module. The following code will plot the audio waveform and overlay vertical lines indicating the detected onsets:
import librosa.display
# Plot the onsets
fig, ax = plt.subplots(figsize=(12, 4))
librosa.display.waveshow(y, sr=sr, ax=ax)
ax.vlines(onset_times, ymin=-1, ymax=1, color=‘r‘, linestyle=‘--‘, label=‘Onsets‘)
ax.legend()
plt.show()
This produces a plot like the following, clearly showing that the detected onsets align with the start of major waveform peaks, which usually correspond to the beginning of musical events:
One of the great things about Librosa is that it provides a lot of control over the onset detection process. We can fine-tune things like the onset detection algorithm, the frame size, the hop length between frames, and the amplitude threshold for peak picking.
For example, to use a different onset detection function, we can pass the onset_detect
parameter:
# Use high-frequency content detection function
onset_frames = librosa.onset.onset_detect(y=y, sr=sr, onset_detect=librosa.onset.hfc)
And we can adjust the frame size and hop length like this:
onset_frames = librosa.onset.onset_detect(y=y, sr=sr,
frame_length=2048,
hop_length=512)
Generally speaking, a larger frame size will result in more frequency resolution but less temporal resolution. Experiment with different parameters to see what gives you the best results for your particular audio.
Detecting Onsets with Aubio
While Librosa tends to be more well-known, Aubio is another excellent Python library for onset detection and other audio/music analysis tasks. Aubio is highly optimized and also has a C interface in addition to Python.
Installing Aubio is just as easy as Librosa:
pip install aubio
Aubio‘s API for onset detection is fairly similar to Librosa‘s. To detect onsets with Aubio, we first create an Aubio source object and an onset object:
from aubio import source, onset
# Load the audio file
url = ‘example.mp3‘
win_size = 1024
hop_size = win_size // 2
samplerate = 0
src = source(url, samplerate, hop_size)
samplerate = src.samplerate
onset_mode = ‘hfc‘
onsets = onset(onset_mode, win_size, hop_size, samplerate)
Here we‘re using the high-frequency content (‘hfc‘
) onset detection mode, but Aubio supports other modes like ‘specflux‘
, ‘phase‘
, and ‘specdiff‘
. The window and hop sizes control the granularity of the analysis.
With the source and onset objects created, we can step through the audio file and collect the onset timestamps:
timestamps = []
total_frames = 0
while True:
samples, read = src()
total_frames += read
if read < hop_size:
break
is_onset = onsets(samples)
if is_onset:
onset_time = onsets.get_last_s()
timestamps.append(onset_time)
print(timestamps)
This code steps through the audio buffer, processes each chunk of samples, and checks if an onset is detected in that frame. If so, it gets the timestamp of the onset in seconds and appends it to the timestamps
list.
Like with Librosa, we can also visualize the detected onsets using a plot:
import numpy as np
import matplotlib.pyplot as plt
# Plot the onsets
fig = plt.figure(figsize=(12, 4))
plt.plot(np.arange(len(samples)) / float(samplerate), samples, label=‘Audio‘)
plt.vlines(timestamps, ymin=-1, ymax=1, color=‘r‘, linestyle=‘--‘, label=‘Onsets‘)
plt.legend()
plt.show()
This gives a plot very similar to the Librosa one, with the audio waveform and onset markers:
So as you can see, both Librosa and Aubio provide quite straightforward APIs for detecting onsets in music audio files. They also have a lot of similarities under the hood, using many of the same standard spectral processing and peak picking techniques.
That said, there are some differences between the two libraries to be aware of. Aubio prides itself on being very computationally efficient, so it may be a better choice if speed is a priority. Librosa offers a wider selection of onset detection functions and more customization options. It also has a larger overall feature set beyond just onset detection.
In terms of supported audio file formats, Librosa is compatible with a wider range, including MP3, OGG, WAV, FLAC, etc. With Aubio, you may run into some issues with certain formats like MP3, where it can give inaccurate duration warnings. So the audio format support is something to consider.
Whichever library you choose, I‘d encourage you to experiment with the different onset detection modes and parameters to find what works best for your use case. Onset detection accuracy can be quite sensitive to things like the audio mix, percussiveness of the sounds, presence of background noise, and so on. It‘s normal to need to do some tuning to get the best results.
Going Further with Music Onset Detection
Hopefully this tutorial has given you a practical foundation for getting started with music onset detection in Python. Detecting onsets is really just the tip of the iceberg when it comes to music information retrieval and computational analysis of audio.
With the onset timestamps, you can go on to do beat tracking, tempo estimation, audio-to-score alignment, and much more. Some fun ideas for projects could be an automatic drumbeat transcription tool, a real-time visual beat detector, or an app that syncs haptic feedback to onsets in music.
You could also experiment with using onsets to drive generative music systems, such as dynamically triggering MIDI or audio samples in a DAW based on the detected onset positions in an existing piece of music.
As you dive deeper into this field, you might want to explore some of the more advanced algorithms and approaches, such as adaptive whitening, neural networks, and supervised machine learning techniques. With modern deep learning models achieving human-level performance on many MIR tasks, it‘s an exciting time for this area of research.
If you‘re interested in learning more, I‘d recommend checking out the documentation and tutorials for Librosa and Aubio, as well as academic papers and open-source projects shared on sites like ISMIR (International Society for Music Information Retrieval) and DCASE (Detection and Classification of Acoustic Scenes and Events). There are also some great books on music information retrieval and audio signal processing that go into much more mathematical depth on topics like spectral analysis and feature extraction.
At the end of the day, as with any engineering or creative pursuit, the key is just to experiment, have fun, and keep learning. Whether you‘re a musician, researcher, or developer, I think you‘ll find that music onset detection is a fascinating and rewarding application of digital signal processing. So dive in, and happy onset hunting!