First Look at projectM - I Am Singing

Roar. Bears, as I look at how I can draw my singing videos, I think it would be useful to look at audio visualisation software, or music players that has visualisation in them. Then I found one example project called projectM1. It seems to implement some aspects of visualisations in Winamp, which is a famous visualisation set I know. Its source code should be helpful in learning the tricks in them.

The two questions to answer are:

What audio features are extracted for drawing?
How are those features drawn?

The GitHub repo linked from above only contained development libraries and a simple SDL2 programme for testing visuals, but the most important bits - the code that extracts features from audio and draws them - are all in there.

Features from Audio¶

Features used for visualisation include

Waveform (left/right channel)
Spectrum (left/right channel)
Base, mid and treb (volume of the first, second and third sixths of the spectrum)
Volume (average of base, mid and treb)
Attenuated (averaged over a longer interval) values of base, mid, treb and volume

These are stored in a FrameAudioData object. When new samples come in from the audio input (which projectM does not handle and must be converted to floating point (float) or integer values (uint8 or int16) before calling PCM::Add), the waveform and spectrum in the PCM object get updated. The data class containing these is the FrameAudioData class, but instances are created only when PCM::GetFrameAudioData is called. This is also when the base, mid, treb and volume values (both versions) are calculated.

The features used are typical of what I use in visualisation. But projectM made their own adjustments in every of these to make the visualisation smooth.

When updating the frequency spectrum, projectM uses the rolling average of the current and the new frame, with a raised cosine window applied, as input to FFT (Fast Fourier Transform). Raised cosine windows are commonly seen in wavelet analysis as they have a finite time variance and a finite frequency variance2. After this, it optionally equalises the frequencies on a log scale, emphasising lower frequencies.
When updating time waveform, it has an additional waveform alignment step. Consider the case where the time window shows 2 seconds of waveform, but there’s a repeated syllable being sung every 1.9 seconds. Without the alignment step, the bump in the waveform will appear moving to the left, but ideally it should stay at the same place in the window. The alignment was done by calculating a sliding mean absolute error to every octave in the mips and finding the shift that minimises the error, over a weighted waveform.
The beat detection values are calculated in Loudness objects by mixing current and two average values and calculating a ratio of the current value over two average values, averaged over different ranges in time. I have yet checked whether this is a well-known algorithm and whether there’s a paper for it.

Drawing the Features¶

The centre of a projectM visualisation is a preset file. The repo includes some simple ones in the presets/tests folder, but you never know how creative this file can get until you see the idle preset in the IdlePreset class3!

ProjectM parses the preset file and creates a MilkdropPreset object, which initialises, amongst other things, frame buffers, static shaders (for warping and composite effects), meshes for custom waveforms and shapes, and most importantly, a PresetState object.

A PresetState object holds preset variables, audio data, initialisation, per-frame and per-pixel shader code (per-point for custom waveforms) and shader code for other things. If you know OpenGL, you might think of per-frame updates as uniform updates, per-pixel shaders as fragment shaders and per-point shaders as vertex shaders. In projectM, almost anything custom can have a per-frame context.

So where are the audio data used? From what I found, at these places at least:

Shader uniforms. This is the most intuitive guess, but least obvious to find. The base, mid, tref and volume are set as a vec4 uniform called _c3 and the attenuated versions to one called _c4, then expanded to variables with proper names. The idle preset, for example, used these values in its per-frame code.
Per-frame, per-pixel and per-point contexts, where the above shader uniforms are used. Shader authors are free to do anything with the base, mid, tref and volume data.
Main waveform (i.e., not custom waveform ones). In particular, the WaveformMath class’s virtual method GenerateVertices has access to the entire preset state, which includes the time waveform, spectrum and the eight beat detection values. The main waveform is selected with the variable nWaveMode, and one of spectrum or time wave can be used. Based on the selection, either the time waveform data or the spectrum data are used to create data for vertices.
Custom waveforms. These use audio data for vertices similar to the main waveform, however they haven’t got a class similar to WaveformMath. Their use of audio data is in the per-point context.

Conclusion¶

I was slightly disappointed after reading the source code as I expected projectM to have used much more advanced signal processing techniques than waveform, spectrum and beats. But at the same time, I deeply admire how much thought its developers put into implementing the details in the calculation of these data to make the visualisation look smooth.

I was also surprised how these simple features can create such flashy visuals in their demo playlist4. If the data source is not that different, then it would be the drawing process that made all the difference. Just what kind of maths were in the shaders that created those? Studying the idle preset and the built-in waveforms can be a useful exercise.

ProjectM lacked one feature that I want in my videos: the lyric (or pre-recorded timed sequence) display. This is another piece of data needed to make the videos I want. The way projectM separates the feature computation from drawing code, and manages double buffer to implement frame warping will both be helpful for such development.

References¶

GitHub - projectM-visualizer/projectm: projectM - Cross-platform Music Visualization Library. Open-source and Milkdrop-compatible. · GitHub. (n.d.). Retrieved May 3, 2026, from https://github.com/projectM-visualizer/projectm
NOC17 EE09. (2017). Week8-Lecture21.1. https://www.youtube.com/watch?v=0Jh0Xnm0L-8
projectM-visualizer. (n.d.). projectm/src/libprojectM/MilkdropPreset/IdlePreset.cpp at v4.1.6 · projectM-visualizer/projectm. In GitHub. Retrieved May 3, 2026, from https://github.com/projectM-visualizer/projectm/blob/v4.1.6/src/libprojectM/MilkdropPreset/IdlePreset.cpp
cybermischa. (2025). ProjectM Gstreamer offline render test 9. https://www.youtube.com/watch?v=jJmLQGhYWys