Sonic visualiser spectrogram

7/26/2023

Returning to the paper recently, I learned that Flandrin had written a book on the subject, and I bought a copy hoping it might bridge the conceptual gap. (It turns out there are other good reasons one could make this choice, but I didn’t know that. So I didn’t really understand the paper, and a programmer has plenty to do, and that is one reason why Sonic Visualiser’s “Peak-Frequency Spectrogram” layer calculates instantaneous frequencies from the phase difference between successive columns, something which I found much easier to understand. I can dimly remember this world, because my undergraduate degree - who am I kidding, my only degree - started out as pure maths, but I haven’t inhabited it for any of my working life. Both time-domain functions and time-frequency representations are continuous, and practical questions about overlap and window length don’t arise. Signals are finite-energy functions over infinite domains, and a spectrogram is a double integral over time and angular frequency. The Auger & Flandrin paper instead comes from a world that summarises a spectrogram as a two-dimensional Wigner-Ville distribution filtered with a smoothing window leading to a time-frequency representation of the Cohen’s class. The language used in a publication like the DAFx book is typical in this world. There is nothing particularly mathematical about the implementation of this, and any intuition used by the programmer is a mixture of the visual and techniques from the world of engineering. The smoothing window is because your Fourier transform - a thing which matches up sinusoids of different frequencies against a signal to identify which ones would add up to it - operates on an infinite signal, consisting of the input you give it repeated forever in both directions: this will have a discontinuity each time it wraps around, and the smoothing window removes some of the frequency artifacts from these discontinuities. The short slices are because you want a fixed, smallish number of output bins, and you have various tradeoffs - time and frequency resolution and computational efficiency - to consider in that. For a programmer, a spectrogram comes from taking short overlapping slices of a sampled signal, multiplying each by a smoothing window shape, applying a short-time Fourier transform, and taking the magnitudes of the complex output bins to get one column of the spectrogram per slice of input. I have since realised this is partly because it isn’t all that clear with its notation, but there is also a big gap between the naive programmer’s view (that’s mine) of a spectrogram and the mathematical analysis used in the paper. I read this paper about 15 years ago and didn’t understand it. Illustration from Auger & Flandrin (1995) This crunchy publication (21 pages, dozens of equations and figures) took a pleasing idea - replacing the familiar grid-format time-frequency spectrogram with a field of precisely localised points calculated using both magnitude and phase of the frequency bins, rather than only magnitude as a traditional spectrogram does - and set out the mathematics of applying it to a number of different time-frequency and time-scale representations. Patrick Flandrin is a physicist and signal-processing researcher whose name I first encountered as co-author (with François Auger) of a 1995 IEEE Transactions on Signal Processing paper called “Improving the Readability of Time-Frequency and Time-Scale Representations by the Reassignment Method”.

0 Comments

Sonic visualiser spectrogram

Leave a Reply.

Author

Archives

Categories