Sometimes I get curious and want to break down a song to it's core melodies, FFMPEG has a cqt filter (like fft), that can convert an audio file into a visual based on the notes. I also realized you can do this in real-time with decent latency using ALSA.

ffmpeg -f alsa -ac 2 -i pulse -filter_complex "[0:a]showcqt=s=1280x720:fps=30:[out]" -map "[out]" -f sdl -
  • This will try capturing a microphone by default, use 'pavucontrol' to change the captured input to loopback for your speaker.
  • I tried to use '-f pulseaudio' instead, but it just ends up being very stuttery and with much higher latency, whereas the alsa version is so much smoother and lower latency.
  • You can change the resolution and fps are needed, I sometimes daisy-chain multiple effects.
  • Output as SDL has much lower latency from my testing. I used to pipe it into mpv with no buffer, but this is even faster

Older revision of this command I used to use as reference:

ffmpeg -f pulse -ac 2 -i default -filter_complex "[0:a]showcqt=s=1280x720:fps=30:[out]" -map "[out]" -r 30 -c:v rawvideo -f matroska - | mpv --no-cache --untimed --no-demuxer-thread -

This stutters a bit more, but might be preferred in some cases. It also uses pulseaudio directly. I switched to using rawvideo instead of other codecs, since rawvideo has a much lower latency, as there is no need for compression and decompression, which doesn't matter since we're piping it for playback anyway.

  • scripts/ffmpeg/visualize_music_into_notes.txt
  • Last modified: 2022-04-13 15:12
  • by Tony