How to Read an FFT Spectrum: A Practical Field Guide

By WuTools editorial team · Updated 2026-05-08

Look at the diagram below: that staircase of spikes is an FFT (Fast Fourier Transform) plot of a single piano note (A4, 440 Hz). The tallest spike on the left is the fundamental — the pitch you actually hear. The smaller, evenly-spaced spikes to its right are harmonics at 2×, 3×, 4× the fundamental frequency — they give the note its timbre, the reason a piano sounds different from a flute playing the same note. Read an FFT this way and the rest of the page makes sense: every bump, peak, skirt, and slope tells you something concrete about the original waveform. This guide walks through what the FFT actually computes, how to read frequency bins, the difference between magnitude and power, what real-world signals (sine, voice, noise) look like, and which knobs in our Spectrum Analyzer control which features.

What an FFT actually shows

FFT of a piano A4 note. Fundamental at 440 Hz (the perceived pitch); harmonics at 880, 1320, 1760 Hz (the timbre).

The FFT decomposes a finite chunk of a signal into a sum of sinusoids. The horizontal axis is frequency (Hz). The vertical axis is the amplitude (or power) of each frequency component present in that chunk. In the diagram above the chunk contains a piano note: the FFT correctly recovers the 440 Hz fundamental plus a stack of harmonics at integer multiples (880, 1320, 1760 Hz). If instead the chunk contained a pure 440 Hz sine wave, only the leftmost spike would appear; if it contained broadband noise, the plot would look like a roughly flat line from 20 Hz up to the Nyquist frequency.

The FFT does not tell you when each frequency happened — it averages the entire chunk. To see how frequencies evolve over time you need a spectrogram (a stack of FFTs), which our Waveform Viewer overlays on top of the time-domain signal.

Frequency bins and resolution

An N-point FFT produces N/2 distinct frequency bins between 0 Hz and the Nyquist frequency (half the sample rate). Each bin spans fs/N hertz, so a 4096-point FFT at 48 kHz gives roughly 11.7 Hz per bin. That means two pure tones less than 11.7 Hz apart will land in the same bin — they'll look like one peak. To resolve them, increase N (longer FFT) or lower the sample rate.

There is a tradeoff: a longer FFT averages over more time, so transient events get smeared. Voice analysis typically uses 1024–4096 points (around 20–90 ms at 48 kHz). Music analysis uses 8192 points or more for fine pitch resolution. Vibration analysis on rotating machinery sometimes uses 65536 points to separate close shaft harmonics.

Magnitude vs power vs dB

Three vertical-axis conventions are common. Magnitude is the linear amplitude of each component; tall peaks dominate, small features vanish. Power is magnitude squared, which exaggerates the tall peaks even more. Decibels (20·log10(magnitude) or 10·log10(power)) compress the dynamic range so a 1000:1 amplitude ratio becomes a 60 dB visual gap — small features become legible.

For audio work, always use a dB scale unless you're hunting for one specific tone. The human ear hears in dB; the eye reads dB plots better. Engineering specs (THD, noise floor, SNR) are universally reported in dB.

Patterns to recognize

Four common FFT shapes. F0 = vocal fundamental; 60/120/180 = mains hum and its harmonics.

Pure sine wave: one tall, narrow spike at the tone frequency. If it has wide skirts at the base, that's a windowing artifact (next section).

Voice: a fundamental at the speaker's pitch (~100 Hz men, ~200 Hz women) plus a stack of harmonics at 2×, 3×, 4× the fundamental, decaying upward. The spacing between peaks is the fundamental — a quick way to estimate vocal pitch from a plot. Try this on a recording in our Key Detector.

White noise: a flat line bouncing within a few dB across the whole band.

Pink noise: a line falling at 3 dB/octave (10 dB/decade). Common test signal for room acoustics.

Hum / mains pickup: very narrow spike at 50 Hz (Europe, Asia) or 60 Hz (North America), often with smaller spikes at 100/120 Hz, 150/180 Hz, etc.

Clipped signal: a forest of harmonics extending well above the fundamental. Easy to confirm because the time-domain waveform shows visible flat tops.

Reverb / room modes: low-frequency peaks at the room's standing-wave frequencies, typically below 300 Hz. Usually 5-20 dB above the smooth baseline.

Windowing — why peaks have skirts

Same single tone, two windows. Rectangular leaks energy into neighbouring bins (wide skirts). Hann taper concentrates the energy back into one peak.

The FFT assumes the chunk it analyses repeats forever. For most real signals it doesn't, so the boundary discontinuity creates spectral leakage — a single tone's energy spreads into neighbouring bins instead of staying in one. The fix is to multiply the chunk by a tapered window (Hann, Hamming, Blackman, Kaiser, Flat-top) before the FFT, which smooths the edges to zero. The diagram above shows the same single tone analysed with no window (left, energy leaking sideways) versus a Hann window (right, energy concentrated in one peak).

Each window has a different tradeoff: Hann is the everyday default — narrow main lobe, decent side-lobe rejection. Blackman trades a wider main lobe for better side-lobe suppression (good for finding small tones near big ones). Flat-top has the widest main lobe but the best amplitude accuracy — used when you need to measure peak height precisely. Rectangular (no window) has the narrowest main lobe but the worst leakage — only useful when you've made the chunk an exact multiple of the tone period.

Linear vs logarithmic frequency axis

A linear axis spaces 0–10 kHz the same as 10–20 kHz. That's wrong for hearing-related analysis: the ear hears pitch logarithmically, so 100 Hz and 200 Hz are an octave apart, while 10000 Hz and 10100 Hz are a tiny fraction of a semitone. On a linear axis, the bass region (which is where most musical action lives) gets squashed into the leftmost 5% of the plot.

Use logarithmic frequency for any audio task: voice analysis, music, room acoustics, hearing aids. Use linear frequency for scientific instruments, vibration analysis on rotating machinery (where you care about specific shaft harmonics), and ultrasonic work where features are evenly spaced. Our Spectrum Analyzer defaults to logarithmic for that reason.

Reading the noise floor

The flat or gently-sloping baseline below all the peaks is the noise floor — limited by quantization in the ADC, electronics noise in the recording chain, and ambient acoustic noise in the room. Its level relative to the peaks (in dB) is the dynamic range or SNR of your recording. For a 16-bit recording the theoretical floor is around −96 dBFS; for 24-bit, −144 dBFS. Real-world floors are 20–40 dB worse because of the analog chain.

If a tone you're trying to measure is within 10 dB of the floor, increase the FFT length, increase the gain (carefully — clipping kills the spectrum), or average multiple FFTs. Averaging N FFTs reduces noise floor visibility by 10·log10(N) dB without smearing the tone.

Related WuTools

Spectrum Analyzer — Live FFT with selectable window, length, and log/linear axis
Waveform Viewer — Time-domain plus spectrogram side-by-side
Tone Generator — Make a clean test tone to verify your analyzer settings
Key Detector — Pitch and musical-key detection from FFT chroma
Audio Equalizer — Reshape the spectrum after you understand it

Frequently asked questions

Why does my pure sine wave look like a triangle on the FFT?

Spectral leakage from rectangular (no) windowing. The tone's frequency probably doesn't sit exactly on a bin centre, so its energy spreads. Switch to a Hann or Blackman window and the triangle becomes a much narrower spike.

What's the difference between a spectrum and a spectrogram?

A spectrum is one FFT — a snapshot of frequency content over a slice of time. A spectrogram is many FFTs stacked side by side as time advances, usually shown as a heatmap with time on the X-axis, frequency on the Y, and amplitude as colour.

How do I increase frequency resolution without losing time resolution?

You can't — they're inversely related (Heisenberg-style uncertainty). A longer FFT means better frequency bins but worse time localization. The compromise is the Short-Time Fourier Transform with overlapping windows, which most spectrograms use.

Why is the FFT only useful up to half my sample rate?

Nyquist's theorem: a signal sampled at fs Hz can only represent frequencies up to fs/2. Anything above that gets aliased — folded back into lower frequencies. The FFT respects this; bins above N/2 are mirror images and are discarded.

What does a spike at 60 Hz mean?

Mains-power hum picked up by the recording chain (or 50 Hz in Europe and most of Asia). Common with single-coil pickups, ground loops, cheap power supplies, or laptops on AC. Often accompanied by smaller harmonics at 120, 180, 240 Hz.

Why do FFT plots use dB instead of percent?

Audio dynamic range routinely covers 80–100 dB (10000:1 to 100000:1). On a percent scale anything below 1% is invisible, but those quiet frequencies are often what you care about (room reverb, distortion harmonics, noise floor).

Can FFT detect non-stationary signals like speech?

A single FFT will smear a speech signal — it averages all the phonemes together. Use a spectrogram (a stream of short FFTs, typically 20–40 ms wide) or look at our Key Detector which handles per-frame analysis.

What's a flat-top window for?

Measuring the exact amplitude of an isolated tone. Flat-top windows have wide main lobes (bad for resolving close tones) but very flat tops (the peak height equals the tone amplitude regardless of where it sits between bins). Test bench amplitude verification typically uses flat-top.