From Air to Bits: The Complete Path of Digital Audio
If you've ever built an audio pipeline but didn't really understand the underlying physical phenomena, or need a holistic refresher on the subject, this one's for you. This article is aimed at developing an intution for the entire audio lifecycle, following a single sound from the physical phenomena all the way to network packets.

I find it best to learn with a simple example, and a picture is worth a thousand words, so here's our audio "hello world":
A source emits "hello world", a sink receives the physical waves, converts it to meaningful electrical signals and finally sent across the wire. Simple, right?
Sound as Pressure Waves
What we define as sound is really a conglomerate of molecules, driven through some external force at varying densities through a medium; in our case ambient air (≈ 101 kPa) at the speed of sound.
You can see the behavior these pressure waves exhibit as they permeate through the medium; compressions when air molecules come together and rarefactions when they spread out.
Mathematically, this traversal through ambient air is expressed by the wave equation, shown below in one dimension for simplicity:
This second order equation represents how space and time behave around a point p. The pressure variation with respect to time t is proportional to how the wave curves in space x around that point.
The Speed of Sound
Its worth defining the speed of sound on earth, where the medium is air:
here is the heat capacity ratio of the gas (~1.4 for air), is the universal gas constant, is temperature in Kelvin, and is the molar mass of air. The key takeaway: sound travels faster in hotter, lighter gases.
Why? Well hotter means more molecular activity, so ease of propagation. Lighter gases follow a similar principle: less inertia, easier to incite motion.
For a simple sinusoidal wave, the speed, frequency, and wavelength are related by:
This is the relationship that connects what you hear (frequency, perceived as pitch) to the physical scale of the wave (wavelength). Some concrete numbers to build intuition:
| Frequency | Wavelength | What it sounds like |
|---|---|---|
| 20 Hz | 17 m | Lowest audible rumble |
| 300 Hz | 1.1 m | Low end of speech |
| 3.4 kHz | 10 cm | Upper range for speech intelligibility |
| 20 kHz | 1.7 cm | Upper limit of human hearing |
The Microphone: Pressure to Motion
So we now understand how sound travels through air a bit more, but how does that get turned into meaningful data? The microphone, of course. It is responsible for receiving the pressure waves and converting them from mechanical vibrations to electrical signals.
there are, of course, many different mic designs; we will focus on a basic mechanism to develop intuition for the device:
The star of the show is the microphone's diaphragm; a thin membrane exposed to air on one side. The incoming pressure variations push the entire membrane in and out — think of a drum skin flexing as sound hits it. The force on the diaphragm is the pressure difference across it, integrated over its area:
The diaphragm itself can be modeled as a damped harmonic oscillator — the classic spring-mass-damper system from physics:
where is the diaphragm's mass, is its stiffness, is damping, and is the driving force from the incoming pressure wave. While is active, the sound wave continuously drives the diaphragm, and it tracks the pressure variations in real time.
In contrast when decays, the damping term is what causes the diaphragm to settle back to rest rather than ringing indefinitely.
This model has a natural resonant frequency — the frequency at which the system oscillates most vigorously when driven by an external force:
You've experienced resonance before: push a child on a swing at just the right rhythm and they soar; push at the wrong rhythm and you fight the motion. A diaphragm works the same way — frequencies near get amplified, while others don't.
Whether this resonance is a problem depends on the term. The ratio of damping to the critical threshold determines how the diaphragm behaves:
- Underdamped: the diaphragm rings like a struck bell, amplifying frequencies near . Cheap microphones often exhibit this — you hear it as a harsh, tinny coloration on voices or a brittle sibilance on "s" sounds.
- Overdamped: the diaphragm is sluggish, like pushing through honey. It can't keep up with rapid pressure changes, so high frequencies get rolled off and everything sounds muffled.
- Critically damped: the sweet spot. The diaphragm tracks the incoming pressure wave as faithfully as possible and settles back to rest without overshooting. This produces a flat frequency response — meaning the mic reproduces all frequencies at roughly equal sensitivity, without artificially boosting or attenuating any part of the spectrum.
In practice, well-designed microphones target critical or slightly overdamped behavior. A tiny bit of overdamping sacrifices negligible high-frequency response but adds a safety margin against resonant ringing — a worthwhile trade-off when the goal is accurate reproduction of the original sound.
So the diaphragm is moving — faithfully tracking the pressure wave. But mechanical motion alone isn't useful to any downstream system. In part 2, we'll see how microphones convert that motion into electrical signals through different transduction mechanisms.