digitalizzazione segnale acustico

Music and physics: the digitization of the acoustic signal

Who said that physics, electronics and programming have nothing to do with music?

A sound, in fact, is simply the propagation of a wave in the air that vibrates the eardrum of our ear, generating a nerve impulse directed to the brain.

Exactly like radio waves or electromagnetic waves, sound waves can also be captured, analyzed and modulated to our liking.

And this is exactly what a sound card does: it collects an external (analog) sound signal and “digitizes” it, that is, translates it into a language that can be understood and processed by our PC.

This digitization process is divided into two basic parts:

  • sampling: representative signal data are taken at regular time intervals;
  • quantization: approximation of the sampled value with the nearest number that can be represented by the computer.

The term “sampling” literally means taking samples of a variable signal at regular intervals of time.

As can be seen from the images below, given a starting analog signal, the smaller the distance between points (i.e., the higher the sampling rate), the closer the sampled curve is to the original one.

At this point one would be inclined to ask: but does an ideal sampling rate exist?

Unfortunately for us, no, it does not exist. This is simply due to the fact that the frequency at which we sample the signal, to be considered acceptable, depends on numerous factors such as, for example, the number of sound sources we have to analyze (properly recording the melody produced by an orchestra certainly requires more effort than recording a single instrument).

There is, however, a formula that can come to our aid: it is the Nyquist-Shannon Theorem, which tells us that, in order not to lose information useful for a posteriori reconstruction of the original analog signal, the minimum sampling frequency is equal to twice the maximum frequency of the acquired signal.

Since the human ear can perceive sounds up to a maximum of about 20 kHz, the sampling frequency of sound cards must be at least 40 kHz (this is why CDs have signals sampled internally at 44.1 kHz).

After sampling, each element collected is approximated to the nearest numerical value present in the computer.

Thus, the signal is said to have been “quantized” in that it has been transformed into a quantity (a number) that we can easily understand and interpret.

As can be easily guessed, the greater the amount of numbers assignable by the calculator (i.e., its precision), the closer the quantized signal will be to the original one in terms of intensity.

However, this is not enough: although these numbers are more than enough for us to analyze the signal, the PC needs them to be transformed one last time.

Thus, an encoding operation is carried out in which the numbers are translated into the binary system, which is the basic communication system of all digital technology in which each element consists of a series of “1’s” and “0’s.”

At this point, we have finally managed to convert an analog signal (such as a simple guitar arpeggio) into a digital signal that we can read and manipulate on our computer through the use of appropriate software.

After we have properly sampled, quantized, encoded and made an audio track on our computer, what do we do with it?

The main problem, in these cases, is that the file thus generated is extremely heavy either to download, to share or even more simply to move to any area of the PC.

For this reason, nowadays in the world of digital music, audio Codecs (short for “Coder-decoder”) are now widely used: these are hardware or software tools (mainly software) capable of encoding a stream of data in order to be able to later store and transport it more easily.

The goal of Codecs is, therefore, to compress a file to make it more manageable. Such compression, which leads to a reduction in the size of the file, has as an advantage the fact that it needs fewer resources and less computing power for its processing, but has as a disadvantage the inevitable loss of some information initially contained in it.

How, then, do we know which information to keep and which to remove?

There are specific compression algorithms that serve this very purpose and are used in the making of all audio files. One of these is the so-called “time/frequency” approach used in the best-known audio file format in existence, the mp3 format.

Using this approach, the file is divided into a series of time windows and, within each of them, all those signals that are useless because they are not perceived by the human ear are discarded.

Thus, the spectrum of the acoustic signal is taken for each instant and all frequencies “masked” by adjacent ones are discarded. The term “mask effect” refers to a physical phenomenon whereby some frequencies of an acoustic signal having high amplitude (or intensity) mask the adjacent weaker frequencies, making them inaudible to our ear.

This is just one of many algorithms used to make a file in mp3 format. This term is in fact an acronym for MPEG 1 Audio Layer III, or the “third generation” of a format in which only frequency masking was applied at the first stage to reduce the size of the file, while in subsequent generations (up to the third, which is the one still used today) additional data compression algorithms are applied, such as, for example, Huffman Coding.

This concludes this brief smattering of how the creation of a simple audio file that we listen to every day with our Bluetooth headphones takes place, demonstrating how physics and digitization are everywhere around us… even in music!


Andrea Gorreri

Cover photo by Kevin Wuhrmann from Pixabay