In the world of audio engineering, there is frequent debate surrounding sample rates – that is the number of samples of audio per second that make up the digital sound. So, I sought to find some basic answers surrounding this concept.
The Theory
All sound can be accurately, mathematically described as a complex series of sine waves functions changing over time. That means that sounds can be calculated and reproduced by breaking it down into those sine waves.
That’s our first axiom.
Second axiom – that human hearing extends approximately in the range of 20Hz-20kHz. This one is debated a lot – but the truth of it is not necessary for this test. I only need the proper understanding of the fundament behind digital audio.
Third, and this one is pretty safely proven – any sine wave can be described with greater than 2 points along an X and Y axis, and further, the number of points given to describe the sine wave is irrelevant as long as it’s greater than 2.
This is where our 44.1 samples/second rate comes from. This means that we will have greater than 2 points along a amplitude vs. time scale for any sound up to 22kHz. The idea is we can accurately reproduce all the sine waves going up to the extreme of audible hearing.
The Problem
But what happens to frequencies over 22kHz? Well, we don’t reproduce them accurately – we reproduce them inaccurately. Oops. It’s called Aliasing, This is why Analog to Digital converters have filters built in to them – to stop unwanted super-sonic frequencies from confusing the digital programming. The problem with filters are two fold – one is that they don’t totally eliminate sound – they simply reduce it – although, they do so by a lot. The other problem is that they cause phase distortion to occur around the cut off band.
The Solution
Make a system that accepts much higher frequency content. If we grab 88.2 samples/second, our filtering and phasing occurs at 44kHz, which is WAY above human hearing. The only draw back is that our recording will take twice the amount of computer power to work with, and take up twice as much virtual space.
The Argument
The debate begins. Is the difference of 44.1 vs. 88.2 audible? Does it warrant the computer power? Some people will go as far as to say that taking 44.1 recordings and upsampling them to 88.2 files will improve the sound, even if the goal is to go back down to 44.1 (which is the sample rate that CDs burn at).
My Test
I wanted to see what would happen if I took a 44.1 file and boosted the living heck out of the high end, and compared that to an 88.2 upsampling of the same file with the exact same boost. In theory, the 44.1 file should have distortion from boosting up the frequencies above 22kHz – while the 88.2 file should have significantly less distortion.
But first, I had to test another part of the theory. Does upsampling to
88.2 cause any distortion. To test this, I upsampled my test song up to 88.2, printed it, then downsampled it back to 44.1. I then Null Tested (flipped the phase on one of the prints) the two files. If the two completely cancelled out, I knew that they were exactly the same. If there was any sound remaining, I knew that the upsampling caused some form of distortion. The result was a perfect Null.
Next, I took my test song, and I trimmed each version down 16db to prevent any square wave distortion and inter-sample clipping. I then took a digital eq, one that does not do any internal upsampling (I used the stock Pro Tools EQ) and did a 12db boost at 10kHz. I then printed them out.
I did three Null Tests. One where I took the 44.1 print and upsampled it to compare to the 88.2 print, one where I took the 88.2 print and downsampled, and one where I Null Tested the Null Tests!
Conclusion
I found the 88.2 eq’d version to sound subjectively better. My ear alone told me that the fidelity was of higher quality. The Null Tests showed that in fact, there was a difference in the sound – a ghostly broadband version of the entire song was left behind. The Nulls turned out to be the same regardless of whether they were created by upsampling one file or downsampling another.
I’ve included the results:
File: Null Test 44.1.wav
File Size: 6.1 MB
File: sample test 88.2 print.wav
File Size: 12.1 MB
File: sample test 44.1 print.wav
File Size: 6.1 MB
Charlie,
Your explanation is exactly what I ran into myself when I was doing a masters degree at NYU in music technology. But nobody I talked to seemed to understand me and kept thinking I was the one who didn’t understand something. You’re the first person besides myself I’ve heard make this case.
It seems clear–when you look at how many samples there are to represent the higher frequencies there are just not enough–and you get like you said–zero crossings and peaks in all the wrong places–there’s no possible way to get any pretty or even close representation of those frequencies, yet they are allowed through the filter!
Surely the experts at least see this?
The example you are giving only occurs when the sampling frequency is exactly 2x the frequency of the sampled signal (so-called critical frequency). To account for this peculiarity, the Shannon theorem needs to be understood thus: fs > 2B, i.e. sampling frequency needs to be AT LEAST twice the highest frequency to be sampled. Now in your example, sampled at, say, 20.1kHzsampling rate your 10kHz sine wave can be easily reconstructed.
i dont know if this helps but i can hear the song on the null file – with my Sony STR-DE597P amp and Sony MDR-NC7 headphones – using Media Player Classic with AC3Filter v1.51 PCM 24bit AS IS (no change) output.
i put the amp vol up to 70 (MAX) it was tinny but clear.
I think playing back at 24 bit will make a bigger difference than sample frequency.
I’m no expert but wouldn’t errors (imperfections) in pro tools sample rate converting algorithmns invalidate the null test?
Oops, the explanation holds, but the math above is off.
10kHz has a wave period of 1/10,000th of a second, hitting positive max 1/20,000th of a second after the first zero crossing ( the start ), and negative max 2/20,000ths of a second after the start, draw that on the board.
then move it 1/4 wavelength forward or backward in time – leaving the timing of the sampling point the same ( in other words, move the wave, but don’t move the sampling points in time ) – and the wave becomes sampled at the zero crossings – and therefore: it completely disappears on output!
Most real world situations fall somewhere in between, thus: hardly sonic perfection!
Um, the axioms aren’t quite right, god knows why this obvious hole isn’t spoken about more often, probably because the record industry once needed to sell boatloads of CD’s at 44.1. But showing how wrong this assumption is, is fairly simple: your ears will simply confirm this.
Let’s even out the sample rate to make it easier to visualize, let’s say we have a 20khz sampling frequency – and we want to sample the Nyquist maximum, 10khz. it only takes two points on a pure sine wave to describe it perfectly, max & min: so, at 1/20,000th of a second, you’re at the zero point, @ 5/20,000s your at positive max, at 10/20,000s back at zero, 15/20,000s negative max, and at 20/20,000s, back to zero: awesome, it works!!!!
But in the real world, the odds of just by chance sampling that 10kHz wave at exactly the max & min points are, um, small … in fact, it gets much, much worse …
If you sample that incoming real-world-wave, instead of some academic perfection, and it just happens to be shifted 1/4 wavelength either forward or backward in time: THE WAVE COMPLETELY DISAPPEARS! Draw it on a blackboard, or whatever, see for yourself.
So much for “sample at double the highest frequency required for sonic perfection”.
Obviously, most waves fall somewhere in between, and obviously there’s much more to this than I can get into here, but suffice to say, 44.1 16bit often sounds bad not so much because you’re not hearing ultrasonic stuff, but because the hi end you’re actually hearing is kind of messed up, so while it’s there ( unlike so much of the old analog tech ) what’s there winds up sounding brittle and harsh.
Increasing the sample rate so your hi end is more accurately described solves this, my guess is quadruple instead of double is probably enough.
Certainly, by the time you’re at 24bit, 192kHz, we’ve finally hit audio nirvana, we can record better than the human ear can hear – so, oddly enough, it simply become a matter of what kind of vaseline we want to smear on the lens 😉
Oh sheesh, maybe you ought to be a doctor of this sh*t, you’ve just disproved one of the most widely accepted scientific theories of the 20th century… no… wait… my bad, you just don’t understand a word of it.