Friday, February 8, 2013

The CALM Act; Television Program VS Commercial Loudness Part 1

On December 2nd 2010 the US Congress passed the CALM Act: The Commercial Advertisement Loudness Mitigation Act which, essentially, created rules to try to normalize the volume levels between television commercials and the television programs themselves.  

For decades consumers had complained about the WILD differences in levels between the television programs and the blisteringly loud ads that interrupted those programs.  Congresswoman Anna Eshoo decided it was up to the US Government to step in and save the day and for once a bill received bipartisan support.

With the legislation finally taking effect a few months ago, it's brought to the foreground a number of discussions and, in turn, stark misconceptions about the concept of loudness, our perception of sound versus its measurement and the craftspeople who create motion picture soundtracks and albums.  Finding myself fuming after reading some of the more glaring errors I decided to sit down and try to dispel some of the more frustrating perceptions about the world I work in everyday.


INTRODUCTION: Reddit Made Me Do It

I'm a pretty regular reader (read: addict) of the social news site reddit.com, and in the wake of the CALM act users posted a number of hive-minded rants about either about the volume of TV commercials, the variation of volume in feature film soundtracks or the superior sound quality of pre-1980's record albums.

By the time these rants are showered with enough upvotes reach the front page of reddit's rankings, they've been piled on with hundreds and thousands of user comments: usually dominated by endless "Me too"'s.  However, peppered within the chorus of sympathetic souls are dozens of incorrect assumptions, urban legends and expert opinions from someone who has used Cool Edit Pro.

Trying to add an informed response to the din at that point in the discussion is a lot like trying to talk to a tornado.  I thought, instead, I would present the information to those folks who are interested in the foundations of the problem in hopes that next time glaring misinformation is spread, it might be better discussed, if not diffused entirely.

An explanation of the CALM act serves as a great starting point for a larger discussion incorporating concepts of audio storage, measurement, dynamics, loudness and perception which affect disciplines spanning film, television and music.  I hope in the context of explaining the CALM act that I can also help explain the causes behind some of your biggest pet peeves in sound and soundtracks and what you can do about them.


First We Start By Measuring The Invisible 


In order to understand what it means when one sound is 'louder' than another, one need first think about how sound can be measured.  Like many forces in nature, sound is sometimes difficult to quantify.  Sound can happen over a long period of time or be nearly instantaneous.  Sound can exist in a purely electrical form, digital form or a physical form.  Sound also exists not only in relation to a listener but also in an objective form that is independent from a listener.

But in order for the CALM Act to say 'this commercial cannot be louder than the program' we have to decide how we will determine how loud something is... and if we want to mandate rules everyone can follow and can be enforced, we need to agree on a standard to measure and compare loudness.


When sound is recorded it is converted to electrical energy and stored.  Sound can be measured in terms of how much electrical energy is in that recorded sound or in terms of actual sound pressure coming out of a speaker when it is reproduced.  Since the sound pressure coming out of the speaker can be affected by a number of variables (not the least of which is how loud you have the volume set on your TV),  we need to focus on measuring something less relative.  The following examples are how recorded sound are typically measured in terms of the recorded electrical level.


Peak Levels.

One of the most common measurements for electrical sound signal is a PEAK measurement.  Peak is exactly as it sounds, how loud is the loudest peak in a given sound- often, more specifically, the highest peak of electrical signal of a recorded sound wave.  Most graphic meters you see on the front of audio and video equipment are measuring and displaying peak information in regards to audio.






When looking at the digital storage of audio, the Peak scale has a fixed maximum- meaning that there is an absolute amount the loudest sound can be and still be stored by the system.   The scale by which this is measured is called dBFS (Decibels relative to Full Scale) and the absolute maximum value is 0dBFS and all values leading up to this maximum are expressed as negative numbers.



One way to think of the dBFS scale is to picture the old High Striker game from the carnival: a vertical scale with a fixed maximum.  The harder you hit it, the higher it goes until it maxes out.





The problem with a Peak measurement in determining how 'loud' something is, is that it is often not a very accurate way to compare two sounds.  Sound occurs over a span of time, so how loud it is for one very small instant might not tell you might about how you hear the sound as a whole.  Some specialized audio meters (like VU meters) measure peak levels with a very slow response (showing a much more 'averaged' peak level) which give you a much better reflection of how loud a sound might be perceived to be, but it still doesn't tell the whole story.


The Problem With The Peaks.

If you've got your thinking caps on and you read that digital audio has a fixed maximum, you've got to wonder how a given CD can sound louder than another.  Does that mean some albums are just turned down under the maximum level?  No, not by a long shot...

Imagine for a second a boy bouncing on his bed.  He is bouncing higher and higher until his hair starts to brush against the ceiling of the bedroom.  He's reached the maximum height allowed in this apartment.




But he really wants to go higher- so he jumps harder and as he reaches the ceiling he tucks his head a bit and continues to just brush against the maximum.




And still he wants to go even higher- so he bounces and jumps even harder, this time tucking his head and his shoulders as he reaches the ceiling.





Finally he gives it all he's got and jumps as hard as he can.  He reaches the ceiling with ease and folds his body at the waist so his head, shoulder and whole torso brush right against the maximum height for the apartment.


So the question is: did he really jump higher?  Well, technically we determined there was a maximum height represented by the ceiling so he was never able to jump any higher than that.  But, on average, more of his body was close to the maximum while the peak remained the same.  By tucking in his upper parts when he reached the ceiling, more of his body was at maximum-- so he perceived himself as going higher and higher. 


And a very similar thing happens with much of the music released in the last decade or three.  Engineers utilize a device called a limiter which, using our example above,  essentially serves to automatically duck the head, shoulder and torso of the audio as it approaches the maximum.  And they often go one further: with the limiter in place they begin to raise the level of apartment floor so almost every jump results in some ducking of the head and torso: the boy's body is practically pinned to the ceiling!

This results in a perception of 'louder' -- not because it is higher in maximum level but because more of the sound is at the maximum level.

You can see this by looking at visual representations of music from different eras.  Here's Queen/Bowie's Under Pressure captured from the original Vinyl recorded in 1981 (left) versus This Is War from 30 Seconds to Mars from 2009 (right):



Both songs have a similar structure: with a stripped down opening that builds to a more active crescendo, but you can clearly see when each song hits their loudest passages the Queen recording has much more variance in the levels while the 30 Seconds to Mars track is squashed at the maximum throughout.  The actual peak level for each is the same, just more of the song on the right is pushed close to that maximum, resulting in a perceived increase in loudness for the listener.

Now it's important to note that limiters weren't invented in the 1990s, never before used by modern man.  Limiters and and other dynamic range altering devices have been around for a long time- however the modern versions create far fewer 'artifacts' allowing the signal to be more and more smashed without as much obvious degradation of sound.  The sound is degraded and in many cases will cause ear fatigue much sooner, but just not degraded in a way that the average listener would object.

And thus the legendary album loudness wars are born.  Despite a finite 'speed limit' to the maximum sound able to be stored, there is a perceived difference in loudness created by cramming more and more signal closer to maximum.  This created insecure musicians and labels who commanded engineers to match the perceived loudness of the albums released by their peers.  Every little push into the ceiling inspires another and another until average listeners do hear a difference and begin to prefer the sound on the old records.


And so, to bring this back to the topic of comparing loudness, looking at the Peak level of a given signal isn't the best indicator of how 'loud' it is.  We can manipulate sounds within a fixed Peak system to maximize perceived loudness without affecting peak.  Just saying two sounds have the same peak level says very little about their comparative loudness, so we need another way to measure and compare sounds.




RMS Levels.

Many people with a background in science, engineering or math have probably come across the concept of RMS, or Root Mean Square.  It's essentially a way to measure the magnitude of a something that has variation over time.   A signal as complex as a sound wave has a lot of variation over a short period of time.



Without delving into specific formulas, RMS serves in audio as a way of looking at the mean level of a signal, usually calculated a few milliseconds at a time.  This results in measuring more of the 'body' of the signal instead of just the loudest part.  In our example of the couch-bouncing boy, measuring RMS would be somewhat akin to tracking the movement of his whole midsection as he bounced instead of just the peak. As he jumped 'higher' by manipulating his body, we would be able to see that increase using RMS measurements, even though the peak remained unchanged.

Now we're getting somewhere!  RMS measurements can't be as easily tricked by using a limiter: looking at our examples from before, the crescendo passage from the Queen song measures, on average, a full 5db lower than the passage from the 30 Seconds To Mars track on an RMS scale, despite having identical peak level.  By pushing those levels up against the ceiling the measured RMS level was raised along the way!

So, it seems we could use RMS measurement as a the value to determine 'loudness' and declare one thing too loud compared to another.  Well... not quite yet.


Leq: One Value To Rule Them All. 

So now we have RMS as a way to measure loudness so we can compare two sources and determine if one seems louder than the other.  However traditional RMS audio metering is based on measuring a few millisecond snippet of signal.  As a result, plotting an RMS measurement over time might look something like this:


This makes it difficult to try to compare two different sources using RMS readings.  The curves are never perfectly going to line up so we're still trying to compare apples and oranges in most cases.  Ideally we would just have a single value to represent the loudness of a passage and so we can just compare the numbers and say definitively and simply what is louder.

Enter Leq - Equivalent Continuous Sound Level.  Leq is basically a single value calculated from a range of RMS values.  It allows the assignment of a single number to a sound sample to represent the total energy of that sound over its duration.

Bingo!  Now we can take two sounds and quickly say "This number is bigger than that number, so this is louder than that!" 

Right?  Right?  Nope, we're still not done yet.



Barry White vs Fran Drescher.

So as we've said, the first problem with RMS as a sound measurement is that plotting it over time results in multiple values that makes comparison difficult.  Leq addressed this by boiling down continuous measurements to a single value. 


The second problem with these kinds of measurements is that they don't take into account the listener.  These measurements are an objective metering of energy contained within, however measured energy and perceived loudness do not correlate all that well as it turns out.

Human hearing has its own sensitivity curve.  We are more sensitive to sounds in a certain freq range, centered around the range produced by typical human voice.  So while two sounds might contain the same measured energy, we might perceive the loudness very differently depending on the frequency content of the sound.

A good real-world example of this is Barry White versus Fran Drescher.  Barry White, a soul-singer with a famously distinctive bass voice.  Fran Drescher, a comedic actress with an equally famous pinched and nasal voice.

The frequency of Fran's vocal delivery excels mostly in the range where human hearing is most sensitive.  Barry on the other hand, delivers in a frequency range that is typically below the range where humans are most sensitive.  As a result, vocal performances delivered by these two might be measured to be the same in terms of objective energy, however our perception of how loud each sounds to our ears will be greatly different. 

So now we see that not only do we need to measure an average level over time and assign it a single value, but it also needs to be specifically related to how an average human will hear the sound.  We need to value or 'weight' some frequencies differently to better measure not just the objective energy of a sound but how that sound is actually perceived by an average listener.

With that in mind everyone initially decided on using a system called Leq(a) to measure television loudness (Leq with a weighting toward speech frequencies).  However that algorithm didn't do so good at measuring music so with a little tinkering they found a nice compromise in an algorithm called BS.1770.


LKFS

So this concept of a single value weighted based on human hearing was refined, standardized and adopted by the ITU and now known as a measurement algorithm called BS.1770 which results in a measurment known as LKFS (BS.1770 is the method of measuring  sort of like a ruler, and LKFS is the unit of measurement you get from using that ruler, sort of like inches).  LKFS is now the standard measurement scale for loudness level of broadcast television.

Television programs and commercials are now measured (per segment in the case of shows) using the LKFS measurement.  The segment is assigned a value representing its overall average loudness, called the long term level.  Most television networks also have a requirement of how far louder and softer you can be in relation to the long term measurement (controlling how dynamic the program is, a topic we'll discuss in a future installment) called the short term level.


Whew.  Now we have a single value, weighted measure by which to compare and therefor mandate and enforce loudness.


So now we can make some rules... Tune in next time for THE CALM ACT part 2.

Jumping Boy Artwork courtesy of the amazing Samantha Kimball