Saturday, April 20, 2013

The CALM Act; Television Program VS Commercial Loudness Part 2: Electric Boogaloo

On December 2nd 2010 the US Congress passed the CALM Act: The Commercial Advertisement Loudness Mitigation Act which, essentially, created rules to try to normalize the volume levels between television commercials and the television programs themselves.   

For decades consumers had complained about the WILD differences in levels between the television programs and the blisteringly loud ads that interrupted those programs.  Congresswoman Anna Eshoo decided it was up to the US Government to step in and save the day and for once a bill received bipartisan support.

With the legislation finally taking effect a few months ago, it's brought to the foreground a number of discussions and, in turn, stark misconceptions about the concept of loudness, our perception of sound versus its measurement and the craftspeople who create motion picture soundtracks and albums.  Finding myself fuming after reading some of the more glaring errors I decided to sit down and try to dispel some of the more frustrating perceptions about the world I work in everyday. 

In PART ONE of my ramblings, I outlined some of the complications with quantifying sound into a workable measurement and touched on some of the confusions involved with the idea of comparing loudness.  Now in PART TWO we'll see specifically how these measurements are applied, the technologies used and what the CALM act actually mandates to mitigate commercial advertising loudness.

LKFS: One bling to rule them all.

When last we left our hero, we had established a single value, weighted measure based on dialog by which to compare and therefor mandate and enforce loudness: LKFS, and it became the standard by which to measure loudness in the realm of digital television.  

So now the CALM act just says everything broadcast on TV has to be measured to the same value on the LKFS scale, right

Not exactly.

All the CALM Act actually says is that the FCC needs to make mandatory something called A/85:

"[...]the Federal Communications Commission shall prescribe pursuant to the Communications Act of 1934 a regulation that is limited to incorporating by reference and making mandatory the ‘Recommended Practice: Techniques for Establishing and Maintaining Audio Loudness for Digital Television’ (A/85), and any successor thereto, approved by the Advanced Television Systems Committee[...]"

Short of actually mandating specific loudness of broadcast television, the CALM ACT simply says "Hey FCC, enforce this existing set of guidelines on commercial broadcasts with the goal of mitigating the loudness problem."

So what is this A/85 and what exactly does it recommend?  Well, a lot actually.

A/85 (and A/53) or: How I Learned to Stop Worrying and Love BS.1770.

ATSC document A/85 (aka "Recommended Practice: Techniques for Establishing and Maintaining Audio Loudness") is a 75 page document that covers most elements relating to digital television broadcasting as it applies to audio.  It includes sections on the BS.1770 standard for measuring audio (the name of the algorithm that determines the LKFS measurement I discussed in PART ONE), metadata of the audio formats included in Dtv and their roles in loudness, Dynamic range compression features of the Dtv audio system, proper audio monitoring, loudspeaker placement and more.  It really is quite a document; I do highly recommend anyone working in audio to download a copy for themselves and check it out!  I don't know that I'd say it's quite a 'For dummies' read, but it's much more digestible than most of the technical documents the ATSC generates on a regular basis.

But within the 75 pages of useful information, the chewy center at the core of the entire system recommended in A/85 (and thereby mandated by the CALM act) actually points to a second document called A/53: Part 5 (aka "ATSC Digital Standard -- AC-3 Audio System Characteristics").  Hidden among hundreds of "shall" "should" and "may" provisions in A/53 is this gem:

"The value of the dialnorm parameter in the AC-3 elementary bit stream shall indicate the loudness of the encoded audio content (typically of the average spoken dialogue) using LKFS units."

Ah-ha!  That fixes everything!  You don't see it?  Well then maybe I should explain...

Dialnorm, AC-3, Dolby Digital and how Batman is tangentially involved.

In the late 1980's and early 1990's, the folks at Dolby Labs developed a system [initially] for implementing digital audio in cinemas.  The system included a codec for compressing digital audio as well as a method for carrying that digital soundtrack on film prints to be decoded in the cinema.  The technical name for the format they developed was called AC-3, commercially known by its marketing name: Dolby Digital.  Dolby Digital premiered in theaters with the release of Batman Returns in the summer of 1992.


Most people familiar with the name Dolby Digital associate it with 'Surround Sound,' however it's important to realize that 'Surround Sound' existed well before Dolby Digital (dating back to Disney's Fantasia in 1940) and continues to exist independent of Dolby Digital (surround soundtracks can be carried by other competing systems like DTS and SDDS).  And, equally important, the Dolby Digital format supports channel configurations ranging from single channel mono up through eight channel 7.1 surround format.   So Dolby Digital is not necessarily surround sound and surround sound is not necessarily Dolby Digital.

But what Dolby Digital (AC-3) is is a handy, flexible, high-quality audio compression technique allowing the inclusion of multiple channels of audio in a relatively low bandwidth bitstream: it's a groovy way to carry lots of complex digital sound in a really small file.  

AC-3 is so handy, flexible, high quality and low bandwidth that it migrated from the cinema onto laserdisc. AC-3 then leapt from laserdisc onto DVD.  From DVD, AC-3 then became a supported format for HD-DVD, Blu-ray, Video Games and, possibly most importantly, the audio standard for broadcast Digital Television.

In the AC-3 digital stream, along with the sound itself, each packet carries along descriptive metadata: 'data about the data.'  An audio packet containing metadata is a lot like you walking around a party with a t-shirt on that tells everyone you meet a little bit about you.

Obscure John Cusack reference.

Each packet of AC-3 audio contains a whole host of parameters that tell the receiving system all kinds of descriptive information about the audio stream that is contained within.  Back in 1992 when the AC-3 packets carried the breathtakingly bad performance of Michelle Pfeiffer as Catwoman in glorious digital sound to cinemas across the land, not all of these metadata parameters were especially useful.  However, as AC-3 evolved from a cinema format to a multi-environment format, these parameters suddenly became extremely useful.

Buried within the metadata parameters like bitstream mode, audio coding mode, room type, preferred stereo downmix mode, downmix levels and many many many more is a value called Dialnorm, short for dialog normalization.

Alright, enough teasing already: What the f*** is Dialnorm?!?

All AC-3 data streams need to be turned back into sound.  The process of taking this digital stream and turning it into the soundtrack you hear starts with something called a decoder.  These Dolby decoders are built into any devices that support AC-3 soundtracks (DVD players, televisions, surround sound audio receivers, etc).  All Dolby decoders are required to read and respond to the specific metadata included in AC-3 the data stream.   

Dialnorm is a metadata parameter within these AC-3 streams that, in theory, tells the system the loudness of the audio stream- specifically the average level of the dialog.  If used correctly, this flag would report the measured dialog level of the audio stream to the decoder.  The decoder then takes this value and shifts the overall loudness so the dialog level matches a pre-determined level.

It goes like this: 

You grab your copy of 'Mega Shark Versus Giant Octopus' down from the shelf and throw it in the DVD player.  The soundtrack on this DVD has a dialnorm value of -27 in the metadata of the AC-3 stream.  When that stream hits the decoder it will recognize the -27 value and will turn the whole soundtrack down by 4db to -31 (-31 being the predetermined value that Dolby has deemed ideal for soundtrack consistency).  

You get bored of the Giant Octopus and grab a different DVD, this time the complete first season of 'The Fall Guy.' This one has a dialnorm value of -22.  The stream hits the decoder and it lowers the overall level down 9db to also play at -31. 

If everything was flagged correctly, the listening levels of these two soundtracks on playback should be very similar -- you shouldn't need to manually adjust your volume despite the fact that 'The Fall Guy' DVD has nearly twice the average dialog level as 'Mega Shark', the decoder (which is built right into your TV, DVD player or audio receiver) did it for you.

It's important to note that this is just a wholesale shift in the overall level of the program.  Unlike old analog compression systems (which I'll discuss in more detail in PART THREE) that actually changed the characteristics of the original sound to better fit into the limitations of the analog broadcast system, the dialnorm offset process doesn't actually change the quality or character of original sound, just the level.

Now we apply this same concept to a digital broadcast television system.  Digital television uses AC-3 as the audio carrier system, complete with a dialnorm metadata parameter.  As you switch between channels, watch different programs or sit through commercial breaks- your dolby decoder would be receiving metadata from each audio stream and using the dialnorm parameter to automatically 'normalize' the level of dialog between different programs, allowing you to munch away on your Cool Ranch Doritos in aural bliss. 

If you were to scroll back up this page and re-read the passages above, there is an important caveat repeated over and over again: "if flagged correctly." This is how the system would ideally work, so long as the metadata parameter of dialnorm was set to correctly reflect the actual level of dialog in the soundtrack.  

Unfortunately, for the first few decades of Dolby Digital, this parameter was rarely (never) set correctly.  The default dialnorm setting on the industry standard AC-3 professional encoder was -27 and I would bet you a diddled-eyed Joe to a damned-if-I-know that any DVD you pull off your shelf and run through a Dolby decoder would show it was flagged with dialnorm at -27, regardless of the actual average dialog level.  Every soundtrack was just encoded with the default setting because very few people really knew what dialnorm did and those that did weren't real clear on what scale to use to measure the level.  

The system was in place but nobody was using it.

And therein lies the beauty of the CALM Act, A/85 and A/53:Part 5 we started with above: These documents mandate not only the usage of the dialnorm parameter:

"The value of the dialnorm parameter in the AC-3 elementary bit stream shall indicate the loudness of the encoded audio content (typically of the average spoken dialogue) using LKFS units."

but they also set forth a specific and standardized way to measure the audio (the BS.1770 algorithm and the LKFS scale we talked about iPART ONE).  Now everyone is comparing apples-to-apples and broadcasters are legally required to accurately include that data in their broadcast.  

So now if Stanley Steemer wants to make a commercial that is twice as loud as the episode of Toddlers and Tiaras you're trying to enjoy, the now legally mandated metadata measurement derived using the now industry standardized method of BS.1770 would tell your decoder all about it and your decoder would turn old Stanley down to match the level of the program.  Boom, commercial advertisement loudness mitigated.  And all it took was a federal law.  Thanks Congress! 

So the world is safe, the bad guy got his comeuppance and we go home with prom queen right?

Not so fast little shaver.  Having accurate metadata anchored around an average level of dialog and allowing the decoder to automatically normalize dialog level between different audio streams is an amazing improvement for the listener, however average levels rarely tell the whole story.  While the yearly average temperature of my hometown might be 50 degrees, that doesn't necessarily mean you wanna be outside in a light jacket in the middle of February.  There is still a question of the dynamics of the modern audio material (how much louder and how much quieter the sound gets above and below that measured average).

So join me again in PART THREE as we discuss the damned-if-you-do, damned-if-you-don't world of dynamics in audio mixing and the tools that exist to allow the listeners to determine their own individual sonic destinies.

Monday, March 4, 2013

Looking for a documentary on Netflix Instant?

Having trouble choosing a film in the vast sea of titles now available on NETFLIX streaming?  

Below are a couple documentaries that I sound edited or supervised which are currently available via instant streaming on NETFLIX.

Check them out!

Two Days in April : Follows the story of 4 college football players signed by the sports agency IMG, as they bring them to a training facility in Florida and both physically and mentally prepare them for the NFL draft.  From Director Don Argott (Rock School, Art of the Steal, Last Days Here).

Spirit of the Marathon : The first ever feature-length film to capture the essence, drama and unique spectacle of the famed 26.2-mile race, the production features five runners - three amateurs and two elites - as they train for and ultimately run the Chicago Marathon. 

Woody Allen: A Documentary : Iconic writer, director, actor, comedian, and musician Woody Allen allowed his life and creative process to be documented on-camera for the first time. With this unprecedented access, Emmy-winning, Oscar-nominated filmmaker Robert Weide (Curb Your Enthusiasm, Mother Night) followed the notoriously private film legend over a year and a half to create the ultimate film biography. 

Friday, February 8, 2013

The CALM Act; Television Program VS Commercial Loudness Part 1

On December 2nd 2010 the US Congress passed the CALM Act: The Commercial Advertisement Loudness Mitigation Act which, essentially, created rules to try to normalize the volume levels between television commercials and the television programs themselves.  

For decades consumers had complained about the WILD differences in levels between the television programs and the blisteringly loud ads that interrupted those programs.  Congresswoman Anna Eshoo decided it was up to the US Government to step in and save the day and for once a bill received bipartisan support.

With the legislation finally taking effect a few months ago, it's brought to the foreground a number of discussions and, in turn, stark misconceptions about the concept of loudness, our perception of sound versus its measurement and the craftspeople who create motion picture soundtracks and albums.  Finding myself fuming after reading some of the more glaring errors I decided to sit down and try to dispel some of the more frustrating perceptions about the world I work in everyday.

INTRODUCTION: Reddit Made Me Do It

I'm a pretty regular reader (read: addict) of the social news site, and in the wake of the CALM act users posted a number of hive-minded rants about either about the volume of TV commercials, the variation of volume in feature film soundtracks or the superior sound quality of pre-1980's record albums.

By the time these rants are showered with enough upvotes reach the front page of reddit's rankings, they've been piled on with hundreds and thousands of user comments: usually dominated by endless "Me too"'s.  However, peppered within the chorus of sympathetic souls are dozens of incorrect assumptions, urban legends and expert opinions from someone who has used Cool Edit Pro.

Trying to add an informed response to the din at that point in the discussion is a lot like trying to talk to a tornado.  I thought, instead, I would present the information to those folks who are interested in the foundations of the problem in hopes that next time glaring misinformation is spread, it might be better discussed, if not diffused entirely.

An explanation of the CALM act serves as a great starting point for a larger discussion incorporating concepts of audio storage, measurement, dynamics, loudness and perception which affect disciplines spanning film, television and music.  I hope in the context of explaining the CALM act that I can also help explain the causes behind some of your biggest pet peeves in sound and soundtracks and what you can do about them.

First We Start By Measuring The Invisible 

In order to understand what it means when one sound is 'louder' than another, one need first think about how sound can be measured.  Like many forces in nature, sound is sometimes difficult to quantify.  Sound can happen over a long period of time or be nearly instantaneous.  Sound can exist in a purely electrical form, digital form or a physical form.  Sound also exists not only in relation to a listener but also in an objective form that is independent from a listener.

But in order for the CALM Act to say 'this commercial cannot be louder than the program' we have to decide how we will determine how loud something is... and if we want to mandate rules everyone can follow and can be enforced, we need to agree on a standard to measure and compare loudness.

When sound is recorded it is converted to electrical energy and stored.  Sound can be measured in terms of how much electrical energy is in that recorded sound or in terms of actual sound pressure coming out of a speaker when it is reproduced.  Since the sound pressure coming out of the speaker can be affected by a number of variables (not the least of which is how loud you have the volume set on your TV),  we need to focus on measuring something less relative.  The following examples are how recorded sound are typically measured in terms of the recorded electrical level.

Peak Levels.

One of the most common measurements for electrical sound signal is a PEAK measurement.  Peak is exactly as it sounds, how loud is the loudest peak in a given sound- often, more specifically, the highest peak of electrical signal of a recorded sound wave.  Most graphic meters you see on the front of audio and video equipment are measuring and displaying peak information in regards to audio.

When looking at the digital storage of audio, the Peak scale has a fixed maximum- meaning that there is an absolute amount the loudest sound can be and still be stored by the system.   The scale by which this is measured is called dBFS (Decibels relative to Full Scale) and the absolute maximum value is 0dBFS and all values leading up to this maximum are expressed as negative numbers.

One way to think of the dBFS scale is to picture the old High Striker game from the carnival: a vertical scale with a fixed maximum.  The harder you hit it, the higher it goes until it maxes out.

The problem with a Peak measurement in determining how 'loud' something is, is that it is often not a very accurate way to compare two sounds.  Sound occurs over a span of time, so how loud it is for one very small instant might not tell you might about how you hear the sound as a whole.  Some specialized audio meters (like VU meters) measure peak levels with a very slow response (showing a much more 'averaged' peak level) which give you a much better reflection of how loud a sound might be perceived to be, but it still doesn't tell the whole story.

The Problem With The Peaks.

If you've got your thinking caps on and you read that digital audio has a fixed maximum, you've got to wonder how a given CD can sound louder than another.  Does that mean some albums are just turned down under the maximum level?  No, not by a long shot...

Imagine for a second a boy bouncing on his bed.  He is bouncing higher and higher until his hair starts to brush against the ceiling of the bedroom.  He's reached the maximum height allowed in this apartment.

But he really wants to go higher- so he jumps harder and as he reaches the ceiling he tucks his head a bit and continues to just brush against the maximum.

And still he wants to go even higher- so he bounces and jumps even harder, this time tucking his head and his shoulders as he reaches the ceiling.

Finally he gives it all he's got and jumps as hard as he can.  He reaches the ceiling with ease and folds his body at the waist so his head, shoulder and whole torso brush right against the maximum height for the apartment.

So the question is: did he really jump higher?  Well, technically we determined there was a maximum height represented by the ceiling so he was never able to jump any higher than that.  But, on average, more of his body was close to the maximum while the peak remained the same.  By tucking in his upper parts when he reached the ceiling, more of his body was at maximum-- so he perceived himself as going higher and higher. 

And a very similar thing happens with much of the music released in the last decade or three.  Engineers utilize a device called a limiter which, using our example above,  essentially serves to automatically duck the head, shoulder and torso of the audio as it approaches the maximum.  And they often go one further: with the limiter in place they begin to raise the level of apartment floor so almost every jump results in some ducking of the head and torso: the boy's body is practically pinned to the ceiling!

This results in a perception of 'louder' -- not because it is higher in maximum level but because more of the sound is at the maximum level.

You can see this by looking at visual representations of music from different eras.  Here's Queen/Bowie's Under Pressure captured from the original Vinyl recorded in 1981 (left) versus This Is War from 30 Seconds to Mars from 2009 (right):

Both songs have a similar structure: with a stripped down opening that builds to a more active crescendo, but you can clearly see when each song hits their loudest passages the Queen recording has much more variance in the levels while the 30 Seconds to Mars track is squashed at the maximum throughout.  The actual peak level for each is the same, just more of the song on the right is pushed close to that maximum, resulting in a perceived increase in loudness for the listener.

Now it's important to note that limiters weren't invented in the 1990s, never before used by modern man.  Limiters and and other dynamic range altering devices have been around for a long time- however the modern versions create far fewer 'artifacts' allowing the signal to be more and more smashed without as much obvious degradation of sound.  The sound is degraded and in many cases will cause ear fatigue much sooner, but just not degraded in a way that the average listener would object.

And thus the legendary album loudness wars are born.  Despite a finite 'speed limit' to the maximum sound able to be stored, there is a perceived difference in loudness created by cramming more and more signal closer to maximum.  This created insecure musicians and labels who commanded engineers to match the perceived loudness of the albums released by their peers.  Every little push into the ceiling inspires another and another until average listeners do hear a difference and begin to prefer the sound on the old records.

And so, to bring this back to the topic of comparing loudness, looking at the Peak level of a given signal isn't the best indicator of how 'loud' it is.  We can manipulate sounds within a fixed Peak system to maximize perceived loudness without affecting peak.  Just saying two sounds have the same peak level says very little about their comparative loudness, so we need another way to measure and compare sounds.

RMS Levels.

Many people with a background in science, engineering or math have probably come across the concept of RMS, or Root Mean Square.  It's essentially a way to measure the magnitude of a something that has variation over time.   A signal as complex as a sound wave has a lot of variation over a short period of time.

Without delving into specific formulas, RMS serves in audio as a way of looking at the mean level of a signal, usually calculated a few milliseconds at a time.  This results in measuring more of the 'body' of the signal instead of just the loudest part.  In our example of the couch-bouncing boy, measuring RMS would be somewhat akin to tracking the movement of his whole midsection as he bounced instead of just the peak. As he jumped 'higher' by manipulating his body, we would be able to see that increase using RMS measurements, even though the peak remained unchanged.

Now we're getting somewhere!  RMS measurements can't be as easily tricked by using a limiter: looking at our examples from before, the crescendo passage from the Queen song measures, on average, a full 5db lower than the passage from the 30 Seconds To Mars track on an RMS scale, despite having identical peak level.  By pushing those levels up against the ceiling the measured RMS level was raised along the way!

So, it seems we could use RMS measurement as a the value to determine 'loudness' and declare one thing too loud compared to another.  Well... not quite yet.

Leq: One Value To Rule Them All. 

So now we have RMS as a way to measure loudness so we can compare two sources and determine if one seems louder than the other.  However traditional RMS audio metering is based on measuring a few millisecond snippet of signal.  As a result, plotting an RMS measurement over time might look something like this:

This makes it difficult to try to compare two different sources using RMS readings.  The curves are never perfectly going to line up so we're still trying to compare apples and oranges in most cases.  Ideally we would just have a single value to represent the loudness of a passage and so we can just compare the numbers and say definitively and simply what is louder.

Enter Leq - Equivalent Continuous Sound Level.  Leq is basically a single value calculated from a range of RMS values.  It allows the assignment of a single number to a sound sample to represent the total energy of that sound over its duration.

Bingo!  Now we can take two sounds and quickly say "This number is bigger than that number, so this is louder than that!" 

Right?  Right?  Nope, we're still not done yet.

Barry White vs Fran Drescher.

So as we've said, the first problem with RMS as a sound measurement is that plotting it over time results in multiple values that makes comparison difficult.  Leq addressed this by boiling down continuous measurements to a single value. 

The second problem with these kinds of measurements is that they don't take into account the listener.  These measurements are an objective metering of energy contained within, however measured energy and perceived loudness do not correlate all that well as it turns out.

Human hearing has its own sensitivity curve.  We are more sensitive to sounds in a certain freq range, centered around the range produced by typical human voice.  So while two sounds might contain the same measured energy, we might perceive the loudness very differently depending on the frequency content of the sound.

A good real-world example of this is Barry White versus Fran Drescher.  Barry White, a soul-singer with a famously distinctive bass voice.  Fran Drescher, a comedic actress with an equally famous pinched and nasal voice.

The frequency of Fran's vocal delivery excels mostly in the range where human hearing is most sensitive.  Barry on the other hand, delivers in a frequency range that is typically below the range where humans are most sensitive.  As a result, vocal performances delivered by these two might be measured to be the same in terms of objective energy, however our perception of how loud each sounds to our ears will be greatly different. 

So now we see that not only do we need to measure an average level over time and assign it a single value, but it also needs to be specifically related to how an average human will hear the sound.  We need to value or 'weight' some frequencies differently to better measure not just the objective energy of a sound but how that sound is actually perceived by an average listener.

With that in mind everyone initially decided on using a system called Leq(a) to measure television loudness (Leq with a weighting toward speech frequencies).  However that algorithm didn't do so good at measuring music so with a little tinkering they found a nice compromise in an algorithm called BS.1770.


So this concept of a single value weighted based on human hearing was refined, standardized and adopted by the ITU and now known as a measurement algorithm called BS.1770 which results in a measurment known as LKFS (BS.1770 is the method of measuring  sort of like a ruler, and LKFS is the unit of measurement you get from using that ruler, sort of like inches).  LKFS is now the standard measurement scale for loudness level of broadcast television.

Television programs and commercials are now measured (per segment in the case of shows) using the LKFS measurement.  The segment is assigned a value representing its overall average loudness, called the long term level.  Most television networks also have a requirement of how far louder and softer you can be in relation to the long term measurement (controlling how dynamic the program is, a topic we'll discuss in a future installment) called the short term level.

Whew.  Now we have a single value, weighted measure by which to compare and therefor mandate and enforce loudness.

So now we can make some rules... Tune in next time for THE CALM ACT part 2.

Jumping Boy Artwork courtesy of the amazing Samantha Kimball