Saturday, April 20, 2013

The CALM Act; Television Program VS Commercial Loudness Part 2: Electric Boogaloo

On December 2nd 2010 the US Congress passed the CALM Act: The Commercial Advertisement Loudness Mitigation Act which, essentially, created rules to try to normalize the volume levels between television commercials and the television programs themselves.   

For decades consumers had complained about the WILD differences in levels between the television programs and the blisteringly loud ads that interrupted those programs.  Congresswoman Anna Eshoo decided it was up to the US Government to step in and save the day and for once a bill received bipartisan support.

With the legislation finally taking effect a few months ago, it's brought to the foreground a number of discussions and, in turn, stark misconceptions about the concept of loudness, our perception of sound versus its measurement and the craftspeople who create motion picture soundtracks and albums.  Finding myself fuming after reading some of the more glaring errors I decided to sit down and try to dispel some of the more frustrating perceptions about the world I work in everyday. 

In PART ONE of my ramblings, I outlined some of the complications with quantifying sound into a workable measurement and touched on some of the confusions involved with the idea of comparing loudness.  Now in PART TWO we'll see specifically how these measurements are applied, the technologies used and what the CALM act actually mandates to mitigate commercial advertising loudness.

LKFS: One bling to rule them all.

When last we left our hero, we had established a single value, weighted measure based on dialog by which to compare and therefor mandate and enforce loudness: LKFS, and it became the standard by which to measure loudness in the realm of digital television.  

So now the CALM act just says everything broadcast on TV has to be measured to the same value on the LKFS scale, right

Not exactly.

All the CALM Act actually says is that the FCC needs to make mandatory something called A/85:

"[...]the Federal Communications Commission shall prescribe pursuant to the Communications Act of 1934 a regulation that is limited to incorporating by reference and making mandatory the ‘Recommended Practice: Techniques for Establishing and Maintaining Audio Loudness for Digital Television’ (A/85), and any successor thereto, approved by the Advanced Television Systems Committee[...]"

Short of actually mandating specific loudness of broadcast television, the CALM ACT simply says "Hey FCC, enforce this existing set of guidelines on commercial broadcasts with the goal of mitigating the loudness problem."

So what is this A/85 and what exactly does it recommend?  Well, a lot actually.

A/85 (and A/53) or: How I Learned to Stop Worrying and Love BS.1770.

ATSC document A/85 (aka "Recommended Practice: Techniques for Establishing and Maintaining Audio Loudness") is a 75 page document that covers most elements relating to digital television broadcasting as it applies to audio.  It includes sections on the BS.1770 standard for measuring audio (the name of the algorithm that determines the LKFS measurement I discussed in PART ONE), metadata of the audio formats included in Dtv and their roles in loudness, Dynamic range compression features of the Dtv audio system, proper audio monitoring, loudspeaker placement and more.  It really is quite a document; I do highly recommend anyone working in audio to download a copy for themselves and check it out!  I don't know that I'd say it's quite a 'For dummies' read, but it's much more digestible than most of the technical documents the ATSC generates on a regular basis.

But within the 75 pages of useful information, the chewy center at the core of the entire system recommended in A/85 (and thereby mandated by the CALM act) actually points to a second document called A/53: Part 5 (aka "ATSC Digital Standard -- AC-3 Audio System Characteristics").  Hidden among hundreds of "shall" "should" and "may" provisions in A/53 is this gem:

"The value of the dialnorm parameter in the AC-3 elementary bit stream shall indicate the loudness of the encoded audio content (typically of the average spoken dialogue) using LKFS units."

Ah-ha!  That fixes everything!  You don't see it?  Well then maybe I should explain...

Dialnorm, AC-3, Dolby Digital and how Batman is tangentially involved.

In the late 1980's and early 1990's, the folks at Dolby Labs developed a system [initially] for implementing digital audio in cinemas.  The system included a codec for compressing digital audio as well as a method for carrying that digital soundtrack on film prints to be decoded in the cinema.  The technical name for the format they developed was called AC-3, commercially known by its marketing name: Dolby Digital.  Dolby Digital premiered in theaters with the release of Batman Returns in the summer of 1992.


Most people familiar with the name Dolby Digital associate it with 'Surround Sound,' however it's important to realize that 'Surround Sound' existed well before Dolby Digital (dating back to Disney's Fantasia in 1940) and continues to exist independent of Dolby Digital (surround soundtracks can be carried by other competing systems like DTS and SDDS).  And, equally important, the Dolby Digital format supports channel configurations ranging from single channel mono up through eight channel 7.1 surround format.   So Dolby Digital is not necessarily surround sound and surround sound is not necessarily Dolby Digital.

But what Dolby Digital (AC-3) is is a handy, flexible, high-quality audio compression technique allowing the inclusion of multiple channels of audio in a relatively low bandwidth bitstream: it's a groovy way to carry lots of complex digital sound in a really small file.  

AC-3 is so handy, flexible, high quality and low bandwidth that it migrated from the cinema onto laserdisc. AC-3 then leapt from laserdisc onto DVD.  From DVD, AC-3 then became a supported format for HD-DVD, Blu-ray, Video Games and, possibly most importantly, the audio standard for broadcast Digital Television.

In the AC-3 digital stream, along with the sound itself, each packet carries along descriptive metadata: 'data about the data.'  An audio packet containing metadata is a lot like you walking around a party with a t-shirt on that tells everyone you meet a little bit about you.

Obscure John Cusack reference.

Each packet of AC-3 audio contains a whole host of parameters that tell the receiving system all kinds of descriptive information about the audio stream that is contained within.  Back in 1992 when the AC-3 packets carried the breathtakingly bad performance of Michelle Pfeiffer as Catwoman in glorious digital sound to cinemas across the land, not all of these metadata parameters were especially useful.  However, as AC-3 evolved from a cinema format to a multi-environment format, these parameters suddenly became extremely useful.

Buried within the metadata parameters like bitstream mode, audio coding mode, room type, preferred stereo downmix mode, downmix levels and many many many more is a value called Dialnorm, short for dialog normalization.

Alright, enough teasing already: What the f*** is Dialnorm?!?

All AC-3 data streams need to be turned back into sound.  The process of taking this digital stream and turning it into the soundtrack you hear starts with something called a decoder.  These Dolby decoders are built into any devices that support AC-3 soundtracks (DVD players, televisions, surround sound audio receivers, etc).  All Dolby decoders are required to read and respond to the specific metadata included in AC-3 the data stream.   

Dialnorm is a metadata parameter within these AC-3 streams that, in theory, tells the system the loudness of the audio stream- specifically the average level of the dialog.  If used correctly, this flag would report the measured dialog level of the audio stream to the decoder.  The decoder then takes this value and shifts the overall loudness so the dialog level matches a pre-determined level.

It goes like this: 

You grab your copy of 'Mega Shark Versus Giant Octopus' down from the shelf and throw it in the DVD player.  The soundtrack on this DVD has a dialnorm value of -27 in the metadata of the AC-3 stream.  When that stream hits the decoder it will recognize the -27 value and will turn the whole soundtrack down by 4db to -31 (-31 being the predetermined value that Dolby has deemed ideal for soundtrack consistency).  

You get bored of the Giant Octopus and grab a different DVD, this time the complete first season of 'The Fall Guy.' This one has a dialnorm value of -22.  The stream hits the decoder and it lowers the overall level down 9db to also play at -31. 

If everything was flagged correctly, the listening levels of these two soundtracks on playback should be very similar -- you shouldn't need to manually adjust your volume despite the fact that 'The Fall Guy' DVD has nearly twice the average dialog level as 'Mega Shark', the decoder (which is built right into your TV, DVD player or audio receiver) did it for you.

It's important to note that this is just a wholesale shift in the overall level of the program.  Unlike old analog compression systems (which I'll discuss in more detail in PART THREE) that actually changed the characteristics of the original sound to better fit into the limitations of the analog broadcast system, the dialnorm offset process doesn't actually change the quality or character of original sound, just the level.

Now we apply this same concept to a digital broadcast television system.  Digital television uses AC-3 as the audio carrier system, complete with a dialnorm metadata parameter.  As you switch between channels, watch different programs or sit through commercial breaks- your dolby decoder would be receiving metadata from each audio stream and using the dialnorm parameter to automatically 'normalize' the level of dialog between different programs, allowing you to munch away on your Cool Ranch Doritos in aural bliss. 

If you were to scroll back up this page and re-read the passages above, there is an important caveat repeated over and over again: "if flagged correctly." This is how the system would ideally work, so long as the metadata parameter of dialnorm was set to correctly reflect the actual level of dialog in the soundtrack.  

Unfortunately, for the first few decades of Dolby Digital, this parameter was rarely (never) set correctly.  The default dialnorm setting on the industry standard AC-3 professional encoder was -27 and I would bet you a diddled-eyed Joe to a damned-if-I-know that any DVD you pull off your shelf and run through a Dolby decoder would show it was flagged with dialnorm at -27, regardless of the actual average dialog level.  Every soundtrack was just encoded with the default setting because very few people really knew what dialnorm did and those that did weren't real clear on what scale to use to measure the level.  

The system was in place but nobody was using it.

And therein lies the beauty of the CALM Act, A/85 and A/53:Part 5 we started with above: These documents mandate not only the usage of the dialnorm parameter:

"The value of the dialnorm parameter in the AC-3 elementary bit stream shall indicate the loudness of the encoded audio content (typically of the average spoken dialogue) using LKFS units."

but they also set forth a specific and standardized way to measure the audio (the BS.1770 algorithm and the LKFS scale we talked about iPART ONE).  Now everyone is comparing apples-to-apples and broadcasters are legally required to accurately include that data in their broadcast.  

So now if Stanley Steemer wants to make a commercial that is twice as loud as the episode of Toddlers and Tiaras you're trying to enjoy, the now legally mandated metadata measurement derived using the now industry standardized method of BS.1770 would tell your decoder all about it and your decoder would turn old Stanley down to match the level of the program.  Boom, commercial advertisement loudness mitigated.  And all it took was a federal law.  Thanks Congress! 

So the world is safe, the bad guy got his comeuppance and we go home with prom queen right?

Not so fast little shaver.  Having accurate metadata anchored around an average level of dialog and allowing the decoder to automatically normalize dialog level between different audio streams is an amazing improvement for the listener, however average levels rarely tell the whole story.  While the yearly average temperature of my hometown might be 50 degrees, that doesn't necessarily mean you wanna be outside in a light jacket in the middle of February.  There is still a question of the dynamics of the modern audio material (how much louder and how much quieter the sound gets above and below that measured average).

So join me again in PART THREE as we discuss the damned-if-you-do, damned-if-you-don't world of dynamics in audio mixing and the tools that exist to allow the listeners to determine their own individual sonic destinies.