1 of 3

Introduction to APTO Processing

But First, a Word on Television Loudness

The introduction of compliance regulations and overall increased awareness of television loudness control has generally been successful in taming the wild loudness shifts that startled and annoyed viewers in the earlier days of digital television.

Real-time processing is ubiquitous and effective, content creators now have a better understanding of how viewers watch (and listen to) their programs, and broadcasters have the necessary tools at their disposal to analyze and, if necessary, correct pre-recorded programs upon ingest in the file domain.

That being said, there are still some misconceptions about loudness control and regulations being repeated as fact, and when applied, are nearly always to the detriment of the audio.

Speaking in the most general terms, most loudness regulations specify an average audio level for each program segment. Some are based on the loudness of the entire mix, while others are based on only an anchor element such as dialogue.

In either case, the regulations do not dictate that the audio be devoid of dynamic range nor do they mandate that levels constantly stay at the target value so that your LKFS/LUFS meter looks more like an FM modulation monitor at a major market radio station. Soft passages are allowed to stay soft and loud passages are permitted to pass through generally unaltered, so long as the average over the duration of the program segment is maintained and, where applicable, True Peak values are not exceeded.

Just like audio can be over-processed, it can also be under-processed. Trying to watch a sporting event in a noisy bar or restaurant (or even the average living room) would be frustrating if presented with the amount of dynamic range appropriate for a cinematic presentation in a high-end, multi-channel home theater.

The proper balance is found in what is commonly called the “comfort zone,” that is, the range in which the viewer never strains to hear lower-level programming yet is never annoyed by excessively loud levels. This is sometimes described as the situation in which the viewer never feels the need to reach for the volume control on their television, mobile phone, or tablet.

What Is APTO?

The audio processing within ARC is performed by Linear Acoustic® APTO™, our latest and most advanced adaptive loudness control algorithm to date.

APTO ensures that ARC can deliver audio that is compliant with various television broadcast regulatory loudness standards including EBU R128, ATSC A/85, FreeTV OP59, ARIB TR-B32, and AGCOM 219/09/CSP as well as the AES-TD1006 streaming standard.

It also includes profiles for other non-broadcast delivery platforms and listening environments such as gaming, movies, earphones, and in-flight entertainment.

APTO enhances the listening experience by providing consistent audio levels within a user-defined “comfort zone”, thus eliminating listener annoyance due to sudden loudness shifts. It also improves dialogue intelligibility, addressing one of the most common complaints with television audio. Best of all, it does so without audibly affecting the sound quality and artistic intent of the original content.

What Makes APTO Different?

Traditional real-time television processing normally employs a series of wideband and/or multiband compressors or AGCs that react to changing input levels by either increasing or decreasing gain in an effort to provide a more consistent output level. The various threshold, ratio, and attack and release rates can be adjusted to help determine the amount of dynamic range present at the output, or, put another way, how close to the desired output level the audio remains at any given time. A final look-ahead limiter is typically employed for peak control.

One potential downside to this type of processing is that unless the audio falls below a specified gate threshold – the point at which low-level audio isn’t increased so as not to bring up background noise – the gain is always changing, whether it needs to or not. Furthermore, multiband processing by design re-balances the spectral mix of the program. This can be advantageous for helping to fix less-than-ideal mixes or if a spectrally consistent output is a priority, but it can also affect the artistic intent of well-mixed content.

In contrast, APTO focuses on achieving the following goals:

Ensuring that foreground sounds – particularly dialog – remain intelligible at all times
Allowing the user to define a comfort zone within which no additional audio processing is applied, providing a more natural sense of dynamics
Maintaining the spectral balance of the original program material to preserve the artistic intent
Achieving and maintaining an output target level that is in compliance with global loudness regulations and is optimized for distribution platforms (such as streaming and on-demand services) and for specific devices and listening environments (including mobile phones and tablets)

How Does APTO Work?

In very basic terms, APTO first measures and analyzes the loudness of the incoming audio. In its first processing stage – the “Dynamic Range” stage - it applies realtime loudness control to reduce the overall dynamic range and get the levels within the user-defined comfort zone. This processed audio is then scaled in the second processing stage – the “Compliance” stage - to achieve an average output level that matches the desired target loudness value.

As mentioned previously, ARC includes a host of factory profiles for various loudness standards and deliverable platforms, but individual controls to fine-tune both processing stages are also brought out to the user interface and are described in greater detail later in the sections on Basic and Advanced APTO processing.

Basic APTO Processing

Choosing an APTO Profile

In most cases, ARC is truly a “set and forget” processor and simply choosing a factory profile is all that is required.

To select a profile, navigate to the Program 1 (or Program 2) menu (A), click the Load button in the APTO Loudness Control section (B), mouse over Factory Presets (C), then choose an appropriate profile from the Factory Preset List (D).

Basic APTO Controls

The individual controls within each APTO factory profile have been adjusted in such a way that they automatically deliver the required result associated with the profile name (such as “EBU R128 Adaptive”). However, there may be situations in which you want to adjust certain individual parameters to customize or fine-tune the audio.

The APTO Loudness Control section of the Program 1 and Program 2 menu contains the basic controls for each program path and are identical in terms of controls and functionality, though completely independent of one another in their operation.

Figure 8-2 – Basic APTO controls

Current Profile

The currently selected profile is displayed in the Profile window (A).

Note: In the current version of ARC, there is no indication that the factory profile has been modified as a result of changing the values or settings of individual controls. This feature will be included in an upcoming software upgrade. In the meantime, we strongly recommend that you save any changes and create uniquely-named user profiles as you make adjustments as outlined below in the paragraph on using the “Load and Save” buttons.

Load and Save

The Load button (B) is used to recall factory profiles, user profiles, or to import a profile previously saved to your computer. Any changes or modifications to the current profile can be saved by using the cryptically-named Save button (C).

Bypass

APTO can be bypassed so that the input audio is passed through to the output without being processed by clicking on the Bypass control (D). This is useful for a quick comparison between the unprocessed and processed audio. When bypass is engaged, the button will turn red.

Reset

Clicking on the Reset button (E) resets the loudness measurements as well as the gain buffers APTO uses in the normalization stage. Resetting at the start of each individual program element provides accurate per-segment loudness measurements, aids in achieving overall compliance, and ensures that adaptive processing decisions are made based upon the current program dynamics. A GPIO input may be used to trigger the reset automatically.

Target Loudness

The Target Loudness control (H) sets the desired average loudness level of the output signal in either LUFS or LKFS, depending upon the profile. Some profiles, such as EBU R128, will measure loudness according to overall program levels while others, such as ATSC A/85, will do so based on dialog and gated speech measurements.

To change the target loudness value, click in the Target Loudness field, type in the desired value, and click on the green checkmark to save your change (or the red “X” to exit without saving).

Adaptation

The Adaptation control (G) determines how much processing is applied to the incoming signal in the Dynamic Range processing stage, and, in combination with various individual controls, determines the dynamic range of the output audio.

The ideal amount of Adaptation depends upon both the source content and the destination platform. Content that has been pre-analyzed for loudness, scaled, and normalized in the file domain will require less realtime processing than, say, live sports, which can have rather unpredictable audio levels. Programming streamed to a mobile device or expected to be heard on lower quality earbuds will benefit from more Adaptation than the same content destined for a home cinema presentation.

Ideally, you want enough Adaptation to achieve compliance and keep levels within the viewer’s comfort zone and at a stable average level, but not so much that the audio sounds unnatural or over-processed. APTO is based on a psychoacoustic model that takes into account human hearing and perceived loudness and remains very natural-sounding even when extensive processing is applied, but it is generally advisable to keep the amount of Adaptation under 50% when possible. If incoming content is so poorly controlled as to require higher values, it may be necessary to adjust some of the individual controls and create a custom profile to address this scenario.

Maximum True Peak Limiter and Limiter Threshold

The True Peak Limiter control (J) enables and disables the True Peak Limiter, which is the final processing stage just ahead of the final output. The Maximum True Peak value (F) sets the level beyond which the True Peak limiter engages and attenuates the processed audio so as not to exceed the set level. These controls comply with the True Peak measurement as outlined in ITU-R BS.177-4 Annex 2.

Dialog Normalization

The Dialog Normalization control (L) enables dialog detection and measurement. When enabled, APTO’s Compliance processing stage uses long-term speech-only measurements rather than the overall input program loudness to ensure the output target level is achieved.

The use of dialog-based measurement is normally determined by regional regulations. For example, ATSC A/85 relies upon anchor-based normalization, specifically dialog, while EBU R128 recommends the use of overall measurements. This is reflected in their respective factory profiles.

Average Hold

When the Average Hold control (K) is enabled and the output audio of the Dynamic Range processing stage falls below the level set in the Average Hold Threshold control (located in the Advanced menu), APTO’s Compliance stage of processing becomes inactive until the level once again rises above the threshold. This helps prevent noise and low-level background audio from being increased unnecessarily.

Adaptive Input Detection

When enabled, the Adaptive Input Detection control (I) dynamically adapts the amount of processing occurring in the Dynamic Range stage of processing depending upon the actual measured average level at the input. The degree to which the actual input levels influence the processing versus relying upon the value of the Average Input Level control (found in the Advanced menu) is determined by the Adaptive Input Percentage control (also located in the Advanced menu).

When Adaptive Input Detection is disabled, Dynamic Range processing decisions are made based strictly upon the value set in the Average Input Level and in accordance with the settings of other parameters and controls.

Adaptive Input Detection is especially useful when source audio levels are unknown or are likely to vary widely as it allows the Dynamic Range processing stage to respond more predictively to the actual incoming content.

If the incoming content has already been analyzed and loudness-corrected in the file domain, less overall processing is required and a more natural-sounding output can be achieved by setting the Average Input Level (located in the Advanced menu) to the same value as the target level used during file-based correction and reducing the value of the Adaptive Input Percentage control (also found in the Advanced menu) or disabling Adaptive Input Detection altogether.

Advanced APTO Processing

Advanced APTO Controls

As we’ve mentioned before, most users and applications will be well served by simply selecting the appropriate factory profile according to either compliance regulations or the delivery platform.

At most, some adjustments to the Basic controls may be necessary.

Unless there are extenuating circumstances or the need to solve a particular problem with highly difficult programming, adjusting the Advanced parameters described below is rarely necessary. Making arbitrary adjustments without a firm understanding of how these controls work, relate to the Basic APTO controls, and interact with the complex processing algorithms at work can cause more harm than good and result in audio that is either unpleasing, non-compliant, or both.

That said, they are available for those who need them.

Audio High Pass Filter Cutoff Frequency

The High Pass Filter control (A) sets the cutoff frequency for the high pass filter applied to the input signal. No audio below this frequency will make it through to the APTO processing engine. In most cases, this control should be set to “Off”, but in some instances, it may be beneficial to filter out any sub-audible frequencies (below 20Hz).

Voiced-based content with no low-frequency information can benefit from a higher setting (60Hz, for example). Delivery platforms with limited headroom (such as in-flight entertainment or mobile streaming) and consumer devices with small speakers (mobile phones, tablets, or earbuds) can also benefit from a higher setting to prevent distortion.

Average Hold Threshold

The value at which the control is set is relative to the Target level. For example, if the Target level is set to -23dB LUFS and the desired level at which the hold engages in the Compliance processing stage is -34dB, then the Average Hold Threshold should be set to -10dB

Minimum Speech Duration

Proper dialog-based loudness processing requires the accurate detection and measurement of speech. When Dialog Normalization is enabled, the Minimum Speech Duration control (C) determines how many seconds of continuous speech are required for the measurements to be considered reliable and valid, and after which the Compliance processing stage adaptively switches to dialog normalization mode.

A minimum of 5 seconds is recommended. Shorter values may incorrectly factor in non-dialog material, and significantly longer values may cause unnecessary delays in engaging dialog normalization processing, especially in content with shorter segments of continuous dialog.

Dialog Compliance Measurements

The Dialog Compliance Measurements control (I) determines how many speech measurements are used to perform dialog normalization in the Compliance stage of processing. Because the algorithm samples the measurements at 0.5 second (one half of one second) intervals, this control should be set to a time value that is twice the desired average. For example, to base normalization on a 60 second average, the control should be set to a value of 120.

Note: The Dialog Compliance Measurements setting only applies to profiles that use Dialog Normalization. Profiles that do not use Dialog Normalization rely upon the settings of Compliance Window control to determine their time window.

Adaptive Input Percentage

When the Adaptive Input Detection control (located in the Basic section) is enabled, the Dynamic Range processing section of APTO dynamically adapts the amount of processing depending upon the actual measured average level at the input. The degree to which the actual input levels influence the processing versus relying upon the value of the Average Input Level control (K) is determined by the Adaptive Input Percentage control (H).

The recommended setting to start is 50% in order to keep the effect of adaptation more consistent.

Foreground Sounds Coefficient

All programming has what is sometimes referred to as an “anchor element,” that is, the audio content to which the viewer will pay the most attention. This is typically (though not always) dialog. This is also referred to as “foreground” audio to differentiate it from background audio, or in some cases, noise.

The Foreground Sounds Coefficient control (G) sets the level in the Dynamic Range processing stage at which program audio is no longer considered a foreground sound relative to both the lower border as set by the Null Area Coefficient control (described in detail below) and a minus infinite level (full silence) on a proportional scale.

For example, if the Target level is set to -23dB LUFS, the Null Area Coefficient is set to 4, and the Foreground Sounds Coefficient is set to 2, any audio between -27 and -25dB will be deemed foreground audio and therefore be raised toward the target.

Audio at levels lower than -27dB will still be increased toward the target, but the degree to which gain is increased slows down considerably. The lower the audio is from -27dB, the less the gain will increase.

Null Area Coefficient

The primary goal of the APTO processing within ARC is to deliver a consistent average output level as set by the Target Level control. This does not mean, however, that the actual output audio level must never deviate from this value. In fact, a certain amount of dynamic range helps preserve the artistic integrity of the original programming and makes for a more engaging audio experience for the viewer.

One of the things that makes APTO different from traditional processing is its ability to “do nothing” to audio levels when no action is required to maintain the correct average output level. This avoids the “busy” sound of traditional compressors and ACGs which by their very nature are always operating either over or under a threshold and therefore always increasing or decreasing gain - often for no good reason.

The Null Area Coefficient control (D) sets the lower and upper thresholds that together determine the size of the window in which APTO’s Dynamic Range processing stage neither increases nor decreases gain, with the user-determined Target level sitting in the middle of the range. Values are in dB, with larger values resulting in a larger “do nothing” window.

Compliance Speed

How quickly gain changes are made in the Compliance processing stage are largely program dependent. However, the maximum rate at which the gain can increase or decrease is set by the Compliance Speed control (E). The rate is calibrated in dB (or LU) per second.

Higher values (from 2 – 6 LU per second) allow compliance to be achieved more quickly but may introduce audible gain changes to the average program level in the process. These faster settings are best suited for situations when input levels are expected to be very inconsistent.

Lower values (from 0.2 to 2 LU per second) provide a subtler and more natural-sounding normalization but may result in gain changes that are too slow and allow the audio to remain outside of the comfort zone for too long. Slower settings work well for content that is more consistent and well-controlled. They are also recommended for long-form content such as feature films or classical music.

Average Maximum Gain

The Average Maximum Gain control (F) sets the maximum amount of positive gain (gain increase) applied in the Compliance processing stage in order to reach the Target level.

Larger values will allow very soft program segments to be raised by a greater amount, but whether or not this is desirable must be considered. For instance, some material may have been purposefully kept at a lower level for dramatic effect, and of course, there is always the risk of increasing unwanted background noise.

Another unwanted byproduct of setting this control too high - especially with profiles that use a lower Adaptation value – is that it can result in APTO taking too long to lower levels when loud content immediately follows soft content, such as when a loud commercial follows a quiet passage from a TV drama, as it will take longer to reduce a greater amount of gain.

For these reasons, it is advisable to set the Average Maximum Gain for no more than 2 LU.

One way to enhance the efficiency of the processing across programs is reset APTO at the transition point, which will, in turn, reset the amount of average gain or attenuation being applied and allow each individual program element to be optimally processed from the beginning.

Average Maximum Attenuation

The Average Maximum Attenuation control (L) sets the maximum amount of negative gain (gain reduction) applied in the Compliance processing stage in order to reach the Target level.

Larger values will allow loud program segments to be reduced by a greater amount. However, setting this control too high – again, especially with profiles that use a lower Adaptation value - can cause APTO to take too long to raise levels when soft content immediately follows something loud, such as when transitioning from a loud commercial back to a quiet TV drama, as it will take longer to boost a greater amount of gain.

Just as with the Average Maximum Gain control, resetting APTO at the start of each program segment can help minimize such issues and optimize overall processing for each individual program segment.

Compliance Window

The Compliance Window control (J) adjusts the size of the sliding time window used in the Compliance processing stage to align the output program to the target level. It is similar to the rolling integration time on an LKFS/LUFS loudness meter.

Note:

The Compliance Window setting only applies to profiles that do not use Dialog Normalization. Profiles that do use Dialog Normalization rely upon the settings of the Dialog Compliance control to determine their time window.

Many factors can influence whether or not content at the output of ARC is compliant. These include the level consistency of the incoming audio, program duration, the difference between the average input level and the Target level, the amount of Adaptation employed, and the settings of many of the controls listed in this section.

That said, a smaller compliance window value more easily delivers a compliant output for shorter program durations, but can produce more variations in long-term programming.

On the other hand, a larger compliance window value will allow for a smoother and more consistent average overall for longer program durations but can reduce the effectiveness of short-term normalization, especially program elements shorter in duration than the compliance window.

For short-term programs, a value in the range of 20-30 seconds is recommended. Larger values, from 60-120 seconds, are recommended for mid- to long-form content. For profiles that rely upon Dialog Normalization, a setting between 120-180 seconds is highly recommended.

Average Input Level

The value of the Average Input Level control (K) should be set to match the average level of the input audio as it serves as the mid-point value of the comfort zone defined by the Null Area Coefficient control.

For content that has been previously analyzed and loudness corrected in the file domain, this value will be easy to determine.

For live programming, unprocessed material, or in cases where programming is of different genres from different decades, this becomes more challenging. For example, cinematic content is typically mixed to -31/-27 LUFS, while music productions can be as loud on average as -12/-8 LUFS.

In these cases where the average input level is not easy to predict, it is best to enter a value that matches the output target loudness value, enable Adaptive Input Detection, and take advantage of APTO’s intelligent dynamics processing.