APTO® Processing

LA-5300 units equipped with the Dolby Digital Plus Encoder for a second program source offer a choice of loudness control options including Dolby’s own Real Time Loudness Leveler (RTLL) and Linear Acoustic® APTO processing.

There are no dedicated controls when using the RTLL option, but APTO is a bit more complex and requires some additional explanation.

How Does APTO Work?

In very basic terms, APTO first measures and analyzes the loudness of the incoming audio. In its first processing stage – the “Dynamic Range” stage - it applies realtime loudness control to reduce the overall dynamic range and get the levels within the user-defined comfort zone. This processed audio is then scaled in the second processing stage – the “Compliance” stage - to achieve an average output level that matches the desired target loudness value.

Basic APTO Controls

The individual controls within each APTO factory profile have been adjusted in such a way that they automatically deliver the required result associated with the profile name, such as “EBU R128 Adaptive” for use in most European regions and “ATSC A/85” for CALM-compliance in the U.S. However, there may be situations in which you want to adjust certain individual parameters to customize or fine tune the audio.

Current Profile

The currently selected profile is show in the Profile window (1J), which also serves as the dropdown menu for selecting a profile.

Target Loudness

The Target Loudness control (1A) sets the desired average loudness level of the output signal in either LUFS or LKFS, depending upon the profile. Some profiles, such as EBU R128, will measure loudness according to overall program levels while others, such as ATSC A/85, will do so based on dialog and gated speech measurements.

Adaptation

The Adaptation control (1B) determines how much processing is applied to the incoming signal in the Dynamic Range processing stage, and, in combination with various individual controls, determines dynamic range of the output audio.

The ideal amount of Adaptation depends upon both the source content and on the destination platform. Content that has been pre-analyzed for loudness, scaled, and normalized in the file domain will require less realtime processing than, say, live sports, which can have rather unpredictable audio levels. Programming streamed to a mobile device or expected to be heard on lower quality earbuds will benefit from more Adaptation than the same content destined for a home cinema presentation.

Ideally, you want enough Adaptation to achieve compliance and keep levels within the viewer’s comfort zone and at a stable average level, but not so much that the audio sounds unnatural or over-processed. APTO is based on a psychoacoustic model that takes into account human hearing and perceived loudness and remains very natural-sounding even when extensive processing is applied, but it is generally advisable to keep the amount of Adaptation under 50% when possible. If incoming content is so poorly controlled as to require higher values, it may be necessary to adjust some of the individual controls and create a custom profile to address this scenario.

Bypass

APTO can be bypassed so that the input audio is passed through to the output without being processed by clicking on the Bypass control (1C). This is useful for a quick comparison between the unprocessed and processed audio.

Reset

Clicking on the Reset button (1D) resets the loudness measurements as well as the gain buffers APTO uses in the normalization stage. Resetting at the start of each individual program element provides accurate per-segment loudness measurements, aids in achieving overall compliance, and ensures that adaptive processing decisions are made based upon the current program dynamics. A GPIO input may be used to trigger the reset automatically.

Maximum True Peak Limiter and Limiter Threshold

The True Peak Limiter control (1E) enables and disables the True Peak Limiter, which is the final processing stage just ahead of the final output. The Maximum True Peak value (10-1F) sets the level beyond which the True Peak limiter engages and attenuates the processed audio so as not to exceed the set level. These controls comply with the True Peak measurement as outlined in ITU-R BS.177-4 Annex 2.

Adaptive Input Detection

When enabled, the Adaptive Input Detection control (1G) dynamically adapts the amount of processing occurring in the Dynamic Range stage of processing depending upon the actual measured average level at the input. The degree to which the actual input levels influence the processing versus relying upon the value of the Average Input Level control (found in the Advanced menu) is determined by the Adaptive Input Percentage control (also located in the Advanced menu).

When Adaptive Input Detection is disabled, Dynamic Range processing decisions are made based strictly upon the value set in the Average Input Level (found in the Advanced menu) and in accordance with the settings of other parameters and controls.

Adaptive Input Detection is especially useful when source audio levels are unknown or are likely to vary widely as it allows the Dynamic Range processing stage to respond more predictively the actual incoming content.

If the incoming content has already been analyzed and loudness-corrected in the file domain, less overall processing is required and a more natural-sounding output can be achieved by setting the Average Input Level to the same value as the target level used during file-based correction and reducing the value of the Adaptive Input Percentage control (also found in the Advanced menu) or disabling Adaptive Input Detection altogether.

Dialog Normalization

The Dialog Normalization control (1H) enables dialog detection and measurement. When enabled, APTO’s Compliance processing stage uses long-term speech-only measurements rather than the overall input program loudness to ensure the output target level is achieved.

The use of dialog-based measurement is normally determined by regional regulations. For example, ATSC A/85 relies upon anchor-based normalization, specifically dialog, while EBU R128 recommends the use of overall measurements. This is reflected in their respective factory profiles.

Average Hold

When the Average Hold control (1I) is enabled and the output audio of the Dynamic Range processing stage falls below the level set in the Average Hold Threshold control (located in the Advanced menu), APTO’s Compliance stage of processing becomes inactive until the level once again rises above the threshold. This helps prevent noise and low-level background audio from being increased unnecessarily.

Figure 1 - Basic APTO controls

Figure 1 - Basic APTO controls

Advanced APTO Controls

As we’ve mentioned before, most users and applications will be well served by simply selecting the appropriate factory profile according either to compliance regulations. At most, some adjustments to the Basic controls may be necessary.

Unless there are extenuating circumstances or the need to solve a particular problem with highly difficult programming, adjusting the Advanced parameters described below is rarely necessary.

Important! Making arbitrary adjustments without a firm understanding of how these controls work, relate to the Basic APTO controls, and interact with the complex processing algorithms can cause more harm than good and result in audio that is either unpleasing, non-compliant, or both.

That said, they are available for those who need them.

Audio High Pass Filter Cutoff Frequency

The High Pass Filter control (2A) sets the cutoff frequency for the high pass filter applied to the input signal. No audio below this frequency will make it through to the APTO processing engine. In most cases, this control should be set to “Off”, but in some instances, it may be beneficial to filter out any sub-audible frequencies (below 20Hz).

Voiced-based content with no low frequency information can benefit from a higher setting (60Hz, for example). Delivery platforms with limited headroom (such as in-flight entertainment or mobile streaming) and consumer devices with small speakers (mobile phones, tablets, or earbuds) can also benefit from a higher setting to prevent distortion.

Average Hold Threshold

When the Average Hold control (found in the APTO basic menu) is enabled, the Average Hold Threshold control (2B) determines the point at which the hold is triggered in the Compliance processing stage to prevent increasing noise or background audio.

The value at which the control is set is relative to the Target level. For example, if the Target level is set to -23dB LUFS and the desired level at which the hold engages in the Compliance processing stage is -34dB, then the Average Hold Threshold should be set to -10dB.

Minimum Speech Duration

Proper dialog-based loudness processing requires the accurate detection and measurement of speech. When Dialog Normalization is enabled, the Minimum Speech Duration control (2C) determines how many seconds of continuous speech are required for the measurements to be considered reliable and valid, and after which the Compliance processing stage adaptively switches to dialog normalization mode.

A minimum of 5 seconds is recommended. Shorter values may incorrectly factor in non-dialog material, and significantly longer values may cause unnecessary delays in engaging dialog normalization processing, especially in content with shorter segments of continuous dialog.

Dialog Compliance Measurements

The Dialog Compliance Measurements control (2D) determines how many speech measurements are used to perform dialog normalization in the Compliance stage of processing. Because the algorithm samples the measurements at 0.5 second (one half of one second) intervals, this control should be set to a time value that is twice the desired average. For example, to base normalization on a 60 second average, the control should be set to a value of 120.

Note - The Dialog Compliance Measurements setting only applies to profiles that use Dialog Normalization. Profiles that do not use Dialog Normalization rely upon the settings of Compliance Window control to determine their time window.

Adaptive Input Percentage

When the Adaptive Input Detection control (located in the Basic section) is enabled, the Dynamic Range processing section of APTO dynamically adapts the amount of processing depending upon the actual measured average level at the input. The degree to which the actual input levels influence the processing versus relying upon the value of the Average Input Level control (2L) is determined by the Adaptive Input Percentage control (2E).

The recommended setting to start is 50% in order to keep the effect of adaptation more consistent.

Foreground Sounds Coefficient

All programming has what is sometimes referred to as an “anchor element,” that is, the audio content to which the viewer will pay the most attention. This is typically (though not always) dialog. This is also referred to as “foreground” audio to differentiate it from background audio, or in some cases, noise.

The Foreground Sounds Coefficient control (2F) sets the level in the Dynamic Range processing stage at which program audio is no longer considered a foreground sound relative to both the lower border as set by the Null Area Coefficient control (2G, described in detail below) and a minus infinite level (full silence) on a proportional scale.

For example, if the Target level is set to -23dB LUFS, the Null Area Coefficient is set to 4, and the Foreground Sounds Coefficient is set to 2, any audio between -27 and -25dB will be deemed foreground audio and therefore be raised toward the target.

Audio at levels lower than -27dB will still be increased toward the target, but the degree to which gain is increased slows down considerably. The lower the audio is from -27dB, the less the gain will increase.

Null Area Coefficient

The primary goal of APTO processing is to deliver a consistent average output level as set by the Target Level control. This does not mean, however, that the actual output audio level must never deviate from this value. In fact, a certain amount of dynamic range helps preserve the artistic integrity of the original programming and makes for a more engaging audio experience for the viewer.

One of the things that makes APTO different from traditional processing is its ability to “do nothing” to audio levels when no action is required to maintain the correct average output level. This avoids the “busy” sound of traditional compressors and ACGs which by their very nature are always operating either over or under a threshold, and therefore always increasing or decreasing gain - often for no good reason.

The Null Area Coefficient control (2G) sets the lower and upper thresholds that together determine the size of the window in which APTO’s Dynamic Range processing stage neither increases nor decreases gain, with the user-determined Target level sitting in the middle of the range. Values are in dB, with larger values resulting in a larger “do nothing” window.\

Compliance Speed

How quickly gain changes are made in the Compliance processing stage are largely program dependent. However, the maximum rate at which the gain can increase or decrease is set by the Compliance Speed control (2H). The rate is calibrated in dB (or LU) per second.

Higher values (from 2 – 6 LU per second) allow compliance to be achieved more quickly but may introduce audible gain changes to the average program level in the process. These faster settings are best suited for situations when input levels are expected to be very inconsistent.

Lower values (from 0.2 to 2 LU per second) provide a subtler and more natural-sounding normalization but may result in gain changes that are too slow and allow the audio to remain outside of the comfort zone for too long. Slower settings work well for content that is more consistent and well-controlled. They are also recommended for long-form content such as feature films or classical music.

Average Maximum Gain

The Average Maximum Gain control (2I) sets the maximum amount of positive gain (gain increase) applied in the Compliance processing stage in order to reach the Target level.

Larger values will allow very soft program segments to be raised by a greater amount, but whether or not this is desirable must be considered. For instance, some material may have been purposefully kept at a lower level for dramatic effect, and of course there is always the risk of increasing unwanted background noise.

Another unwanted byproduct of setting this control too high - especially with profiles that use a lower Adaptation value – is that it can result in APTO taking too long to lower levels when loud content immediately follows soft content, such as when a loud commercial follows a quiet passage from a TV drama, as it will take longer to reduce a greater amount of gain.For these reasons, it is advisable to set the Average Maximum Gain for no more than 2 LU.

One way to enhance the efficiency of the processing across programs is reset APTO at the transition point, which will in turn reset the amount of average gain or attenuation being applied and allow each individual program element to be optimally processed from the beginning.

Average Maximum Attenuation

The Average Maximum Attenuation control (2J) sets the maximum amount of negative gain (gain reduction) applied in the Compliance processing stage in order to reach the Target level.

Larger values will allow loud program segments to be reduced by a greater amount. However, setting this control too high – again, especially with profiles that use a lower Adaptation value - can cause APTO to take too long to raise levels when soft content immediately follows something loud, such as when transitioning from a loud commercial back to a quiet TV drama, as it will take longer to boost a greater amount of gain.

Just as with the Average Maximum Gain control, resetting APTO at the start of each program segment can help minimize such issues and optimize overall processing for each individual program segment.

Compliance Window

The Compliance Window control (2K) adjusts the size of the sliding time window used in the Compliance processing stage to align the output program to the target level. It is similar to the rolling integration time on an LKFS/LUFS loudness meter.

Note - The Compliance Window setting only applies to profiles that do not use Dialog Normalization. Profiles that do use Dialog Normalization rely upon the settings of the Dialog Compliance control to determine their time window

Many factors can influence whether or not content at the output is compliant. These include the level consistency of the incoming audio, program duration, the difference between the average input level and the Target level, the amount of Adaptation employed, and the settings of many of the controls listed in this chapter.

That said, a smaller compliance window value more easily delivers a compliant output for shorter program durations, but can produce more variations in long-term programming.

On the other hand, a larger compliance window value will allow for a smoother and more consistent average overall for longer program durations, but can reduce the effectiveness of short-term normalization, especially program elements shorter in duration than the compliance window.

For short-term programs, a value in the range of 20-30 seconds is recommended. Larger values, from 60-120 seconds, are recommended for mid- to long-form content. For profiles which rely upon Dialog Normalization, a setting between 120-180 seconds is highly recommended.

Average Input Level

The value of the Average Input Level control (2L) should be set to match the average level of the input audio as it serves as the mid-point value of the comfort zone defined by the Null Area Coefficient control.

For content that has been previously analyzed and loudness corrected in the file domain, this value will be easy to determine.

For live programming, unprocessed material, or in cases where programming is of different genres from different decades, this becomes more challenging. For example, cinematic content is typically mixed to -31/-27 LUFS, while music productions can be as loud on average as -12/-8 LUFS.

In these cases where the average input level is not easy to predict, it is best to enter a value that matches the output target loudness value, enable Adaptive Input Detection, and take advantage of APTO’s intelligent dynamics processing.

Figure 2 - Advanced APTO controls

Figure 2 - Advanced APTO controls