Advanced APTO Processing

As we’ve mentioned before, most users and applications will be well served by simply selecting the appropriate factory profile according to either compliance regulations or the delivery platform.

At most, some adjustments to the Basic controls may be necessary.

Unless there are extenuating circumstances or the need to solve a particular problem with highly difficult programming, adjusting the Advanced parameters described below is rarely necessary. Making arbitrary adjustments without a firm understanding of how these controls work, relate to the Basic APTO controls, and interact with the complex processing algorithms at work can cause more harm than good and result in audio that is either unpleasing, non-compliant, or both.

That said, they are available for those who need them.

Figure 1 - Advanced APTO controls

Figure 1 - Advanced APTO controls

Audio High Pass Filter Cutoff Frequency

The High Pass Filter control (A) sets the cutoff frequency for the high pass filter applied to the input signal. No audio below this frequency will make it through to the APTO processing engine. In most cases, this control should be set to “Off”, but in some instances, it may be beneficial to filter out any sub-audible frequencies (below 20Hz).

Voiced-based content with no low-frequency information can benefit from a higher setting (60Hz, for example). Delivery platforms with limited headroom (such as in-flight entertainment or mobile streaming) and consumer devices with small speakers (mobile phones, tablets, or earbuds) can also benefit from a higher setting to prevent distortion.

Average Hold Threshold

When the Average Hold control (found in the APTO basic menu) is enabled, the Average Hold Threshold control (B) determines the point at which the hold is triggered in the Compliance processing stage to prevent increasing noise or background audio.

The value at which the control is set is relative to the Target level. For example, if the Target level is set to -23dB LUFS and the desired level at which the hold engages in the Compliance processing stage is -34dB, then the Average Hold Threshold should be set to -10dB

Minimum Speech Duration

Proper dialog-based loudness processing requires the accurate detection and measurement of speech. When Dialog Normalization is enabled, the Minimum Speech Duration control (C) determines how many seconds of continuous speech are required for the measurements to be considered reliable and valid, and after which the Compliance processing stage adaptively switches to dialog normalization mode.

A minimum of 5 seconds is recommended. Shorter values may incorrectly factor in non-dialog material, and significantly longer values may cause unnecessary delays in engaging dialog normalization processing, especially in content with shorter segments of continuous dialog.

Dialog Compliance Measurements

The Dialog Compliance Measurements control (I) determines how many speech measurements are used to perform dialog normalization in the Compliance stage of processing. Because the algorithm samples the measurements at 0.5 second (one half of one second) intervals, this control should be set to a time value that is twice the desired average. For example, to base normalization on a 60 second average, the control should be set to a value of 120.

Note: The Dialog Compliance Measurements setting only applies to profiles that use Dialog Normalization. Profiles that do not use Dialog Normalization rely upon the settings of Compliance Window control to determine their time window.

Adaptive Input Percentage

When the Adaptive Input Detection control (located in the Basic section) is enabled, the Dynamic Range processing section of APTO dynamically adapts the amount of processing depending upon the actual measured average level at the input. The degree to which the actual input levels influence the processing versus relying upon the value of the Average Input Level control (K) is determined by the Adaptive Input Percentage control (H).

The recommended setting to start is 50% in order to keep the effect of adaptation more consistent.

Foreground Sounds Coefficient

All programming has what is sometimes referred to as an “anchor element,” that is, the audio content to which the viewer will pay the most attention. This is typically (though not always) dialog. This is also referred to as “foreground” audio to differentiate it from background audio, or in some cases, noise.

The Foreground Sounds Coefficient control (G) sets the level in the Dynamic Range processing stage at which program audio is no longer considered a foreground sound relative to both the lower border as set by the Null Area Coefficient control (described in detail below) and a minus infinite level (full silence) on a proportional scale.

For example, if the Target level is set to -23dB LUFS, the Null Area Coefficient is set to 4, and the Foreground Sounds Coefficient is set to 2, any audio between -27 and -25dB will be deemed foreground audio and therefore be raised toward the target.

Audio at levels lower than -27dB will still be increased toward the target, but the degree to which gain is increased slows down considerably. The lower the audio is from -27dB, the less the gain will increase.

Null Area Coefficient

The primary goal of the APTO processing within ARC is to deliver a consistent average output level as set by the Target Level control. This does not mean, however, that the actual output audio level must never deviate from this value. In fact, a certain amount of dynamic range helps preserve the artistic integrity of the original programming and makes for a more engaging audio experience for the viewer.

One of the things that makes APTO different from traditional processing is its ability to “do nothing” to audio levels when no action is required to maintain the correct average output level. This avoids the “busy” sound of traditional compressors and ACGs which by their very nature are always operating either over or under a threshold and therefore always increasing or decreasing gain - often for no good reason.

The Null Area Coefficient control (D) sets the lower and upper thresholds that together determine the size of the window in which APTO’s Dynamic Range processing stage neither increases nor decreases gain, with the user-determined Target level sitting in the middle of the range. Values are in dB, with larger values resulting in a larger “do nothing” window.

Compliance Speed

How quickly gain changes are made in the Compliance processing stage are largely program dependent. However, the maximum rate at which the gain can increase or decrease is set by the Compliance Speed control (E). The rate is calibrated in dB (or LU) per second.

Higher values (from 2 – 6 LU per second) allow compliance to be achieved more quickly but may introduce audible gain changes to the average program level in the process. These faster settings are best suited for situations when input levels are expected to be very inconsistent.

Lower values (from 0.2 to 2 LU per second) provide a subtler and more natural-sounding normalization but may result in gain changes that are too slow and allow the audio to remain outside of the comfort zone for too long. Slower settings work well for content that is more consistent and well-controlled. They are also recommended for long-form content such as feature films or classical music.

Average Maximum Gain

The Average Maximum Gain control (F) sets the maximum amount of positive gain (gain increase) applied in the Compliance processing stage in order to reach the Target level.

Larger values will allow very soft program segments to be raised by a greater amount, but whether or not this is desirable must be considered. For instance, some material may have been purposefully kept at a lower level for dramatic effect, and of course, there is always the risk of increasing unwanted background noise.

Another unwanted byproduct of setting this control too high - especially with profiles that use a lower Adaptation value – is that it can result in APTO taking too long to lower levels when loud content immediately follows soft content, such as when a loud commercial follows a quiet passage from a TV drama, as it will take longer to reduce a greater amount of gain.

For these reasons, it is advisable to set the Average Maximum Gain for no more than 2 LU.

One way to enhance the efficiency of the processing across programs is reset APTO at the transition point, which will, in turn, reset the amount of average gain or attenuation being applied and allow each individual program element to be optimally processed from the beginning.

Average Maximum Attenuation

The Average Maximum Attenuation control (L) sets the maximum amount of negative gain (gain reduction) applied in the Compliance processing stage in order to reach the Target level.

Larger values will allow loud program segments to be reduced by a greater amount. However, setting this control too high – again, especially with profiles that use a lower Adaptation value - can cause APTO to take too long to raise levels when soft content immediately follows something loud, such as when transitioning from a loud commercial back to a quiet TV drama, as it will take longer to boost a greater amount of gain.

Just as with the Average Maximum Gain control, resetting APTO at the start of each program segment can help minimize such issues and optimize overall processing for each individual program segment.

Compliance Window

The Compliance Window control (J) adjusts the size of the sliding time window used in the Compliance processing stage to align the output program to the target level. It is similar to the rolling integration time on an LKFS/LUFS loudness meter.

Note:

The Compliance Window setting only applies to profiles that do not use Dialog Normalization. Profiles that do use Dialog Normalization rely upon the settings of the Dialog Compliance control to determine their time window.

Many factors can influence whether or not content at the output of ARC is compliant. These include the level consistency of the incoming audio, program duration, the difference between the average input level and the Target level, the amount of Adaptation employed, and the settings of many of the controls listed in this section.

That said, a smaller compliance window value more easily delivers a compliant output for shorter program durations, but can produce more variations in long-term programming.

On the other hand, a larger compliance window value will allow for a smoother and more consistent average overall for longer program durations but can reduce the effectiveness of short-term normalization, especially program elements shorter in duration than the compliance window.

For short-term programs, a value in the range of 20-30 seconds is recommended. Larger values, from 60-120 seconds, are recommended for mid- to long-form content. For profiles that rely upon Dialog Normalization, a setting between 120-180 seconds is highly recommended.

Average Input Level

The value of the Average Input Level control (K) should be set to match the average level of the input audio as it serves as the mid-point value of the comfort zone defined by the Null Area Coefficient control.

For content that has been previously analyzed and loudness corrected in the file domain, this value will be easy to determine.

For live programming, unprocessed material, or in cases where programming is of different genres from different decades, this becomes more challenging. For example, cinematic content is typically mixed to -31/-27 LUFS, while music productions can be as loud on average as -12/-8 LUFS.

In these cases where the average input level is not easy to predict, it is best to enter a value that matches the output target loudness value, enable Adaptive Input Detection, and take advantage of APTO’s intelligent dynamics processing.