IGMP: Is it ok to have 'querier' enabled all switches?

Question

Is it ok to have an IGMP querier on multiple switches? Isn't only one switch supposed to perform the querier function?

Answer

The answer is yes to both.

Yes, you can have an IGMP querier on more than one switch, and yes only one switch performs this function.

How does that work?

The switches elect a querier based on its IP address, and the lowest IP address is elected as the querier.

When a switch is connected to a network and IGMP querier is enabled, it will assume it is the querier until it hears a General Query from another router or switch with a lower IP address. At that point, it becomes a "non-querier."

Which switch should be querier?

In most networks, your core switch performs this role, as it should be the most powerful and centralized switch on your network. The IGMP querier function requires more CPU resources, and the more Livewire/multicast streams that are on your network, the harder your switch is taxed.

For this reason, it is essential to properly spec out a core switch that is capable of handling queries relative to your network size.

Should I enable querier on all switches?

That's up to you.

Some customers prefer to have only one switch be capable of the querier role, and other customers prefer to enable the querier function on all switches as a form of redundancy in case their core switch fails.

What’s are the deciding factors? A few things:

How large is your network?
- Number of AoIP/Livewire+ streams?
- Number of switches?
How powerful are your other edge switches?
What is the topology of your network?
- Does a single core switch connect to all the edge switches?
- Does a core switch connect to a string of edge switches, creating multiple “hops”?

Let’s take a look at a few different example cases:

Example Case 1

Small Network, 1 Core Switch, 1 Edge Switch

In this scenario, there is

1, C1000 core switch, and
1, C1000 edge switch

connected directly to one another using a point-to-point topology:

While one switch is designated as the core, and therefore IGMP querier, either of these switches would perform this role the same way.

Enabling IGMP querier on either would simply be a matter of preference, so enabling them on both might be a good idea. Let’s examine a few pros and cons.

Pro: Broken Link Mitigation

If a link is broken, then both can at least provide the querier function for the devices that are directly connected to it:

While not an ideal situation, let’s assume that the core switch accounts for Studio A and the airchain xNodes, and the edge switch accounts for Studio B devices. If the link is broken between the two, then both studios could operate independently until the problem is resolved.

Pro: Failover from Core Switch

In the case that the core switch dies completely, the edge switch would be able to keep Studio B devices operating while devices are moved from the broken core to the operational edge.

Taking this a step further, an xNode has 2 NICs on it, and can be connected to a second switch in failover mode. Therefore, the airchain xNode could be connected to both the core and edge switches simultaneously. If the core switch fails, the xNode will failover to the edge switch, keeping the station on the air:

Con: Link Overload

NOTE
While this scenario does not directly affect the IGMP querier function on the edge switch, we must caution against overloading an edge switch with too many IGMP devices.

If you recall how IGMP querier works, all traffic from the edge switch must traverse across the link to the core switch.

In the illustration below, each switch is loaded with an equal amount of IGMP devices. As long as the devices connected to the edge switch do not exceed 70% of the link bandwidth, then this is acceptable:

However, if the edge switch is the one with most devices, a core switch may be ineffective at performing the IGMP querier role if the available bandwidth is utilized sending all IGMP querier traffic to the core switch:

Therefore, it is important to either

(1) connect most devices directly to the core switch, or
(2) ensure the total IGMP bandwidth on devices connected to the edge switch does not exceed 70% of your network link’s bandwidth to the core switch

Conclusion: Enable IGMP Querier on Edge Switch

In this scenario, it would be advisable to activate IGMP querier on both switches as failover protection.

WAIT
How do I know how much bandwidth my devices are using?
There are several ways to figure this out.
(1) The first is by approximating an average stream size to 5Mbps. This is a rough average between Live Stereo, Standard Stereo, and AES67 bandwidth usages.
If you know how many audio streams you are sending on your devices connected to your edge switch, you can multiply that by 5 and get a rudimentary answer. For example, if I have 1 device that is sending 8 AoIP streams, then I can approximate that I am using 40Mbps of bandwidth (8 × 5Mbps). If I am using 2 devices each sending 8 AoIP streams, then I am using approximately 80Mbps ( (8 × 5Mbps) + (8 × 5Mbps) ). Again, this is an imprecise method to help you “ballpark” the total usage.
(2) A second, more reliable method is to login to your core switch and look at the actual link utilization when all devices are connected and operating on the edge switch. If you are using a Cisco Catalyst switch, you can type in the following commands after logging in:
enable
show interfaces summary
On the output page, locate the interface that your core switch is using to connect to the edge switch. For this example, let’s pretend that GigabitEthernet1/0/8 is my link:
This output shows that the core is receiving 2,316,000 bits per second (approximately 2.3Mbps) from my edge switch, well within 70% of a 1Gbps link.
We also see that there are no output queue drops (OQD). The presence of OQD’s can signal there is an issue (persistent or intermittent) with link over-saturation, causing the QoS function of the switch to drop packets that it cannot send through.

Example Case 2

Medium Sized Network, 1 Core Switch, 4 Edge Switches, Standard Topologies

In this scenario, there is

1, C9300 core switch, and
4, C1000 edge switches

However, because there are more network devices, there are more ways to connect them together, so let’s examine a few topology options.

Hub/Spoke or “Star” Topology

This network topology connects all edge switches to a central core (the “hub”) and assumes the more powerful core switch (Cisco 9300) is the IGMP querier. Each edge switch has a direct connection to the core, but the only connection to one another is through the core switch.

Enabling IGMP querier on the edge switches might be a good idea in certain circumstances, depending on how each edge switch operates in the network

Pro: Broken Link Mitigation OR Failed Core Switch

Similar to the layout in Case 1, let’s assume that each edge switch cares for a separate studio. Additionally, every switch has all critical devices plugged into it, such as local xNodes, playout systems, and consoles.

If a link to the core switch link breaks on one (or more than one) of these switches, or if the core switch fails completely, each studio could operate autonomously until the problem is resolved.

While not ideal, enabling IGMP querier on all switches allows them to facilitate local studio usage temporarily. Any critical xNode plugged into the core switch (such as an airchain xNode) could be plugged into an edge switch.

Mesh Topology

A mesh topology creates a direct path between all switches. This is advantageous for resiliency within a network, creating multiple pathways between network devices, but can add considerable overhead to wiring and management. Additionally, this must be done with switches that are configured to utilize spanning-tree protocol in order to avoid network loops.

While certain network traffic might flow directly between edge switches, IGMP traffic will still need to make its way to the IGMP querier. Therefore, IGMP traffic might flow more like it would in a hub and spoke topology:

Pro: Multiple Pathways: Broken Link OR Core Switch Failure

As mentioned before, if the core switch fails, or a link to the core switch is broken, an edge switch has a way to find all the other edge switches.

In the case of a core switch failure, if all edge switches are configured as the IGMP querier, the one with the lowest IP address will be automatically elected to that role and able to service all other edge switches:

The above example shows the yellow edge switch as having the next lowest IP address (192.168.1.2), and therefore being elected to the IGMP querier role. All other edge switches will send their query traffic there now.

Additionally, because each switch has a direct connection to the other, there is no risk of bandwidth over-utilization (using more than 70% of a link’s available bandwidth, as described in Case 1).

Con: Switch Overload

We’ve discussed the possibility of link over-utilization, but a switch’s CPU processor can become overloaded as well.

In addition to being centrally located relative to the other edge switches, the core switch should also be powerful enough to handle all the extra CPU load from the IGMP query traffic. If the switch doesn’t have a powerful CPU, then it may result in traffic being dropped, or the other switches erroneously electing a 2nd IGMP querier.

If two IGMP queriers are active on a LAN and unaware of one another, then AoIP traffic may become unpredictable and severely compromised.

In this case, a facility with thousands of IGMP groups (AoIP streams) might want to weigh the pros and cons of having a mesh network with less powerful switches. If all switches are of the same make/model (C9300), then any one of them should be able to handle the IGMP querier role at any time, and a mesh network with switches all configured as querier would make sense.

However, if budget requires the purchase of less powerful edge switches, then one must consider if a mesh topology is actually of any benefit, and if an IGMP querier failure on an edge switch would be better than all switches flooding multicast traffic across all ports.

How much traffic is too much IGMP traffic?
The amount of IGMP traffic a switch can handle is limited. To determine if your network is small, medium, or large, Telos as put together a document that can help you find the correct switch to purchase.
Also, if in doubt, purchase the more powerful switch. The core switch will be one of the most critical parts of any AoIP network.

Conclusion: Maybe

In this case, there is not a simple “yes or no” answer to the question of “should I enable IGMP querier on all edge switches”. There are several variables that could determine your decision.

Connecting your network with a hub/spoke topology and enabling IGMP querier would at least allow your network to function with “independent Livewire networks” temporarily until a connection or core switch issue is resolved. On the other hand, using a mesh topology with the same network devices might create a host of other issues if your network is heavily utilized.

Weighing the pros and cons of each would be advised.

Example Case 3

Medium Sized Network, 1 Core Switch, 4 Edge Switches, Daisy Chain Issue

In this scenario, there is

1, C9300 core switch, and
4, C1000 edge switches

We’re going to examine a network that is connected in a daisy-chain series: one switch connected to another, to another, to another, and so on.

While some customers may not intentionally design a network this way, over time a network can evolve into this type of topology due to a variety of reasons (i.e.: I just need a switch in a small studio; we’re just trying to send Livewire to the transmitter site, etc.).

Although not ideal, Livewire can work properly in a daisy chain topology in a smaller network. However, complications using IGMP querier often arise, depending on the setup.

Pro: Core Switch Failure - Recovery

In the event of a core switch failure, having IGMP querier activated on one or more edge switches might be a benefit. Instead of creating a multicast traffic storm, the edge switch can take on the IGMP querier role temporarily and “direct traffic.”

In the example above, the core switch (192.168.1.1) is inaccessible, so the edge switch with the IP address 192.168.1.4 is elected as IGMP querier because it has the lowest IP address of all the switches in this network. Because it is directly connected to the core, and all other edge switches are connected to it (directly or indirectly), we might assume that this is a good backup solution.

Any endpoint devices (such as xNodes, PCs, etc.) that were connected directly to the core switch would need to be moved to a functional switch, but because all other switches were previously routing through this 192.168.1.4 edge switch to the core (IGMP querier), it won’t be much different to have them route directly to this edge switch as the IGMP querier.

This also assumes that there weren’t any problems with the network before the core switch failed. As previously stated, this type of topology (daisy chain) has the possibility of creating problems, such as bandwidth bottlenecks and CPU overload (even when there is no core switch failure).

Con: Insufficient CPU

Building upon the last scenario where the core switch fails, let’s assume that the 192.168.1.4 switch has enough available bandwidth, but not enough available CPU resources to handle all the IGMP querier responsibilities. As you recall, the core switch in this case is a Cisco C9300, and the edge switch that is now assuming the querier role is a less-powerful Cisco C1000. This could lead to a situation where:

(1) The switch drops traffic, creating audio problems, or
(2) The switch fails due to extreme demand, causing it to become completely unresponsive.

If IGMP traffic is light, this may not be a concern, but it certainly is a possibility, and can become more of a probability as the network grows.

Con: Improper IP Addressing; Bandwidth Bottlenecks

So far we have assumed that the 192.168.1.4 edge switch would take on the IGMP querier role. It is directly connected to the core, and all other switches are connected to it, so it seems logical that this would be the best option in a disaster situation. The switches will not elect a querier based upon it’s proximity to the core switch, rather based on it’s IP address. Therefore, ANY switch with the lowest IP address can become the IGMP querier.

As previously stated, a daisy chain topology is often one that evolves with time and not with intention.

Let’s say that someone installs a smaller C1000 edge switch into a production studio. Instead of plugging this switch into the core switch, they opt to plug it into the switch that is closest in proximity (which happens to be 4 hops away from the core switch). And because there is a free IP address of 192.168.1.2, that is what is assigned.

Assuming the installer enabled IGMP querier on this switch during the configuration process, you can see below what would happen if the core switch ever were to fail:

Because this smaller switch is enabled as IGMP querier, and because it has the lowest IP address of all the switches (now that the core switch has failed), it assumes the IGMP querier role.

Previously, there were no more than 4 hops between the powerful C9300 core switch and the furthest edge switch. Now, there are as many as 6 hops, and a less powerful C1000 as the IGMP querier.

Not only does the switch have less CPU resources to perform this role, the bandwidth between the switches will also be a factor.

A significant amount of traffic from each switch will be sent to the IGMP querier. If that traffic traverses another switch in the process, then BOTH switches will be sending their traffic through the 2nd switch’s uplink to the querier. If the 2nd switch connects to a 3rd switch, and the 3rd switch connects to a 4th, bandwidth can quickly be consumed.

The example below illustrates how this effect can compound quickly and cause problems by the time traffic reaches the IGMP querier (or beforehand):

Assuming the connections between all of these switches are 1Gbps, and assuming there is a fair amount of Livewire traffic on this network, we most certainly will experience traffic dropping at some point after the first few switch connections.

Symptoms of this would include silence, distorted audio, pops/clicks, and other audio anomalies caused by an inability to send and receive traffic properly across this network.

Conclusion: Probably not a good idea.

While enabling IGMP querier on the edge switch closest to the core switch might work well in an emergency situation, enabling IGMP querier on ALL switches in this particular network might prove to be a secondary disaster.

The Importance of Planning

The above examples illustrate the importance of planning out a network infrastructure before connecting it together haphazardly.

Daisy chaining switches, selecting a less-powerful core switch, or setting a far-end edge switch as an IGMP querier can carry large consequences.

Simple changes, such as connecting devices with a large number of IGMP streams (i.e.: a Quasar engine) directly to a core switch, stacking core switches together, or plugging every edge switch into the core switch directly, can be the difference between a well-run network and a multicast nightmare.

Whichever method or model you choose, be sure to understand the pros and cons in light of the network you currently have, and the one you expect to grow into.

How can I confirm which switch is the IGMP querier?

Open up a PuTTy or terminal session to your Cisco switch and type the following command:

enable
show ip igmp snooping querier

You should see something similar to the output below, which will tell you what switch is the querier on each vlan, and what port it is connected to:

Vlan      IP Address               IGMP Version   Port             
-------------------------------------------------------------
1         172.16.0.1                v2            Gi1/0/23

If the switch your logged into is the querier, then you will see "Switch" under the "Port" heading:

Vlan      IP Address               IGMP Version   Port             
-------------------------------------------------------------
1         172.16.0.1                v2            Switch

Let us know how we can help

If you have further questions on this topic or have ideas about improving this document, please contact us.