Clusters

Pathfinder Core PRO clustering is designed to address concerns over hardware failures. Clustering allows you to connect multiple networked Pathfinder Core PRO systems, so that they automatically and dynamically share configuration data and states. They also monitor each other, taking over processing events if one system fails.

As you prepare to configure your cluster:

  1. Begin with a single Pathfinder Core PRO

  2. Create the routers and add other required events and components

  3. Take a backup of your configuration and download it to your local computer. See the section on Backup/Restore for more information. This will guarantee you can get back to your current operational state if you make a mistake when setting up your cluster.

  4. Click the Clusters link in the navigation bar

Four buttons are available at the bottom of the Clusters page; Create, Join, Leave, and Manual Sync.

Note: Leave and Manual Sync are disabled until the system is part of a cluster.

Creating a Cluster

First, click Create to create the cluster. The system will ask you to create a cluster administration password. This password is important because it will be used throughout the cluster so that the Pathfinder Core PRO units can log in to each other and synchronize data. When you provide this password, the system will create a special ClusterAdmin user for this purpose. You will be able to see this user when you view the Users navigation link and a severe warning will be issued if you try to change the password of that user since it must be identical across all systems. Once the cluster has been created, the Clusters link should look something like this:

The system is now part of a one node cluster. This display will show the hostname and IP address for each node in the cluster. When there is more than one systems in the cluster, the IsLocal field will show which of the systems you are currently logged into. It will also display the state of each Pathfinder Core PRO system in the cluster. The minus button should only be used in the rare situation where you need to forcefully remove a non-functioning node from the cluster. The preferred method is to go to the node to be removed and click the Leave button.

Joining a Cluster

Now it's time to add the second system into the cluster. It is important to note that the second system’s configuration will be overwritten during this process. If the system is new, make sure you first assign an IP address using the front panel. The IP must have network access to the first unit in the cluster, so pay attention to the network configuration. You do not need to enable Livewire discovery on this second system.

On the second system, browse to the clusters link on the navigation bar, and then click the Join button. The system will ask for the IP address of any Pathfinder Core PRO system that is already a part of the cluster. It will also ask for the cluster administration password you provided when you created the cluster.

The system will now pop up a progress message box as the system joins the cluster.

The join process involves telling the first system that the new node is joining the cluster. It will then ask the first system to generate a special backup. The second system will download that backup and restore and then reboot. At that point, the cluster should be created, and you should be able to see both systems on the cluster page.

A successfully synchronized cluster should show all node states on the web page of all systems as SyncNewerChangesComplete. If that is not the case in your cluster and it does not resolve to that state within a few minutes after a restart, then contact Axia support to help determine why the cluster is not synchronizing properly.

Follow this procedure with subsequent Pathfinder Core PRO systems you wish to have join the cluster.

Once the cluster is created, changes to the configuration and state information are automatically synchronized across the Pathfinder Core PRO systems in the cluster. Additionally, in the case of a failure of one of the Pathfinder Core PRO systems, one of the other systems will begin processing the events. See the section on Events and Timers for more information.

Manual Sync

Sometimes it may be desirable to force the cluster into synchronicity if there is some doubt that it is properly synchronized. In order to do this, browse to the system that is incorrect (out of sync) select the cluster menu item and select the manual sync button. This will ask for confirmation and then will request a backup from the other (known good) system, download and restore that backup and then reboot. This is usually a procedure that would be performed in the context of an Axia support session.

Leaving a Cluster

If you wish to leave a cluster, make sure that the system is connected to other nodes in the cluster. Then click on the Clusters link and click the Leave button. The system will revert back to a stand-alone system. It is important to note that after leaving a cluster, the configuration will still be the same as it was prior to leaving the system. It is recommended to quickly remove the system from the network and/or factory default the system. Otherwise, the system may start trying to process events that the rest of the cluster is also trying to process.

Events and Timers

There are a number of elements of a Pathfinder Core PRO system that should only execute on a single system at a time so that a cluster is not requesting duplicate changes from a device. Examples of this are Logic Flows and Timers. Those will always execute on the system that is currently active and has the lowest IP address. Heartbeats are constantly sent and monitored between the systems so that if a system stops responding the next lowest IP address system will take over those responsibilities.

Clustering and SapV2

SapV2 is the protocol that is used both for cluster communications and for internal communications. This protocol is available on port 9600 and is described in more detail in the appendix of this manual. If the cluster does not seem to be working properly there is a great deal of information that can be collected about the clustering process using this protocol to try and solve the problem.

GPIO Node Clustering

The Pathfinder Core PRO GPIO Node in a cluster is synchronized as well. Port additions, removals, and configurations applied to either Pathfinder Core PRO of the cluster will be replicated to the other Pathfinder Core PRO in the cluster.

Pin closure states will also be replicated in most cases. If the port has been configured with a multicast GPIO channel or snake routing assignment (IP/port), replication becomes a bit more nuanced. Closures that are directly tripped will still be replicated. However, closures that are sensed from the mcast or unicast snake (GPIs from the other snake port) will not be replicated. This is because it is assumed that both cluster nodes are monitoring those changes directly from the third piece of equipment and will pick up their states directly from the other end of the snake. Therefore, cluster replication of that state would in fact be redundant and could lead to race conditions.

When dealing with unicast snake mode (IP/port assignments), there are some additional clustering nuances to be aware of. These are most easily discussed via an example.

When you route a GPIO node to the Pathfinder Core PRO GPIO port, the address field in the GPIO port on both cluster nodes will get filled by IP Address/port. For example:

  • Pathfinder Core PRO A = 172.16.1.241

  • Pathfinder Core PRO B = 172.16.1.242

  • GPIO XNode = 172.16.1.85

If you route (using the Pathfinder Core PRO GPIO Router) the XNode GPIO source 1 to the Pathfinder Core PRO A or B destination 1, the destination port on both Pathfinder Core PROs will look like:

  • CFG GPO 1 SRCA:”172.16.1.85/1”

In this example both Pathfinder Core PROs are monitoring the GPI changes on port 1 of XNode at 172.16.1.85 and making matching changes on their GPO pins. This is straightforward.

However, if we route either of the Pathfinder Core PRO servers to the XNode GPIO things get a bit more interesting. We can only apply one IP Address to the SRCA field on the XNode destination. Therefore, Pathfinder Core PRO will use the Axia livewire address of whichever Pathfinder Core PRO in the cluster currently has its event system active. In the example above Pathfinder Core PRO A would typically be the active server in the cluster. When you make the route change on either Pathfinder Core PRO, the system will use the active server IP address on the XNode destination. The XNode destination would look like:

  • CFG GPO 1 SRCA:”172.16.1.241/1”

If Pathfinder Core PRO A Server were to fail or get shut down, the B Server will loop through its routes and find any destination that Pathfinder Core PRO A’s GPIO ports were routed to and switch them to Pathfinder Core PRO B’s external livewire ip address.

  • A Fails

  • XNode port gets reassigned:

    • CFG GPO 1 SRCA:”172.16.1.242/1”

If A comes back online the port will switch back to use the A server when the event system comes online again.

One important note about this is that the XNode GPIO clears its pin states to be all high each time a route change happens and then reassigns them to the new values based on the new GPIs. This means that if a snake route is held low on the XNode based on a low GPI on Pathfinder Core PRO, during a cluster failover those pins might flicker to high and then back to low. This is usually a virtually instantaneous flip.

This should only cause an issue if you have a Pathfinder Core PRO GPI that is unicast snake routed to an XNode (or other Axia GPIO device) GPO and that closure is both normally held low and you have an activity (such as an automation advance) that happens on the flip from high to low and you fail over to the secondary node in the cluster. But this is also similar to what would happen if a device supplying that same closure were to be restarted. Note that this is not an issue for multicast snake routing.

In the future we may address this issue by using floating IPs where the ip route would not change but rather the ip address used for the external routes would float/move between the Pathfinder Core PRO devices.

We highly recommend that users do some testing of this functionality to fully understand how it works. For the vast majority of use cases it should be seamless.