Achieving service redundancy in two steps with unified clustering in OpenSIPS 3.0

indexA hot backup means redundancy, redundancy means more uptime, more uptime means a better SLA, a better SLA means happier customers and more money.

Building redundancy is a must when moving your service into production. And a typical approach for achieving redundancy is by implementing an active – backup setup with full realtime synchronization between the two (basically a hot backup/stand-by server).

How can you achieve this with OpenSIPS 3.0? Well, thanks to the new unified clustering support, you are only 2 steps away from getting it done.

Let’s see which are these two steps, considering a typical system with user registrations, call handling, presence, dynamic routing, user keepalive and more – let’s not fall for an easy one ;).

Step one – adding clustering support

The clustering support is responsible for sharing data between the nodes in the cluster – in our case, between the active and backup servers which form a two nodes cluster.

Starting with OpenSIPS 2.4, this is very easy to do – defining the clustering and setting a few parameters is enough to get it done. OpenSIPS will automatically and transparently do all the magic for you.

First enable the clustering engine:

loadmodule "clusterer.so"
# node ID 1 is the active, node ID 2 is the backup
modparam("clusterer", "current_id", 1)
# load the definition of the cluster ID 1 (with nodes 1 and 2)
modparam("clusterer", "db_url", "mysql://..........")

Enable realtime replication for the user registration data:

modparam("usrloc", "working_mode_preset", "full-sharing-cluster")
modparam("usrloc", "location_cluster", 1)

Enable realtime replication for the ongoing dialogs data:

modparam("dialog", "dialog_replication_cluster", CLUSTER_ID_HA)

Enable realtime replication for the presence data:

modparam("presence", "fallback2db", 1)  # full DB sharing
modparam("presence", "cluster_id", 1)

In the drouting module, enable the replication for the status of the gateways and carriers:

# replicate gw/carrier status inside the cluster
modparam("drouting", "cluster_id", 1)

What we achieved at this point is to have the two OpenSIPS instances to replicate one-to-the-other other in relatime various information about runtime data like user registrations, ongoing dialogs, presence data or gateway/carrier status. Keep in mind that the clustering engine requires the definition of a BIN protocol listener in the config file too.

Step two – managing the active role in the cluster

What we still have to do is to control the cluster, in the way of deciding which OpenSIPS instance is playing the active role and which one is acting as backup.

Maybe from data replication perspective this is not relevant, but OpenSIPS is an intelligent application that not only stores data, but also takes actions based on the data it has. For example, for the user registration data it is not enough to only share it, but we need to decide which node/instance is responsible for expiring registered contact or for probing/pinging the registered user.

More complex is the case of dialog sharing. Again, we need to decide which node is responsible for expiring the dialogs (and generating CDRs), which one is doing in-dialog pinging and more. In a similar way, for drouting data, we need to have only the active instance doing the gateway probing (and to replicate the date to the backup).

Shortly said, we need a way to “inform” the nodes which one is active and which is backup, so that only the active one will perform all the actions in the cluster. This is done via the clusterer sharing tags mechanism .

So we only need to tag the replicated data and to decide which OpenSIPS instance is active for that tag (and implicitly responsible to perform actions on the tagged data).

First, let’s define a tag:

# define "service" tag in cluster 1, with the default
# status at startup of "active" (on backup we set it as "backup")
modparam("clusterer", "sharing_tag", "active/1=active")

And now start tagging data. For user registration :

save("location",......, "service");

For dialog data:

create_dialog();
set_dlg_sharing_tag("service");

For presence data:

handle_subscribe( , "service");

For Dynamic Routing data:

modparam("drouting", "cluster_sharing_tag", "service")

Now that the data is tagged, let’s see how we can dictate which OpenSIPS instance is “active” by changing the status of the sharing tag.

It is outside the scope of this post to talk about the tools that decide which server is to be active, tools like VRRPD or KeepAlive. But each of them provides hooks to get notified when a node becomes active or inactive (like dead or IP unreachable).

Such hooks may be used to also inform OpenSIPS instance about the change in status. In our case we only need to inform the instance that takes over the role of being active, as the other instance will automatically step down. In such a hook, when getting notified as becoming active, we need to run:

opensipsctl fifo clusterer_shtag_set_active service/1

This command will force the “service” tag in cluster 1 to become active on the server. There is no need for any action on the newly became backup node. This will be notified via the clustering support about its new status as backup.

By activating this single tag on the new active server, we make the local OpenSIPS instance responsible for registration’s managing and pinging, for the dialog expiration, pinging and CDR generation, for presence notification triggering, gateway probing and so on and so forth.

All these services in OpenSIPS can be toggled active/inactive with a simple MI command integrated in your High-Availability system (vrrpd, keepalived). This is what we call unified clustering – controlling the whole clustering support in OpenSIPS via a single tag and a single command.

What is left for you to do ? Just seat back and relax… and make your customers happy!

How can you not love OpenSIPS 3.0?

More reasons? Join us in Amsterdam for the OpenSIPS Summit and learn more about OpenSIPS 3.0!

 

One thought on “Achieving service redundancy in two steps with unified clustering in OpenSIPS 3.0

Leave a comment