The new Clustering Engine of OpenSIPS 2.4

UntitledBefore even thinking of building clustering support for high level services like User Location, Dialog Tracking or SIP Presence, it is mandatory to have in place a powerful and flexible clustering engine. Such an engine will become the reliable foundation  for approaching more complex clustering scenarios.

And this is what we did in OpenSIPS 2.4 (the future major release at this current moment). We invested a lot of work to improve and consolidate the clusterer module, the under-the-hood clustering engine of OpenSIPS.

Topology layer

This layer is responsible of managing the cluster topology. And starting with OpenSIPS 2.4, we have two great additions at this layer.

The topology management was shifted to a more flexible approach. As an alternative to the more rigid cluster definition via database (where you need to define all the nodes forming the cluster), the new engine allows dynamic building of the cluster, without any nodes pre-configuration. It is very simple and easy: whenever you have a new node that needs to join to a cluster, just point it to any of the existing nodes – and the new node will be “adopted” by the whole cluster.

The ability to modify the cluster topology in a simple and zero penalty manner is a critical need. The operation of dynamically adding or removing nodes into a production OpenSIPS cluster must be fully transparent for the services using the clustering topology.

Also the topology of the cluster is now more consistent by enforcing bidirectional checks over the links between the nodes. Such checks reduce the risks of race conditions when the topology of the cluster changes, like when nodes are joining or leaving the cluster. This mechanism also makes the start up sequence of the cluster more robust, as all nodes are trying to get together and establish links between them.

Having a consistent topology is critical for application-layer data sharing via the cluster support. The application layer relies on such topology information in order to decide which should be the node to be used as data sync source or to decide which are the nodes that can partition an activity.

Middle layer

OpenSIPS 2.4 brings this new middle layer to help bridging in a more powerful way the low level topology layer and the high level application layer.

This middle layer, or the capabilities layer, is responsible to map on top of each cluster node the list of capabilities (and what their status is) that it is able to offer. A capability can be seen as a “service” implemented by the application layer with the help of the topology layer. For example, the “dialog” module offers two capabilities  : (a) ongoing call replication and (b) call profiles replication.  For each node in the cluster, the clusterer module will know what is the status of available capabilities.

For example, for the “call replication” capability, the information held per node will say if the node is connected to the cluster, if it has old data or if it is fully sync’ed with the rest of the cluster. Such information is critical when a new node needs to do a full data sync from the cluster it just joined – it needs to identify the nodes that have valid (sync’ed) data, able to become data donors.

Application layer

Usually the top layers are the ones visible and appreciated by the users. Nevertheless, a top layer without all the underneath layers is useless.

So, what are the new improvements that OpenSIPS 2.4 has to offer with regards to cluster-enabled, high-level applications? Just to be sure, at this layer we do talk about applications like User Location, Call Tracking or SIP Presence. These are the end-users of all the clustering capabilities.

The most important addition here is the full data sync’ing between the nodes of the cluster. What does that mean? A node with no data, like a freshly started node or a disconnected node, can update itself with the application related data by performing a full data transfer from a valid cluster node. For example, if we insert a new node into an User Location oriented cluster, the node will fetch the entire registration set from any of the valid nodes in the cluster; and it will be ready to operate in no time.

Benefits ? Such on-demand data sync will update the node in no time, without the need of anything external (i.e. an external database). You plug a new node and in a matter of milliseconds it will be fully sync’ed with the rest of the cluster and ready to process traffic. Do you need to restart a node ? no worries, after coming back online it will sync and be ready to operate.

This is a powerful feature that allows bulk data replication between nodes (for sharing or backup purposes). It is simple to use (everything is automatic) but it is a complex mechanism – just think about : you need a way to identify the right donor (and the super-seeder) to grab the data; if the data to be sync is too large, some internal chunking is done; if there are too many chunks of data, there may be races between data being sync and realtime replication data received from the cluster. But do not worry, we took care of all these for you :).

The second important addition is the action partitioning support – you cannot talk about clustering without any kind of partitioning. OpenSIPS 2.4 is able to partition actions over data – if a piece of data is shared across multiple nodes and a certain type of action needs to be done for that data, you want (1) to be sure at least one node does the action and (2) the entire set of actions for the all the pieces of data is distributed over all the nodes in the cluster.

A typical example here is the pinging of the registered contacts. Let’s say you have 100 thousands contacts and all need to be pinged. All these contacts are shared between all the nodes in the cluster, so theoretically any node may ping any contact (please refer to this document describing the models for clustering registrations). So, you would like to have the pinging effort partitioned between all the nodes holding that data. Of course, all the active nodes in the cluster – even more keep in mind that the cluster topology may change – nodes may join, nodes may drop.

Well, the partitioning support in the clustering module will provide the needed information to the User Location module, so that the pinging is perfectly distributed among the nodes and, of course, no ping is missed.

And the last but not the least, the OpenSIPS 2.4 will offer you (as script writer) the possibility to implement cluster oriented services directly into the OpenSIPS script. Yes, from the script level you are able to send custom messages across the cluster, to a given node or to all (broadcast). And again, at the script level you can receive such a message, handle it and reply back to the originating node. That is a cool way to implement your own custom logic and messages without touching the the OpenSIPS code – see the cluster_send_req() and cluster_send_rpl() functions.

What’s next ?

We will follow with more posts on how the clustering support impacts the applications like User Location, Call Tracking, Anycast and Presence. These will be the actual ways to benefit of the clustering support.

Also, keep in mind that we will hold presentation and run several interesting demos covering the clustering scenarios during the OpenSIPS Summit in Amsterdam .  So , do not miss it 😉


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s