The “usrloc” module will benefit from a significant revamp in OpenSIPS 2.4. In this article, we’re going to take a brief look at the newly-introduced usage dynamics of the module and how we can exploit them to the fullest.
Clustering: Then
By using a pre-2.4 version of OpenSIPS, you really did not have a straightforward way to distribute the user location across multiple data centers. The only mode that promised to solve this problem was the so-called “DB_ONLY” mode, which fully moves the user location data set into an SQL database (i.e. all reads and writes), but this was somewhat problematic, mainly due to these real-life problems:
- extraneous pinging: all OpenSIPS instances hooked into the distributed database will unconditionally ping all contacts in DB_ONLY mode. Even to the untrained eye, the difference between sending N pings versus sending 1 ping is noticeable. If we’re a network administrator or just have a good taste for computer science problems in general, then this difference becomes enormous.
- no support for “home registrar routing”: DB_ONLY mode does not properly take into account IP-restricting scenarios where you look up a subscriber in one location, yet the call must only egress towards them from their actual “home” location. If we attempt to route calls to these subscribers from a different location (even if we know the “received” IP:port of the subscriber!), we will quickly realize that they are unreachable. A classic and common example of this situation is an OpenSIPS edge registrar with a public IP with a registering UA which sits behind a NAT device.
But, wait! usrloc also had a “binary replication” feature. What’s up with that?
Well, things were only beginning to take shape back when binary replication for usrloc was introduced in OpenSIPS 1.11. In a nutshell: binary replication was initially only meant to provide a “hot backup” box that could instantly take over in case the active one catches fire. Throw a Virtual IP (VIP) in front of them, and you’re golden. It was never meant to help you form clusters or distribute data across locations, although its naming apparently resembled the ability to achieve that as well.
From there onward, no major user location change was introduced throughout OpenSIPS 2.1, 2.2 or even 2.3, as we constantly consulted with our community (find the public discussions here and here) in order to figure out what is the next big step to take.
Clustering: Now
I can say one thing for sure: all those discussions were put to excellent use for the 2.4 release! Both the community’s clustering requirements and some of our own have been condensed into something that we now call “working mode presets” for the OpenSIPS user location service.
We can break these presets down into three categories:
1. Classic Presets
These are the classic user location settings that we are all used to: NO_DB, WRITE_THROUGH, WRITE_BACK, and DB_ONLY (brush up your memory about them here). Although “db_mode” is deprecated in 2.4, we’ve kept it backwards-compatible. No script migration is required.
2. “Federation” Preset
In OpenSIPS 2.4 user location terminology, a “federated cluster” is formed by OpenSIPS nodes with independent data sets. The main factor driving this topology is the requirement that the registration service must directly face the SIP User Agents (UA). In such cases, all SIP traffic routing back to these UAs will always be forced through their “home registrar” (the server they registered with). This helps overcome a wide range of possible IP restrictions: NAT devices, firewalls, TCP connections, etc.
For the 2.4 release, user location federation will only work with an additional NoSQL database:
Federated Cluster + NoSQL “metadata” database
A user location clustering variant designed to provide an optimal mix between lookup speeds and global platform scalability. The OpenSIPS nodes only hold their local user location dataset, and publish local Address-of-Record (AoR) availability in the NoSQL database using some trivial records, such as {“id”: <idstring>, “aor”: “liviu@example.com”, “home_ip”: “10.0.0.10”}. Restart persistency is achieved through cluster sync — a node-to-node, TCP-based, binary-encoded data transfer process (you can read more about it in this blog post).
- required services: OpenSIPS + NoSQL database. (MongoDB or Cassandra in 2.4)
- restart persistency: sync from cluster (requires 2+ OpenSIPS cluster members).
- high availability: optional.
- home registrar routing: built-in solution.
- NAT pinging: built-in solution. Only 1 ping gets sent, originated by home registrar.
- dataset mirroring: optional, to the HA node. Otherwise, each OpenSIPS node only keeps its own data.
- geo distributed: yes.
- horizontally scalable: yes*.
- lookup speed: 1 x in-memory lookup + 1 x NoSQL lookup.
* the more locations (Points of Presence) our platform has, the better this mode will shine.
3. “Full Sharing” Presets
In OpenSIPS 2.4 user location terminology, a “full sharing cluster” is formed by OpenSIPS nodes which have the luxury of being front-ended by at least one SIP proxy residing on the same platform. All of a sudden, a whole world of IP-restriction induced problems and implications simply vanishes away! We can definitely say that the problems that a “full sharing user location cluster” is able to solve are a subset of the problems solved by a “federated user location cluster”. However, this does not mean that one is better than the other, as you will see below.
Let’s take a look at the two full sharing user location cluster presets of OpenSIPS 2.4:
Full Sharing Cluster
A user location clustering solution designed to be OpenSIPS-only. The cluster is front-ended by one or more SIP proxies, which alleviates any call routing constraints. No SQL or NoSQL database is required. Restart persistency is achieved through “cluster sync”.
- required services: OpenSIPS.
- restart persistency: sync from cluster (requires 2+ OpenSIPS cluster members).
- high availability: not needed! (any cluster node can route out to any user)
- home registrar routing: not needed.
- NAT pinging: built-in solution. Only 1 ping gets sent. Nodes share ping workload.
- dataset mirroring: Full mesh. Each OpenSIPS node holds all SIP user location data.
- geo distributed: yes.
- horizontally scalable: yes, as long as the dataset fits in RAM ((*1), below).
- lookup speed: 1 x in-memory lookup.
Full Sharing Cluster with NoSQL storage
A user location clustering variant designed for large-scale deployments. At this scale (tens/hundreds of millions of subscribers), we are no longer able to fit the entire dataset into OpenSIPS RAM. Moreover, features such as “cluster sync” could reach their limitations, so it’s best if we delegate all data storage and management responsibilities to a specialized, data-sharding-enabled NoSQL engine.
- required services: OpenSIPS + NoSQL database. (MongoDB or Cassandra in 2.4).
- restart persistency: not needed! (all data resides in NoSQL)
- high availability: not needed! (any node can route out to any user)
- home registrar routing: not needed.
- NAT pinging: built-in solution. Only 1 ping gets sent. Nodes share ping workload.
- dataset mirroring: None. OpenSIPS nodes hold no data, only route traffic.
- geo distributed: yes.
- horizontally scalable: yes*.
- lookup speed: 1 x NoSQL lookup.
(*): solely driven by the horizontal scalability of the NoSQL database.
Summary
A lot of time and efforts have gone into the OpenSIPS 2.4 user location revamp. Feedback and ideas have been gathered from OpenSIPS users, developers, VoIP enthusiasts as well as platforms and businesses which are powered by OpenSIPS. All of this has led to what we now believe to be a solid direction for the OpenSIPS user location, one that is tailored to fit an overwhelming majority of use cases.
If you want to find out more about this topic as well as discuss OpenSIPS/VoIP with like-minded fellow telephony enthusiasts, do not miss the Amsterdam 2018 OpenSIPS Summit, May 1-4! See you there!
(*1) holding the entire dataset on each box may seem like a limitation to scalability, but the OpenSIPS user location data structures are well-optimized and configurable, allowing for hundreds of thousands of subscribers to be stored while maintaining close to O(1) lookup speeds with proper configuration. This allows this mode to be quite performant up until deployment sizes reach large scales.
This article is well summed up and helps me sort out the differences between the various working mode presets. But I have some doubt. What’s the difference between restart persistency and dataset mirroring? What is the purpose of restart persistency? why restart persistency is needed in Federated Cluster(in this mode, every opensips just need keep data of client that registered to it)?
LikeLike
Hey, @sxtang!
The purpose of restart persistency is to allow you to “systemctl restart opensips” without losing dialogs, online subscribers, presence subscriptions, etc.
The wording “dataset mirroring” refers to the fact that two cluster nodes have (and keep sharing changes to) the same data set. Example such data: live dialogs, online users.
Correct: in federated clustering, each OpenSIPS keeps its client data and does not waste bandwidth, cpu, etc. to broadcast this data to everyone. But what does this have to do with restart persistency? Or even high-availability? We may want to restart this node and still have the data. Or we may want to allow it to crash & burn while still keeping the service up!
LikeLike