Distributed User Location replication using OpenSIPS

user-location-mark_318-39506When building a large, highly-available, distributed VoIP platform, you definitely need to take into account geographically distributing you servers in different locations. And if you want them to be redundant and fail-over in case one of them goes down, you need to share a large amount of data between them. Users’ location is a particular sensitive information that needs to be shared – in case one of the registrar servers crashes, a different one needs to take over all its customers. This article discusses about several aspects that need to be considered when distributing customers location to multiple nodes and the way OpenSIPS deals with them.

User location indicates how the customers can be reached from your platform down to their SIP phones. It is important not to loose this information in case one node goes down, otherwise you won’t be able to terminate their inbound calls. Thus you need a mechanism to make this data generally available to other nodes in order to be able to fail-over.

How to replicate user location

User location data is particularly sensitive because data is very big and dynamic: your platform might have hundred of thousands customers that can register/un-register anytime. So we need to have this information replicated relatively fast to all the nodes in our platform, while ensuring the consistency of the data. This can be achieved in OpenSIPS in two ways:

  • store user location in an SQL database (mysql, postgres) and use traditional database replication mechanisms to distribute data to all the nodes in your platform. While this is a fairly common solution, it suffers when comes to performance: inserts and updates every time a user changes its registration status might result in big performance issues, and also querying the database for every call is not really desirable;
  • use OpenSIPS binary interface – a native protocol used to replicate data between multiple OpenSIPS nodes. This is a more efficient approach because the data is exchanged directly between all the instances in the cluster and cached by each of them in memory. Configuration is simple as well, all you have to do is to specify the cluster you want to replicate contacts to and accept the received contacts:
  • modparam("usrloc", "accept_replicated_contacts", 1)
    modparam("usrloc", "replicate_contacts_to", 1)

Network constraints

Customers can also have some network topology constraints: for example they might have rigid firewall configurations that only allow a certain server IP to talk to them. Therefore, you have to make sure that if the registrar server goes down, the new server that handles the customers will be able to reach them at the networking layer. Ideally, the new server should have the same IP coordinates.

Similar constraints apply when the customers are located behind NAT’ed networks. Since the customer’s router opens a pin hole with a certain IP address, when the new registrar fails over, the router needs to see the requests coming from the same IP address, otherwise it will reject them.

In order to solve these restrictions, when replicating the location of a user, besides its IP and port, we also replicate the socket/interface the REGISTER message was received on, as well as all the hops the message has gone through (the “Path” header, according to RFC3327). Depending on your platform’s layout, one or the other might come in handy.

You are using a SIP Edge Proxy between your SIP clients and your core registrars

When a client registers, it communicates only with a specific Edge Proxy (chosen by DNS or whatever mechanisms). The proxy adds itself in the “Path” header, and forwards the REGISTER message to a specific registrar, which then replicates the information to all the other registrars.

When that registrar crashes, any new server (with an arbitrary IP and port) that takes its place, it will use the replicated “Path” information to find out the Edge Proxy that can reach that customer. Then it will use that specific Edge Proxy for any inbound messages sent towards the client. Since the client will always talk to the Edge Proxy, no matter what registrar is behind it, it will always receive the messages from the same IP and port (the Edge Proxy’s one), thus the messages will pass any router’s NAT or firewall configuration.

The “socket” information from the replicated data is not particularly useful in this scenario, because the new registrar will probably don’t even have that IP assigned, so it can not use it anyway, it might as well ignore it. Instead, it will use its own IP to forward the requests to the Edge Proxy. As long as they are all part of the same platform, everything should be fine.

You are not using an Edge Proxy

If you are not using an Edge Proxy in your platform and your customers are facing directly the registrar servers, you will have to use a different approach: when the initial registrar server for a customer crashes and a new one takes its place, it will have to use the same IP and port as the initial one – the one specified in the “socket” information previously replicated – otherwise it will not be able to pass firewalls or NAT networks. This means that the old and the new registrar servers need to use that same IP. This “magic” can be done using high-availability tools such as vrrpd or keepalived.

Since multiple OpenSIPS instances need to have that IP assigned in order to be prepared to use them in case of fail-over, they all need to bind on that IP and port. However, having the same IP assigned to two (or more) different machines in the same network is not such a good idea from networking perspective. But, if you make sure that only a single machine actively uses that IP at a certain point, this should be fine. Both tools presented above can be configured to use iptables in order to make sure that only a single machine is active. Alternatively, one can set the following configuration in its sysctl configuration file:

net.ipv4.ip_nonlocal_bind = 1

This setting instructs the Operating System to allow OpenSIPS to bind on any IP requested. Of course, you still need to configure vrrpd or keepalived to move the shared IP when the fail-over happens.

Conclusions

This article presents some of the challenges of sharing user location information in a large, distributed and redundant VoIP platform and how you can use OpenSIPS, as well as other tools, to solve them. It also shows how OpenSIPS can act in different setups in order to accommodate this requirements.

However, these are not the only aspects that need to take into consideration when talking about sharing dynamic data in distributed VoIP solutions. Dialogs sharing, profiles, call limitations are also challenging discussions, but all these are subject to a different post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s