How to Eliminate Zombie Registrations in OpenSIPS

The registration process is an important mechanism in SIP. It allows the users to get in contact with the service, to announce their location and to receive calls from the service.

The health of your SIP service, and its performance, depends on how well this registrations process works and how accurate and up-to-date the registration info is stored. And the most negative impacting thing here are “zombie” registrations.

What is a zombie registration? A registration done by a SIP device that disappears before the expiration of its registration. For example let’s say a device registering a contact for the next 30 minutes, but after 5 minutes the device disappears. How can a device disappear?

crashing soft/hard phone
network unplug (cable, interface)
power unplug of a hard phone
network disconnection (like NAT)
IP roaming (mobile device migrates into a different network)

So, there are plenty of reasons that may cause zombie registrations. But why are they so harmful for a SIP service?

a zombie registration means the SIP server is trying to send the call to an not-answering destination. This means delays, a higher PDD, a worse user experience.
if TCP based protocols are involved, sending the call to such a zombie destination will translate into an attempt to create a TCP connection towards such an not-answering destination. This means potentially blocking I/O operations that impact the performance of the SIP server.
the magnitude of the registrations – systems may have tens or hundred of thousands of registering user, so any small deviation on the registration handling will get multiplied with the number of users.
breaking the call routing logic – a false registration may change the way the call is routed, leading to undesired effects.

OpenSIPS and zombie registrations

Now, let’s see what OpenSIPS has to offer in order to deal with zombie registrations. The nathelper module is already providing SIP OPTIONS pinging for the registered device. But by default is a passive one, doing nothing more than creating the necessary traffic to keep the NAT pinholes open.

Optionally, you can configure the module to wait and interpret the replies received for the SIP OPTIONS requests – basically to detect if the pinged contact/device is still reachable and alive:

...
## ping all UDP contacts with OPTIONS
modparam("nathelper", "sipping_bflag", "SIPPING_ENABLE")
## the timeout (seconds) for a ping to fail
modparam("nathelper", "ping_threshold", 10)
## number of failed pings before removing contact 
modparam("nathelper", "max_pings_lost", 5)
## branch flag to activate ping based removal
modparam("nathelper", "remove_on_timeout_bflag", "SIPPING_RTO")
...
route {
   ...
   if (is_method("REGISTER") {
      setbflag("SIPPING_ENABLE");
      setbflag("SIPPING_RTO");
      save("location");
   }
   ...
}

With the above setting, if a contact is marked (during registration) with the “SIPPING_RTO” branch flag, after 5 (in a row) pingings that failed with a 10 seconds timeout, the contact will be automatically discarded by OpenSIPS. This discarding operation does not produce any kind of SIP signaling.

TCP specific handling

The above pinging approach works perfectly tuned for UDP. But TCP is a bit different – there is no NAT pinging (not needed, as the TCP connection ensures the tunneling via NAT), so we need a some different approach.

The solution for TCP based contacts exploits the fact that, in parallel with the SIP registration state, we also have a state kept at transport level – the TCP connection itself. Ideally, when dealing with registrations, the TCP connection used by registration should be kept up for the whole duration of the registration. Why? So the SIP server will be able to reach back the registered device without the need of opening a TCP connection. This is required because in most cases the registering devices are behind NAT networks and it is impossible to open a TCP connection to a target behind NAT. So, better keep the initial TCP connection up and re-use it to reach back the device:

modparam("registrar", "tcp_persistent_flag", "TCP_PERSIST_DURATION")
...
route {
   ...
   if (is_method("REGISTER") {
      if ( $socket_in(proto)=="TCP")
         setflag("TCP_PERSIST_DURATION");
      save("location");
   }
   ...
}

This will prevent OpenSIPS for closing the inbound TCP connection that was created by received REGISTER requests.

To handle the case when the device may close the connection (due to its disappearance) , we can configure OpenSIPS to catch the event related to TCP connection closing and remove any contact matching the IP and port of that connection.

event_route[E_CORE_TCP_DISCONNECT] {
   # remove any contact matching the far end
   # (source) of the TCP connection 
   remove_ip_port( $param(src_ip), $param(src_port),"location");
}

By doing this, when the TCP connection to the device is lost, the matching registration (that became un-reachable) will also be removed by OpenSIPS.

Conclusion

In order to keep your OpenSIPS as efficient as possible, it is a very good practice to try to get rid of the zombie registrations. Even more, this will improve the user experience when come to dialing delays.

In a future post, we will expand a bit more the things you can do in order to improve your setup when comes to registrations and delays. So stay tuned.

How to Eliminate Zombie Registrations in OpenSIPS

OpenSIPS and zombie registrations

TCP specific handling

Conclusion

Published by bogdan1iancu

One thought on “How to Eliminate Zombie Registrations in OpenSIPS”

Leave a comment Cancel reply

OpenSIPS and zombie registrations

TCP specific handling

Conclusion

Share this:

Related

Published by bogdan1iancu

One thought on “How to Eliminate Zombie Registrations in OpenSIPS”

Leave a comment Cancel reply