Quality-based PSTN Routing in OpenSIPS 3.1 LTS [part 1]

Introduction

Note: some information in this post may be outdated. Make sure to also read the follow-up post for the final view on qrouting’s behavior.

Up until today, the open-source SIP server garden seems to have yielded quite a handful of ways to perform PSTN routing. Some VoIP operators prefer to pull the latest rate sheets and rebuild their LCR rules on a daily basis, then attempt the same (cost optimal!) list of gateways for each PSTN prefix throughout the day. In OpenSIPS land, this behavior can be replicated using the drouting (Dynamic Routing) module.

Other VoIP operators prefer a more holistic approach, where they would rather balance their egress PSTN traffic to all gateways they are interconnected with. Moreover, since some of the gateways may have different incoming traffic limitations than others, an often acceptable solution to this problem is to use a weight-based approach when terminating traffic to these PSTN gateways. In OpenSIPS, this can also be achieved using drouting.

But is there an actual solution for operators which want to build a service or package with a strong and consistent quality guarantee? What if they want to prioritize PSTN gateways with fast and reliable SIP response times rather than the ones with the least cost? Also, can they easily implement the idea of routing to the gateway with the best media quality?

So far, at least in open-source, we’ve not seen any implementation attempts in the latter direction. Well, at least not until the qrouting module got merged into master branch OpenSIPS this week!

How Does qrouting Work?

The new OpenSIPS qrouting (Quality-based Routing) module aims to solve exactly the “signaling/media quality guarantee” problems discussed above, opening up doors to new types of service offerings for VoIP operators.

During live PSTN routing, the module will continuously collect signaling statistics for each (prefix, destination) pair. If you are not familiar with the drouting (Dynamic Routing) module and terminology, a destination may be either a PSTN gateway or a PSTN carrier (fancy way of saying “N gateways”). The sampled statistics are periodically rotated, with the oldest ones being dropped with each new sampling_interval, such that at any point in time, only a maximum of history_span minutes worth of statistics are being kept for each pair.

Using these statistics, the module hooks into drouting and dynamically alters the behavior of the do_routing() function, such that the destinations may now also get sorted using the newly added “quality” algorithm, as an alternative to the classic “weight-based” or “failover” algorithms. The latter ones are not gone by any chance, they are still production-ready!

Signaling Statistics

Here is a summary of the signaling statistics that qrouting is collecting from the PSTN gateways:

  • ASR (Answer Seizure Ratio) – the percentage of telephone calls which are answered (200 reply status code)
  • CCR (Call Completion Ratio) – the percentage of telephone calls which are signaled back by the far-end client. Thus, 5xx, 6xx reply codes and internal 408 timeouts generated before reaching the client do not count here. The following is always true: CCR >= ASR
  • PDD (Post Dial Delay) – the duration, in milliseconds, between the receival of the initial INVITE and the receival of the first 180/183 provisional reply (the call state advances to “ringing”)
  • AST (Average Setup Time) – the duration, in milliseconds, between the receival of the initial INVITE and the receival of the first 200 OK reply (the call state advances to “answered”). The following is always true: AST >= PDD
  • ACD (Average Call Duration) – the duration, in seconds, between the receival of the initial INVITE and the receival of the first BYE request from either participant (the call state advances to “ended”)

Quality Routing Profiles

We know our statistics and we have full monitoring for them on each (prefix, destination) pair. But how do we specify when a destination is performing good or bad? In qrouting, this is done using the qr_profiles table.

To keep things flexible, we defined two levels of “bad” for each monitored statistic: “warning” and “critical”. When a (prefix, destination) combination exceeds the “warning” threshold, it will receive a mild scoring penalty (1). Should it ever exceed the “critical” threshold, the scoring penalty will be severe. For good measure, we set it to 10x the warning level penalty (10). If you want to disable a threshold from computation, just set it to -1. Note that disabling the “warning” threshold will also cause the “critical” one to be disabled, regardless of the actual value placed for that field.

Lastly, we added the possibility to build your own scoring formula. By also allowing a weight (default: 1) for each statistic in the qr_profiles table, developers may prioritize one stat over the others or reduce the importance of a stat.

Scoring a Destination

The perfect score for a (prefix, destination) combo is… 0! A 0 score means: “no penalties, all stats are below all thresholds”. So in qrouting, a lower score is better. Here are 3 example destinations and scores for prefix +40:

ASRCCRPDDASTACDScore
GW1crit (+10)okokokok10
GW2okokwarn (+1)okwarn (+1)2
GW3warn (+1)warn (+1)warn (+1)warn (+1)warn (+1)5
Example qrouting assessment for routing to prefix +40

In the given point in time, qrouting would sort the gateways as: GW2, GW3, GW1, so GW2 would be the first attempted gateway when routing to Romania, since it would have the least amount of penalties. I could have complicated this table and calculus by adding weights for each statistic, but I will leave that as an exercise to the reader! 🙂

Managing Destinations

The qrouting module is mostly transparent at opensips.cfg level, as both the statistic collection and gateway re-ordering are performed behind the curtains. However, we did provide an event interface hook, giving you the option to take action whenever a destination is misbehaving:

event_route [E_QROUTING_BAD_DST]
{
    qr_disable_dst($param(rule_id), $param(dst_name));
}

Here is an example where the E_QROUTING_BAD_DST event is triggered, because a destination’s score fell below event_bad_dst_threshold. From here, OpenSIPS script developers may invoke qr_disable_dst() in order to completely remove that destination from routing until qr_enable_dst() is called (if ever).

The above commands for managing destinations are also available via the MI interface, among other useful ones!

Credits

We would like to give a big shout out to Mihai Țigănuș, the original author of the module, who wrote it as his summer internship project at OpenSIPS Solutions a few years back ago. Starting from there, we brushed it up a bit, wrote some documentation and promoted it to alpha version, to be available in the upcoming OpenSIPS 3.1 LTS!

If you’d like to find out more about qrouting and other new and cool features of OpenSIPS 3.1 in a relaxed and information-rich experience among the best professionals in the VoIP and RTC industry, we would like to warmly invite you to book your seat at the Amsterdam OpenSIPS Summit 2020!

4 thoughts on “Quality-based PSTN Routing in OpenSIPS 3.1 LTS [part 1]

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s