Monitoring OpenSIPS using Prometheus and Grafana

Monitoring real time statistics is a great tool to assess the performance of your services, as well as for detecting, and possibly preventing, unfortunate events. And visualizing the monitored statistics in a graph or chart can definitely improve your DevOps team experience, as well as reduce the troubleshooting time of possible failure events.

In a previous blog we have showed you how you can produce interesting real-time statistics for your services using the new OpenSIPS 3.2 time series statistics. In this article we will show you how you can use Prometheus to monitor these statistics and nicely plot them in a Grafana dashboard, that you can easily monitor using your web browser.

Let us see how we can achieve this.

Prometheus

Prometheus is a monitoring and alerting tool that is capable of gathering statistics from different sources and execute some actions (alerts) based on certain conditions. It works using a pull model by periodically pulling statistics from its monitored tools over HTTP queries. It also stores its gathered statistics in a highly efficient, multi-dimension, time series database that can be queried using its specific protocol (PromQL).

OpenSIPS and Prometheus

Prior to OpenSIPS 3.2, there was no native way of integrating OpenSIPS statistics in Prometheus. Nevertheless, you could use a third-party daemon to facilitate this integration:

  • OpenSIPS Exporter is a stand-alone server able to proxy/translate Prometheus pull queries to OpenSIPS’ get_statistics query over the MI interface, as well as annotating the statistics with labels and types. All you need is to install the daemon and configure it to use one of OpenSIPS’ MI connectors
  • Pushgateway is a daemon provided by the Prometheus creators that can be used to store statistics for short-life jobs, that could run for even less than the pull period. The idea is that before exiting, these jobs could push its statistics to the Pushgateway daemon, and it will store them until the Prometheus’ scrape job comes to fetch them. One could abuse this mechanism (as we have done during this year’s ClueCon TGI2021) to periodically push certain statistics to the gateway using the rest_client (example)

OpenSIPS native Prometheus connector

Starting with OpenSIPS 3.2 we have developed a native Prometheus module that is able to provide statistics to the Prometheus monitoring tool directly over the HTTP pull mechanism it requires. The new module is also able to filter the statistics exposed as well as annotate them with the group they belong to, and their type (counter or gauge).

Putting things all together is trivial: all you need to do is to install OpenSIPS and Prometheus, and configure them as it follows:

  • OpenSIPS: load the prometheus module and its dependencies (the httpd module), and set up the desired statistics you want to publish. Note that you can publish individual statistics, as well as groups, using the statistics parameter
loadmodule "httpd.so"
loadmodule "prometheus.so"

modparam("prometheus", "statistics", "active_dialogs load: stats:")

The snippet above will provide to the Prometheus server the individual active_dialogs statistic (provided by the dialog module), the load class statistics (provided by the OpenSIPS core) and the stats group statistics (custom real-time statistics built in the previous blog).

  • Prometheus: set up a new scarpe job that would query OpenSIPS on the port the httpd module is listening (defaults to 8888)
scrape_configs:
  - job_name: opensips
    static_configs:
    - targets: ['localhost:8888'] 

Feel free to setup your alerts as well, but we are not touching that aspect in this post.

That is it – OpenSIPS’ statistics are now periodically queried by Promethus and the data is stored in its time-series database. Let us see how we can visualize this more easier.

Grafana

Grafana is a tool used to create monitoring dashboards, that can plot several real-time statistics in charts/graphs. It is able to gather data from various sources, including from Prometheus, which is the one we will be using.

Setting this up is quite simple, all you have to do is install Grafana and set the Prometheus server as a data source (here is an extensive tutorial on how to do that). Once this is up, create your dashboard with the desired statistics from Prometheus.

You can find here an example of a dashboard that plots gateways time-series statistics (PDD, ASR, AST, CCR, etc) gathered by OpenSIPS using a script similar to the one obtained in the previous post. You can find below a snapshot of that dashboard, where we artificially fluctuate different gateways parameters and watch how PDD, CCR, ASR and number of calls fluctuate over time.

Example

A few examples about what can done with this setup have been demonstrated during the OpenSIPS Workshop in the ClueCon TGI 2021 event. For example, @

You can find a complete demo of how to plot these statistics, and how they change over time in our ClueCon TGI 2021 presentation:

Feedback on Gateways behavior changes

Later on the presentation you can see how the reported statistics can be pushed in the routing algorithm so that you can obtain routing based on quality (using the Quality Based Routing module in OpenSIPS):

Feedback on Quality Based Routing

Happy hacking!

Leave a comment