In order to provide secure SIP communication over TLS connections, OpenSIPS uses the OpenSSL library, probably the most widely used open-source TLS & SSL library across the Internet. The fact that it is so popular and largely used makes it more robust, therefore a great choice to enforce security in a system! That was the reason it was chosen to be used in OpenSIPS in the first place. However, being designed as a multi-threaded library, while OpenSIPS is a multi-process application, integrating it was not an easy task. Furthermore, maintaining it was not trivial either. And the major changes in the OpenSSL library within the last couple of years have proven that. Once the library maintainers decided to have a more robust thread-safe approach, things started to break in OpenSIPS. Hence the numerous issues reported withing the last couple of years related to SSL bugs and crashes. The purpose of this post is to present you the challenges we faced, and how we dealt with them.
This article describes the way OpenSSL, a multi-threaded designed library, was designed to work in OpenSIPS, a multi-process application, and what was the journey of maintaining the code by adapting to the changes throughout the years in the OpenSSL library.
The initial design and implementation of TLS support in OpenSIPS was done in 2003. Back then OpenSSL was releasing revision 0.9.6. That’s the version that we have used for the original design and implementation.
OpenSIPS is a multi-process server, that is able to handle SIP requests or replies in multiple processes, in parallel. When a message is received it is “assigned” to any of its free processes, that is responsible of the entire processing of that message. Any of these messages might decide, based on the routing logic, that the request has to be forwarded to the next hop using TLS. This means that any OpenSIPS process worker needs to be able to forward a message using SSL/TLS connections. And naturally, since all these processes run simultaneously, multiple processes can decide to forward the messages to the same TLS destination, raising various consistency concerns.
In terms of design, there were three possible ways of ensuring consistency in this multi-process environment:
- Each process has its own SSL/TLS connection towards each destination. This means that if you have N workers and M destinations, your OpenSIPS server will have to maintain NxM connections. That’s something we should avoid.
- Map each SSL/TLS connection with a worker, and only that worker is allowed to communicate with that endpoint. When a different process has to forward a message to a specific endpoint, it will first send the message/job to the designated worker, which forwards it down to the next hop. Although this looks OK, it involves an extra layer or inter-process communication, for the job dispatching, and it is also prone to scalability issues (for example when the destination is a TLS trunk).
- Keep a single SSL/TLS connection to each destination throughout all the processes, and make sure there’s a mutual concurrent access to it. This seems to be the most elegant solution, as your SIP interconnections will always see a single TLS connection towards your server. However, ensuring mutual access to the connection is not that trivial, as you will see throughout this article.
Nevertheless, since in OpenSIPS we need to address both scalability and ease interconnection with other SIP endpoints, we decided to implement solution number 3.
Although even back then it was advertised as a multi-threaded library, OpenSSL was exposing hooks to use it in a multi-process environment:
- CRYPTO_set_mem_functions() hook could be used to have the library use a custom memory allocator. We set this function to make sure OpenSSL allocates the SSL context in a shared memory, so that it can be accessed by any process
- CRYPTO_set_id_callback() was used to determine the thread that OpenSSL was running into. We used this callback to indicate that the “thread” was actually a process, and each of them has its own id, namely the Process ID (PID)
- CRYPTO_set_locking_callback() was exposing hooks to perform create, lock, unlock and delete using “user” specified locking mechanisms. Using this function we were able to “guard” the SSL shared context (allocated in our shared memory) using OpenSIPS specific multi-process shared locking mechanisms.
That being said, we had all the ingredients to implement our chosen solution using OpenSSL, all we had to do was to glue them together. This is how the first implementation of SSL/TLS communication appeared in OpenSIPS. And it worked out just great throughout the years, up until OpenSSL version (including) 1.0.2.
OpenSSL 1.1.0 new threading API
The turning point
On 25th of August 2016, when OpenSSL 1.1.0, was released, the OpenSSL team decided to implement a new threading API. In order to provide a nicer usage experience to multi-threaded applications that were using the OpenSSL libraries, they dropped the previously used threading mechanism and replaced it with an their own (hardcoded) implementation using pthreads (for Linux). This means that we could no longer use the CRYPTO_set_locking_callback() hooks, as they became obsolete.
Since we were still allocating SSL contexts in shared memory, the locking mechanisms (i.e. pthread mutex structures) were also allocated in shared memory. Therefore, when OpenSSL was using them to guard the shared context, it was actually still using a “shared” memory, therefore the other processes were able to see that the lock/pthread mutex is acquired, resulting (in theory) in a successful mutual exclusion to the shared context.
In practice, however, this resulted in a deadlock (see tickets #1590 #1755 , #1767). Although in general it was working fine, the problem appears when there’s a contention trying to acquire the pthread mutex from two different processes at the same time. Imagine process P1 and P2 trying to acquire mutex M in parallel: P1 gets first and acquires M; P2 then tries to acquire it – because M is in shared memory, it detects that M is already acquired (by P1), thus it blocks waiting for it to be released. When P1 finishes the processing, it releases M. However, due to the fact that pthreads by default is not meant to be shared between processes, P2 is not informed that M was released, thus remaining stuck. This was a problem very hard to debug, because when a process gets stuck, the first thing to do is to run a trap (opensipsctl trap) and check which process is blocked. However, when running trap gdb is executed on each OpenSIPS process, therefore each process is “interrupted” to do a GDB dump. Therefore our trap command would actually awake P2, make it re-evaluate the status of M, and basically unblocking the process and “fixing” the “deadlock”.
Luckily, after a lot of tests and brainstorming, we managed to pinpoint the issue. The fix was quite simple – all we had to do was to set the PTHREAD_PROCESS_SHARED attribute to the pthread shared mutex. However, these mutexes are encapsulated in the openssl library, and there’s no hooks to tune them. After trying to pick some brains from the OpenSSL team, we realized that they are not interested in supporting that, therefore we had to take this issue in our own hands. That’s when we used a trick to overload the pthread_mutex_init() and pthread_rwlock_init() with our own implementation, that was also setting the shared attribute. And our SSL/TLS implementation started to work again.
OpenSSL 1.1.1 new challenges
Once with the OpenSSL 1.1.1 release on 11th of September 2018, new issues started to appear. Due to the fact that the OpenSSL team was trying to make their code base even more thread friendly (without considering the multi-process applications effects), they started to move most of their internal objects in TLS (thread local storage) memory zones. Although OpenSIPS was still allocating OpenSSL contexts in shared memory, these were stored in some locations where only one thread have access. Mixing the two memory management mechanisms resulted in several, unexpected crashes in the SSL library (see ticket #1799).
After reading the OpenSSL library code and understanding the problem, our first idea was to implement a thread local storage that was compatible with multiple processes. This was our first attempt to fix the issue: overwrite the pthread_key_create(), pthread_getspecific() and pthread_setspecific() functions, similarly to the solution we had for OpenSSL 1.1.0 issues, to make them multi-process aware. Unfortunately our solution failed because of two reasons: although the library was no longer crashing, hence the memory operations were now valid, most of the concurrent connections were rejected (only 2 out of 10 SSL accepts were passing through). So this indicated us that there are still some issues with the internal data – although it is now accessible, most likely there is no concurrent access to it, resulting in unexpected behavior. A second issue with this approach was that overwriting the thread local storage implementation was not only done for the OpenSSL library, but for all the other libraries that were used by OpenSIPS. And since those libraries most likely do not use OpenSIPS managed memory, this might introduce bugs in other libraries – therefore we had to drop this solution.
The second attempt to fix this issue came from inspecting the stack trace of the crashes, combined with vitalikvoip‘s suggestion, which were indicating that the problem was within the pseudo random-number generator (RAND_DRBG_bytes()). Therefore we proceeded by using the RAND_set_rand_method() hooks to guard the process of random numbers generators. Although this stopped the crashes, connections were still not properly accepted (again, 8 out of 10 were rejected), so we were back to square one.
Since the problem was not sorted out, we started to dig more into OpenSSL thread safety considerations and discussions (see OpenSSL ticket #2165), and try to understand how these translate to process safety. These made us wonder if it is OK to have a SSL_CTX (the context that manages what certificates, ciphers and other settings are to be used for new connections) shared among all processes. Therefore our next attempt to fix this issue was to duplicate the context (not the connection context, but the global context of SSL) in each process, and use each process’ context to create new connections. And Voillà, OpenSIPS started to accept all the connections, without any issues!
After running a set of tests, both by us and our community, we concluded that the issue was the fact that the global SSL context was shared among OpenSIPS processes. Unfortunately this was not a diagnose that we could have come up with easily, due to the fact that this was working just fine up until version 1.1.1, and there were no indications in the OpenSSL documentation that this behavior has changed. Hence, the long-term process of solving this issue.
As described throughout the article, running OpenSSL in a multi-process environment, with a context that is shared among multiple processes, is definitely doable. However, without support from the library itself (such as offering locking and memory allocations hooks and providing exhaustive documentation), it becomes more and more complicated to maintain the current implementation. That’s why in the future we are are planning to look into different alternatives for TLS (i.e. more multi-process friendly libraries).
But until then, you can use OpenSIPS with the latest OpenSSL TLS implementation without any issues!
Many thanks to vitalikvoip and danpascu for their valuable input on the latest matters, as well as to the whole OpenSIPS core team for all the brainstorming sessions for these issues (and not only :)). Although they were not easy to solve, it was definitely a lot of fun dealing with them.
If you want to find out more information regarding this topic (and not only), make sure you do not miss this year’s OpenSIPS Summit on 5th-8th May 2020, in Amsterdam, Netherlands.