Audio Recording and Speech Detection Experiments with OpenSIPS


What are our users really talking about all the time? Let’s find out!

RTPEngine is a proxy for RTP traffic and other UDP based media for VoIP and webRTC., meant to be used in OpenSIPS and other proxies as a drop-in replacement for rtpproxy with many advanced features, including:

  • webRTC support as ICE and SRTP
  • Bridging between IPv4 and IPv6 user agents
  • RTP/RTCP statistics reporting
  • In-kernel packet forwarding for low-latency and low-CPU performance

When used in combination with the OpenSIPS rtpengine module several additional features are provided, such as SDP parsing and rewriting, ICE support, SRTP support, HEP support to report RTP/RTCP statistics and RTP Call Recording.

Today’s Dish is about Recording of RTP Media Streams using OpenSIPS 2.x and extracting the call media for post processing, such as Speech-to-Text conversion.



WARNING: This demo is NOT suitable for mass production as-is, and only demonstrates a feature!

Demo Requirements:

  • OpenSIPS 2.x + rtpengine module
  • RTP:Engine 6.x + recording daemon
  • RTPEngine-Speech2Text nodejs demo w/ Bing Speech API

OpenSIPS Setup

OpenSIPS 2.2-2.3 should be installed according to the official documentation before attempting to run this demo configuration.

A fully blown working OpenSIPS + RTPEngine configuration is available as reference.

OpenSIPS Modules

The rtpengine module should be loaded and configured in our configuration:

#### RTPengine protocol
loadmodule ""
modparam("rtpengine", "rtpengine_sock", "udp:")

OpenSIPS Route

Call Recording for specific sessions can be selectively triggered by using the dedicated rtpengine_start_recording()  function within the selected route:

route[record] {
   if (!is_method("INVITE") || !has_body("application/sdp")) return;


RTPEngine Setup

RTPEngine and its kernel modules should be installed by following the official instructions for the target system before attempting to run this demo configuration.

First and foremost, do make sure the xt_RTPENGINE kernel module is installed and loaded – without this module, the recording features will fail to initialize:

depmod -a
modprobe xt_RTPENGINE
lsmod | grep RTPENGINE

Next, create a dedicated writable folder required to store the recorded files:

mkdir /recording

In order for the recording component to engage with the RTPEngine kernel table, the following configuration should be placed in /etc/rtpengine/rtpengine-recording.conf

table = 0
output-format = wav
resample-to = 16000
mp3-bitrate = 44100
output-mixed = 1
log-level = 1
spool-dir = /recording
output-dir = /recording

We’re almost there – The next step is launching our daemons in tandem:

# Starting RTPEngine process w/ Kernel Table 0
rtpengine -p /var/run/ --interface=! -n -c -m 20000 -M 30000 -E -L 7 \
 --recording-method=proc \
 --recording-dir=/recording \

# Starting RTPEngine Recording process w/ Kernel Table 0
rtpengine-recording --config-file=/etc/rtpengine/rtpengine-recording.conf

If setup was successful, a new node will appear in /proc/rtpengine/0

Let’s Register a device and make a test call – “hello, hello, test one, two, three, goodbye!”

A Decoded WAV Recordings for the completed call should be stored in /recording




This last bonus section will do some magic and leverage the RTPEngine recording functionality to attempt Speech-to-Text transcription of call audio, shipping any result as HEP Logs to a collector for later retrieval and correlation for fun and profit.


  • A working HOMER or HEPIC setup.
  • A Free API key for the Bing Speech API will be required before proceeding!

In order to complete our demo, we’ll install our nodejs sample application using npm:

git clone
cd RTPEngine-Speech2Text
npm install
  • Fill in the BING API KEY and HEP Server details in config.js

Time to start our demo application:

nodejs speech2hep.js

Next, perform a short duration call – if possible, with two different speakers – and hangup.

As soon as RTP:Engine detects a Call termination, the .meta file will be removed and the Speech-to-Text conversion spooling process will be triggered automatically:

File /recording/6f9db20deb1a9871-ce2fa1345463393b-mix.wav has been added
File /recording/6f9db20deb1a9871-ce2fa1345463393b.meta has been removed
Meta Hit! Seeking Audio at: /recording/6f9db20deb1a9871-ce2fa1345463393b-mix.wav
Bing service started
Response { RecognitionStatus: 'Success',
 DisplayText: 'Hello' }
Sending HEP...
Response { RecognitionStatus: 'Success',
 DisplayText: 'Hey how are you' }
Sending HEP...

That’s all – If you correctly configured your HEP Settings, the same log should appear in correlation with the originating SIP session the media was attached to. Just like Magic!


Enjoyed this Post? Please consider joining us and many others at the OpenSIPS Summit in Amsterdam to discuss, invent and learn!


NOTE: OpenSIPS and QXIP/SIPCAPTURE are strong supporters of communication privacy and do not endorse any mass interception techniques using our software, or other tools. The techniques described in this article are generic and for educational purposes only.


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s