What are our users really talking about all the time? Let’s find out!
RTPEngine is a proxy for RTP traffic and other UDP based media for VoIP and webRTC., meant to be used in OpenSIPS and other proxies as a drop-in replacement for rtpproxy with many advanced features, including:
- webRTC support as ICE and SRTP
- Bridging between IPv4 and IPv6 user agents
- RTP/RTCP statistics reporting
- In-kernel packet forwarding for low-latency and low-CPU performance
When used in combination with the OpenSIPS rtpengine module several additional features are provided, such as SDP parsing and rewriting, ICE support, SRTP support, HEP support to report RTP/RTCP statistics and RTP Call Recording.
Today’s Dish is about Recording of RTP Media Streams using OpenSIPS 2.x and extracting the call media for post processing, such as Speech-to-Text conversion.
WARNING: This demo is NOT suitable for mass production as-is, and only demonstrates a feature!
Demo Requirements:
- OpenSIPS 2.x + rtpengine module
- RTP:Engine 6.x + recording daemon
- RTPEngine-Speech2Text nodejs demo w/ Bing Speech API
OpenSIPS Setup
OpenSIPS 2.2-2.3 should be installed according to the official documentation before attempting to run this demo configuration.
A fully blown working OpenSIPS + RTPEngine configuration is available as reference.
OpenSIPS Modules
The rtpengine module should be loaded and configured in our configuration:
#### RTPengine protocol loadmodule "rtpengine.so" modparam("rtpengine", "rtpengine_sock", "udp:127.0.0.1:60000")
OpenSIPS Route
Call Recording for specific sessions can be selectively triggered by using the dedicated rtpengine_start_recording() function within the selected route:
route[record] { if (!is_method("INVITE") || !has_body("application/sdp")) return; rtpengine_offer("$var(rtpengine_flags)"); rtpengine_start_recording(); }
RTPEngine Setup
RTPEngine and its kernel modules should be installed by following the official instructions for the target system before attempting to run this demo configuration.
First and foremost, do make sure the xt_RTPENGINE kernel module is installed and loaded – without this module, the recording features will fail to initialize:
depmod -a modprobe xt_RTPENGINE lsmod | grep RTPENGINE
Next, create a dedicated writable folder required to store the recorded files:
mkdir /recording
In order for the recording component to engage with the RTPEngine kernel table, the following configuration should be placed in /etc/rtpengine/rtpengine-recording.conf
[rtpengine-recording] table = 0 output-format = wav resample-to = 16000 mp3-bitrate = 44100 output-mixed = 1 log-level = 1 spool-dir = /recording output-dir = /recording
We’re almost there – The next step is launching our daemons in tandem:
# Starting RTPEngine process w/ Kernel Table 0 rtpengine -p /var/run/rtpengine.pid --interface=10.0.0.10!192.168.0.10 -n 127.0.0.1:60000 -c 127.0.0.1:60001 -m 20000 -M 30000 -E -L 7 \ --recording-method=proc \ --recording-dir=/recording \ --table=0 # Starting RTPEngine Recording process w/ Kernel Table 0 rtpengine-recording --config-file=/etc/rtpengine/rtpengine-recording.conf
If setup was successful, a new node will appear in /proc/rtpengine/0
Let’s Register a device and make a test call – “hello, hello, test one, two, three, goodbye!”
A Decoded WAV Recordings for the completed call should be stored in /recording
RTPEngine-Speech2Text
This last bonus section will do some magic and leverage the RTPEngine recording functionality to attempt Speech-to-Text transcription of call audio, shipping any result as HEP Logs to a collector for later retrieval and correlation for fun and profit.
Requirements:
- A working HOMER or HEPIC setup.
- A Free API key for the Bing Speech API will be required before proceeding!
In order to complete our demo, we’ll install our nodejs sample application using npm:
git clone https://github.com/lmangani/RTPEngine-Speech2Text cd RTPEngine-Speech2Text npm install
- Fill in the BING API KEY and HEP Server details in
config.js
Time to start our demo application:
nodejs speech2hep.js
Next, perform a short duration call – if possible, with two different speakers – and hangup.
As soon as RTP:Engine detects a Call termination, the .meta file will be removed and the Speech-to-Text conversion spooling process will be triggered automatically:
File /recording/6f9db20deb1a9871-ce2fa1345463393b-mix.wav has been added File /recording/6f9db20deb1a9871-ce2fa1345463393b.meta has been removed Meta Hit! Seeking Audio at: /recording/6f9db20deb1a9871-ce2fa1345463393b-mix.wav Bing service started Response { RecognitionStatus: 'Success', DisplayText: 'Hello' } Sending HEP... Response { RecognitionStatus: 'Success', DisplayText: 'Hey how are you' } Sending HEP...
That’s all – If you correctly configured your HEP Settings, the same log should appear in correlation with the originating SIP session the media was attached to. Just like Magic!
Enjoyed this Post? Please consider joining us and many others at the OpenSIPS Summit in Amsterdam to discuss, invent and learn!
NOTE: OpenSIPS and QXIP/SIPCAPTURE are strong supporters of communication privacy and do not endorse any mass interception techniques using our software, or other tools. The techniques described in this article are generic and for educational purposes only.