Alberto Gonzalez Trastoy, was among the speakers at Agora’s Real-Time Engagement 2020 Conference. His presentation was about what makes building a live video application more complicated than a regular web app. Isn’t WebRTC supposed to handle everything for you? Alberto describes some of the unexpected nuances and challenges a web developer may encounter building real-time engagement and communications applications. This includes networking, interoperability, scalability and security. He also discusses other complexities in building WebRTC applications and offers tools and alternatives to solve them.
3. Open source project
Real Time Communications framework
Secure
Updated frequently
Used in many major platforms and applications
Available on all modern browsers and native clients
WebRTC Basics
4. “This app doesn’t work on iPhone”
“Hi, hello! Can you hear me? I can hear you, but you can’t hear
me”
“Yes I can see you, but video looks blurry”
“My microphone is not working, wait a second, I will restart my
computer”
“I can’t connect, I think it is because I have very slow internet”
But mistakes can cause user responses like…
9. Some open Source Alternatives:Some commercial alternatives:
Scalability
10. Networking issues: Restrictive
Review your checklist, monitor and
NAT Traversal
Proxy and firewall rules
- Proxy authentication from clients required?
- Proxy blocking access to IP addresses
- Firewall rules
- NAT
Solution
11. Networking issues: Congested
Network congestion
- Too many hosts in a local network
- Low bandwidth
- Interference from outside sources or
faulty cabling
WebRTC has error resilience mechanisms
but there is a limit. In that case, optimize,
monitor and keep track of the logs
Solution
12. Video Room Type Minimal Available Bandwidth
required (at the client side)
8 participants video room with Lo-res
Video (240x180) + HD Audio
~2 Mbps
8 participants video room with SD Video
(640x480) + HD Audio
~8 Mbps
8 participants video room with HD Video
(1280x720) + HD Audio
~22 Mbps
WebRTC Video Bandwidth Requirements
13. How To Overcome Those Limits?
• Minimize the number of videos the client subscribes to
• Use VP8 Simulcast for large conferences or broadcasting*
• Minimize video resolution and frame-rate
• Optimize based on device type
• Keep audio as first-class citizen
*With codecs we always need to compromise. If most users use Safari and have good internet connection, H264 codec
might be the way to go
14. App Optimization Example
Layout example where the main speaker appears in the pink/red
square and the other participants appear at the right
17. WebRTC E2EME Solution
Using Insertable Streams API to use secure frames mechanism
Demo available here: https://webrtc.github.io/samples/src/content/peerconnection/endtoend-encryption/ (experimental)
18. Web Testing vs WebRTC Testing
Both need functional testing
Compatibility testing ≠ Interoperability testing
N x Performance testing
19. WebRTC Testing and Debugging Tools
And other testing and debugging proprietary applications…
Testing Debugging
Chrome WebRTC Internals
20. “This app doesn’t work on iPhone”, said someone using chrome on iPhone
“Hi, hello! Can you hear me? I can hear you, but you can’t hear me”, said someone failing to accept
or blocking access to the microphone
“Yes I can see you, but video looks blurry”, said someone using a video app that doesn’t use SVC
or simulcast
“My microphone is not working, wait a second, I will restart my computer”, said someone with faulty
headphones
“I can’t connect, I think it is because I have very slow internet”, said someone about an app that
doesn’t prioritize audio and optimize available bandwidth
Back To The User Issues
I wanted to introduce this talk with an accurate representation working on RTC apps:
From a naïve version of me 5 years ago to a more experience one.
During that process I discovered how WebRTC is not just another browser API, it has its own community of experts
Today after dozens of apps built for different use cases, and more now to help interacting in a global pandemic.
So why we use WebRTC, well it is the open source standard to go for low latency streaming!
It does sound familiar. And I left out the “I can’t hear you. You are muted” because it is more of a UI/UX thing
But those are user problems that can happen due to mistakes in the implementation. An implementation that doesn’t lack challenges…
WebRTC is build to be easy to use but it is also different from using any other browser API.
This is because it converges hardware. telephony and software. What makes building a live video application more complicated than a regular web app?
Interoperability, which just simply refers to “How good some devices interact with each other”
Interoperability, which just simply refers to “How good some devices interact with each other” is a common challenge.
Basic one to one communication using WebRTC works in most desktop and mobile scenarios.
But some more advanced features like screensharing or managing multiple peer connections are also supported by most. But….Not all
In some situations hardware and OS system also plays a role limiting some functionalities.
Cameras and microphones are not equal in each device and are a common source of user problems.
Handling those errors properly will be key for a good user experience
(Debugging some of this types of issues might require of debugging tools like WebRTC internals or wireshark.)
Browser or OS! The main functions are working on all major browsers. Being safari and edge the latest to support WebRTC
But there are still some general interoperability issues:
Different codec preferences for each browser
Older browser versions with specific bugs. Major browser upgrades WebRTC in edge . Edge was rebuilt on top of Chromium makes it easier now
Others: Safari
-Screenshare
-Safari WebRTC on mac* using h264
-1-1 audio/video call, the integration with this major browsers is quite easy, the problems start to appear in more complex scenarios….
iOS implementation has some bugs/restrictions
Forget about using other browsers other than Safari
Some restrictions on autoplay rules (Guide to Safari webrtc in WebRTC Hacks has some very useful info)
Safari iOS is not ready for WebRTC screen sharing
In a recent project for many to many video proctoring and additional one to one calls we encountered all those issues. For example, if you want to send more than one media stream then the previous video/audio is muted
Since Edge was rebuilt on top of Chromium, having MS Edge using working consistently using WebRTC is not a struggle anymore, example using a multiparty WebRTC app
Firefox, is also in sync with the WebRTC implementation and there aren’t any major differences between Chromium and Firefox that I am aware of today
Scalability doesn’t lack it’s challenges. Mesh video call doesn’t work well beyond 4-5 participants… (CPU/BW). We need media servers for:
1) Scalability, Multiple Participants in a Video Call (helps reduce the number of streams a client needs to send,usually to one)
2) Integration with Other Communication Technologies (PSTN via SIP trunking or streaming through RTMP to services)
3) Processing of Media Streams (processing of video and audio streams at a very low level, like being able to run computer vision models)
Server can handle hundreds of media streams, but limit to vertical scaling. Horizontal scaling with geolocation is a common approach for production rtc apps. To scale media servers horizontally one common approach is to build a dispatcher distributing requests from participants to different media servers. Slightly more advanced, doing geographical cascading, which can reduce latency between participants in different regions by letting each participant send and receive video from closest media server
New codecs and standards like SVC (Scalable Video Coding) are helping to scale from the client side to send better quality at lower bitrates and the right quality for each participant
But of course there are some limitations if we compare with VOD…
OSS:
Jitsi SFU and implements ints own signaling using Jingle(XMPP)
Janus general purpose WebRTC server that can be setup as an SFU. Plugin architecture: SIP Gateway, VP9-SVC Video Room, live streaming…
Kurento can also be configured to function as SFU or MCU, or both, in a single instance. OpenVidu, a new platform to facilitate the use of Kurento functionalities from a higher-level client in your web or mobile applications
We have worked with all of them for production projects or, at least, demos.
Also, there are other popular platforms that weren’t originally developed to be WebRTC media servers but have WebRTC media server capabilities:
Asterisk, FreeSwitch: Mostly used in telephony applications it also supports WebRTC and it is frequently used in conjunction with JsSIP or SIP.js
Pion: New stack for Web Real-Time Communications. Pion is built on Go and allows developers to use the WebRTC stack as small pieces of lego. Can be used to build a SFU
CPaaS
Will scale probably to millions of connections without you having to handle the distribution between servers/maintenance or geolocation. You just need to use their SDK and you are good to just focus on the client solution
Checklist of proxy and firewall rules:
-TCP Ports like 443 should be allowed
-UDP ports used for RTP connection 1025-65535 should be open too. If not at least UDP 3478 for TURN
-Persistent WSS should be allowed for the signaling
-NAT essentially hides a home or office's internal network from the public internet
(Tech note) NAT Traversal:
Nat traversal (reaching client IP address hidden by NAT) is achieved using the WebRTC build in ICE gathering (protocols are STUN and, as a last resort, TURN (less than 1/3 of calls need that but chances are that you will need it if you are in a restrictive network))
But still you will need a TURN server to skip network limitations. You can deploy it yourself using coturn or use some 3rd party provider (CPaaS will handle this for you)
Monitoring is usually built in some CPaaS but there are also some 3rd party platforms like callstats that handle it.
(Or you can build it yourself strong webrtc errors in a logging database)
----
More on NAT traversal:
Clients are typically situated on networks designed to protect them from public requests and may not have a public IP address => this often introduce complicated hurdles.
Connecting to a simple web server is as easy as making an HTTP request VS WebRTC needs to use ICE, which provides a multitude of connection types, each of which may be tried in order to establish a successful connection.
It in your network can be caused by faulty cabling, interference from outside sources or as the result of a collision.
Also, too many hosts in a domain or not enough bandwidth (internet pipe size) can generate congestion and overload the network
Network congestion => High error rate/packet loss & Might cause low quality media
How to optimize?
First, you’d need to know your use case and architecture. Is it a webinar, video chat, panel? And from there measure a minimal available bandwidth required
As an example here I calculated an 8 party video room using video bitrate estimations based on resolution (at 24fps). Assuming a SFU media server (one video uplink and 7 downlink)
And although it will change depending on the codec used (VP9 better, AV1 even better) that’s the idea…
Since not everyone has 22Mbps available how can we handle HD quality? How do we overcome these limits?
Collaboration/presentation use cases might not need to display all the participants in a grid. We can show the dominant speaker and the rest as thumbnails
Collaboration/presentation use cases might not need to display all the participants in a grid. We can show the dominant speaker and the rest as thumbnails
Also mobile phones have less CPU so if you want to keep the best experience for mobile keep the number of displayed videos small
Encrypted end-to-end.
Core protocols defined by the IETF for providing WebRTC security: SRTP for media traffic and DTLS-SRTP for key negotiation
This is an ideal scenario that gets more complicated if we need to support multiparty with media servers in between
we have an intermediate participant, the media server, which would decrypt and re-encrypt the media. Obviously, that’s not great if you don’t trust the media server
Media streams are temporarily decrypted within the cloud servers and then immediately re-encrypted before being sent through the internet to the subscribing client. This decryption is necessary for managing group calls, other types of media exchange, intelligent quality control, and session recording
E2EE with insertable streams demo from webrtc-samples where Middlebox represents what the media server would see. Insertable Streams is not supported by default in Chrome yet, so you might need to enable that in chrome://flags in Canary.
Kudos to cosmoSoftware, google and the rest of the open source community building this encryption mechanism. Called Secure Frames
Using Insertable Streams API
Compatibility testing on a basic web app would be mainly focused on display in different devices and resolutions. Different operating systems display certain app elements differently.
VS
Interoperability testing needs, IN ADDITION to test for compatibility between different browsers and OS. How the RTC communication behaves, codecs used, etc.
N times what a basic web app would need…
For performance, while basic web apps will focus on the single page load, CPU usage of the server and so on. For RTC, and in specific, WebRTC, you will need to test the bandwidth limitations when sending and receiving media with different number of participants.
Also stress testing is easy to do for basic web apps, just open new tabs. But for WebRTC, you can reach the client bandwidth and CPU limit quick, so you will need multiple devices or VMs to properly stress test, a device isn’t enough.
For testing and debugging network or interoperability challenges mentioned before using KITE webrtc specific selenium based framework will help identify problems in your app
Some proprietary apps that we have used are BrowserStack or testRTC for testing or callstats for monitoring/debugging
For debugging WebRTC internals it is a quick way to identify WebRTC problems: it can be used to debug the flow of WebRTC sessions to determine issues during development
Wireshark will be a more advanced alternative to get more granularity, down to seeing the packets one by one
KITE for interoperability testing which uses selenium to launch browsers to check if video is sent or receives, and also goes into other details such as if the ICE gathering was successful.
In this image we are testing with 4 browsers, for testing with Safari you need to have a Safari device or VM.
WebRTC internals: it can be used to debug the flow of WebRTC sessions to determine issues during development
For example we can see here the outbound video and audio streams. And Video stopped being sent after a few seconds (could be user or pli packets stopped due to hardware?)
Back to the user issues, now, based on what I explained we could guess what could have been the problem for each user…
Because applications today have a high standard and things are supposed to work, always. I hope you learned and won’t make the same mistakes I did in the past.