The presentation will show how to integrate Artificial Intelligence world with WebRTC using the Janus WebRTC Server. Taking advantage of a modular architecture, a Janus instance was configured to provide the handled media to a *Stream Elaborator Network* as a low-latency RTP stream, and to receive back information on the processed media through a data channel, everything in a seamless way.
The presentation will explore how we built a tool that can be specialized to execute different Computer Vision tasks, from the architecture till the code.
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
Talk@JanusCon2019: Janus, WebRTC and ML - Fantastic technologies and how to mix them!
1. Janus, WebRTC & AI
Fantastic Technologies and How to Mix Them!
Paolo Saviano
Full Stack developer @Meetecho
psaviano@meetecho.com
2. WebRTC and Artificial Intelligence?
● Absolutely fascinating matter
Real Time Computer Vision & Sound Recognition
Data collection and live elaboration
Broadcasting experience customized by users
● A lot of scenarios with different requirements
Massive broadcasting, small videorooms, device networks
Device constraints, Time constraints…
Must choose!
● Make it simple!
Remove constraints
Explore the results
2
3. What we wanted from this project
● Handle media in a comfortable way
Receive and elaborate the Janus RTP stream without (too much) manipulation
● Minimize the client side effort
Avoid client-side elaboration
Do not overload the client’s bandwidth
● Something generic
Make an API/SDK/WHATEVER intended to fit as many scenarios as possible
● Play in the home pitch using known languages and environments
3
4. ● Handle media in a comfortable way
Receive and elaborate the Janus RTP stream without (too much) manipulation
● Minimize the client side effort
Avoid client-side elaboration
Do not overload the client’s bandwidth
● Something generic
Make an API/SDK/WHATEVER intended to fit as many scenarios as possible
● Play in the home pitch using known languages and environments
What we wanted from this project
3
ffmpeg
5. ● Handle media in a comfortable way
Receive and elaborate the Janus RTP stream without (too much) manipulation
● Minimize the client side effort
Avoid client-side elaboration
Do not overload the client’s bandwidth
● Something generic
Make an API/SDK/WHATEVER intended to fit as many scenarios as possible
● Play in the home pitch using known languages and environments
What we wanted from this project
3
server-side
ffmpeg
6. ● Handle media in a comfortable way
Receive and elaborate the Janus RTP stream without (too much) manipulation
● Minimize the client side effort
Avoid client-side elaboration
Do not overload the client’s bandwidth
● Something generic
Make an API/SDK/WHATEVER intended to fit as many scenarios as possible
● Play in the home pitch using known languages and environments
What we wanted from this project
3
server-side
modularity :)
ffmpeg
7. ● Handle media in a comfortable way
Receive and elaborate the Janus RTP stream without (too much) manipulation
● Minimize the client side effort
Avoid client-side elaboration
Do not overload the client’s bandwidth
● Something generic
Make an API/SDK/WHATEVER intended to fit as many scenarios as possible
● Play in the home pitch using known languages and environments
What we wanted from this project
JavaScript-ish
modularity :)
3
server-side
ffmpeg
8. Technologies Recap
● Media Capture (WebRTC)
Janus WebRTC Server (http://github.com/meetecho/janus-gateway)
● Language
TypeScript (https://www.typescriptlang.org)
● Server
NodeJS (https://nodejs.org)
● Media elaboration and Sampling
fluent-ffmpeg (https://github.com/fluent-ffmpeg/node-fluent-ffmpeg)
4
9. Architecture
● The Janus Videoroom Plugin
WebRTC entrypoint for media producers
Forwards received stream as RTP stream WebRTC
Client
Janus VideoRoom
5
10. Architecture
● The Janus Videoroom Plugin
WebRTC entrypoint for media producers
Forwards received stream as RTP stream
● The Stream Elaborator
Elaborates received RTP streams
Returns elaboration results as UDP messages
5
RTP
WebRTC
Client
Janus VideoRoom
11. Architecture
● The Janus Videoroom Plugin
WebRTC entrypoint for media producers
Forwards received stream as RTP stream
● The Stream Elaborator
Elaborates received RTP streams
Returns elaboration results as UDP messages
● The Janus Streaming Plugin
WebRTC entrypoint for media receivers
Forwards received streams (RTP & UDP)
to WebRTC clients through media stream
or datachannels
RTP
RTP
UDP
Janus streaming
WebRTC
Client
WebRTC
Client
5
Janus VideoRoom
12. Breaking up the flow
Receive
media
elaboration
Handle
Results
6
13. elaboration
Breaking up the flow
Receive
media
Handle
Results
● Deal with different media
Handle both audio and video RTP streams
● Divergent options for each kind of media
Customizable listening port, video or audio codecs, resolution or sampling…
● Prepare and push data to the elaboration step
6
14. elaboration
Breaking up the flow
Receive
media
Handle
Results
● Execute the elaboration
Use a local library to achieve the results and propagate them
Use an external cognitive service
● Transform received data
Fit the machine learning model expected input
Optionally preprocess the data for features extraction and normalization
6
15. elaboration
Breaking up the flow
Receive
media
Handle
Results
● Define what we need to do
Send results back to the Janus Streaming plugin or save them locally
● Postprocess the results
Marshall the results in the format required for the usage, if needed
6
16. Breaking up the flow
Receive
media
elaboration
Handle
Results
● Common tasks
Bootstrapping, Configuration and Tearing down tasks.
● Common ability
Controllability, observability, isolation, replicability.
6
17. Breaking up the flow
Receive
media
elaboration
Handle
Results
Input Output
I.a.
Module
6
18. Breaking up the flow
Receive
media
elaboration
Handle
Results
Input Output
Channels
I.a.
Module
6
19. Breaking up the flow
Receive
media
elaboration
Handle
Results
Input Output
I.a.
Module
Channels
Producer Consumer/Producer Consumer
6
20. Module
Input
Breaking up the flow
Receive
media
elaboration
Handle
Results
Output
Janus
Janus
Receiver Sender
I.a.
Module
6
23. The Producer/Consumer Interfaces
9
export interface Producer {
outputChannel: Channel;
propagate(result: Detection): void;
}
export interface Consumer {
inputChannel: Channel;
elaborate(chunk: Buffer): void;
}
● Nodes relationship and runtime
No references to setup, teardown and internal state
The elaborate method will be triggered each time a new data is available
Two nodes are connected when their input/output channels are pipelined
24. Channels
10
● Transform class extended
It’s a type of Node Stream object
Transfers a Buffer in a One-Way flow
Could register a transformation function and save a state between transformation
● Provides utility methods
type Transformation =
(data: Buffer, ref?: any) => Promise<Buffer|Buffer[]>;
25. The Sender/Receiver/Module Interfaces
11
● Lifecycle of the node
Invoked automatically
Have to be implemented by developers, if needed
Should handle how the node will use external libraries
export interface Module {
moduleStart(): void;
modulePrepare(...options: any[]): void;
moduleStop(): void;
}
27. ● Three kind of implemented supernodes
Shown interfaces are already implemented in Input, Elaboration and Output supernodes
Supernodes
13
abstract class Input<T extends Configuration>
implements Node, Receiver, Producer
abstract class Output<T extends Configuration>
implements Node, Sender, Consumer
abstract class Elaborator<T extends Configuration>
implements Node, Module, Producer, Consumer
28. Configuration
14
● An abstract Configuration class
An abstract class with utility methods only
● Inherited classes requirements
Translate the json in a simple class that contains properties we are looking for
Optionally add some extra get/set methods or additional utility functions
● The LoadConfiguration function
A static function capable of loading configuration properties from a json file
LoadConfiguration<T extends Configuration>
(source: string, constructor: new () => T): T
30. Extending the Input Node
16
● Bootstrap the ffmpeg object
Use the sourcePrepare method to initialize a ffmpeg instance with fluent-ffmpeg
Add listeners for all event
● Configure the node
Specialize the ffmpeg pipe with the options declared into the Configuration class
Compose the right SDP in order to receive RTP stream correctly
● Forward the data
Invoking the sourceStart the ffmepg output is streamed to the outputChannel directly
The audio node will forward chunks of 640Kb (20ms)
The video node will decode and serve video as pictures
32. Extending the Elaborator Node
18
● No boundaries
This kind of supernode could execute a very large variety of tasks
Just pay attention to the time required by the execution
● Use an external service
I.E. Azure, Google, AWS cognitive services
Put in place a third party SDK in the modulePrepare function
The elaborate method will implement data dispatch to the remote service
Then wait for the async results and push them downstream
● Use higher-level library
Demand the interaction with machine learning models to the library internal code
Wrap the API into the above methods
33. Extending the Elaborator Node
19
elaborate = (chunk: Buffer): void => {
var data = tf.tensor3d(new Int32Array(chunk.buffer), [...]);
this.model.estimateSinglePose(data).then((estimation) => {
const res = new PosenetDetection(estimation);
this.propagate(res);
});
};
● TensorflowJS
Load the model into modulePrepare
Apply a data transformation into the inputChannel to serve data as a Tensor
Do the prediction in the elaborate function
Push results using the propagate method
34. Extending the Output Node
20
● Stream results back to the Janus Streaming Plugin
When sinkPrepare is invoked, a new UDP socket is created
When data is received, the elaborate function will simply send data to Janus
● Too large data?
Define a transformation function in inputChannel capable of splitting data into chunks
35. Public API - Build the Network
21
● An object to rule them all
The NetAPI object keeps track of allocated nodes in a Map
A Controller object capable to send message on the controlChannel
● A factory pattern
Allocate a new node by passing its class and the configuration file path
Inject into the constructor a list of options, like parents list
Automatically wire the nodes to its parents and call its prepare method
factory = <T extends Node>
(layer: new (ID: string, configurationPath: string, ...extra: any[]) => T,
configurationPath: string,
constructorOptions: NodeConstructorOptions = { parents: [] }) : T
37. Public API - Control the Network
23
● Controlling a node
A status message is sent by the controller over the shared controlChannel
The recipient of this message is written into the message
Only the recipient will change its status, the message is dropped by the others
The state of each node is retained both from the controller and the node itself
● Start and stop a node
When a node change its status, one among its start or stop methods is invoked
NetAPI.activate("input_video_1",
"module_posenet_1",
"output_udp_1");
42. What’s Next?
● Measurements
Provide the capability to evaluate delays introduced by elaboration
● Work directly on Stream
Allow video and audio elaboration live
● Online Training
Use the controlChannel in order to dispatch rewards
● Other modules, other languages
Increase the numbers of nodes available out-of-the-box
● Improve API
Create a network dynamically by feeding a file to API, or by using a graphic tool 28