Challenges exist with media transformation into Visual Cloud services and the flexibility to migrate those services to new HW platforms. Learn how Intel and partners are solving these challenges with highly optimized cloud native media processing, media analytics, and graphics/rendering components to quickly and easily deliver end-to-end visual cloud services with scalable open source software. Two visual cloud services around media delivery and media analytics will be demonstrated to showcase how to enable faster time to market for innovative “new media” services.
6. Accelerating Services innovation for
visual cloudOpen Visual Cloud Project
6
FOR MORE INFORMATION VISIT: https://01.org/openvisualcloud
* Targeted for open source in 2H’2019
FFMpeg, GStreamer, TensorFlow*, MXNet*, Caffe*, OpenCL*
Intel® OpenVINO™ Toolkit Intel® Rendering Framework
7. *Other names and brands may be claimed as the property of others.
SOFTWARE: CONVERGE THE
WORKLOADS
7
1. Proven Dataplane acceleration technologies in
network platforms
2. Integration of analytics, media, and
networking SW technologies to
ease developer adoption and programmability
3. Leveraging and contributing to industry
standard interfaces and open
source software
USE RICH AND FLEXIBLE SOFTWARE FRAMEWORKS FOR
FASTER CUSTOMER SOLUTION READINESS & DEPLOYMENTS
NETWORK PLATFORMS
APPLICATIONWORKLOADCONVERGENCE
Industry Standard Interfaces for
Efficient, Programmable, Scalable Data Plane
(e.g. DPDK, Open vSwitch)
Application & Service Orchestration/virtualization
Intel® Xeon™
Processors
Intel® Atom™
Processors
Intel®
FPGA
Intel® Ethernet
Controller
Open Visual
Cloud
Network Edge
SW
RAN SW
(e.g. ADK,
FlexRAN)
Network
Functions
(e.g. CDN,
EPC)
Intel®
Movidius
™ VPU
Intel®
core™
Processors
Developer Edge Frameworks (eg. AWS*, Azure*,Baidu*, Alibaba*)
Services (IOT Verticals, Comms, Cloud, Enterprises)
Intel®
Optane™ DC
Persistent
Memory
10. Prepare Use case Focused
Software Stack
Build in Cloud/Local
git clone https://github.com/OpenVisualCloud/Dockerfiles
cd Dockerfiles
mkdir build
cd build
cmake ..
cd Xeon/centos-7.6/ffmpeg
make
Use Case Image Name Platform OS
Media ffmpeg Intel® Xeon
Intel® Xeon E3
Intel® VCA2
Ubuntu* 16.04
Ubuntu 18.04
CentOS* 7.4
CentOS 7.5
CentOS 7.6
gst
nginx
Analytics ffmpeg
gst
Graphics ospray
ospary-mpi
11. • SVT introduces novel standard, codec
agnostic architectural features and algorithm to
develop optimized encoder.
• Developed to increase the scalability of the
core encoder and improve its tradeoffs
between performance and visual quality.
• Main Architectural features:
o Human Visual System (HVS)-optimized
classification
o Resource adaptive scalability.
o Three Dimensional Parallelism
Integrate Powerful Ingredients – Scalable
VIDEO TECHNOLOGY (SVT)
11
https://01.org/svt
13. Encode with SVT-HEVC
13
Exercise 2: Encode With SVT-HEVC
Use SVT encoder app:
cd home
SvtHevcEncApp -i travel6.yuv -w 1920 -h 1080 -b travel_hevc.ivf .
Use FFmpeg:
cd home
ffmpeg -i travel6.mp4 -c:v libsvt_hevc -y travel6_hevc.mp4
ffprobe -v error -show_streams travel6_hevc.mp4
14. Encode with SVT-AV1
14
Use SVT encoding app:
cd home
SvtAv1EncApp -i travel6.yuv -w 1920 -h 1080 -b travel6_av1.ivf .
Use FFmpeg:
cd home
ffmpeg -i travel6.mp4 -c:v libsvt_av1 -y travel6_av1.mp4
ffprobe -v error -show_streams travel6_av1.mp4
Exercise 3: Encode With SVT-AV1
15. 15
Review Question 1
What are the 5 major services in Open Visual Cloud?
Media Creation and Delivery
Media Analytics
Immersive Media
Cloud Gaming
Cloud Graphics
16. 16
Review Question 2
What are the 4 core building blocks in the Open Visual Cloud?
Encode
Inference
Decode
Render
17. REVIEW QUESTION 3
What all codecs does Intel support today under Scalable Video Technology(SVT)
Architecture?
AV1
HEVC
VP9
17
24. 24
Try Open Visual Cloud at https://github.com/OpenVisualCloud.
Post your Open Visual Cloud demos and projects to Developer Mesh
and apply to be an Intel Innovator! https://devmesh.intel.com
Participate by submitting feedbacks, bugs, and feature requests.
Contribute to the Open Visual Cloud development.
Learn more at https://01.org/openvisualcloud
Call for Action
Media is undergoing a rapid evolution. It’s no longer about experiencing streamed content over the television in your living room. The content is becoming richer and much more interactive. It’s delivered globally and often with increasing amount of intelligence for personalization and relevance. We are moving from a passive consumption of media to highly immersive and intelligent visual experiences. The visual experiences of tomorrow are no longer constrained by the definition of Media as we understand it today. So lets’ look what are these visual experiences.
We typically think of Media as ‘media processing and delivery’ where content is streamed whether it’s video on demand or live streaming etc. While this is still a large part of the opportunity Media is rapidly becoming much more. It encompasses Media Analytics– where content is analyzed to deliver experiences that are much more intelligent, localized personalized and relevant. (Eg. ad insertion). Immersive Media– where content is highly immersive, augmented as if you not just viewing the content but part of the experience. (Eg. Live 360 degree VR streaming of a sporting event). Cloud Graphics– where compute and graphically rich content is made available remotely whether it’s within an enterprise for increased productivity (Eg. training, diagnostics, 3D modeling and simulations) or for delivery of life like ray traced images (Eg. rendering of movies). Cloud Gaming– where end users can experience rich, interactive and highly immersive games anytime, anywhere and on any connected device. They are no longer bound to their playstations and desktops to play rich, highly interactive games. While Media will remain the underpinnings of Visual Cloud, the term ‘Media’ is no longer descriptive of what the industry needs to deliver. The industry needs a new term to define these new experiences so that we have a common understanding of what this new era of media is. Intel in collaboration with our key partners has been using the term Visual Cloud to define this Media of the future and we are putting it out for the industry to adopt.
Now looking at the building block which will deliver these services. Decode and encode have always been the foundational building blocks of Media delivery but going forward we need additional building blocks like render and inference. hence we have 4 main building blocks here – encode decode, render and inference.
Deployment of visual experiences of tomorrow requires four core building blocks to enable these five major services and unleash innovation via infinite use cases. It will happens based on which of these four core building blocks are selected and how they are sequenced. For streaming all you require is to decode the content and then encode for the target device. Now the content is not just delivered to you – it’s increasingly personalized and localized to uniquely address the user. Whether the provider is trying to recommend content based on your viewing habits or user profile it requires intelligence and analytics. This drives the need for inference as another core building block. Hence for analytics pipeline you would need to decode the content, perform inference, take the necessary action and then encode it before sending the data along.
The studios are increasingly relying on graphically rendered movies that break through the bounds of imagination and are realistic and immersive which will require us to render the content and encode it to send. Intent of Visual Cloud is to provide right technology ingredients and interoperability across these four core building blocks.
It’s not just about aggregating the Intel assets to address this new Visual Cloud market opportunity. We are aggressively making investments to enable a platform that is targeted and optimized for the Visual workloads. We are investing in technology leadership where we are establishing a common reference architecture for Visual Cloud that is scalable.
Not only are we defining the HW/SW elements of the platform we are also identifying gaps and aggressively working towards bridging them. For example we are ensuring a rich portfolio of software for the four core building blocks (encode, decode, inference and render) that is interoperable, scales across the hardware offerings and supports standard industry frameworks for scalability. For time to market we are enabling the Intel Select solutions for Visual Cloud which offer BKCs that are optimized for target use cases and delivered via ODMs and OEMs.
Finally, we are launching Open Visual Cloud which builds on the industry standards based framework, optimized and interoperable software ingredients to release reference pipelines for key target use cases. So, Intel is driving technology to deliver a scalable and optimized reference architecture for Visual Cloud, ensuring it’s standards based, runs best on Intel architecture and is easy to commercialize. it ensure cost effective solution for the service providers and developers to innovate and deploy visual services. The investments we make will enable you to focus your investments on rapid services deployment rather than platform development.
Lets zoom in to software workloads here..
Intel is seeding the industry with a project we call Open Visual Cloud; a set of pre-defined reference pipelines for various target visual cloud services. Open Visual Cloud reference pipelines are based on existing Intel-optimized open source ingredients across the four core building blocks. Under decode we have interoperablity with industry standard like x264 and x265 along with introducing AV1 under SVT architecture. Inference is supported though Open VINO - Open Visual Inferencing and Neural Network Optimization. For encode we are continuously adding more and more codecs under scalabale video technology for improved video quality. Ultimately all these ingredients will support open source industry frameworks like ffmpeg, tensorflow etc. Finally in Open Visual Cloud, Intel is offering reference pipelines which shows how all these blocks are interoperable and can be scaled for future requirements. Today we are offering two reference pipeline CDN transcode under media processing and delivery at edge and Smart ad insertion under media analytic services, and intend to put quarterly updates on new pipelines and continue optimizing existing ones. Aim of Open Visual Cloud is to enable the ecosystem including ISVs, Next wave Service Providers and Communications Service Providers etc to accelerate the pace of their visual cloud services innovation
Lets take it to next level and look at optimized software ingredients…
Open visual cloud is Intel’s approach to support these powerful building blocks. SVT, sets of codecs, HW acc through Media SDK, OpenVINO are powerful engines within the OVC software stack to drive the features. So let’s look closely to each one of them. SVT is an architecture which is codec agnostic, it’s a foundational building block for encode and decode. It is designed to get highest performance for better or equal VQ. The objective of the open source Scalable Video Technology (SVT) project is to provide flexible high-performance software encoder core libraries for media and visual cloud developers. Such libraries will serve as a starting point for developers to build faster and higher-quality full-feature encoder products. SVT is designed for cloud-native scalability, and it provides outstanding tradeoffs between visual quality and performance, for both VOD and live usecase. There is also HW acceleration libraries for decode through Intel Media SDK or quick sync video has been traditionally there for all the integrated graphics platform. We have continued evolve this product.
The OpenVINO toolkit offers software developers a single toolkit for applications wanting human-like vision capabilities. It does this by supporting deep learning through Deep Learning Deployment toolkit, which is an inference toolkit, computer vision, optimized functions for OpenCV and OpenVX, and hardware acceleration with heterogeneous support, all in a single toolkit. Aim of Open VINO is to offer open source software that helps developers and data scientists speed up computer vision and deep learning workloads, and enable easy, heterogeneous execution by supporting all HW plugins across Intel® platforms from edge to cloud. Intel® Rendering Framework is a software defined visualization (SDVis) approach for supporting big data use on platforms of all sizes, including cloud and high-performance computing (HPC) clusters. The framework provides SW optimized raytracing and rasterization.
All the software or building block ingredients we saw in last slide are interoperable and very well integrated with the existing industry framework, ultimately leveraging and benefitting the ecosystem. FFMPEG, GSTREAMERS are the high-level interfaces that OVC promotes to speed up development.
Under media, we are leveraging our existing media investment, upstreamed svt-hevc, hw accelerated codecs into ffmpeg and continue to invest more. With that SVT architecture improvements have been upstreamed to x265 as well.
Under Inference, our deep learning framework interoperability allows for reuse of different neural network model like TensorFlow, MXNet, Café, etc. . We are upstreaming deep learning framework to ffmpeg and gstreamer interfaces, also investing in the making the underneath HW and SW libraries and plugins interoperable directly with neural network.
Intent is for developers to benefit from Intel’s upstreamed optimized software and work at interface level when needed which enables quicker time to market. Companies can built this with confidence as Intel’s will contribute both SW and HW plugins.
DCG instructor presents slide ( 3 minutes)
Open source initiatives are a key component of Intel helping both our partners and our end customers benefiting from our broad Hardware roadmap offering. We are active code contributors within multiple community projects to provide a platform foundation hosting a broad set of edge computing applications within a performant, secured, and orchestrated environment.
Right side is open source projects related to the lower layers of the stack: OSes, networking stacks. We contribute code in many projects including base OS enhancements to accommodate our new platforms; DPDK, OvS, FD.io, and Hyperscan for high performing dataplane. DPDK is a set of optimized software libraries and drivers that Intel invented back in 2010. This is to provide acceleration on packet processing on general purpose CPUs. We also contribute to virtual infrastructure managers such as Openstack and Kubernetes; lifecycle management of services at ONAP, Network controllers at Open Daylight and Tungsten Fabric, as well as the emerging Akraino project for Edge stack solutions.
In support of the deployments at the edge, we have investments across different OS variants including Yocto and Clear Linux. This, when combined with Xeon at the edge, enables many of the same tools and innovations from the datacenter to make their way to the edge.
The left side is open source projects we are focused on and/or contributing to related to the upper layers of the stack. Some are AI/CV-related, some are virtualization/containerization-related (aka kubernetes) others are networking related
Above that are the workloads we are converging.
This could be in an industrial setting where PLCs are being consolidated: for example Schneider Electric/Advantech solar plant pilot where thousands of heliostat controllers (to direct the solar panels towards the sun) were previously controlled by PLCs (100 heliostats per PLC), but in the pilot we have virtualized and consolidated 200 PLCs (all individual HW failure points with no failover) into 6 Xeon servers.
Or it could be in a network/NGCO setting: a central office server can be running networking functions but given its proximity to the on-prem edge could conceivably also be used with OpenVINO for deep learning applications someday.
What role do you think our software offerings play in getting us the deal win?
Speaker notes:
RAN Speaker notes: Content for ASSP and Custom
Open Visual Cloud:
Will launch in Q2 ’19
Portability across CPUs, GPUs, and Accelerators,
End-to-end reference pipelines for easy commercialization
So we have all these software's available in opensource available under github repository. Under Open Visual Cloud we are releasing a set of building blocks and reference pipelines in this repository along with the dockerfiles support. Can use the dockerfile(s) in the project or as a reference point for bare metal installation. One thing I will like to emphasize here is with increasing set of software, and hardware.. It gets more complex to maintain all the software, install and use. So here Intel is doing all the hardwork require to provide ease to developer through dockerfiles.
Lets look underneath these repository to see what we support.…
Building OVC software stack is easy if your app is docker based. However, it is not required. The docker instructions provide the exact steps if you want to install on bare metal.
We support multiple images for different software stacks supporting various Oses like multiple version of Ubuntu, CentOS:
FFMPEG: software stack optimized for media creation and delivery, based on FFMPEG.
Included codecs:, x265, vp8/9, av1 and SVT-HEVC. The GPU images are accelerated with vaapi and qsv
GSTREAMER: software stack optimized for media creation and delivery, based on GSTREAMER.
DLDT+FFMPEG: software stack optimized for media analytics., based of the FFmpeg framework. Includes Inferencing engine and tracking plugins.
DLDT+GSTREAMER: software stack optimized for media analytics, based on GSTREAMER.
FFMPEG+GSTREAMER+DEV: The development image that can be used to compile C++ application, for all above usages.
NGINX+RTMP: software stack optimized for web hosting and caching, developed for microservices and CDN. Based on FFmpeg, included NGINX the web server and RTMP the RTMP, DASH and HLS streaming module.
osray: software stack optimized for ray tracing development. Based on embree, included ospray Ray Tracing engine and examples.
What these abbreviation means here are that V stands for Tested and Verified by Intel, T are for Tested but some test didn’t pass versus compiled is that image exist but we haven’t tested yet. Intention to put this out is to be transparent. With that, let’s jump to our first exercise where I will like to show how simple it is to clone these docker files.
As we briefly touched upon SVT previously, lets look here more closely. SVT is an architecture, codec agnostic provides higher or equivalent VQ quality.
Scalable Video Technology (SVT) is a software-based video coding that allows encoders to take advantage of best possible tradeoffs to scale their performance levels given the quality and latency requirements of the target applications through the multiple presets available M0 to M12. The efficiency and scalability of SVT are enabled through mainly architectural and algorithmic features through three-dimensional parallelism, HVS optimized classification, and Resource adaptive scalability.
SVT supports process-based parallelism, which involves the splitting of the encoding operation into a set of independent encoding processes, where partitioning/mode decisions and normative encoding/ decoding are de-coupled.
SVT also supports picture-based parallelism through the use of hierarchical GOP structures.
Most importantly, however, is SVT’s segment-based parallelism, which involves the splitting of each picture into segments and processing multiple segments of a picture in parallel to achieve better utilization of the computational resources with no loss in video quality
SVT is one of the powerful architecture under OVC provides amazingly fast encoding on Intel Xeon platforms. This architectural solution is available for HEVC, VP9 and AV1. It is fully open sourced and plugin into both ffmpeg and gstreamer framework.
Performance of SVT.
Under OVC we have created reference pipeline so lets look at the very first solution. This reference pipeline is architect to demonstrate FFMPEG RTMP streaming, FFMPEG 1:1 and 1:N transcoding and CDN NGINX caching service. Some of the common benefits of CDN pipeline is to reduces latency delays and the end user experience is fast
We will focus on the highlighted section here – where a live video from streaming server is pushed into Transcode Server over RTMP protocol. The Transcode Server receives the video stream over rtmp, decapsulate and demux the video, transcode to other codecs/bitrate/resolution, in 1:N manner, which means one input and N output. As you see transcoding first channel to 1080p at 60fps, second channel to 1280p 60fps and so on to have N outputs. The transcoded video streams is muxed over rtmp, be distributed to the CDN Edge Server over CDN network, accordingly to the decision from CDN Manager, whose role is to role is to schedule jobs., manage parallized task execution. Now the CDN Edge server receives the video streams from Transcode Server, cache the streams and push the steams into various clients over rtmp. we have used ffmpeg to decode, transcode and rtmp streaming, with that transcoding can be done using SW or HW codec like SVT/x264/x265 or qsv which is all available through ffmpeg. Doing 1:N transcoding drastically improves the performance by running pipelines in parallel using ffmpeg. The eliminate the necessity to program at lower and write lines of code whenever possible.
Nginx – web server -, inside module is RTMP – HLS/dash segments to browser. CDN manager
The AD-Insertion sample demos the AD insertion usage. The server-side AD insertion solution provides multiple benefits like
improve ad viewing experience : studies have been done if the ad match the content being watched there is higher viewer engagement by clicking the ad or not skipping it.
regional ad customization : Content providers are national wide. Replace default ads with localized ads through server side ad insertion, this enables replacing the ads.
The Content Provider service serves original content, with on-demand transcoding, through the DASH or HLS streaming protocol. The AD Insertion service analyzes and inserts AD, with transcoding if needed, into the video stream at each AD break slot. The client player is based on dash.js and hls.js. So here are three main blocks which ad analytics to the existing framework.
The AD Content service archives the AD videos and serves them upon request.
The AD Insertion service implements the logic of inserting AD during video playback
The Ad decision service makes decision on what AD to show in the next AD break, and returns the AD URL. The decision is based on combination of user AD preference and available cues, results from analyzing the video content.
Lets look in depth how the flow works..
The under-the-hood design shows where the OVC pipelines reside:
The client player starts video playback by requesting the video manifest file, which describes the DASH/HLS segments.
The AD Insertion service intercepts the request, retrieves the manifest. The AD Insertion service also schedules two pipelines, first to analyze the video segments, and to construct AD segments. The analyzed results are saved to the database for later use. The constructed AD segments will be sent to the client player upon request.
The AD Insertion service keeps track of how many AD segments are served to the client player and reports the statistics to the AD Content service. If the user clicks on any portion of the playback screen, the AD Insertion service interpret the click to be either an AD click or a question/answer click. Report the click to the AD Content service or the AD Decision service for further action.
The analytics pipeline is used to analyze the video content. We provide two equivalent implementations based FFMPEG and GSTREAMER. They run side by side or standalone as you prefer.
The transcoding pipeline is used to transcode ADs to match the video quality (bitrate and resolution.) The transcoding pipeline is based on FFMPEG.
Analytics is powered by OpenVINO.
Now let’s look at how we can simply run this through ffmpeg/gstreamer framework.
So OVC provides ffmpeg inference plugins that let you perform end to end task like object detection, face detection and emotion detection through cmd line. Simply call the corresponding ffmpeg functions. The slide shows some examples. Lets look at first example which shows face detection followed by emotion recognition which is relevant for understanding the emotion of the video to insert an appropriate ad content.
Message here is we don’t have to program to do run complex analytics job, there is cmd line options available to run the workload. This reduce the number of hours spend to develop a solution and optimize performance, weight and models, simply use one of the pretrained model part of the industry standard framework.
OVC provides similar features with GSTREAMER as well. The application developers can use the plugins to construct complex pipelines for media analytics use cases.
With support of two different framework – ffmpeg and gstreamer we have made sure model support underneath are same, metadata format is exactly so if you want to switch between two framework it should require minimal changes at cmd line level.
In the end I will like to conclude with few takeaways Open Visual is a project which goes beyond traditional media delivery to offer reference solution and pipelines to interesting usecase with intention to make the software interoperate well with standard Industry framework which in turn reduce the development time by months. All these software are opensource today at github.com/OpenVisual Cloud with simple docker images.
Also, participate by by submitting feedbacks, bugs, and feature requests. And Contribute to enhance Open Visual