SlideShare ist ein Scribd-Unternehmen logo
1 von 100
Downloaden Sie, um offline zu lesen
Copyright © 2014 Criteo
Criteo Labs Infrastructure Tech Talk
November 7, 2017
By Dailymotion, Criteo & Leboncoin
Copyright © 2017 Criteo
© 2017
Hardware assisted transcoding
Tuesday, November 7th 2017
SOME
BACKGROUND
© 2017
© 2017
• uploaded videos come in various containers and codecs
• you need to feed the native web player with a specific format
• various qualities are available to our users
SOME BACKGROUND
What is transcoding and why it is so important to us
www.dailymotion.com/
upload
TRANSCODE
www.dailymotion.com
/video/12345
1080p
576p
4k
720p
M3U8
H264/AAC TS
HLS
ABR
…
© 2017
• more than 150k videos uploaded every day
• 4 to 8 qualities per video (HLS)
• from 144p to 2160p
• 20M transcoding tasks per month
• fast publication time constraint (3x to 10x)
• getting the best possible video quality
SOME BACKGROUND
Dailymotion facts
LEGACY
© 2017
© 2017
• 160 blade servers
• up to 56 logical threads Xeon E5–2683 CPU
• 240W TDP per blade
• FFmpeg 3.x with libx264 for video encoding
• pure software transcoding
SOME BACKGROUND
Legacy encoding farm
WHAT WE WANT
© 2017
© 2017
• reduce OPEX: power consumption
• reduce CAPEX: unit price
• better performances means faster publication
• use the existing video workflow
WHAT WE WANT
Can we improve our existing transcoding workflow ?
SOMETHING NEW
© 2017
© 2017
• choosing the right solution:
• Nvidia NVENC
• Intel Medial SDK (Quicksync)
• how can it it fit in our workflow ?
• what is the gain in term of:
• performance
• cost
• power consumption
SOMETHING NEW
GPU accelerated solution
HARDWARE
© 2017
© 2017
•HPE Moonshot 1500 chassis
•up to 45 Intel Xeon cartridges per chassis:
•Xeon E3–1585L V5 (4 cores Hyper-Threaded) with
Iris Pro Graphics P580 (72 processing units)
•TDP: 45W
•500 GB SSD
•64 GB RAM
HARDWARE
What we use now
SOFTWARE
© 2017
© 2017
•Intel Media SDK 2017R3
•Kernel 4.4.83 with Intel patches
•FFmpeg 3.3.3 with Intel patches
•unchanged in-house scheduling solution
•monitoring through Datadog
SOFTWARE
What we use now
WORKFLOW
© 2017
© 2017
WORKFLOW
Transcoding workflow
Input file
Demux
(MP4/MKV/…)
Software (FFMPEG)
Hardware (Intel GPU)
Video
frames
Decode
frames
Filters
(Deint/Scale/…)
Encode
frames
Transcode
audioAudio
frames
Remux
(MP4)
Output file
© 2017
WORKFLOW
Look-ahead algorithm
Input file
Demux
(MP4/MKV/…)
Video
frames
Decode
frames
Next decoded frames buffer
Encode
frames
Look-ahead
bitrate analyser
Set bitrate
Transcode
audioAudio
frames
Remux
(MP4)
Output file
RESULTS
© 2017
© 2017
The following FFmpeg version were used during the test:
• QSV version: FFmpeg 3.3.3 + Intel patches
• SW only: FFmpeg 3.3.3 (16 threads)
We also tried to use the same parameters, when possible:
• AVC profile and level
• Keyframes interval (forced every 3 seconds for HLS)
• Frame-rate left untouched when possible
• AAC audio
• MP4 container
We enable Variable Bit-Rate Look-ahead for QSV transcoding
RESULTS
© 2017
RESULTS
Performance test 1: single 1080p transcoding
© 2017
RESULTS
Performance test 2: concurrent 10x1080p transcodings
© 2017
RESULTS
Performance test 3: concurrent 20x480p transcodings
© 2017
The following graph show the power consumption of a full chassis (45
cartridges) over the last 2 weeks
~ 3700 W /45 = 82W per cartridge
(SW only solution has a theoretical TDP of 240W)
RESULTS
Power consumption
© 2017
RESULTS
Quality
Test SW transcoding 2
pass
HW transcoding Average Gain
SW vs HW
4k→ mp4_h264_aac_uhd 33 32 ~=
720p→ mp4_h264_aac_hd 44 44 =
1080p→ mp4_h264_aac_fhd 47 40 x1,2
movie_sample (1080p)→
mp4_h264_aac_fhd
44 44 =
• Results are in PSNR units, higher is better
© 2017
RESULTS
Quality
BW graph with Look-ahead enabled
BW graph without Look-ahead enabled
MONITORING
© 2017
© 2017
MONITORING
CONCLUSION
© 2017
© 2017
• Pros
• much… much faster for single transcoding (up to 12x faster)
• power consumption is much lower
• cheaper (more the 2.5 times cheaper per unit)
• Cons
• slower with multiple low-res tasks
• Quality is not 100% as good, Look-ahead helps though
CONCLUSION
We built our new transcoding farm!
© 2017
• Blog post on medium:
• http://medium.com/dailymotion-engineering/hardware-assisted-
video-transcoding-at-dailymotion-66cd2db448ae
• SFFmpeg (static FFmpeg build):
• https://github.com/pyke369/sffmpeg
• Intel media SDK FFmpeg patches:
• https://github.com/Intel-FFmpeg-Plugin/Intel_FFmpeg_plugins
CONCLUSION
Some references
DEMO AND Q&A
© 2017
© 2017
Thank you
Gilles Vieira
gilles.vieira@dailymotion.com
1
RFP Challenge
The quest to find our next DCs
Mohamed Benazza
Nicolas Pérez
2
RFP: Request For Proposal
An RFP is a set of specifications that describe the
sought-after solution, and evaluation criteria that
disclose how proposals will be graded.
(Margaret Rouse - https://goo.gl/uVHKqM)
3
Traffic
90+ Gbps Internet Traffic
1+ Tbps inter-DCs capacity
Servers
26 000+ Servers
28 Pb Storage
Growth in 2017
+2 Data centers
+6 500 Servers
+4 x 100G inter-DCs links
Power
8+ Mega Watt
(+/- 7000 homes for 1 year)
Data Centers World Wide
Criteo Global Footprint 2017-Q4
4
760
1019
1366
1831
2454
2080
1320
301
2017 2018 2019 2020 2021
0
1000
2000
3000
4000
5000
6000
7000
8000
TY5
#SRV to Add #SRV Site Capacity #SRV Total
- Capacity Planning:
- Organic Growth
- New Projects
- Corporate Strategy
- Resilience
- Hadoop
5
Q4 2017Q3 2017
RFP Process
RFP Launch
February 7th
Vendor Award
June 9th
RFP Answers
March 1st
Contract review
Project
Launch
BUILD Phase
Data Center
Infrastructure
Ready
Data Center
Commissioning
Hardware Procurement
Cabling & Setup
IP Transit & Leased Line
Q&A
February March April
Shortlist
Selection
April 3rd
May
First Billing
Offers
Review
Vendors Negotiations
PO release &
Contract signed
August September
Site visit
Q2 2017
Hardware
Setup+
SRE
validation
July
Q1 2017
October November
4 months 6 months
June December
ETA for PO
approval
Timeline
6
Who?
R&D
Infrastructure
Procurement
Team
Legal Counsel
Qapla Team
CTO
CFO
MBG
7
RFP: Documentation Package
Master document with background and planning
Appendix 1 : Technical Requirement and Answering Grid
Mandatory Requirements
All questions will be shared on this file
Administrative documentation
8
RFP: Administrative documentation
9
RFP: Planning and Master Document
10
RFP: Technical Requirement and Answering Grid
11
RFP: Pricing Grid
12
13
RFP: Results: Proposal Summary
Vendor #7 is not Shortlisted
Vendor #3 is Shortlisted
14
Data Centers Visits
15
Data Centers Visits
16
17
Y a plus qu’à …
18
Questions?
1
ACDC - AutomatiC DataCenter
Felix Cantournet & Xavier Krantz
2017-11-07
Agenda
1. Leboncoin
2. Historique
3. Remise en question
4. ACDC
5. Next
6. Rex
3
Leboncoin
Quelques chiffres
4
5
6
7
1.2 - Stack Technique
2
Datacenters
600
serveurs physiques
(plus de 1000 avec les virtuels)
12 Gbits/s
de débit sortant
6 To
de BDD
8
300M
d’images
15k req/s
sur leboncoin.fr
1.2 - Stack Technique
2
Datacenters
600
serveurs physiques
(plus de 1000 avec les virtuels)
12 Gbits/s
de débit sortant
6 To
de BDD
9
300M
d’images
15k req/s
sur leboncoin.fr
Historique
& Évolutions
10
2.1 - Situation initiale
11
2.1 - Situation initiale
● 1 - Operator
○ find a free IP (Welcome ping !)
● 3 - Foreman
○ Go in Foreman and select a node
○ Get the @MAC
○ Create the node + put in build mode
12
● 4 - Puppet
○ Reserve @Mac / DNS name in DHCP
○ Commit + push
○ Run the agent on every DHCP nodes
● 2 - Puppet
○ Reserve IP / DNS name in DNS
○ Commit + push
○ Run the agent on every DHCP nodes
2.1 - Situation initiale
● 5 - Foreman
○ Reboot the node via BMC plugin
● 7 - Operator
○ Follows with Java console
13
● 6 - Node installs
○ Boot on network (PXE)
○ DHCP redirects to TFTP
○ TFTP serves the custom PXE config
○ Pressed is rendered by Foreman
2.1 - Situation initiale
● 5 - Foreman
○ Reboot the node via BMC plugin
● 7 - Operator
○ Follows with Java console
14
● 6 - Node installs
○ Boot on network (PXE)
○ DHCP redirects to TFTP
○ TFTP serves the custom PXE config
○ Pressed is rendered by Foreman
6 manual steps
Errors prone
Human conflicts
Time consuming
15
2.2 - Problématique
● Simplifier le provisioning bare metal
○ Provisioning / installation non-supervisée
○ 1 manual step
16
2.3 - Essai 1 - Foreman + SmartProxies
Constat: Sous utilisation de Foreman.
Solutions: Smart proxy pour automatiser :
- IPAM + DHCP
- DNS
17
● Foreman Smart-proxy
○ Not supported
2.3 - Essai 1 - Foreman + SmartProxies
● We
○ 1 big zone file
● Foreman Smart-proxy
○ Dynamic updates = nsupdate
○ Binary journal file + serial conflicts
18
● We
○ Do nics bonding
○ Need to register n@Macs <> 1 IP
Pain points: DNS
Pain points: DHCP
2.3 - Essai 1 - Foreman + SmartProxies
● We
○ Do not master Ruby
○ Are not “a Tech company”
○ Are not that big
● Foreman & Smart-proxy
○ Very complex code base
○ Very complex UI
○ Generic and have a lots (too many) of
features
19
Pain points: Foreman
Remise en
cause
20
3.1 - Interface avec prestataire
Celeris : Prestataire interventions en DC
● Spreadsheet
● DCIM : Netbox
○ Open source
○ Digital Ocean
○ python + postgresql
Intégration avec Foreman ?
21
3.2 - Overlap de solutions
IPAM
DCIMCMDB
???
22
Problématique 2
● Automatiser la gestion du cycle de vie des
machines physiques
○ Discovery/intake
○ Provisioning / installation non-supervisée
○ Maintenance, decommission
23
Collins
● Project open source https://github.com/tumblr/collins
● Machine à état imposée
● Système de hook / callback arbitraire sur les transitions d’état
● Metadata key / value arbitraires associées à chaque assets
● Web UI + API http + firehose
24
Collins: Tooling
25
API Clients
● Go-collins
● pycollins
● Ruby libs
○ collins-auth
○ collins-client
○ collins-notify
○ collins-state
○ ...
CLI
● collins-shell
Collins: Web UI
26
Collins: Web UI
27
Collins: Cycle de vie
28
Workflows spécifiés :
- Intake
- Comissionnement
- Maintenance
- Décomissionnement
Collins: Callback registry
29
ACDC
30
4.1 - Overview
31
4.2 - Lorie
32
4.2 - Lorie
33
4.3 - IPXE Router
34
4.4 - Collins callbacks
35
● nowProvisioned
○ on = "asset_update"
○ When
■ previous.state = "isProvisioning"
■ && current.state = "isProvisioned"
● provisionEvent
○ on = "asset_update"
○ When
■ current.state = "isNew"
● unallocated
○ on = "asset_update"
○ When
■ current.state = "isUnallocated"
4.5 - Provisioning
36
4.6 - Tooling
37
$ collins-shell
INFO - ENV Variable COLLINS_CONFIG=/home/xkrantz/Sources/github.schibsted.io/leboncoin/acdc/conf/collins.yaml
Tasks:
collins-shell asset <command> # Asset related commands
collins-shell asset_type <command> # Asset Type related commands
collins-shell console # drop into the interactive collins shell
collins-shell help [TASK] # Describe available tasks or one specific task
collins-shell ip_address <command> # IP address related commands
collins-shell ipmi <command> # IPMI related commands
collins-shell latest # check if there is a newer version of collins-shell
collins-shell log MESSAGE # log a message on an asset
collins-shell logs TAG # fetch logs for an asset specified by its tag. Use "all" for a...
collins-shell power ACTION --reason=REASON --tag=TAG # perform power action (off, on, rebootSoft, rebootHard, etc) o...
collins-shell power_status # check power status on an asset
collins-shell provision <command> # Provisioning related commands
collins-shell search_logs QUERY # search for asset logs
collins-shell state <command> # State management related commands - use with care
collins-shell tag <command> # Tag related commands
collins-shell version # current version of collins-shell
Next
38
5 - Next
ACDC v2
Rework
● Discovery
● OS bootstrapping
Add
● Disk management
● Firmware updates
● Any maintenance tasks
39
5 - Next
ACDC v2
Rework
● Discovery
● OS bootstrapping
Add
● Disk management
● Firmware updates
● Any maintenance tasks
Discovery
● Currently:
○ Genesis (Tumblr)
○ Ruby DSL (Chef like)
● Next:
○ CoreOS in Memory + Ansible
40
5 - Next
ACDC v2
Rework
● Discovery
● OS bootstrapping
Add
● Disk management
● Firmware updates
● Any maintenance tasks
OS Bootstrapping
● Currently:
○ Pressed / Kickstart
○ Shell scripts
● Next:
○ CoreOS in Memory + Ansible
41
5.1 - Ansible jobs runner
42
5.1 - Ansible jobs runner
43
5.2 - Visualization & federation
44
5.3 - Integration
45
5.3 - Integration
46
SPECS
REX
47
20% projects are not enough
REX
48
Services & ownership transition (for Ops)
REX
49

Weitere ähnliche Inhalte

Was ist angesagt?

LF_OVS_17_OVS Performance on Steroids - Hardware Acceleration Methodologies
LF_OVS_17_OVS Performance on Steroids - Hardware Acceleration MethodologiesLF_OVS_17_OVS Performance on Steroids - Hardware Acceleration Methodologies
LF_OVS_17_OVS Performance on Steroids - Hardware Acceleration MethodologiesLF_OpenvSwitch
 
FreeSWITCH on Docker
FreeSWITCH on DockerFreeSWITCH on Docker
FreeSWITCH on Docker建澄 吳
 
How Networking works with Data Science
How Networking works with Data Science How Networking works with Data Science
How Networking works with Data Science HungWei Chiu
 
Ceph on All Flash Storage -- Breaking Performance Barriers
Ceph on All Flash Storage -- Breaking Performance BarriersCeph on All Flash Storage -- Breaking Performance Barriers
Ceph on All Flash Storage -- Breaking Performance BarriersCeph Community
 
LF_OVS_17_Enabling hardware acceleration in OVS-DPDK using DPDK Framework.
LF_OVS_17_Enabling hardware acceleration in OVS-DPDK using DPDK Framework.LF_OVS_17_Enabling hardware acceleration in OVS-DPDK using DPDK Framework.
LF_OVS_17_Enabling hardware acceleration in OVS-DPDK using DPDK Framework.LF_OpenvSwitch
 
LF_OVS_17_Enabling Hardware Offload of OVS Control & Data plane using LiquidIO
LF_OVS_17_Enabling Hardware Offload of OVS Control & Data plane using LiquidIOLF_OVS_17_Enabling Hardware Offload of OVS Control & Data plane using LiquidIO
LF_OVS_17_Enabling Hardware Offload of OVS Control & Data plane using LiquidIOLF_OpenvSwitch
 
OCP U.S. Summit 2017 Presentation
OCP U.S. Summit 2017 PresentationOCP U.S. Summit 2017 Presentation
OCP U.S. Summit 2017 PresentationNetronome
 
Control Your Network ASICs, What Benefits switchdev Can Bring Us
Control Your Network ASICs, What Benefits switchdev Can Bring UsControl Your Network ASICs, What Benefits switchdev Can Bring Us
Control Your Network ASICs, What Benefits switchdev Can Bring UsHungWei Chiu
 
[En] IPVS for Docker Containers
[En] IPVS for Docker Containers[En] IPVS for Docker Containers
[En] IPVS for Docker ContainersAndrey Sibirev
 
Cloud-based Virtualization for Test Automation
Cloud-based Virtualization for Test AutomationCloud-based Virtualization for Test Automation
Cloud-based Virtualization for Test AutomationVikram G Hosakote
 
LF_OVS_17_Open vSwitch Offload: Conntrack and the Upstream Kernel
LF_OVS_17_Open vSwitch Offload: Conntrack and the Upstream KernelLF_OVS_17_Open vSwitch Offload: Conntrack and the Upstream Kernel
LF_OVS_17_Open vSwitch Offload: Conntrack and the Upstream KernelLF_OpenvSwitch
 
Approaching hyperconvergedopenstack
Approaching hyperconvergedopenstackApproaching hyperconvergedopenstack
Approaching hyperconvergedopenstackIkuo Kumagai
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHungWei Chiu
 
Open vSwitch Introduction
Open vSwitch IntroductionOpen vSwitch Introduction
Open vSwitch IntroductionHungWei Chiu
 
LF_OVS_17_Red Hat's perspective on OVS HW Offload Status
LF_OVS_17_Red Hat's perspective on OVS HW Offload StatusLF_OVS_17_Red Hat's perspective on OVS HW Offload Status
LF_OVS_17_Red Hat's perspective on OVS HW Offload StatusLF_OpenvSwitch
 
LF_OVS_17_OVN and Kelda
LF_OVS_17_OVN and KeldaLF_OVS_17_OVN and Kelda
LF_OVS_17_OVN and KeldaLF_OpenvSwitch
 
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6David Pasek
 
LF_DPDK17_ OpenVswitch hardware offload over DPDK
LF_DPDK17_ OpenVswitch hardware offload over DPDKLF_DPDK17_ OpenVswitch hardware offload over DPDK
LF_DPDK17_ OpenVswitch hardware offload over DPDKLF_DPDK
 

Was ist angesagt? (20)

LF_OVS_17_OVS Performance on Steroids - Hardware Acceleration Methodologies
LF_OVS_17_OVS Performance on Steroids - Hardware Acceleration MethodologiesLF_OVS_17_OVS Performance on Steroids - Hardware Acceleration Methodologies
LF_OVS_17_OVS Performance on Steroids - Hardware Acceleration Methodologies
 
FreeSWITCH on Docker
FreeSWITCH on DockerFreeSWITCH on Docker
FreeSWITCH on Docker
 
How Networking works with Data Science
How Networking works with Data Science How Networking works with Data Science
How Networking works with Data Science
 
Ceph on All Flash Storage -- Breaking Performance Barriers
Ceph on All Flash Storage -- Breaking Performance BarriersCeph on All Flash Storage -- Breaking Performance Barriers
Ceph on All Flash Storage -- Breaking Performance Barriers
 
LF_OVS_17_Enabling hardware acceleration in OVS-DPDK using DPDK Framework.
LF_OVS_17_Enabling hardware acceleration in OVS-DPDK using DPDK Framework.LF_OVS_17_Enabling hardware acceleration in OVS-DPDK using DPDK Framework.
LF_OVS_17_Enabling hardware acceleration in OVS-DPDK using DPDK Framework.
 
LF_OVS_17_Enabling Hardware Offload of OVS Control & Data plane using LiquidIO
LF_OVS_17_Enabling Hardware Offload of OVS Control & Data plane using LiquidIOLF_OVS_17_Enabling Hardware Offload of OVS Control & Data plane using LiquidIO
LF_OVS_17_Enabling Hardware Offload of OVS Control & Data plane using LiquidIO
 
OCP U.S. Summit 2017 Presentation
OCP U.S. Summit 2017 PresentationOCP U.S. Summit 2017 Presentation
OCP U.S. Summit 2017 Presentation
 
Control Your Network ASICs, What Benefits switchdev Can Bring Us
Control Your Network ASICs, What Benefits switchdev Can Bring UsControl Your Network ASICs, What Benefits switchdev Can Bring Us
Control Your Network ASICs, What Benefits switchdev Can Bring Us
 
[En] IPVS for Docker Containers
[En] IPVS for Docker Containers[En] IPVS for Docker Containers
[En] IPVS for Docker Containers
 
Janus & docker: friends or foe
Janus & docker: friends or foe Janus & docker: friends or foe
Janus & docker: friends or foe
 
Cloud-based Virtualization for Test Automation
Cloud-based Virtualization for Test AutomationCloud-based Virtualization for Test Automation
Cloud-based Virtualization for Test Automation
 
LF_OVS_17_Open vSwitch Offload: Conntrack and the Upstream Kernel
LF_OVS_17_Open vSwitch Offload: Conntrack and the Upstream KernelLF_OVS_17_Open vSwitch Offload: Conntrack and the Upstream Kernel
LF_OVS_17_Open vSwitch Offload: Conntrack and the Upstream Kernel
 
Approaching hyperconvergedopenstack
Approaching hyperconvergedopenstackApproaching hyperconvergedopenstack
Approaching hyperconvergedopenstack
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
Open vSwitch Introduction
Open vSwitch IntroductionOpen vSwitch Introduction
Open vSwitch Introduction
 
Multicast in OpenStack Tips
Multicast in OpenStack TipsMulticast in OpenStack Tips
Multicast in OpenStack Tips
 
LF_OVS_17_Red Hat's perspective on OVS HW Offload Status
LF_OVS_17_Red Hat's perspective on OVS HW Offload StatusLF_OVS_17_Red Hat's perspective on OVS HW Offload Status
LF_OVS_17_Red Hat's perspective on OVS HW Offload Status
 
LF_OVS_17_OVN and Kelda
LF_OVS_17_OVN and KeldaLF_OVS_17_OVN and Kelda
LF_OVS_17_OVN and Kelda
 
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
 
LF_DPDK17_ OpenVswitch hardware offload over DPDK
LF_DPDK17_ OpenVswitch hardware offload over DPDKLF_DPDK17_ OpenVswitch hardware offload over DPDK
LF_DPDK17_ OpenVswitch hardware offload over DPDK
 

Ähnlich wie Criteo Labs Infrastructure Tech Talk Meetup Nov. 7

Massively Scaled High Performance Web Services with PHP
Massively Scaled High Performance Web Services with PHPMassively Scaled High Performance Web Services with PHP
Massively Scaled High Performance Web Services with PHPDemin Yin
 
Dockerizing Aurea - Docker Con EU 2017
Dockerizing Aurea - Docker Con EU 2017Dockerizing Aurea - Docker Con EU 2017
Dockerizing Aurea - Docker Con EU 2017Matias Lespiau
 
Feedback on Big Compute & HPC on Windows Azure
Feedback on Big Compute & HPC on Windows AzureFeedback on Big Compute & HPC on Windows Azure
Feedback on Big Compute & HPC on Windows AzureAntoine Poliakov
 
Building High Quality Video Operations in the Cloud - Synacor
Building High Quality Video Operations in the Cloud - SynacorBuilding High Quality Video Operations in the Cloud - Synacor
Building High Quality Video Operations in the Cloud - SynacorAmazon Web Services
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
Building Windows - how the bits flow from check-in to the fast-ring
Building Windows - how the bits flow from check-in to the fast-ringBuilding Windows - how the bits flow from check-in to the fast-ring
Building Windows - how the bits flow from check-in to the fast-ringMicrosoft Tech Community
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...Edge AI and Vision Alliance
 
Traefik on Kubernetes at MySocialApp (CNCF Paris Meetup)
Traefik on Kubernetes at MySocialApp (CNCF Paris Meetup)Traefik on Kubernetes at MySocialApp (CNCF Paris Meetup)
Traefik on Kubernetes at MySocialApp (CNCF Paris Meetup)Pierre Mavro
 
Serverless Media Workflow
Serverless Media WorkflowServerless Media Workflow
Serverless Media WorkflowMooYeol Lee
 
WebRTC Standards & Implementation Q&A - The Internals of WebRTC Browsers Impl...
WebRTC Standards & Implementation Q&A - The Internals of WebRTC Browsers Impl...WebRTC Standards & Implementation Q&A - The Internals of WebRTC Browsers Impl...
WebRTC Standards & Implementation Q&A - The Internals of WebRTC Browsers Impl...Amir Zmora
 
Encoding at Scale for Live Video Streaming
Encoding at Scale for Live Video StreamingEncoding at Scale for Live Video Streaming
Encoding at Scale for Live Video StreamingRay Adensamer
 
Immutable Kubernetes with Digital Rebar Provision
Immutable Kubernetes with Digital Rebar ProvisionImmutable Kubernetes with Digital Rebar Provision
Immutable Kubernetes with Digital Rebar ProvisionRackN
 
Panel with IPv6 CE Vendors
Panel with IPv6 CE VendorsPanel with IPv6 CE Vendors
Panel with IPv6 CE VendorsAPNIC
 
Webcast - Making kubernetes production ready
Webcast - Making kubernetes production readyWebcast - Making kubernetes production ready
Webcast - Making kubernetes production readyApplatix
 
New bare-metal provisioning setup built around Collins
New bare-metal provisioning setup built around CollinsNew bare-metal provisioning setup built around Collins
New bare-metal provisioning setup built around Collinsleboncoin engineering
 
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...Haidee McMahon
 
7 reasons why video conferencing world will never
7 reasons why video conferencing world will never7 reasons why video conferencing world will never
7 reasons why video conferencing world will neverTrueConf
 
The best of Windows Server 2016 - Thomas Maurer
 The best of Windows Server 2016 - Thomas Maurer The best of Windows Server 2016 - Thomas Maurer
The best of Windows Server 2016 - Thomas MaurerITCamp
 

Ähnlich wie Criteo Labs Infrastructure Tech Talk Meetup Nov. 7 (20)

Massively Scaled High Performance Web Services with PHP
Massively Scaled High Performance Web Services with PHPMassively Scaled High Performance Web Services with PHP
Massively Scaled High Performance Web Services with PHP
 
Dockerizing Aurea - Docker Con EU 2017
Dockerizing Aurea - Docker Con EU 2017Dockerizing Aurea - Docker Con EU 2017
Dockerizing Aurea - Docker Con EU 2017
 
HD CCTV -Arecont Exacq Pivot3.ppt
HD CCTV -Arecont Exacq Pivot3.pptHD CCTV -Arecont Exacq Pivot3.ppt
HD CCTV -Arecont Exacq Pivot3.ppt
 
The Road to Ultra Low Latency
The Road to Ultra Low LatencyThe Road to Ultra Low Latency
The Road to Ultra Low Latency
 
Feedback on Big Compute & HPC on Windows Azure
Feedback on Big Compute & HPC on Windows AzureFeedback on Big Compute & HPC on Windows Azure
Feedback on Big Compute & HPC on Windows Azure
 
Building High Quality Video Operations in the Cloud - Synacor
Building High Quality Video Operations in the Cloud - SynacorBuilding High Quality Video Operations in the Cloud - Synacor
Building High Quality Video Operations in the Cloud - Synacor
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Building Windows - how the bits flow from check-in to the fast-ring
Building Windows - how the bits flow from check-in to the fast-ringBuilding Windows - how the bits flow from check-in to the fast-ring
Building Windows - how the bits flow from check-in to the fast-ring
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
 
Traefik on Kubernetes at MySocialApp (CNCF Paris Meetup)
Traefik on Kubernetes at MySocialApp (CNCF Paris Meetup)Traefik on Kubernetes at MySocialApp (CNCF Paris Meetup)
Traefik on Kubernetes at MySocialApp (CNCF Paris Meetup)
 
Serverless Media Workflow
Serverless Media WorkflowServerless Media Workflow
Serverless Media Workflow
 
WebRTC Standards & Implementation Q&A - The Internals of WebRTC Browsers Impl...
WebRTC Standards & Implementation Q&A - The Internals of WebRTC Browsers Impl...WebRTC Standards & Implementation Q&A - The Internals of WebRTC Browsers Impl...
WebRTC Standards & Implementation Q&A - The Internals of WebRTC Browsers Impl...
 
Encoding at Scale for Live Video Streaming
Encoding at Scale for Live Video StreamingEncoding at Scale for Live Video Streaming
Encoding at Scale for Live Video Streaming
 
Immutable Kubernetes with Digital Rebar Provision
Immutable Kubernetes with Digital Rebar ProvisionImmutable Kubernetes with Digital Rebar Provision
Immutable Kubernetes with Digital Rebar Provision
 
Panel with IPv6 CE Vendors
Panel with IPv6 CE VendorsPanel with IPv6 CE Vendors
Panel with IPv6 CE Vendors
 
Webcast - Making kubernetes production ready
Webcast - Making kubernetes production readyWebcast - Making kubernetes production ready
Webcast - Making kubernetes production ready
 
New bare-metal provisioning setup built around Collins
New bare-metal provisioning setup built around CollinsNew bare-metal provisioning setup built around Collins
New bare-metal provisioning setup built around Collins
 
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
 
7 reasons why video conferencing world will never
7 reasons why video conferencing world will never7 reasons why video conferencing world will never
7 reasons why video conferencing world will never
 
The best of Windows Server 2016 - Thomas Maurer
 The best of Windows Server 2016 - Thomas Maurer The best of Windows Server 2016 - Thomas Maurer
The best of Windows Server 2016 - Thomas Maurer
 

Kürzlich hochgeladen

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 

Kürzlich hochgeladen (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 

Criteo Labs Infrastructure Tech Talk Meetup Nov. 7

  • 1. Copyright © 2014 Criteo Criteo Labs Infrastructure Tech Talk November 7, 2017 By Dailymotion, Criteo & Leboncoin Copyright © 2017 Criteo
  • 2. © 2017 Hardware assisted transcoding Tuesday, November 7th 2017
  • 4. © 2017 • uploaded videos come in various containers and codecs • you need to feed the native web player with a specific format • various qualities are available to our users SOME BACKGROUND What is transcoding and why it is so important to us www.dailymotion.com/ upload TRANSCODE www.dailymotion.com /video/12345 1080p 576p 4k 720p M3U8 H264/AAC TS HLS ABR …
  • 5. © 2017 • more than 150k videos uploaded every day • 4 to 8 qualities per video (HLS) • from 144p to 2160p • 20M transcoding tasks per month • fast publication time constraint (3x to 10x) • getting the best possible video quality SOME BACKGROUND Dailymotion facts
  • 7. © 2017 • 160 blade servers • up to 56 logical threads Xeon E5–2683 CPU • 240W TDP per blade • FFmpeg 3.x with libx264 for video encoding • pure software transcoding SOME BACKGROUND Legacy encoding farm
  • 9. © 2017 • reduce OPEX: power consumption • reduce CAPEX: unit price • better performances means faster publication • use the existing video workflow WHAT WE WANT Can we improve our existing transcoding workflow ?
  • 11. © 2017 • choosing the right solution: • Nvidia NVENC • Intel Medial SDK (Quicksync) • how can it it fit in our workflow ? • what is the gain in term of: • performance • cost • power consumption SOMETHING NEW GPU accelerated solution
  • 13. © 2017 •HPE Moonshot 1500 chassis •up to 45 Intel Xeon cartridges per chassis: •Xeon E3–1585L V5 (4 cores Hyper-Threaded) with Iris Pro Graphics P580 (72 processing units) •TDP: 45W •500 GB SSD •64 GB RAM HARDWARE What we use now
  • 15. © 2017 •Intel Media SDK 2017R3 •Kernel 4.4.83 with Intel patches •FFmpeg 3.3.3 with Intel patches •unchanged in-house scheduling solution •monitoring through Datadog SOFTWARE What we use now
  • 17. © 2017 WORKFLOW Transcoding workflow Input file Demux (MP4/MKV/…) Software (FFMPEG) Hardware (Intel GPU) Video frames Decode frames Filters (Deint/Scale/…) Encode frames Transcode audioAudio frames Remux (MP4) Output file
  • 18. © 2017 WORKFLOW Look-ahead algorithm Input file Demux (MP4/MKV/…) Video frames Decode frames Next decoded frames buffer Encode frames Look-ahead bitrate analyser Set bitrate Transcode audioAudio frames Remux (MP4) Output file
  • 20. © 2017 The following FFmpeg version were used during the test: • QSV version: FFmpeg 3.3.3 + Intel patches • SW only: FFmpeg 3.3.3 (16 threads) We also tried to use the same parameters, when possible: • AVC profile and level • Keyframes interval (forced every 3 seconds for HLS) • Frame-rate left untouched when possible • AAC audio • MP4 container We enable Variable Bit-Rate Look-ahead for QSV transcoding RESULTS
  • 21. © 2017 RESULTS Performance test 1: single 1080p transcoding
  • 22. © 2017 RESULTS Performance test 2: concurrent 10x1080p transcodings
  • 23. © 2017 RESULTS Performance test 3: concurrent 20x480p transcodings
  • 24. © 2017 The following graph show the power consumption of a full chassis (45 cartridges) over the last 2 weeks ~ 3700 W /45 = 82W per cartridge (SW only solution has a theoretical TDP of 240W) RESULTS Power consumption
  • 25. © 2017 RESULTS Quality Test SW transcoding 2 pass HW transcoding Average Gain SW vs HW 4k→ mp4_h264_aac_uhd 33 32 ~= 720p→ mp4_h264_aac_hd 44 44 = 1080p→ mp4_h264_aac_fhd 47 40 x1,2 movie_sample (1080p)→ mp4_h264_aac_fhd 44 44 = • Results are in PSNR units, higher is better
  • 26. © 2017 RESULTS Quality BW graph with Look-ahead enabled BW graph without Look-ahead enabled
  • 30. © 2017 • Pros • much… much faster for single transcoding (up to 12x faster) • power consumption is much lower • cheaper (more the 2.5 times cheaper per unit) • Cons • slower with multiple low-res tasks • Quality is not 100% as good, Look-ahead helps though CONCLUSION We built our new transcoding farm!
  • 31. © 2017 • Blog post on medium: • http://medium.com/dailymotion-engineering/hardware-assisted- video-transcoding-at-dailymotion-66cd2db448ae • SFFmpeg (static FFmpeg build): • https://github.com/pyke369/sffmpeg • Intel media SDK FFmpeg patches: • https://github.com/Intel-FFmpeg-Plugin/Intel_FFmpeg_plugins CONCLUSION Some references
  • 33. © 2017 Thank you Gilles Vieira gilles.vieira@dailymotion.com
  • 34. 1 RFP Challenge The quest to find our next DCs Mohamed Benazza Nicolas Pérez
  • 35. 2 RFP: Request For Proposal An RFP is a set of specifications that describe the sought-after solution, and evaluation criteria that disclose how proposals will be graded. (Margaret Rouse - https://goo.gl/uVHKqM)
  • 36. 3 Traffic 90+ Gbps Internet Traffic 1+ Tbps inter-DCs capacity Servers 26 000+ Servers 28 Pb Storage Growth in 2017 +2 Data centers +6 500 Servers +4 x 100G inter-DCs links Power 8+ Mega Watt (+/- 7000 homes for 1 year) Data Centers World Wide Criteo Global Footprint 2017-Q4
  • 37. 4 760 1019 1366 1831 2454 2080 1320 301 2017 2018 2019 2020 2021 0 1000 2000 3000 4000 5000 6000 7000 8000 TY5 #SRV to Add #SRV Site Capacity #SRV Total - Capacity Planning: - Organic Growth - New Projects - Corporate Strategy - Resilience - Hadoop
  • 38. 5 Q4 2017Q3 2017 RFP Process RFP Launch February 7th Vendor Award June 9th RFP Answers March 1st Contract review Project Launch BUILD Phase Data Center Infrastructure Ready Data Center Commissioning Hardware Procurement Cabling & Setup IP Transit & Leased Line Q&A February March April Shortlist Selection April 3rd May First Billing Offers Review Vendors Negotiations PO release & Contract signed August September Site visit Q2 2017 Hardware Setup+ SRE validation July Q1 2017 October November 4 months 6 months June December ETA for PO approval Timeline
  • 40. 7 RFP: Documentation Package Master document with background and planning Appendix 1 : Technical Requirement and Answering Grid Mandatory Requirements All questions will be shared on this file Administrative documentation
  • 42. 9 RFP: Planning and Master Document
  • 43. 10 RFP: Technical Requirement and Answering Grid
  • 45. 12
  • 46. 13 RFP: Results: Proposal Summary Vendor #7 is not Shortlisted Vendor #3 is Shortlisted
  • 49. 16
  • 50. 17 Y a plus qu’à …
  • 52. 1
  • 53. ACDC - AutomatiC DataCenter Felix Cantournet & Xavier Krantz 2017-11-07
  • 54. Agenda 1. Leboncoin 2. Historique 3. Remise en question 4. ACDC 5. Next 6. Rex 3
  • 56. 5
  • 57. 6
  • 58. 7
  • 59. 1.2 - Stack Technique 2 Datacenters 600 serveurs physiques (plus de 1000 avec les virtuels) 12 Gbits/s de débit sortant 6 To de BDD 8 300M d’images 15k req/s sur leboncoin.fr
  • 60. 1.2 - Stack Technique 2 Datacenters 600 serveurs physiques (plus de 1000 avec les virtuels) 12 Gbits/s de débit sortant 6 To de BDD 9 300M d’images 15k req/s sur leboncoin.fr
  • 62. 2.1 - Situation initiale 11
  • 63. 2.1 - Situation initiale ● 1 - Operator ○ find a free IP (Welcome ping !) ● 3 - Foreman ○ Go in Foreman and select a node ○ Get the @MAC ○ Create the node + put in build mode 12 ● 4 - Puppet ○ Reserve @Mac / DNS name in DHCP ○ Commit + push ○ Run the agent on every DHCP nodes ● 2 - Puppet ○ Reserve IP / DNS name in DNS ○ Commit + push ○ Run the agent on every DHCP nodes
  • 64. 2.1 - Situation initiale ● 5 - Foreman ○ Reboot the node via BMC plugin ● 7 - Operator ○ Follows with Java console 13 ● 6 - Node installs ○ Boot on network (PXE) ○ DHCP redirects to TFTP ○ TFTP serves the custom PXE config ○ Pressed is rendered by Foreman
  • 65. 2.1 - Situation initiale ● 5 - Foreman ○ Reboot the node via BMC plugin ● 7 - Operator ○ Follows with Java console 14 ● 6 - Node installs ○ Boot on network (PXE) ○ DHCP redirects to TFTP ○ TFTP serves the custom PXE config ○ Pressed is rendered by Foreman 6 manual steps Errors prone Human conflicts Time consuming
  • 66. 15
  • 67. 2.2 - Problématique ● Simplifier le provisioning bare metal ○ Provisioning / installation non-supervisée ○ 1 manual step 16
  • 68. 2.3 - Essai 1 - Foreman + SmartProxies Constat: Sous utilisation de Foreman. Solutions: Smart proxy pour automatiser : - IPAM + DHCP - DNS 17
  • 69. ● Foreman Smart-proxy ○ Not supported 2.3 - Essai 1 - Foreman + SmartProxies ● We ○ 1 big zone file ● Foreman Smart-proxy ○ Dynamic updates = nsupdate ○ Binary journal file + serial conflicts 18 ● We ○ Do nics bonding ○ Need to register n@Macs <> 1 IP Pain points: DNS Pain points: DHCP
  • 70. 2.3 - Essai 1 - Foreman + SmartProxies ● We ○ Do not master Ruby ○ Are not “a Tech company” ○ Are not that big ● Foreman & Smart-proxy ○ Very complex code base ○ Very complex UI ○ Generic and have a lots (too many) of features 19 Pain points: Foreman
  • 72. 3.1 - Interface avec prestataire Celeris : Prestataire interventions en DC ● Spreadsheet ● DCIM : Netbox ○ Open source ○ Digital Ocean ○ python + postgresql Intégration avec Foreman ? 21
  • 73. 3.2 - Overlap de solutions IPAM DCIMCMDB ??? 22
  • 74. Problématique 2 ● Automatiser la gestion du cycle de vie des machines physiques ○ Discovery/intake ○ Provisioning / installation non-supervisée ○ Maintenance, decommission 23
  • 75. Collins ● Project open source https://github.com/tumblr/collins ● Machine à état imposée ● Système de hook / callback arbitraire sur les transitions d’état ● Metadata key / value arbitraires associées à chaque assets ● Web UI + API http + firehose 24
  • 76. Collins: Tooling 25 API Clients ● Go-collins ● pycollins ● Ruby libs ○ collins-auth ○ collins-client ○ collins-notify ○ collins-state ○ ... CLI ● collins-shell
  • 79. Collins: Cycle de vie 28 Workflows spécifiés : - Intake - Comissionnement - Maintenance - Décomissionnement
  • 85. 4.3 - IPXE Router 34
  • 86. 4.4 - Collins callbacks 35 ● nowProvisioned ○ on = "asset_update" ○ When ■ previous.state = "isProvisioning" ■ && current.state = "isProvisioned" ● provisionEvent ○ on = "asset_update" ○ When ■ current.state = "isNew" ● unallocated ○ on = "asset_update" ○ When ■ current.state = "isUnallocated"
  • 88. 4.6 - Tooling 37 $ collins-shell INFO - ENV Variable COLLINS_CONFIG=/home/xkrantz/Sources/github.schibsted.io/leboncoin/acdc/conf/collins.yaml Tasks: collins-shell asset <command> # Asset related commands collins-shell asset_type <command> # Asset Type related commands collins-shell console # drop into the interactive collins shell collins-shell help [TASK] # Describe available tasks or one specific task collins-shell ip_address <command> # IP address related commands collins-shell ipmi <command> # IPMI related commands collins-shell latest # check if there is a newer version of collins-shell collins-shell log MESSAGE # log a message on an asset collins-shell logs TAG # fetch logs for an asset specified by its tag. Use "all" for a... collins-shell power ACTION --reason=REASON --tag=TAG # perform power action (off, on, rebootSoft, rebootHard, etc) o... collins-shell power_status # check power status on an asset collins-shell provision <command> # Provisioning related commands collins-shell search_logs QUERY # search for asset logs collins-shell state <command> # State management related commands - use with care collins-shell tag <command> # Tag related commands collins-shell version # current version of collins-shell
  • 90. 5 - Next ACDC v2 Rework ● Discovery ● OS bootstrapping Add ● Disk management ● Firmware updates ● Any maintenance tasks 39
  • 91. 5 - Next ACDC v2 Rework ● Discovery ● OS bootstrapping Add ● Disk management ● Firmware updates ● Any maintenance tasks Discovery ● Currently: ○ Genesis (Tumblr) ○ Ruby DSL (Chef like) ● Next: ○ CoreOS in Memory + Ansible 40
  • 92. 5 - Next ACDC v2 Rework ● Discovery ● OS bootstrapping Add ● Disk management ● Firmware updates ● Any maintenance tasks OS Bootstrapping ● Currently: ○ Pressed / Kickstart ○ Shell scripts ● Next: ○ CoreOS in Memory + Ansible 41
  • 93. 5.1 - Ansible jobs runner 42
  • 94. 5.1 - Ansible jobs runner 43
  • 95. 5.2 - Visualization & federation 44
  • 99. 20% projects are not enough REX 48
  • 100. Services & ownership transition (for Ops) REX 49