SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Downloaden Sie, um offline zu lesen
Provisioning Linux Servers Made Easy
Greg Bruno, PhD
VP Engineering, StackIQ
Open Source Stack Installer
Stacki is a very fast and ultra reliable Linux server provisioning tool … at scale.
With zero prerequisites for taking systems from bare metal to a ping and prompt.
Why is this hard and important?
Datacenter Architecture
Frontend
Network
Backend Backend Backend Backend
em1 em1
em1
em1
em1
Datacenter Host Software Stack
DevOps / Configuration Tool
DHCP /
DNS / TFTP
NetworkDiskOS
In-house
developed
deployment
tools
- Disk Array Controller Configuration
- Disk Partitioning Configuration
The “Step 0” Problem
Check namenodes are
empty
Format/start HDFS
Create all directories
Create all metastores
Start services (Hbase, Hive,
Oozie, Sqoop, Impala, etc)
Deploy client configuration Configure database
Setup/assign monitors
(activity, services, and host)
Test database connections
Validate/resolve hostnamesConsistent host timezones
No bad kernel versions
running
(CDH) version consistency
Java version consistency
Daemons versions
consistency
Mgmt Agents versions
consistency
Host specification/SSH
ports
MUCH MORE …
DHCP Server/Client setup TFTP/PXE configuration
Server OS installation
Node OS Install
RAID configuration
Boot configuration
System/data disk
partitioning
Monitoring system setup
and config
Lights Out/IPMI setup
User accounts added and
synced
SSH keys on all hosts
Network node configuration
Config Mgmt install and
configuration
Route configurationOS upgrades/updates
Site specific software and
configuration
Host specification/SSH
ports
Security
Firewall setupCluster Mgmt utility Database install and config
Multiple network configPackage installation MUCH MORE …
Clusters are Different
Adding new servers does require coordination
Newly added servers must:
•  Have same software stack as original
servers
•  Have same configuration as original
servers
•  Know about original servers
And, original servers must:
•  Know about new servers
Result: The management complexity added to the
Operations staff is “exponential”
Exponential Complexity
Number of Servers
ManagementComplexity
General Data Center
Clusters
The Pain Curve
Number of Servers
ManagementComplexity
General Data Center
Clusters
PAIN
The Pain Threshold
The pain threshold differs for every
organization
Function of:
•  cluster(s) size
•  number of people in Operations
•  Operations staff cluster expertise
Moore’s Law
50 1 2 3 4
8
1
2
3
4
5
6
7
Time (Years)
Density
18 month
doubling
Moore’s Law and Infrastructure Value
What it Means for You
50 1 2 3 4
100
0
10
20
30
40
50
60
70
80
90
Time (Years)
Value(%)
3 months
90% value
18 months
50% value
Time is Money
The clock starts ticking when hosts land on your
loading dock
Without your applications online, you have an
paper weight that consumes power, cooling, and
management’s attention
How We Solve the Problem
History
• San Diego Supercomputer Center
•  1986 - National Science Foundation
•  Along with NCSA only two non-classified centers
•  Mission: serve computational scientists
• Rocks
•  2000 - First cluster group inside SDSC
•  Version 1.0 released that November as open source
•  10k+ clusters world-wide
• StackIQ
•  2006 - Commercial support for Rocks
•  2011 - Venture Backed
•  Focus on next generation clustered systems (Data, Cloud)
• Stacki - 2015
•  June – released as open source
•  July – first hyper-scale user
Philosophy
 Make it – Automatic
◦  Think about it, test it. Deploy it.
◦  People don’t scale, software does. Free your people – allow ops guys to be ops/analysis guys, move them from single machine view to
global machine view.
 Make it – Repeatable
◦  State of the environment is guaranteed. Does not require homogeneity of hardware or functionality. Make compute environments
homogenous on heterogeneous hardware and software.
◦  Really, nothing is homogenous. Environment maybe, behavior of that environment on different machines while predictable will not be the
same across all hardware. Stacki gets you flexibility and predictability.
 Make it – Reliable
◦  You always get what you want when you want it. You can make reasonable estimates of need because you’ve made the environment
predictable and repeatable. Just like science!
 Make it – Comprehensive
◦  Manage application layer(s) down to kernels and device configuration with one tool. Never hit the network unconfigured.
◦  Provide turn-key deployment with reasonable default settings and ability to customize / re-wire as desired.
Stacki Positioning
DevOps / Configuration Tool
DHCP /
DNS / TFTP
NetworkDiskOS
In-house
developed
deployment
tools
- Disk Array Controller Configuration
- Disk Partitioning Configuration
Datacenter Architecture
Frontend
Network
Backend Backend Backend Backend
em1 em1
em1
em1
em1
Download and Boot the ISO
Go to www.stacki.com and download the ISO
◦  It’s 1.8 GB
◦  “stacki” pallet plus stripped down CentOS 6.6

Boot the ISO on the host that will be your frontend
Frontend Services
Services to build backend nodes
◦  DHCP
◦  TFTP
◦  Named (optional)
Services to access backend nodes
◦  SSH key management
◦  Parallel execution shell
Host Configuration Spreadsheet
Frontend
Network
Backend Backend Backend Backend
em1 em1
em1
em1
em1
Backend Installation
Save your Host Configuration spreadsheet as a CSV

Import CSV on frontend
◦  “stack load hostfile file=hosts.csv”

Tell backend nodes to install on their next PXE boot
◦  “stack set host boot backend action=install”

PXE boot all backend nodes

Done!
BitTorrent-Inspired Package Installation
Stacki
Customizing Your Hosts
Advanced Networking
Via Host Configuration spreadsheet, you can configure:
◦  Bonded interfaces
◦  VLANs
◦  Bridging
◦  Any combo of the above
Manage hosts in multiple subnets
◦  Build a single cluster from hosts in multiple subnets
◦  Manage hosts in multiple datacenters
Host Configuration Spreadsheet
Disk Controller Configuration Spreadsheet
Disk Partition Configuration Spreadsheet
Multiple Distributions
A frontend houses a default distribution
◦  Based on stripped down CentOS 6.6 or 7.1
◦  Used to build backend nodes

Can add any number of new distributions to a frontend
◦  E.g., RHEL 6.x based distro, CentOS 6.5, etc.
Assign any backend node to any distro
PayPal
Hadoop @ PayPal
12 x 2TB SATA
data drives	
48 nodes
each rack
1GBE-10GBE
NICs
24 x 900GB 6G SAS
10K data drives	
24 nodes
each rack
10GBE NIC
8 x 4TB NR-SAS
data drives	
10 GBE NIC
Bay	Area	
Salt	Lake	City	
Las	Vegas	
DATACENTERS
•  3,000 nodes and growing
•  60+ initial server racks
•  Heterogeneous HW
across multiple DCs
Data Science
Infrastructure Footprint	
48 nodes
each rack
Automation Challenge
Spinout creates some datacenter automation challenges …
•  Smaller team but even more to do
•  Rethink automation
•  Distributed systems have tons of local drives which require

time consuming disk formatting and partitioning, and hardware
RAID config on masternodes
•  New provisioning solution needs to easily, flexibly integrate

w/ other commercial, open source, and homegrown

management tools
•  Can 100s or 1000s of nodes be (re)provisioned as quickly as

one or a few? (e.g., drive failures mean replacing entire host

from O/S to disk to network to firmware to … etc)
Stacki @ PayPal
Ambari HDP
Health Detection
Integration
IPMI/iLOOS Disk Network
DHCP / DNS /
TFTP
Ansible
- Disk Array Controller Configuration
- Disk Partitioning Configuration
“Stacki + Ansible = Happiness. :D” – Stacki mailing list 8/11/15
Quick, Early Success
14 Minutes*To Fully Provision 6 Racks of Bare Metal (288 Servers)
Includes wiping all
disks then fully
partitioning & formatting
~3500 drives
And Now…
Upgrades all firmware
automatically
Executes Ansible
scripts on all hosts
Hadoop packages
installed
* Versus hours with other hyperscale management tools, or days to weeks with traditional tools and processes
Try It Out
stacki.com
Download - www.stacki.com
Source & Docs - github.com/StackIQ/stacki/wiki
Discuss - groups.google.com/forum/#!forum/stacki
PayPal’s Options
Bring what we used at former parent company eBay with us.
Build our own soups-to-nuts bespoke bare metal provisioning tool.
Find the perfect open source tool that we can use and grow with.
Not Possible
Not Optimal
Not Likely
Quick, Early Success
2 Weeks Instead of 2 Years
To Build a Scale-out Management Solution
1.  Installed Stacki Frontend (base management server)
Ran test installations of backend servers
1.  Single Server test
2.  Full Rack test (48 nodes)
2.  Updated distribution (CentOS 6.6) to install additional
packages
3.  Integrated IPMI information into Stacki
1.  Can now ssh into all IPMI consoles from the Stacki
frontend host using <hostname>.ipmi
4.  Re-ran with PayPal kickstart changes/additions and was
able to image 6 racks in 14 minutes, including:
1.  Nuking disks/partitions and running a full format of all
data drives
5.  Updated the Stacki post-boot piece to do the following:
1.  Upgrade firmware if host needs it
2.  Runs PayPal Ansible playbook, which:
1.  Installs additional packages
2.  Creates user accounts
3.  Disables unused services
4.  Sets up resolver/ntp/syslog-ng/sudoers/limits.
d/sysctl/etc.
5.  Installs/configures Ambari agents
6.  Checks data drive mounts, fstab
7.  Prepares the rack to be added to a Hadoop
cluster
PayPal development with Stacki includes:
DevOps Agnostic
DevOps / Configuration Tool
DHCP /
DNS / TFTP
NetworkDiskOS
In-house
developed
deployment
tools
- Disk Array Controller Configuration
- Disk Partitioning Configuration
The “Step 0” Problem
Check namenodes are
empty
Format/start HDFS
Create all directories
Create all metastores
Start services (Hbase, Hive,
Oozie, Sqoop, Impala, etc)
Deploy client configuration Configure database
Setup/assign monitors
(activity, services, and host)
Test database connections
Validate/resolve hostnamesConsistent host timezones
No bad kernel versions
running
(CDH) version consistency
Java version consistency
Daemons versions
consistency
Mgmt Agents versions
consistency
Host specification/SSH
ports
MUCH MORE …
DHCP Server/Client setup TFTP/PXE configuration
Server OS installation
Node OS Install
RAID configuration
Boot configuration
System/data disk
partitioning
Monitoring system setup
and config
Lights Out/IPMI setup
User accounts added and
synced
SSH keys on all hosts
Network node configuration
Config Mgmt install and
configuration
Route configurationOS upgrades/updates
Site specific software and
configuration
Host specification/SSH
ports
Security
Firewall setupCluster Mgmt utility Database install and config
Multiple network configPackage installation MUCH MORE …
App Config
Site Config
HW Install
System Performance
Validation
Bare Metal Installers
Hadoop Mgmt Tool
Upgrades/Patching
Disk Configuration
Monitoring Tool
Configuration Tool
Network/Site Config ToolsSystems Mgmt Tool
Others …
MANUAL
SEMI-AUTOMATED
TOOLCHAIN
(w/o StackIQ)
w/StackIQ
FULLY AUTOMATED
StackIQ Boss
Configuration Database
 Server appliance types (e.g. data, namenode, tomcat, …)
 Number of CPUs
 Disk partitioning
 Hardware RAID config
 PCI bus information
 …
 And other System Attributes
Attributes
 Global
◦  stack set attr
 Appliance
◦  stack set appliance attr
 OS
◦  stack set os attr
 Host
◦  stack set host attr
Kickstart Profiles
Zoom In
Starting from the Empty Set
  { }
{ os }
© 2009 UC Regents
{ os, core }
© 2009 UC Regents
{ os, core, kernel }
© 2009 UC Regents
{ os, core, kernel, mapr }
© 2009 UC Regents
Manage the Deltas
{os, core, kernel, mapr} {os, core, kernel, horton}
© 2009 UC Regents
stacki.com
 @masonkatz

Weitere ähnliche Inhalte

Was ist angesagt?

Implementing Secure Docker Environments At Scale by Ben Bernstein, Twistlock
Implementing Secure Docker Environments At Scale by Ben Bernstein, TwistlockImplementing Secure Docker Environments At Scale by Ben Bernstein, Twistlock
Implementing Secure Docker Environments At Scale by Ben Bernstein, Twistlock
Docker, Inc.
 

Was ist angesagt? (20)

Take an Analytics-driven Approach to Container Performance with Splunk for Co...
Take an Analytics-driven Approach to Container Performance with Splunk for Co...Take an Analytics-driven Approach to Container Performance with Splunk for Co...
Take an Analytics-driven Approach to Container Performance with Splunk for Co...
 
DockerCon EU 2015: Docker Universal Control Plane (Gordon's Special Session)
DockerCon EU 2015: Docker Universal Control Plane (Gordon's Special Session)DockerCon EU 2015: Docker Universal Control Plane (Gordon's Special Session)
DockerCon EU 2015: Docker Universal Control Plane (Gordon's Special Session)
 
DockerCon EU 2015: What's New with Docker Trusted Registry
DockerCon EU 2015: What's New with Docker Trusted RegistryDockerCon EU 2015: What's New with Docker Trusted Registry
DockerCon EU 2015: What's New with Docker Trusted Registry
 
Nginx conference 2015
Nginx conference 2015Nginx conference 2015
Nginx conference 2015
 
Moving Legacy Applications to Docker by Josh Ellithorpe, Apcera
Moving Legacy Applications to Docker by Josh Ellithorpe, Apcera Moving Legacy Applications to Docker by Josh Ellithorpe, Apcera
Moving Legacy Applications to Docker by Josh Ellithorpe, Apcera
 
Docker for Ops - Scott Coulton, Puppet
Docker for Ops - Scott Coulton, PuppetDocker for Ops - Scott Coulton, Puppet
Docker for Ops - Scott Coulton, Puppet
 
Taking Docker from Local to Production at Intuit JanJaap Lahpor, Intuit and H...
Taking Docker from Local to Production at Intuit JanJaap Lahpor, Intuit and H...Taking Docker from Local to Production at Intuit JanJaap Lahpor, Intuit and H...
Taking Docker from Local to Production at Intuit JanJaap Lahpor, Intuit and H...
 
DockerCon EU 2015: It's in the game: the path to micro-services at Electronic...
DockerCon EU 2015: It's in the game: the path to micro-services at Electronic...DockerCon EU 2015: It's in the game: the path to micro-services at Electronic...
DockerCon EU 2015: It's in the game: the path to micro-services at Electronic...
 
Windows container security
Windows container securityWindows container security
Windows container security
 
DockerCon EU 2015: Deploying and Managing Containers for Developers
DockerCon EU 2015: Deploying and Managing Containers for DevelopersDockerCon EU 2015: Deploying and Managing Containers for Developers
DockerCon EU 2015: Deploying and Managing Containers for Developers
 
A vision of persistence
A vision of persistenceA vision of persistence
A vision of persistence
 
Automated hardware testing using docker for space
Automated hardware testing using docker for spaceAutomated hardware testing using docker for space
Automated hardware testing using docker for space
 
Experiences with AWS immutable deploys and job processing
Experiences with AWS immutable deploys and job processingExperiences with AWS immutable deploys and job processing
Experiences with AWS immutable deploys and job processing
 
DockerCon EU 2015: Cultural Revolution - How to Mange the Change Docker Brings
DockerCon EU 2015: Cultural Revolution - How to Mange the Change Docker BringsDockerCon EU 2015: Cultural Revolution - How to Mange the Change Docker Brings
DockerCon EU 2015: Cultural Revolution - How to Mange the Change Docker Brings
 
The Next Generation Cloud: Unleashing the Power of the Unikernal
The Next Generation Cloud: Unleashing the Power of the UnikernalThe Next Generation Cloud: Unleashing the Power of the Unikernal
The Next Generation Cloud: Unleashing the Power of the Unikernal
 
Docker Enterprise Edition: Building a Secure Supply Chain for the Enterprise ...
Docker Enterprise Edition: Building a Secure Supply Chain for the Enterprise ...Docker Enterprise Edition: Building a Secure Supply Chain for the Enterprise ...
Docker Enterprise Edition: Building a Secure Supply Chain for the Enterprise ...
 
Oded Coster - Stack Overflow behind the scenes - how it's made - Codemotion M...
Oded Coster - Stack Overflow behind the scenes - how it's made - Codemotion M...Oded Coster - Stack Overflow behind the scenes - how it's made - Codemotion M...
Oded Coster - Stack Overflow behind the scenes - how it's made - Codemotion M...
 
DCSF 19 Data Center Networking with Containers
DCSF 19 Data Center Networking with ContainersDCSF 19 Data Center Networking with Containers
DCSF 19 Data Center Networking with Containers
 
Implementing Secure Docker Environments At Scale by Ben Bernstein, Twistlock
Implementing Secure Docker Environments At Scale by Ben Bernstein, TwistlockImplementing Secure Docker Environments At Scale by Ben Bernstein, Twistlock
Implementing Secure Docker Environments At Scale by Ben Bernstein, Twistlock
 
DCSF19 Deploying Istio as an Ingress Controller
DCSF19 Deploying Istio as an Ingress Controller DCSF19 Deploying Istio as an Ingress Controller
DCSF19 Deploying Istio as an Ingress Controller
 

Andere mochten auch

Pengantar Bisnis - Materi perkuliahan oleh ARS
Pengantar Bisnis - Materi perkuliahan oleh ARSPengantar Bisnis - Materi perkuliahan oleh ARS
Pengantar Bisnis - Materi perkuliahan oleh ARS
Andry R Sukma
 

Andere mochten auch (12)

Alan's Report2
Alan's Report2Alan's Report2
Alan's Report2
 
Climate
ClimateClimate
Climate
 
ІННОВАЦІЙНІ ПЕРЕТВОРЕННЯ ВИЩОЇ ОСВІТИ В УКРАЇНІ
ІННОВАЦІЙНІ ПЕРЕТВОРЕННЯ ВИЩОЇ ОСВІТИ В УКРАЇНІІННОВАЦІЙНІ ПЕРЕТВОРЕННЯ ВИЩОЇ ОСВІТИ В УКРАЇНІ
ІННОВАЦІЙНІ ПЕРЕТВОРЕННЯ ВИЩОЇ ОСВІТИ В УКРАЇНІ
 
Performance Profiling Tools & Tricks
Performance Profiling Tools & TricksPerformance Profiling Tools & Tricks
Performance Profiling Tools & Tricks
 
Factores de riesgo ergonomicos
Factores de riesgo ergonomicosFactores de riesgo ergonomicos
Factores de riesgo ergonomicos
 
CUNY Commons Tour
CUNY Commons TourCUNY Commons Tour
CUNY Commons Tour
 
Continuos integration with Jenkins for iOS | SuperSpeakers@CodeCamp Iasi, 2014
Continuos integration with Jenkins for iOS | SuperSpeakers@CodeCamp Iasi, 2014Continuos integration with Jenkins for iOS | SuperSpeakers@CodeCamp Iasi, 2014
Continuos integration with Jenkins for iOS | SuperSpeakers@CodeCamp Iasi, 2014
 
Security From The Big Data and Analytics Perspective
Security From The Big Data and Analytics PerspectiveSecurity From The Big Data and Analytics Perspective
Security From The Big Data and Analytics Perspective
 
How Companies can Effectively Work with Open Source Communities
How Companies can Effectively Work with Open Source CommunitiesHow Companies can Effectively Work with Open Source Communities
How Companies can Effectively Work with Open Source Communities
 
行政院會簡報:經濟部水利署:2017水情分析及因應作為
行政院會簡報:經濟部水利署:2017水情分析及因應作為行政院會簡報:經濟部水利署:2017水情分析及因應作為
行政院會簡報:經濟部水利署:2017水情分析及因應作為
 
Kashmir Conflict
Kashmir  ConflictKashmir  Conflict
Kashmir Conflict
 
Pengantar Bisnis - Materi perkuliahan oleh ARS
Pengantar Bisnis - Materi perkuliahan oleh ARSPengantar Bisnis - Materi perkuliahan oleh ARS
Pengantar Bisnis - Materi perkuliahan oleh ARS
 

Ähnlich wie Provisioning Servers Made Easy

Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
DataWorks Summit
 

Ähnlich wie Provisioning Servers Made Easy (20)

Introduction to Stacki - World's fastest Linux server provisioning Tool
Introduction to Stacki - World's fastest Linux server provisioning ToolIntroduction to Stacki - World's fastest Linux server provisioning Tool
Introduction to Stacki - World's fastest Linux server provisioning Tool
 
Introduction to Stacki at Atlanta Meetup February 2016
Introduction to Stacki at Atlanta Meetup February 2016Introduction to Stacki at Atlanta Meetup February 2016
Introduction to Stacki at Atlanta Meetup February 2016
 
Stacki at the Seattle Scalability Meetup
Stacki at the Seattle Scalability MeetupStacki at the Seattle Scalability Meetup
Stacki at the Seattle Scalability Meetup
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
DR_PRESENT 1
DR_PRESENT 1DR_PRESENT 1
DR_PRESENT 1
 
TryStack: A Sandbox for OpenStack Users and Admins
TryStack: A Sandbox for OpenStack Users and AdminsTryStack: A Sandbox for OpenStack Users and Admins
TryStack: A Sandbox for OpenStack Users and Admins
 
StackiFest16: How PayPal got a 300 Nodes up in 14 minutes - Greg Bruno
StackiFest16: How PayPal got a 300 Nodes up in 14 minutes - Greg BrunoStackiFest16: How PayPal got a 300 Nodes up in 14 minutes - Greg Bruno
StackiFest16: How PayPal got a 300 Nodes up in 14 minutes - Greg Bruno
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
 
Why OpenStack on UCS? An Introduction to Red Hat and Cisco OpenStack Solution
Why OpenStack on UCS? An Introduction to Red Hat and Cisco OpenStack SolutionWhy OpenStack on UCS? An Introduction to Red Hat and Cisco OpenStack Solution
Why OpenStack on UCS? An Introduction to Red Hat and Cisco OpenStack Solution
 
Public vs. Private Cloud Performance by Flex
Public vs. Private Cloud Performance by FlexPublic vs. Private Cloud Performance by Flex
Public vs. Private Cloud Performance by Flex
 
Java ee7 with apache spark for the world's largest credit card core systems, ...
Java ee7 with apache spark for the world's largest credit card core systems, ...Java ee7 with apache spark for the world's largest credit card core systems, ...
Java ee7 with apache spark for the world's largest credit card core systems, ...
 
Accelerate Your OpenStack Deployment Presented by SolidFire and Red Hat
Accelerate Your OpenStack Deployment Presented by SolidFire and Red HatAccelerate Your OpenStack Deployment Presented by SolidFire and Red Hat
Accelerate Your OpenStack Deployment Presented by SolidFire and Red Hat
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
Containers for grownups migrating traditional &amp; existing applications[1...
Containers for grownups   migrating traditional &amp; existing applications[1...Containers for grownups   migrating traditional &amp; existing applications[1...
Containers for grownups migrating traditional &amp; existing applications[1...
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and docker
 
Critical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency DatabaseCritical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency Database
 

Mehr von All Things Open

Open Source and Public Policy
Open Source and Public PolicyOpen Source and Public Policy
Open Source and Public Policy
All Things Open
 
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
All Things Open
 
How to Write & Deploy a Smart Contract
How to Write & Deploy a Smart ContractHow to Write & Deploy a Smart Contract
How to Write & Deploy a Smart Contract
All Things Open
 
Scaling Web Applications with Background
Scaling Web Applications with BackgroundScaling Web Applications with Background
Scaling Web Applications with Background
All Things Open
 
Build Developer Experience Teams for Open Source
Build Developer Experience Teams for Open SourceBuild Developer Experience Teams for Open Source
Build Developer Experience Teams for Open Source
All Things Open
 
Sudo – Giving access while staying in control
Sudo – Giving access while staying in controlSudo – Giving access while staying in control
Sudo – Giving access while staying in control
All Things Open
 
Fortifying the Future: Tackling Security Challenges in AI/ML Applications
Fortifying the Future: Tackling Security Challenges in AI/ML ApplicationsFortifying the Future: Tackling Security Challenges in AI/ML Applications
Fortifying the Future: Tackling Security Challenges in AI/ML Applications
All Things Open
 
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
All Things Open
 

Mehr von All Things Open (20)

Building Reliability - The Realities of Observability
Building Reliability - The Realities of ObservabilityBuilding Reliability - The Realities of Observability
Building Reliability - The Realities of Observability
 
Modern Database Best Practices
Modern Database Best PracticesModern Database Best Practices
Modern Database Best Practices
 
Open Source and Public Policy
Open Source and Public PolicyOpen Source and Public Policy
Open Source and Public Policy
 
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
 
The State of Passwordless Auth on the Web - Phil Nash
The State of Passwordless Auth on the Web - Phil NashThe State of Passwordless Auth on the Web - Phil Nash
The State of Passwordless Auth on the Web - Phil Nash
 
Total ReDoS: The dangers of regex in JavaScript
Total ReDoS: The dangers of regex in JavaScriptTotal ReDoS: The dangers of regex in JavaScript
Total ReDoS: The dangers of regex in JavaScript
 
What Does Real World Mass Adoption of Decentralized Tech Look Like?
What Does Real World Mass Adoption of Decentralized Tech Look Like?What Does Real World Mass Adoption of Decentralized Tech Look Like?
What Does Real World Mass Adoption of Decentralized Tech Look Like?
 
How to Write & Deploy a Smart Contract
How to Write & Deploy a Smart ContractHow to Write & Deploy a Smart Contract
How to Write & Deploy a Smart Contract
 
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 
DEI Challenges and Success
DEI Challenges and SuccessDEI Challenges and Success
DEI Challenges and Success
 
Scaling Web Applications with Background
Scaling Web Applications with BackgroundScaling Web Applications with Background
Scaling Web Applications with Background
 
Supercharging tutorials with WebAssembly
Supercharging tutorials with WebAssemblySupercharging tutorials with WebAssembly
Supercharging tutorials with WebAssembly
 
Using SQL to Find Needles in Haystacks
Using SQL to Find Needles in HaystacksUsing SQL to Find Needles in Haystacks
Using SQL to Find Needles in Haystacks
 
Configuration Security as a Game of Pursuit Intercept
Configuration Security as a Game of Pursuit InterceptConfiguration Security as a Game of Pursuit Intercept
Configuration Security as a Game of Pursuit Intercept
 
Scaling an Open Source Sponsorship Program
Scaling an Open Source Sponsorship ProgramScaling an Open Source Sponsorship Program
Scaling an Open Source Sponsorship Program
 
Build Developer Experience Teams for Open Source
Build Developer Experience Teams for Open SourceBuild Developer Experience Teams for Open Source
Build Developer Experience Teams for Open Source
 
Deploying Models at Scale with Apache Beam
Deploying Models at Scale with Apache BeamDeploying Models at Scale with Apache Beam
Deploying Models at Scale with Apache Beam
 
Sudo – Giving access while staying in control
Sudo – Giving access while staying in controlSudo – Giving access while staying in control
Sudo – Giving access while staying in control
 
Fortifying the Future: Tackling Security Challenges in AI/ML Applications
Fortifying the Future: Tackling Security Challenges in AI/ML ApplicationsFortifying the Future: Tackling Security Challenges in AI/ML Applications
Fortifying the Future: Tackling Security Challenges in AI/ML Applications
 
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

Provisioning Servers Made Easy

  • 1. Provisioning Linux Servers Made Easy Greg Bruno, PhD VP Engineering, StackIQ
  • 2.
  • 3.
  • 4. Open Source Stack Installer Stacki is a very fast and ultra reliable Linux server provisioning tool … at scale. With zero prerequisites for taking systems from bare metal to a ping and prompt.
  • 5. Why is this hard and important?
  • 6. Datacenter Architecture Frontend Network Backend Backend Backend Backend em1 em1 em1 em1 em1
  • 7. Datacenter Host Software Stack DevOps / Configuration Tool DHCP / DNS / TFTP NetworkDiskOS In-house developed deployment tools - Disk Array Controller Configuration - Disk Partitioning Configuration
  • 8. The “Step 0” Problem Check namenodes are empty Format/start HDFS Create all directories Create all metastores Start services (Hbase, Hive, Oozie, Sqoop, Impala, etc) Deploy client configuration Configure database Setup/assign monitors (activity, services, and host) Test database connections Validate/resolve hostnamesConsistent host timezones No bad kernel versions running (CDH) version consistency Java version consistency Daemons versions consistency Mgmt Agents versions consistency Host specification/SSH ports MUCH MORE … DHCP Server/Client setup TFTP/PXE configuration Server OS installation Node OS Install RAID configuration Boot configuration System/data disk partitioning Monitoring system setup and config Lights Out/IPMI setup User accounts added and synced SSH keys on all hosts Network node configuration Config Mgmt install and configuration Route configurationOS upgrades/updates Site specific software and configuration Host specification/SSH ports Security Firewall setupCluster Mgmt utility Database install and config Multiple network configPackage installation MUCH MORE …
  • 9. Clusters are Different Adding new servers does require coordination Newly added servers must: •  Have same software stack as original servers •  Have same configuration as original servers •  Know about original servers And, original servers must: •  Know about new servers Result: The management complexity added to the Operations staff is “exponential”
  • 10. Exponential Complexity Number of Servers ManagementComplexity General Data Center Clusters
  • 11. The Pain Curve Number of Servers ManagementComplexity General Data Center Clusters PAIN
  • 12. The Pain Threshold The pain threshold differs for every organization Function of: •  cluster(s) size •  number of people in Operations •  Operations staff cluster expertise
  • 13. Moore’s Law 50 1 2 3 4 8 1 2 3 4 5 6 7 Time (Years) Density 18 month doubling
  • 14. Moore’s Law and Infrastructure Value
  • 15. What it Means for You 50 1 2 3 4 100 0 10 20 30 40 50 60 70 80 90 Time (Years) Value(%) 3 months 90% value 18 months 50% value
  • 16. Time is Money The clock starts ticking when hosts land on your loading dock Without your applications online, you have an paper weight that consumes power, cooling, and management’s attention
  • 17. How We Solve the Problem
  • 18. History • San Diego Supercomputer Center •  1986 - National Science Foundation •  Along with NCSA only two non-classified centers •  Mission: serve computational scientists • Rocks •  2000 - First cluster group inside SDSC •  Version 1.0 released that November as open source •  10k+ clusters world-wide • StackIQ •  2006 - Commercial support for Rocks •  2011 - Venture Backed •  Focus on next generation clustered systems (Data, Cloud) • Stacki - 2015 •  June – released as open source •  July – first hyper-scale user
  • 19. Philosophy  Make it – Automatic ◦  Think about it, test it. Deploy it. ◦  People don’t scale, software does. Free your people – allow ops guys to be ops/analysis guys, move them from single machine view to global machine view.  Make it – Repeatable ◦  State of the environment is guaranteed. Does not require homogeneity of hardware or functionality. Make compute environments homogenous on heterogeneous hardware and software. ◦  Really, nothing is homogenous. Environment maybe, behavior of that environment on different machines while predictable will not be the same across all hardware. Stacki gets you flexibility and predictability.  Make it – Reliable ◦  You always get what you want when you want it. You can make reasonable estimates of need because you’ve made the environment predictable and repeatable. Just like science!  Make it – Comprehensive ◦  Manage application layer(s) down to kernels and device configuration with one tool. Never hit the network unconfigured. ◦  Provide turn-key deployment with reasonable default settings and ability to customize / re-wire as desired.
  • 20. Stacki Positioning DevOps / Configuration Tool DHCP / DNS / TFTP NetworkDiskOS In-house developed deployment tools - Disk Array Controller Configuration - Disk Partitioning Configuration
  • 21. Datacenter Architecture Frontend Network Backend Backend Backend Backend em1 em1 em1 em1 em1
  • 22. Download and Boot the ISO Go to www.stacki.com and download the ISO ◦  It’s 1.8 GB ◦  “stacki” pallet plus stripped down CentOS 6.6 Boot the ISO on the host that will be your frontend
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29. Frontend Services Services to build backend nodes ◦  DHCP ◦  TFTP ◦  Named (optional) Services to access backend nodes ◦  SSH key management ◦  Parallel execution shell
  • 31. Frontend Network Backend Backend Backend Backend em1 em1 em1 em1 em1 Backend Installation Save your Host Configuration spreadsheet as a CSV Import CSV on frontend ◦  “stack load hostfile file=hosts.csv” Tell backend nodes to install on their next PXE boot ◦  “stack set host boot backend action=install” PXE boot all backend nodes Done!
  • 34. Advanced Networking Via Host Configuration spreadsheet, you can configure: ◦  Bonded interfaces ◦  VLANs ◦  Bridging ◦  Any combo of the above Manage hosts in multiple subnets ◦  Build a single cluster from hosts in multiple subnets ◦  Manage hosts in multiple datacenters
  • 38. Multiple Distributions A frontend houses a default distribution ◦  Based on stripped down CentOS 6.6 or 7.1 ◦  Used to build backend nodes Can add any number of new distributions to a frontend ◦  E.g., RHEL 6.x based distro, CentOS 6.5, etc. Assign any backend node to any distro
  • 40. Hadoop @ PayPal 12 x 2TB SATA data drives 48 nodes each rack 1GBE-10GBE NICs 24 x 900GB 6G SAS 10K data drives 24 nodes each rack 10GBE NIC 8 x 4TB NR-SAS data drives 10 GBE NIC Bay Area Salt Lake City Las Vegas DATACENTERS •  3,000 nodes and growing •  60+ initial server racks •  Heterogeneous HW across multiple DCs Data Science Infrastructure Footprint 48 nodes each rack
  • 41. Automation Challenge Spinout creates some datacenter automation challenges … •  Smaller team but even more to do •  Rethink automation •  Distributed systems have tons of local drives which require
 time consuming disk formatting and partitioning, and hardware RAID config on masternodes •  New provisioning solution needs to easily, flexibly integrate
 w/ other commercial, open source, and homegrown
 management tools •  Can 100s or 1000s of nodes be (re)provisioned as quickly as
 one or a few? (e.g., drive failures mean replacing entire host
 from O/S to disk to network to firmware to … etc)
  • 42. Stacki @ PayPal Ambari HDP Health Detection Integration IPMI/iLOOS Disk Network DHCP / DNS / TFTP Ansible - Disk Array Controller Configuration - Disk Partitioning Configuration “Stacki + Ansible = Happiness. :D” – Stacki mailing list 8/11/15
  • 43. Quick, Early Success 14 Minutes*To Fully Provision 6 Racks of Bare Metal (288 Servers) Includes wiping all disks then fully partitioning & formatting ~3500 drives And Now… Upgrades all firmware automatically Executes Ansible scripts on all hosts Hadoop packages installed * Versus hours with other hyperscale management tools, or days to weeks with traditional tools and processes
  • 45. stacki.com Download - www.stacki.com Source & Docs - github.com/StackIQ/stacki/wiki Discuss - groups.google.com/forum/#!forum/stacki
  • 46. PayPal’s Options Bring what we used at former parent company eBay with us. Build our own soups-to-nuts bespoke bare metal provisioning tool. Find the perfect open source tool that we can use and grow with. Not Possible Not Optimal Not Likely
  • 47. Quick, Early Success 2 Weeks Instead of 2 Years To Build a Scale-out Management Solution 1.  Installed Stacki Frontend (base management server) Ran test installations of backend servers 1.  Single Server test 2.  Full Rack test (48 nodes) 2.  Updated distribution (CentOS 6.6) to install additional packages 3.  Integrated IPMI information into Stacki 1.  Can now ssh into all IPMI consoles from the Stacki frontend host using <hostname>.ipmi 4.  Re-ran with PayPal kickstart changes/additions and was able to image 6 racks in 14 minutes, including: 1.  Nuking disks/partitions and running a full format of all data drives 5.  Updated the Stacki post-boot piece to do the following: 1.  Upgrade firmware if host needs it 2.  Runs PayPal Ansible playbook, which: 1.  Installs additional packages 2.  Creates user accounts 3.  Disables unused services 4.  Sets up resolver/ntp/syslog-ng/sudoers/limits. d/sysctl/etc. 5.  Installs/configures Ambari agents 6.  Checks data drive mounts, fstab 7.  Prepares the rack to be added to a Hadoop cluster PayPal development with Stacki includes:
  • 48. DevOps Agnostic DevOps / Configuration Tool DHCP / DNS / TFTP NetworkDiskOS In-house developed deployment tools - Disk Array Controller Configuration - Disk Partitioning Configuration
  • 49. The “Step 0” Problem Check namenodes are empty Format/start HDFS Create all directories Create all metastores Start services (Hbase, Hive, Oozie, Sqoop, Impala, etc) Deploy client configuration Configure database Setup/assign monitors (activity, services, and host) Test database connections Validate/resolve hostnamesConsistent host timezones No bad kernel versions running (CDH) version consistency Java version consistency Daemons versions consistency Mgmt Agents versions consistency Host specification/SSH ports MUCH MORE … DHCP Server/Client setup TFTP/PXE configuration Server OS installation Node OS Install RAID configuration Boot configuration System/data disk partitioning Monitoring system setup and config Lights Out/IPMI setup User accounts added and synced SSH keys on all hosts Network node configuration Config Mgmt install and configuration Route configurationOS upgrades/updates Site specific software and configuration Host specification/SSH ports Security Firewall setupCluster Mgmt utility Database install and config Multiple network configPackage installation MUCH MORE … App Config Site Config HW Install System Performance Validation Bare Metal Installers Hadoop Mgmt Tool Upgrades/Patching Disk Configuration Monitoring Tool Configuration Tool Network/Site Config ToolsSystems Mgmt Tool Others … MANUAL SEMI-AUTOMATED TOOLCHAIN (w/o StackIQ) w/StackIQ FULLY AUTOMATED
  • 51. Configuration Database  Server appliance types (e.g. data, namenode, tomcat, …)  Number of CPUs  Disk partitioning  Hardware RAID config  PCI bus information  …  And other System Attributes
  • 52. Attributes  Global ◦  stack set attr  Appliance ◦  stack set appliance attr  OS ◦  stack set os attr  Host ◦  stack set host attr
  • 55. Starting from the Empty Set   { }
  • 56. { os } © 2009 UC Regents
  • 57. { os, core } © 2009 UC Regents
  • 58. { os, core, kernel } © 2009 UC Regents
  • 59. { os, core, kernel, mapr } © 2009 UC Regents
  • 60. Manage the Deltas {os, core, kernel, mapr} {os, core, kernel, horton} © 2009 UC Regents