This crash course is designed to give an overview of cloud computing architecture and the open source software that can be used to deploy and manage a cloud computing environment.
Topics to be discussed in this session will include virtualization (KVM, LXC, and Xen Project), orchestration (Apache CloudStack, Eucalyptus, Open Nebula, and OpenStack), and storage (GlusterFS, Ceph, and others). The talk will also provide insight into how to deliver Platform-as-a-Service (PaaS) and what technologies can be used to compliment this evolving cloud computing paradigm.
Systems administrators and IT generalists will leave the discussion with a general overview of the options at their disposal to effectively build and manage their own cloud computing environments using free and open source software and understand the capabilities and benefits of a host of technologies.
[Updated with new Docker projects]
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Fossetcon: Crash Course on Open Source Cloud Computing
1. Crash Course In Open Source
Cloud Computing
Mark Hinkle
Senior Director, Open Source Solutions
Citrix Inc.
mark.hinkle@citrix.com
mrhinkle@gmail.com
@mrhinkle
Last updated: 9/11/2014
2. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
ABOUT ME
I Help Build Open Source Ecosystems
Open Source Experience
• Manage Citrix Open Source Business Office
• Apache CloudStack Committer and PMC Member
• Advisory boards Gluster and Xen Project
• Joined Citrix via Cloud.com acquisition July 2011
• Zenoss Core open source project to 100,000 users,
1.5 million downloads
• Former LinuxWorld Magazine Editor-in-Chief
• Open Management Consortium organizer
• Author - “Windows to Linux Business Desktop
Migration” – Thomson
• NetDirector Project - Open Source Configuration
Management
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
3. http://www.slideshare.net/socializedsoftware
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes
were made. You may do so in any reasonable manner, but not in any way that suggests the licensor
endorses you or your use.
ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions
under the same license as the original.
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Slides Available on Slideshare:
Creative Commons Attributions-ShareAlike 4.0 International
Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material
for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms.
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
4. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
AGENDA
• Vetting Open Source Cloud Projects
• “What is Cloud” in 60 Seconds
• Virtualization
• Infrastructure-as-a-Service
• Platform-as-a-Service
• SDN
• Open Source for the Amazon Web Services
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
5. • Code Velocity
• Committers
• Committer Reputation
• User-driven or Vendor-Driven
Innovation
• User Activity
• Corporate Support*
• Reputation of Foundation*
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
VETTING OPEN SOURCE
HPowR caOn yJouE telCl if TtheSy’re Legit
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
7. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
60 SECOND CLOUD DEFINITION
Just because Software Marketing Guys Think it’s the Internet
5 CHARACTERISTICS OF CLOUD
1. On-Demand Self-Service
2. Broad Network Access
3. Resource Pooling
4. Rapid Elasticity
5. Measured Service
User Cloud a.k.a.
SOFTWARE-AS-A-SERVICE
Developer Cloud a.k.a.
PLATFORM-AS-A-SERVICE
Systems Cloud a.k.a.
INFRASTRUCTURE-AS-A-SERVICE
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
8. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
SCALE-UP SCALE OUT
Elasticity and the cloud
Vertical Scaling (Scale-Up)
Allocate additional resources to
VMs, requires a reboot, no need for
distributed app logic, single-point of
OS failure
Horizontal Scaling (Scale-Out)
Application needs logic to work in
distributed fashion (e.g. HA-Proxy
and Apache Hadoop)
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
9. HYPERVISORS AND CONTAINERS
Differences in virtualization
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Type 1 Hypervisors
VMware, Xen Project, Hyper-V
Type 2 Hypervisors
KVM, VirtualBox
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
Containers
LXC
10. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
VIRTUALIZATION
Carving up compute resources
OPEN SOURCE
• Xen Project
• Citrix XenServer
• KVM
• VirtualBox
• OpenVZ
• LXC
PROPRIETARY
• VMware
• Microsoft Hyper-V
• OracleVM (Based on Xen Project)
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
11. OPEN VIRTUALIZATION FORMATS
Virtualization Payloads
Formats for hypervisors/cloud
technologies:
• Amazon - AMI
• KVM – QCOW2
• VMware – VMDK
• Xen Project– IMG
• Hyper-V - VHD – Virtual Hard Disk
• LXC – local file system/mount point -
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Open Virtualization
Format (OVF) is an
open standard for
packaging and
distributing virtual
appliances or more
generally software to
be run in virtual
machines.
Docker*
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
12. • Lets your run a Linux system within
• A container is a group of processes on a
Linux box, put together the provide an
isolated environment
• From the inside, it looks like a VM
• Externally it looks like normal processes
• “chroot on steroids”
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
LINUX CONTAINERS (LXC)
“Lightweight” Linux Virtualization
another Linux system
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
13. • Different file formats for virtual machines
• VMware uses vmdk file format, Xen and Hyper-
V use VHD, KVM uses Raw or QCOW2
• Guest images may be “processor architecture”
• VMware and Xen can manage SCSI devices, but
• KVM and Xen can use virtio drivers but not
• VMware uses a proprietary agent inside the
guest OS (VMware tools) which does not work
with Xen or KVM
• Xen uses VirtIo and ParaVirtualized drivers, Xen
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
THE PORTABILITY PROBLEM
Containers compared to Hardware Virtualization
bound
KVM cannot
VMware
uses
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
14. • Code – Application is stored
• Build – Code is built (Jenkins)
• Test – Unit tests are
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
CONTINUOUS INTEGRATION
Rebuild Applications on any Cloud and/or Virtualized Infrastructure
in a repository
(Subversion,Git)
automated (Jenkins)
• Deploy – Deploy code to
server various ways
Code
Build
Test
Deploy
Thoughtworks Go – Open Source
Continuous Deliver System
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
15. PACKER MULTIPLATFORM VM
CREATION
Packer is easy to use and automates the
creation of any type of machine image. It
embraces modern configuration
management by encouraging you to use
automated scripts to install and configure
the software within your Packer-made
images.
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
To learn more please visit:
www.packer.io
Open source Automation for VMs
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
16. DOCKER CONTAINER PACKAGING
Open source LXC Packaging Engine
Docker is an open-source project to easily
create lightweight, portable, self-sufficient
containers from any application. The same
container that a developer builds and tests
on a laptop can run at scale, in production,
on VMs, bare metal, public clouds and
more.
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
To learn more please visit:
www.docker.io
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
17. • Compliment to LXC not a replacement
• Managed daemonized processes on Linux
• Create ability to re-use and manage similar
• Content agnostic
• Hardware agnostic
• Easy to automate
• Integrated with other tools: Chef, OpenShift,
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
WHAT IS DOCKER
System for Managing and Deploying LXC Containers
using LXC
applications
Puppet, VMware, etc.
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
18. DOCKER’S GROWING
ECOSYSTEM
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
19. Kubernetes builds on top of Docker to
construct a clustered container scheduling
service. Kubernetes enables users to ask
a cluster to run a set of containers. The
system will automatically pick worker
nodes to run those containers on, which
we think of more as "scheduling" than
"orchestration”
To learn more please visit:
https://github.com/GoogleCloudPlatform/kubernetes Greek for Shipmaster
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
KUBERNETES
Container Cluster Management – Scheduler
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
20. DOCKER RELATED
PROJECTS
• Fig -Fast, isolated development environments
• Flynn - Next-generation application platform
• Panamax – Drag-and-Drop Docker Containerization
• Project Atomic – JEOS designed to run Docker
containers
• Weave – The Docker Network
• 13,000+ Docker-related repos on Github
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
21. Apache Mesos is a cluster manager that simplifies the
complexity of running applications on a shared pool of
servers. Largely supported by Twitter, used by LinkedIn,
AirBNB too.
Features
• Fault-tolerant replicated master using ZooKeeper
• Scalability to 10,000s of nodes
• Isolation between tasks with Linux Containers
• Multi-resource scheduling (memory and CPU aware)
• Java, Python and C++ APIs for developing new
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
APACHE MESOS
One to many tools for managing large numbers of devices
parallel applications
• Web UI for viewing cluster state
To learn more please visit:
http://mesos.apache.org/
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
22. ZooKeeper is a centralized service for
maintaining configuration information,
naming, providing distributed
synchronization, and providing group
services. All of these kinds of services
are used in some form or another by
distributed applications
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
APACHE ZOOKEEPER
Centralized Server to Service Distributed Apps
To learn more please visit:
http://zookeeper.apache.org/
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
22
23. INFRASTRUCTURE-AS-A-SERVICE
Compute Orchestration
Project Year Started License Virtualization
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
Technologies
Apache
CloudStack
2008 Apache (Bare Metal), Xenserver,
KVM, LXC VMware Hyper-
V
Eucalyptus 2006 GPL Xen, KVM, VMware
(commercial version)
OpenNebula 2005 Apache Xen, KVM, VMware
OpenStack 2010 (Developed by
NASA by Anso Labs
previously)
Apache VMware ESX and ESXi, ,
Xen, XenServer, KVM,
LXC, QEMU and Virtual
Box
24. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
OPENSTACK
The Boy Band of the Open Source Cloud
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
25. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
OPENSTACK SHARED
SSpaEn CRomVputIe,C StoEragSe and Networking
IDENTITY
SERVICE
IMAGE
SERVICE
TELEMETRY
SERVICE
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
ORCHESTRATION
SERVICE
26. EVEN MORE OPENSTACK PROJECTS
Span Compute, Storage and Networking
• Trove
Database Service
• Ironic
Bare Metal (Ironic)
• Marconi
Queue Service
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
• Cinder
Block Storage Service
• Ceilometer
Metering/Monitoring
• Heat
Orchestration
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
27. OPENSTACK SOLUTION PROVIDERS
If you can’t do it yourself
“OpenStack is not a product. If you are building a large infrastructure, it’s
more like a tool kit. It gives you a lot of technologies that do take a lot of
effort to integrate.”
Chris Kemp, OpenStack Board Member and Co-Founder
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
CEO of Piston Computing
28. • Deltacloud(ruby)
• Daisein(java)
• Jclouds(java)
• Libcloud(python)
• Fog(ruby)
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
CLOUD APIS
Everything (should) have an API in the Cloud
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
29. Project Description
Ceph Distributed file storage system developed by DreamHost ->
GlusterFS Scale Out NAS system aggregating storage over Ethernet or
Riak CS Riak CS is open source software designed to provide simple,
available, distributed cloud storage at any scale. Riak CS is S3-
API compatible and supports per-tenant reporting for billing and
metering use cases. (object)
Sheepdog Distributed storage for KVM hypervisors, distributed iSCSI
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
CLOUD STORAGE
Virtualized, Distributed usually on Commodity Hardware
InkTank -> Red Hat (block, object, file)
Infiniband (file)
OpenStack
Storage
Long-term object storage system (object)
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
30. Project Sponsors Languages/Frameworks
Spring for Java, Ruby for Rails and
Sinatra, node.js, Grails, Scala on
Lift and more via partners (e.g.
Python, PHP)
Cloudify Gigaspaces [Groovy for deployment recipes]
OpenShift Origin Red Hat Java, Ruby, PHP, Perl and Python
Apache Stratos WSO2 - >Apache Stratus PHP, Tomcat, MySQL “cartridges”
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
PLATFORM-AS-A-SERVICE
Abstracted Cloud-Scale Run-Time Environments
CloudFoundry VMware -> Pivotal -> CloudFoundry
Foundation
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
31. Decoupling of the control and data planes of the network to
improve efficiency. Communication from a SDN controller via a
protocol to network devices both physical and virtual.
Abstractions allow for programmable networks.
Network can be changed quickly via a controller
Network offerings can match virtualization offerings for finer
grained security in a highly volatile compute landscape.
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
SOFTWARE DEFINED
VNirtuEalizTatiWon mOeetRs thKe neItwNorkG(SDN)
Automation
Dynamic Networks
Security
Heterogeneous Management
Single control point for various devices.
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
32. API API
Network Services
SDN OVERVIEW
Control Data Plane Interface (e.g. OpenFlow)
Network Devices Network Devices Network Devices
Network Devices Network Devices Network Devices
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Business Applications
SDN
Control
Software
Application
Layer
Control
Layer
Infrastructure
Layer
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
33. BENEFITS OF SDN
Network Virtualization is the final frontier of Software Defined Datacenter
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
• Dynamically update networks
• Automate network
functionality
• “Program” security into the
network
• Centrally apply policies to
network and services
• Optimize networks
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
34. OpenFlow enables networks to
evolve, by giving a remote
controller the power to modify
the behavior of network
devices, through a well-defined
"forwarding instruction set".
The growing OpenFlow
ecosystem now includes
routers, switches, virtual
switches, and access points
from a range of vendors.
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
OPENFLOW
Virtualization meets the network
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
35. OPEN SOURCE SDN
Software Defined Network Controllers and more
Floodlight The Floodlight Open SDN Controller is an enterprise-class, Apache-licensed, Java-based OpenFlow
Controller. It is supported by a community of developers including a number of engineers from Big Switch
Networks. - See more at: http://www.projectfloodlight.org/floodlight/#sthash.9IhA1Ih5.dpuf
Indigo Indigo is an open source project aimed at enabling support for OpenFlow on physical and hypervisor
switches. Big Switch has helped numerous companies OpenFlow enable their equipment, and we
provide firmware for a number of popular switches. Indigo is the basis of Switch Light by Big Switch
Networks. - See more at: http://www.projectfloodlight.org/indigo/#sthash.K7LiHcqc.dpuf
Lincx LINCX is a pure OpenFlow software switch written in Erlang. It runs within a separate domain under Xen
Nox NOX is the original OpenFlow controller, and facilitates development of fast C++ controllers on Linux.
Open Daylight Linux Foundation Collaborative Project based on Cisco One Controller and plugins from numerous
Open vSwitch Open vSwitch is a open source (ASL 2.0), multilayer virtual switch designed to enable massive network
automation through programmatic extension, while still supporting standard management interfaces and
protocols (e.g. NetFlow, sFlow, SPAN, RSPAN, CLI, LACP, 802.1ag).
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Project Description
hypervisor using LING (erlangonxen.org).
vendors in development. E.g IBM DOVE
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
36. Open vSwitch is a production quality,
multilayer virtual switch licensed under the
open source Apache 2.0 license. It is
designed to enable massive network
automation through programmatic extension,
while still supporting standard management
interfaces and protocols (e.g. NetFlow, sFlow,
SPAN, RSPAN, CLI, LACP, 802.1ag).
To learn more please visit our website:
http://openvswitch.org/
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
OPEN VSWITCH
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
37. CONFIGURATION MANAGEMENT
TOOLS Tools with features for configuring cloud infrastructure
Project Year Started Language License Client/Server
Chef 2009 Ruby Apache Chef Solo – No
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
CFengine 1993 C Apache Yes
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
Chef Server - Yes
Puppet 2004 Ruby GPL Yes & standalone
Salt 2011 Python Apache yes
Hitchhiker’s Guide to the
Open Cloud by @mrhinkle
37
38. CLOUD AUTOMATION TOOLS
One to many tools for managing large numbers of devices
Ansible Ansible's SSH-key based access allows contributors to the Fedora Project to assist in
automating infrastructure while having access limited appropriately. (Originally authored Func)
Capistrano Utility and framework for executing commands in parallel on multiple remote machines, via SSH.
It uses a simple DSL that allows you to define tasks, which may be applied to machines in
certain roles
RunDeck Rundeck is an open-source process automation and command orchestration tool with a web
Func Func provides a two-way authenticated system for generically executing tasks, integrations with
MCollective The Marionette Collective AKA MCollective is a framework to build server orchestration or
Salt Execute arbitrary shell commands or choose from dozens of pre-built modules of common (or
Scalr Provide scaling across multiple cloud computing platforms, integrates with Chef.
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Project Description
console.
puppet and cobbler.
parallel job execution systems.
complex) commands.
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
39. EUREKA PRIAM SIMIAN ARMY
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
ASGARD ASTYANAX EDDA
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
39
http://netflix.github.com
NETFLIX AWS TOOLBAG
Tools developed by a super Amazon Web Services Power User
40. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
CONTACT ME
Happy to Chat about Open Source, Cloud or Pittsburgh Sports
Professional: mark.hinkle@citrix.com
Personal: mrhinkle@gmail.com
Phone: 919.228.8049
Professional: http://open.citrix.com
Personal: http://www.socializedsoftware.com
Twitter: @mrhinkle
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
41. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
APPENDIX A
Additional Links to related stuff
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
42. ADDITIONAL LINKS
• Devops Toolchains Group
• Software Defined Networking: The New Norm for Networks
(Whitepaper)
• DevOps Wikipedia Page
• NoSQL-Database.org – Ultimate Guide to the Non-Relational Universe
• Open Cloud Initiative
• NIST Cloud Computing Platform
• Open Virtualization Format Specs
• Clouderati Twitter Account
• Planet DevOps
• Nicira Whitepaper – It’s Time to Virtualize the Network
• Why Open vSwitch FAQ
• Stanford Seminar - Software-Defined Networking at the Crossroads
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
43. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
ADDITIONAL LINKS (CONT’D)
• SDN, NFV, and open source: The Operator’s View
• Puppet Labs: Build a Toolbox for Continuous Delivery
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
44. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
APPENDIX B
Stuff I’d liked to have talked
about but didn’t have time
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
45. Bitnami BitNami provides free, ready to run environments for your favorite open source
web applications and frameworks, including Drupal, Joomla!, Wordpress, PHP,
Rails, Django and many more.
Boxgrinder BoxGrinder is a set of projects that help you grind out appliances for multiple
Oz Command-line tool that has the ability to create images for common Linux
SUSE Studio SUSE Studio supports building and deploying directly to cloud services such as
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
SOURCING CLOUD APPLIANCES
Packaging Engines for VMs
Tool/Project What you can do with them
virtualization and Cloud providers
distributions to run on KVM
Amazon EC2.
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
46. CLOUD MONITORING TOOLS
Tools with features for monitoring cloud infrastructure
Project Type of Monitoring Collection Methods
Cacti / RRDTool Performance SNMP, syslog
Nagios Availability SNMP,TCP, ICMP, IPMI,
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Graphite Performance Agent
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
syslog
Sensu Availability Agent
Zabbix Availability/ Performance and more SNMP, TCP/ICMP, IPMI,
Synthetic Transactions
Zenoss Availability, Performance, Event
Management
SNMP, ICMP, SSH, syslog,
WMI
Hitchhiker’s Guide to the
Open Cloud by @mrhinkle
46
47. CLOUD PROVISIONING TOOLS
Packaging Engines for VMs
Can provision 10s to 1000s of machines on various clouds.
Cobbler Distributed virtual infrastructure using koan (kickstart of a network to PXE
boot VMs) for Red Hat, OpenSUSE Fedora, Debian, Ubuntu VMs
Salt Cloud Tool to provision “salted” VMs that can then be updated by a central server
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
Project Installation Targets
Apache Provisionr
(incubating)
Crowbar (Bare metal provisioning)
JuJu Public Clouds - Amazon Web Services HP Cloud,
Private OpenStack clouds, Bare Metal via MAAS.
via ZeroMQ
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
Hitchhiker’s Guide to the
Open Cloud by @mrhinkle
47
48. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
BIG DATA
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
49. API: many » Query Method: MapReduce, Replicaton: , Written in: Java, Concurrency: eventually
consistent , Misc: like "Big-Table on Amazon Dynamo alike", initiated by Facebook
CouchDB Document Store API: Memcached API+protocol (binary and ASCII) , most languages, Protocol: Memcached REST interface
for cluster conf + management, Written in: C/C++ + Erlang (clustering), Replication: Peer to Peer, fully
consistent, Misc: Transparent topology changes during operation, provides memcached-compatible
caching buckets
API: Java / any writer, Protocol: any write call, Query Method: MapReduce Java / any exec, Replication:
HDFS Replication, Written in: Java
PI: Thrift (Java, PHP, Perl, Python, Ruby, etc.), Protocol: Thrift, Query Method: HQL, native Thrift API,
Replication: HDFS Replication, Concurrency: MVCC, Consistency Model: Fully consistent Misc: High
performance C++ implementation of Google's Bigtable.
MongoDB Document Store API: BSON, Protocol: C, Query Method: dynamic object-based language & MapReduce, Replication:
Redis Key Value/ Tuple Store API: Tons of languages, Written in: C, Concurrency: in memory and saves asynchronous disk after a
defined time. Append only mode available. Different kinds of fsync policies. Replication: Master / Slave,
Misc: also lists, sets, sorted sets, hashes, queues.
Riak Key Value / Tuple Store API: JSON, Protocol: REST, Query Method: MapReduce term matching , Scaling: Multiple Masters; Written
in: Erlang, Concurrency: eventually consistent (stronger then MVCC via Vector Clocks)
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
NOSQL DATABASES
Horizontally scalable unstructured data retrieval
Name Type Description
Apache
Wide Column
Cassandra
Store/Families
HBase Wide Column
Store/Families
Hypertable Wide Column
Store/Families
Master Slave & Auto-Sharding, Written in: C++,Concurrency
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
50. By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
MAP REDUCE
Algorithm for Parallelized Data Set Processing
Problem
Data
Master
Node
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
Worker
Node 1
Worker
Node 2
Worker
Node 3
Solution
Data
Map
Reduce
51. • Handles large amounts of
• Stores data in native format
• Delivers linear scalability at
• Resilient in case of
infrastructure failures
• Transparent application
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
APACHE HADOOP
Apache Project for Parallelized Data Set Processing
Overview
• Handles large amounts of
data
• Stores data in native format
• Delivers linear scalability at
low cost
• Resilient in case of
infrastructure failures
• Transparent application
scalability
Features
data
low cost
scalability
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
52. Machine Learning
By Mark R. Hinkle
@mrhinkle
mrhinkle@gmail.com
APACHE HADOOP ECOSYSTEM
Non-Relational DB
Hadoop Hadoop Common
HDFS
Distributes & replicates data
across machines
FOSSETCON 2014 - Crash Course in Open Source Cloud Computing
MapReduce
Distributes & monitors tasks
Hive
Data warehouse that
provides SQL interface.
Ad hoc projection of
data structure to
unstructured
MapReduce
• Parallel programming
• Handles large data blocks
HBase
Column-oriented
schema-less distributed
DB modeled after
Google’s BigTable
Random real time
read/write.
Scripting
Pig
Platform for
manipulating and
analyzing large data sets.
Scripting language for
analysts.
Mahout
Machine learning
libraries for
recommendations ,
clustering, classifications
and item sets.
Chuckwa Zookeeper