SlideShare ist ein Scribd-Unternehmen logo
1 von 38
IT Infrastructure Through the Public Network: Challenges & Solutions
$ whoami Martin Jackson – Uncommon Sense Consulting Working in the IT Field since 1993 Linux and Virtualization Consultant specialising in automated build and deployment of virtual infrastructures Infrastructure as Code Hacker DevOps Advocate Keen Judoka @actionjack on Twitter martin@uncommonsense-uk.com
$ cat /infrastructure/info Source: http://en.wikipedia.org/wiki/Cloud_computing
$ whatisiaas Outsourced Hardware Outsourced Operating system Outsourced Network Self Managed Typically available in Minutes Pay per play
#1  Challenge: Security
$ info security How do you protect your data in an infrastructure that you do not own or control?
$ cat security/access Protect your API keys and Use complex passwords Cyber-Ark Enterprise Vault Manage Engine Password Manager Pro  KeePass APG and GPG
$ cat security/access Keep your systems patched (religiously) Yum Red Hat Network Microsoft Update Network ShavlikNetChk Protect Apt
$ cat security/access Limit access to least privilege  Only create accounts for those who “need” them Create separate accounts per device Do not allow direct access via privileged user accounts e.g. Administrator or Root Use audited privilege elevation e.g. sudo, rootsh, sudosh, runas, shellrunas Only use encrypted login mechanisms e.g. ssh, ssl certificates
$ cat security/access Aggregate and monitor all login attempts Splunk Logstash Graylog2 GFI Events Manager
$ cat security/data Encrypt your sensitive data before you place it into the cloud PGP, GPG Keep it encrypted while in the cloud TrueCrypt, LUKS Ensure encryption is maintained if data needs to be transmitted elsewhere SCP, SSL, VPN, SSH
$ cat security/network If you need Secure Intra IaaS communication SSL Auth CohesiveFT’s VPN-Cubed OpenVPN Amazon Virtual Private Cloud
#2  Challenge: Outages
$ whatis outage Unplanned unavailability of a service "...in the cloud, you control your SLA..."  George Reese, founder enStratus Networks LLC
$ whatis outage “large-scale, essentially self-managed and commoditised infrastructure-as-a-service (IaaS) has price benefits but, if things go wrong, they do so in a big way” Dr Aydin Kurt-Elli, Lumison
$ whatis outage Vendor: TerremarkOutage Date: March 17, 2010Outage Duration: 7 hoursReason for Outage:Terremark'svCloud Express services suffered an outage after a bout of connectivity loss in its Miami data center. T he outage resulted in intermittent periods of connectivity with high data packet loss starting at 11:54 a.m. eastern and lasting more than seven hours, ending at 7:05 p.m. eastern time. According to Apparent Networks' Cloud Performance Center, during the outage access to systems in Terremark's Miami data center was severely degraded and often unavailable, affecting many businesses using Terremark'svCloud Express services.Severity: Medium http://www.crn.com/slide-shows/applications-os/225701829/10-biggest-cloud-outages-of-2010-so-far.htm;jsessionid=o+AywGYF+Mv5w3ZoWChIbQ**.ecappj01?pgno=5
$ whatis outage Vendor:Rackspace Outage Date:2011-02-01 Outage Duration:30 minutes Reason for Outage:DNS Issue Causes MySQL Server Outage.An unspecified DNS issue prevented users from connecting to MySQL and making external API calls. Rackspace resolved the issue and advised their users to refresh their browsers to view the site properly.  Severity:Low   http://outagecenter.com/rackspace-cloud-reports/cloud-sites-dfw1-wc2-degraded-2/
$ whatis outage Vendor:Rackspace Outage Date:April 28,2011 Outage Duration:6 hours Reason for Outage:At approximately 4:00 PM (CDT) customers began to experience connectivity issues related to Domain Name System (DNS) on Jungle Disk/Cloud Drive.The issue was identified to be an error with hostname translations on a single DNS server. This server was returning erroneous DNS information.an emergency maintenance to change the DNS configuration was performed In order to mitigate the issue. Severity: Medium    
$ whatis outage Vendor: Amazon Web ServiceOutage Date: April 21, 2011Outage Duration: UnknownReason for Outage:Amazon began reporting trouble on its Service Health Dashboard about 5 a.m. Eastern today. At 5:16 a.m., the site reported connectivity issues that were affecting its Relational Database Service, which is used to manage a relational database in the cloud, across multiple zones in the eastern U.S. A networking event early this morning triggered a large amount of re-mirroring of EBS volumes in US-EAST-1.The re-mirroring created a shortage of capacity in one of the US-EAST-1 Availability Zones, which impacted new EBS volume creation as well as the pace with which we could re-mirror and recover affected EBS volumes. Amazon also reported problems with its EC2, or Elastic Compute Cloud, a service that provides pay-as-you-go compute capacity in the cloud. The company also reported issues with its EBS, or Elastic Block Storage, which is storage related to the EC2 service. Severity: High http://www.computerworld.com/s/article/9216064/Amazon_gets_black_eye_from_cloud_outage
$ whatis outage Vendor: Amazon Web ServiceOutage Date: August 08, 2011Outage Duration: 30 MinutesReason for Outage: The issue happened in the networks that connect  the Availability Zones to the internet. The event began when a southern router inside one of Availability Zones briefly stopped exchanging route information with all adjacent devices, going into an incommunicative state. Upon re-establishing its health, the router began advertising an unusable route to other southern routers in other Availability Zones, deviating from its configuration and bypassing the standard protocol restriction on how routes are allowed to flow. The bad default internet route was picked up and used by the routers in other Availability Zones. Internet traffic from multiple Availability Zones in US East was immediately not routable out to the internet through the border. The issue was resolved by removing the router from service.Severity:Medium http://outagecenter.com/category/amazon-web-services-reports/amazon-elastic-compute-cloud-ec2-north-virginia/
$ whatis outage Failure is the new black, expect it and embrace it Design for failure and build your infrastructures to be redundant on 5 different levels Physical Virtual resource Availability zone Region Cloud
#3  Challenge: Standards
$ find standard Cloud standards and Interoperability To be honest they don’t exist yet… http://www.infoq.com/articles/problem-with-cloud-computing-standardization
$ cat standard/api Many different clouds… Many ways to interact with them… All do the same sort thing… Let abstract them Deltacloud Libcloud Jclouds
$ cat standard/api/deltacloud http://incubator.apache.org/deltacloud/ Ruby client require 'deltacloud'  api_url= 'http://localhost:3001/api'  api_name= 'mockuser'  api_password= 'mockpassword'  client = DeltaCloud.new( api_name, api_password, api_url )  http://www.infoq.com/articles/problem-with-cloud-computing-standardization
$ cat standard/api/libcloud http://libcloud.apache.org/ Python client from libcloud.compute.types import Provider from libcloud.compute.providers import get_driver EC2_ACCESS_ID = 'your access id' EC2_SECRET_KEY = 'your secret key' Driver = get_driver(Provider.EC2) conn = Driver(EC2_ACCESS_ID, EC2_SECRET_KEY) http://www.infoq.com/articles/problem-with-cloud-computing-standardization
$ cat standard/api/jclouds http://libcloud.apache.org/ Java client ComputeServiceContext context =      new ComputeServiceContextFactory().createContext("aws-ec2",  accesskeyid, secretkey,  ImmutableSet.<Module> of(new Log4JLoggingModule(),  new JschSshClientModule())); http://www.infoq.com/articles/problem-with-cloud-computing-standardization
#4  Challenge: Monitoring and Management
$ service monitor status Pay per play monitoring or fixed instance On premise or Off Ramping up and tearing down of instances Focus on Service monitoring vs host monitoring Monitoring tool must have an api
$ service monitor status Next Generation Cloud Monitoring Services Cloudkick - https://www.cloudkick.com Pingdom - http://www.pingdom.com Watchmouse - http://www.watchmouse.com Monitis– http://www.monitis.com
$ service management status Provision within minutes – Ready in Days??? If it takes 5 minutes to get a Virtual Machine How long are you willing to wait to use it? Data Center Automation Tools can help Puppet Chef CFEngine
$ cat management/puppet http://puppetlabs.com/ package { 'openssh-server': ensure => installed, }
$ cat management/chef http://www.opscode.com/ package "openssh-server" do     action :install end
$ cat management/chef http://cfengine.com/ control: any:: actionsequence= (                 packages             ) DefaultPkgMgr= ( rpm ) RPMcommand= ( /bin/rpm ) RPMInstallCommand= ( "/usr/bin/yum -y install %s" ) packages:         any:: openssh-server action=install
#5  Challenge: Governance
$ make governance The game has changed and you’ll need to change with it Conway's law applies: “...organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations.”
Challenge: Questions
$ cat links http://www.accenture.com/us-en/outlook/Pages/outlook-online-2011-challenges-cloud-computing.aspx http://www.infoq.com/articles/problem-with-cloud-computing-standardization http://www.computerworld.com/s/article/9217158/Cloud_interoperability_Problems_and_best_practices http://www.theaccidentalsuccessfulcio.com/cloud-computing/cio-cloud-computing-101-problems-with-clouds http://nylawblog.typepad.com/suigeneris/2009/11/does-cloudcomputing-compromise-clients.html http://horicky.blogspot.com/2009/08/multi-tenancy-in-cloud-computing.html http://www.cio.com/article/488478/The_Trouble_with_Cloud_Vendor_Lock_in http://www.agathongroup.com/blog/2010/04/cloud-computing-and-latency/

Weitere ähnliche Inhalte

Was ist angesagt?

Herd your chickens: Ansible for DB2 configuration management
Herd your chickens: Ansible for DB2 configuration managementHerd your chickens: Ansible for DB2 configuration management
Herd your chickens: Ansible for DB2 configuration managementFrederik Engelen
 
How to create a multi tenancy for an interactive data analysis
How to create a multi tenancy for an interactive data analysisHow to create a multi tenancy for an interactive data analysis
How to create a multi tenancy for an interactive data analysisTiago Simões
 
zookeeperProgrammers
zookeeperProgrammerszookeeperProgrammers
zookeeperProgrammersHiroshi Ono
 
Puppet for dummies - ZendCon 2011 Edition
Puppet for dummies - ZendCon 2011 EditionPuppet for dummies - ZendCon 2011 Edition
Puppet for dummies - ZendCon 2011 EditionJoshua Thijssen
 
Managing Puppet using MCollective
Managing Puppet using MCollectiveManaging Puppet using MCollective
Managing Puppet using MCollectivePuppet
 
Python Deployment with Fabric
Python Deployment with FabricPython Deployment with Fabric
Python Deployment with Fabricandymccurdy
 
Vagrant for real codemotion (moar tips! ;-))
Vagrant for real codemotion (moar tips! ;-))Vagrant for real codemotion (moar tips! ;-))
Vagrant for real codemotion (moar tips! ;-))Michele Orselli
 
Incrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern AutomationIncrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern AutomationSean Chittenden
 
Ansible 實戰:top down 觀點
Ansible 實戰:top down 觀點Ansible 實戰:top down 觀點
Ansible 實戰:top down 觀點William Yeh
 
PuppetCamp SEA 1 - Puppet Deployment at OnApp
PuppetCamp SEA 1 - Puppet Deployment  at OnAppPuppetCamp SEA 1 - Puppet Deployment  at OnApp
PuppetCamp SEA 1 - Puppet Deployment at OnAppWalter Heck
 
Ansible : what's ansible & use case by REX
Ansible :  what's ansible & use case by REXAnsible :  what's ansible & use case by REX
Ansible : what's ansible & use case by REXSaewoong Lee
 
PuppetCamp SEA 1 - Use of Puppet
PuppetCamp SEA 1 - Use of PuppetPuppetCamp SEA 1 - Use of Puppet
PuppetCamp SEA 1 - Use of PuppetWalter Heck
 
Puppet Camp DC 2014: Managing Puppet with MCollective
Puppet Camp DC 2014: Managing Puppet with MCollectivePuppet Camp DC 2014: Managing Puppet with MCollective
Puppet Camp DC 2014: Managing Puppet with MCollectivePuppet
 
Ansible not only for Dummies
Ansible not only for DummiesAnsible not only for Dummies
Ansible not only for DummiesŁukasz Proszek
 
Getting started with Ansible
Getting started with AnsibleGetting started with Ansible
Getting started with AnsibleIvan Serdyuk
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleSean Chittenden
 
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...Timofey Turenko
 
Amazon EC2 Container Service in Action
Amazon EC2 Container Service in ActionAmazon EC2 Container Service in Action
Amazon EC2 Container Service in ActionRemotty
 
Australian OpenStack User Group August 2012: Chef for OpenStack
Australian OpenStack User Group August 2012: Chef for OpenStackAustralian OpenStack User Group August 2012: Chef for OpenStack
Australian OpenStack User Group August 2012: Chef for OpenStackMatt Ray
 

Was ist angesagt? (20)

Herd your chickens: Ansible for DB2 configuration management
Herd your chickens: Ansible for DB2 configuration managementHerd your chickens: Ansible for DB2 configuration management
Herd your chickens: Ansible for DB2 configuration management
 
How to create a multi tenancy for an interactive data analysis
How to create a multi tenancy for an interactive data analysisHow to create a multi tenancy for an interactive data analysis
How to create a multi tenancy for an interactive data analysis
 
zookeeperProgrammers
zookeeperProgrammerszookeeperProgrammers
zookeeperProgrammers
 
Puppet for dummies - ZendCon 2011 Edition
Puppet for dummies - ZendCon 2011 EditionPuppet for dummies - ZendCon 2011 Edition
Puppet for dummies - ZendCon 2011 Edition
 
Managing Puppet using MCollective
Managing Puppet using MCollectiveManaging Puppet using MCollective
Managing Puppet using MCollective
 
Python Deployment with Fabric
Python Deployment with FabricPython Deployment with Fabric
Python Deployment with Fabric
 
Vagrant for real codemotion (moar tips! ;-))
Vagrant for real codemotion (moar tips! ;-))Vagrant for real codemotion (moar tips! ;-))
Vagrant for real codemotion (moar tips! ;-))
 
Incrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern AutomationIncrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern Automation
 
Ansible 實戰:top down 觀點
Ansible 實戰:top down 觀點Ansible 實戰:top down 觀點
Ansible 實戰:top down 觀點
 
PuppetCamp SEA 1 - Puppet Deployment at OnApp
PuppetCamp SEA 1 - Puppet Deployment  at OnAppPuppetCamp SEA 1 - Puppet Deployment  at OnApp
PuppetCamp SEA 1 - Puppet Deployment at OnApp
 
Ansible : what's ansible & use case by REX
Ansible :  what's ansible & use case by REXAnsible :  what's ansible & use case by REX
Ansible : what's ansible & use case by REX
 
PuppetCamp SEA 1 - Use of Puppet
PuppetCamp SEA 1 - Use of PuppetPuppetCamp SEA 1 - Use of Puppet
PuppetCamp SEA 1 - Use of Puppet
 
kubernetes practice
kubernetes practicekubernetes practice
kubernetes practice
 
Puppet Camp DC 2014: Managing Puppet with MCollective
Puppet Camp DC 2014: Managing Puppet with MCollectivePuppet Camp DC 2014: Managing Puppet with MCollective
Puppet Camp DC 2014: Managing Puppet with MCollective
 
Ansible not only for Dummies
Ansible not only for DummiesAnsible not only for Dummies
Ansible not only for Dummies
 
Getting started with Ansible
Getting started with AnsibleGetting started with Ansible
Getting started with Ansible
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale
 
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
 
Amazon EC2 Container Service in Action
Amazon EC2 Container Service in ActionAmazon EC2 Container Service in Action
Amazon EC2 Container Service in Action
 
Australian OpenStack User Group August 2012: Chef for OpenStack
Australian OpenStack User Group August 2012: Chef for OpenStackAustralian OpenStack User Group August 2012: Chef for OpenStack
Australian OpenStack User Group August 2012: Chef for OpenStack
 

Ähnlich wie IT Infrastructure Through The Public Network Challenges And Solutions

Availability Considerations for SQL Server
Availability Considerations for SQL ServerAvailability Considerations for SQL Server
Availability Considerations for SQL ServerBob Roudebush
 
Webinar: How Microsoft is changing the game with Windows Azure
Webinar: How Microsoft is changing the game with Windows AzureWebinar: How Microsoft is changing the game with Windows Azure
Webinar: How Microsoft is changing the game with Windows AzureCommon Sense
 
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMsScaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMsMatei Zaharia
 
It's a Dangerous World
It's a Dangerous World It's a Dangerous World
It's a Dangerous World MongoDB
 
Monitoring IAAS & PAAS Solutions
Monitoring IAAS & PAAS SolutionsMonitoring IAAS & PAAS Solutions
Monitoring IAAS & PAAS SolutionsColloquium
 
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Prolifics
 
Cloud servers-new-risk-considerations
Cloud servers-new-risk-considerationsCloud servers-new-risk-considerations
Cloud servers-new-risk-considerationsAccenture
 
Service Virtualization: What Testers Need to Know
Service Virtualization: What Testers Need to KnowService Virtualization: What Testers Need to Know
Service Virtualization: What Testers Need to KnowTechWell
 
A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Compu...
A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Compu...A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Compu...
A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Compu...Nico Huysamen
 
Lessons from Large-Scale Cloud Software at Databricks
Lessons from Large-Scale Cloud Software at DatabricksLessons from Large-Scale Cloud Software at Databricks
Lessons from Large-Scale Cloud Software at DatabricksMatei Zaharia
 
The Real World - Plugging the Enterprise Into It (nodejs)
The Real World - Plugging  the Enterprise Into It (nodejs)The Real World - Plugging  the Enterprise Into It (nodejs)
The Real World - Plugging the Enterprise Into It (nodejs)Aman Kohli
 
Operations: Security Crash Course — Best Practices for Securing your Company
Operations: Security Crash Course — Best Practices for Securing your CompanyOperations: Security Crash Course — Best Practices for Securing your Company
Operations: Security Crash Course — Best Practices for Securing your CompanyAmazon Web Services
 
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014SAMeh Zaghloul
 

Ähnlich wie IT Infrastructure Through The Public Network Challenges And Solutions (20)

Availability Considerations for SQL Server
Availability Considerations for SQL ServerAvailability Considerations for SQL Server
Availability Considerations for SQL Server
 
Webinar: How Microsoft is changing the game with Windows Azure
Webinar: How Microsoft is changing the game with Windows AzureWebinar: How Microsoft is changing the game with Windows Azure
Webinar: How Microsoft is changing the game with Windows Azure
 
Introduction To Cloud Computing
Introduction To Cloud ComputingIntroduction To Cloud Computing
Introduction To Cloud Computing
 
Sdn primer pdf
Sdn primer pdfSdn primer pdf
Sdn primer pdf
 
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMsScaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
 
It's a Dangerous World
It's a Dangerous World It's a Dangerous World
It's a Dangerous World
 
Monitoring IAAS & PAAS Solutions
Monitoring IAAS & PAAS SolutionsMonitoring IAAS & PAAS Solutions
Monitoring IAAS & PAAS Solutions
 
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
 
Cloud servers-new-risk-considerations
Cloud servers-new-risk-considerationsCloud servers-new-risk-considerations
Cloud servers-new-risk-considerations
 
Service Virtualization: What Testers Need to Know
Service Virtualization: What Testers Need to KnowService Virtualization: What Testers Need to Know
Service Virtualization: What Testers Need to Know
 
A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Compu...
A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Compu...A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Compu...
A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Compu...
 
Lessons from Large-Scale Cloud Software at Databricks
Lessons from Large-Scale Cloud Software at DatabricksLessons from Large-Scale Cloud Software at Databricks
Lessons from Large-Scale Cloud Software at Databricks
 
Mini-Track: Lessons from Public Cloud
Mini-Track: Lessons from Public CloudMini-Track: Lessons from Public Cloud
Mini-Track: Lessons from Public Cloud
 
Operations: Security
Operations: SecurityOperations: Security
Operations: Security
 
Walking Through Cloud Serving at Yahoo!
Walking Through Cloud Serving at Yahoo!Walking Through Cloud Serving at Yahoo!
Walking Through Cloud Serving at Yahoo!
 
The Real World - Plugging the Enterprise Into It (nodejs)
The Real World - Plugging  the Enterprise Into It (nodejs)The Real World - Plugging  the Enterprise Into It (nodejs)
The Real World - Plugging the Enterprise Into It (nodejs)
 
cluster computing
cluster computingcluster computing
cluster computing
 
Operations: Security Crash Course — Best Practices for Securing your Company
Operations: Security Crash Course — Best Practices for Securing your CompanyOperations: Security Crash Course — Best Practices for Securing your Company
Operations: Security Crash Course — Best Practices for Securing your Company
 
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
 
Rik Ferguson
Rik FergusonRik Ferguson
Rik Ferguson
 

IT Infrastructure Through The Public Network Challenges And Solutions

  • 1. IT Infrastructure Through the Public Network: Challenges & Solutions
  • 2. $ whoami Martin Jackson – Uncommon Sense Consulting Working in the IT Field since 1993 Linux and Virtualization Consultant specialising in automated build and deployment of virtual infrastructures Infrastructure as Code Hacker DevOps Advocate Keen Judoka @actionjack on Twitter martin@uncommonsense-uk.com
  • 3. $ cat /infrastructure/info Source: http://en.wikipedia.org/wiki/Cloud_computing
  • 4. $ whatisiaas Outsourced Hardware Outsourced Operating system Outsourced Network Self Managed Typically available in Minutes Pay per play
  • 5. #1 Challenge: Security
  • 6. $ info security How do you protect your data in an infrastructure that you do not own or control?
  • 7. $ cat security/access Protect your API keys and Use complex passwords Cyber-Ark Enterprise Vault Manage Engine Password Manager Pro KeePass APG and GPG
  • 8. $ cat security/access Keep your systems patched (religiously) Yum Red Hat Network Microsoft Update Network ShavlikNetChk Protect Apt
  • 9. $ cat security/access Limit access to least privilege Only create accounts for those who “need” them Create separate accounts per device Do not allow direct access via privileged user accounts e.g. Administrator or Root Use audited privilege elevation e.g. sudo, rootsh, sudosh, runas, shellrunas Only use encrypted login mechanisms e.g. ssh, ssl certificates
  • 10. $ cat security/access Aggregate and monitor all login attempts Splunk Logstash Graylog2 GFI Events Manager
  • 11. $ cat security/data Encrypt your sensitive data before you place it into the cloud PGP, GPG Keep it encrypted while in the cloud TrueCrypt, LUKS Ensure encryption is maintained if data needs to be transmitted elsewhere SCP, SSL, VPN, SSH
  • 12. $ cat security/network If you need Secure Intra IaaS communication SSL Auth CohesiveFT’s VPN-Cubed OpenVPN Amazon Virtual Private Cloud
  • 13. #2 Challenge: Outages
  • 14. $ whatis outage Unplanned unavailability of a service "...in the cloud, you control your SLA..." George Reese, founder enStratus Networks LLC
  • 15. $ whatis outage “large-scale, essentially self-managed and commoditised infrastructure-as-a-service (IaaS) has price benefits but, if things go wrong, they do so in a big way” Dr Aydin Kurt-Elli, Lumison
  • 16. $ whatis outage Vendor: TerremarkOutage Date: March 17, 2010Outage Duration: 7 hoursReason for Outage:Terremark'svCloud Express services suffered an outage after a bout of connectivity loss in its Miami data center. T he outage resulted in intermittent periods of connectivity with high data packet loss starting at 11:54 a.m. eastern and lasting more than seven hours, ending at 7:05 p.m. eastern time. According to Apparent Networks' Cloud Performance Center, during the outage access to systems in Terremark's Miami data center was severely degraded and often unavailable, affecting many businesses using Terremark'svCloud Express services.Severity: Medium http://www.crn.com/slide-shows/applications-os/225701829/10-biggest-cloud-outages-of-2010-so-far.htm;jsessionid=o+AywGYF+Mv5w3ZoWChIbQ**.ecappj01?pgno=5
  • 17. $ whatis outage Vendor:Rackspace Outage Date:2011-02-01 Outage Duration:30 minutes Reason for Outage:DNS Issue Causes MySQL Server Outage.An unspecified DNS issue prevented users from connecting to MySQL and making external API calls. Rackspace resolved the issue and advised their users to refresh their browsers to view the site properly. Severity:Low   http://outagecenter.com/rackspace-cloud-reports/cloud-sites-dfw1-wc2-degraded-2/
  • 18. $ whatis outage Vendor:Rackspace Outage Date:April 28,2011 Outage Duration:6 hours Reason for Outage:At approximately 4:00 PM (CDT) customers began to experience connectivity issues related to Domain Name System (DNS) on Jungle Disk/Cloud Drive.The issue was identified to be an error with hostname translations on a single DNS server. This server was returning erroneous DNS information.an emergency maintenance to change the DNS configuration was performed In order to mitigate the issue. Severity: Medium    
  • 19. $ whatis outage Vendor: Amazon Web ServiceOutage Date: April 21, 2011Outage Duration: UnknownReason for Outage:Amazon began reporting trouble on its Service Health Dashboard about 5 a.m. Eastern today. At 5:16 a.m., the site reported connectivity issues that were affecting its Relational Database Service, which is used to manage a relational database in the cloud, across multiple zones in the eastern U.S. A networking event early this morning triggered a large amount of re-mirroring of EBS volumes in US-EAST-1.The re-mirroring created a shortage of capacity in one of the US-EAST-1 Availability Zones, which impacted new EBS volume creation as well as the pace with which we could re-mirror and recover affected EBS volumes. Amazon also reported problems with its EC2, or Elastic Compute Cloud, a service that provides pay-as-you-go compute capacity in the cloud. The company also reported issues with its EBS, or Elastic Block Storage, which is storage related to the EC2 service. Severity: High http://www.computerworld.com/s/article/9216064/Amazon_gets_black_eye_from_cloud_outage
  • 20. $ whatis outage Vendor: Amazon Web ServiceOutage Date: August 08, 2011Outage Duration: 30 MinutesReason for Outage: The issue happened in the networks that connect the Availability Zones to the internet. The event began when a southern router inside one of Availability Zones briefly stopped exchanging route information with all adjacent devices, going into an incommunicative state. Upon re-establishing its health, the router began advertising an unusable route to other southern routers in other Availability Zones, deviating from its configuration and bypassing the standard protocol restriction on how routes are allowed to flow. The bad default internet route was picked up and used by the routers in other Availability Zones. Internet traffic from multiple Availability Zones in US East was immediately not routable out to the internet through the border. The issue was resolved by removing the router from service.Severity:Medium http://outagecenter.com/category/amazon-web-services-reports/amazon-elastic-compute-cloud-ec2-north-virginia/
  • 21. $ whatis outage Failure is the new black, expect it and embrace it Design for failure and build your infrastructures to be redundant on 5 different levels Physical Virtual resource Availability zone Region Cloud
  • 22. #3 Challenge: Standards
  • 23. $ find standard Cloud standards and Interoperability To be honest they don’t exist yet… http://www.infoq.com/articles/problem-with-cloud-computing-standardization
  • 24. $ cat standard/api Many different clouds… Many ways to interact with them… All do the same sort thing… Let abstract them Deltacloud Libcloud Jclouds
  • 25. $ cat standard/api/deltacloud http://incubator.apache.org/deltacloud/ Ruby client require 'deltacloud' api_url= 'http://localhost:3001/api' api_name= 'mockuser' api_password= 'mockpassword' client = DeltaCloud.new( api_name, api_password, api_url ) http://www.infoq.com/articles/problem-with-cloud-computing-standardization
  • 26. $ cat standard/api/libcloud http://libcloud.apache.org/ Python client from libcloud.compute.types import Provider from libcloud.compute.providers import get_driver EC2_ACCESS_ID = 'your access id' EC2_SECRET_KEY = 'your secret key' Driver = get_driver(Provider.EC2) conn = Driver(EC2_ACCESS_ID, EC2_SECRET_KEY) http://www.infoq.com/articles/problem-with-cloud-computing-standardization
  • 27. $ cat standard/api/jclouds http://libcloud.apache.org/ Java client ComputeServiceContext context = new ComputeServiceContextFactory().createContext("aws-ec2", accesskeyid, secretkey, ImmutableSet.<Module> of(new Log4JLoggingModule(), new JschSshClientModule())); http://www.infoq.com/articles/problem-with-cloud-computing-standardization
  • 28. #4 Challenge: Monitoring and Management
  • 29. $ service monitor status Pay per play monitoring or fixed instance On premise or Off Ramping up and tearing down of instances Focus on Service monitoring vs host monitoring Monitoring tool must have an api
  • 30. $ service monitor status Next Generation Cloud Monitoring Services Cloudkick - https://www.cloudkick.com Pingdom - http://www.pingdom.com Watchmouse - http://www.watchmouse.com Monitis– http://www.monitis.com
  • 31. $ service management status Provision within minutes – Ready in Days??? If it takes 5 minutes to get a Virtual Machine How long are you willing to wait to use it? Data Center Automation Tools can help Puppet Chef CFEngine
  • 32. $ cat management/puppet http://puppetlabs.com/ package { 'openssh-server': ensure => installed, }
  • 33. $ cat management/chef http://www.opscode.com/ package "openssh-server" do action :install end
  • 34. $ cat management/chef http://cfengine.com/ control: any:: actionsequence= ( packages ) DefaultPkgMgr= ( rpm ) RPMcommand= ( /bin/rpm ) RPMInstallCommand= ( "/usr/bin/yum -y install %s" ) packages: any:: openssh-server action=install
  • 35. #5 Challenge: Governance
  • 36. $ make governance The game has changed and you’ll need to change with it Conway's law applies: “...organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations.”
  • 38. $ cat links http://www.accenture.com/us-en/outlook/Pages/outlook-online-2011-challenges-cloud-computing.aspx http://www.infoq.com/articles/problem-with-cloud-computing-standardization http://www.computerworld.com/s/article/9217158/Cloud_interoperability_Problems_and_best_practices http://www.theaccidentalsuccessfulcio.com/cloud-computing/cio-cloud-computing-101-problems-with-clouds http://nylawblog.typepad.com/suigeneris/2009/11/does-cloudcomputing-compromise-clients.html http://horicky.blogspot.com/2009/08/multi-tenancy-in-cloud-computing.html http://www.cio.com/article/488478/The_Trouble_with_Cloud_Vendor_Lock_in http://www.agathongroup.com/blog/2010/04/cloud-computing-and-latency/

Hinweis der Redaktion

  1. Cloud computing is the delivery of computing as a service rather than a product, whereby shared resources, software and information are provided to computers and other devices as a utility (like the electricity grid) over a network (typically theInternet).Application or &quot;Software as a Service (SaaS)&quot; deliver software as a service over the Internet, eliminating the need to install and run the application on the customer&apos;s own computers and simplifying maintenance and support.Platform as a service (PaaS) is the delivery of a computing platform and solution stack as a service. PaaS offerings may include facilities for application design, application development, testing, deployment and hostingInfrastructure as a service (IaaS)is thedelivery of computer infrastructure – typically a platform virtualization environment – as a service, along with raw (block) storage and networking.
  2. Many are in draft – Currently centered around Amazon EC2 API andIn September 2007 Dell, HP, IBM, Microsoft, VMware and XenSource submitted to the Distributed Management Task Force (DMTF) a proposal for OVF, then named &quot;Open Virtual Machine Format&quot;
  3. Many are in draft
  4. Many are in draft
  5. Many are in draft
  6. Many are in draft