Netcetera consultants Ronnie Brunner and Jason Brazile present the results of a year long study of existing and potential uses of cloud computing at the European Space Agency. Some unpublished internal material was removed. Queries can be directed to the contract's Technical Officer at ESA ESRIN.
1. ESA and the Cloud Jason Brazile and Ronnie Brunner Final presentation of Study on Cloud Computing ESRIN/Contract Nr. 22700/09/I-SB 2011-01-28 ESA and the Cloud
2.
3.
4.
5. The Cloud Computing Stack ESA and the Cloud Platforms as a Service (PaaS) Infrastructure as a Service (IaaS) Software as a Service (SaaS) Cloud Enablers / Cross platform solutions
6.
7.
8.
9.
10.
11.
12.
13. ESA Cloud Computing success stories ESA and the Cloud Contact ESRIN/Contract Nr. 22700/09/I-SB Technical Officer
14. Portal Edge Caching, Media Distribution ESA and the Cloud Contact ESRIN/Contract Nr. 22700/09/I-SB Technical Officer
15.
16. G-POD Framework ESA and the Cloud Contact ESRIN/Contract Nr. 22700/09/I-SB Technical Officer
17. Collaboration Tools ESA and the Cloud Contact ESRIN/Contract Nr. 22700/09/I-SB Technical Officer
18. Virtual Archive ESA and the Cloud Contact ESRIN/Contract Nr. 22700/09/I-SB Technical Officer
19. Learnings so far ESA and the Cloud Contact ESRIN/Contract Nr. 22700/09/I-SB Technical Officer
20.
21.
22.
23.
24.
25. Upcoming projects at ESA ESA and the Cloud Contact ESRIN/Contract Nr. 22700/09/I-SB Technical Officer
26.
27. Contact ESA and the Cloud Jason Brazile [email_address] +41-44-247 79 25 Ronnie Brunner [email_address] +41-44-247 79 79 Netcetera Zypressenstrasse 71 CH-8040 Zürich +41-44-247 70 70
Hinweis der Redaktion
ESA and the Cloud Quality Software Engineering ESA and the Cloud This presentation covers the „Study on Cloud Computing“ of the project: Study on Cloud Computing and Management of EOP Content Distribution Network (CDN) Services ESRIN/Contract No. 22700/09/I-SB The study was delivered as a document covering the following content: Part I: Cloud Computing Today 1 A Definition of Cloud Computing 2 Technical Features of Cloud Computing 3 The Economics of Cloud Computing 4 The Cloud Ecosystem 5 Current Limitations of Cloud Computing 6 Market Study 7 Research and Governmental Projects Leveraging Cloud Computing Part II: Recommendations For The Adoption Of Cloud Computing Services By ESA 8 Introduction 9 Identification Of Candidate ESA IT Activities 10 Key Aspects For Evaluation Of Cloud Computing Providers 11 Methodology For The Economic Evaluation Of Cloud Computing Services 12 Proposal For A Program-level Governance For Cloud Computing Initiatives
ESA and the Cloud Quality Software Engineering “ Cloud first” There’s a concerted US government effort to go “Cloud first” in many government agencies Office of Management and Budget Department of Interior US Geological Survey NASA Many others Main goal Aim to reduce the existing thousands of data centers (and associated costs – both in money and otherwise) Department of Agriculture is consolidating 21 different messaging and collaboration systems into one system that will support 120,000 users GSA is contracting with Unisys Corp. to move 17,000 government e-mail accounts to Google Apps for lowering costs by 50 percent and saving $15 million over five years Consolidation of 800 of the government's 2,094 data centers by 2015 Big players (Microsoft, Google) see this as significant – enough to sue the US government over their bidding process in some cases
ESA and the Cloud Quality Software Engineering Background YouTube changed the game w.r.t. what is an ISP, a CDN, and a content provider. So many network transit providers didn’t want to get on the wrong side of the bandwidth equation that they were willing to peer with Google “for free”. So although on paper, it long appeared that YouTube was operating at a huge loss, the savings in aggregate network transit costs and peering desirability compensated and allowed Google to catch up with competitors such as Microsoft. Now Netflix is changing the game w.r.t. TV/Film content: They snuck in cheaply: Originally they were primarily a DVD by physical post distributor, and only later got into Internet distribution Hollywood studios sold Internet viewing rights cheaply because it was free money on top of the more substantial DVD distribution licenses But then the same (cheap) license agreements remained even when Internet viewing vastly overtook DVD postal distribution This seems to be repeating the YouTube-style event – see Level 3 vs ComCast “more than a peering spat” In any case, Netflix’s huge and quick growth… included migration of several core business functions (not just website, data distribution and mastering/recoding) and put a large amount of trust in Amazon’s ability to provide scale, and geographic coverage
ESA and the Cloud Quality Software Engineering Cloud Computing has many definitions The aspects most often cited are: Self-service: No need to wait for anybody to set something up for me On-demand: I can get it now, if I need it now Pay-as-you-go: It only costs as long as I actually use it This already implies elasticity: I can get as many resources as I want/need Economy of scale A driving factor are economies of scale that make it possible to reduce cost per user to almost nil. Additional attributes And also, Cloud Computing is a more than just the these points, but they catch the essence of Cloud Computing pretty well. Other important aspects to make Cloud Computing work are: Broad network access Resource pooling Measured service No commitment Pay by credit card Programmatic/”Mashup-able” SLA-driven Retail-model discipline
ESA and the Cloud Quality Software Engineering The Stack SaaS builds on PaaS PaaS builds on IaaS IaaS builds on Virtualisation Does one size fit all? No! Content Delivery Networks (CDN) Storage & backup providers (maybe IaaS?) DB providers (Can this be called SaaS? Or is it part of IaaS?) Deployment models The Cloud is the Cloud, right? No! Public Cloud: The infrastructure is “publicly” available for anyone to use (Does not imply that everybody can read everybody’s data) Virtual Private Cloud: A provider sets up some part of its Cloud to be accessible by you only (using encryption and VPN technology) Private Cloud: You own the iron, but set it up like a Cloud to share workload of different internal users Disclaimer: The logos of providers and services on this and in all the following slides don’t imply actual recommendations, but are for illustration purposes only, except where explicitly mentioned.
ESA and the Cloud Quality Software Engineering IaaS This is probably the most well-known cloud service model reported by the media. However, cloud compute-on-demand is also an easier to understand next step after in-house virtualization and co-location. When the resource need is known in advance but generally very irregular over time, IaaS is a viable approach to cost savings. However in many application the resources needed over time are not known especially if it depends on external demand (such as a web site that gets very popular because of a specific event). However, after initial “low-hanging fruit” of just moving a virtualized service into the “Cloud” next steps get tougher: Scaling a service needs manual effort or additional tools and measures There is likely a need to re-design (maybe even from scratch) to successfully move to an inherently more distributed platform. Only on PaaS (migrating to which may also need heavy application architecture change) do you get (in some cases at least) automatic reduction of scaling details
ESA and the Cloud Quality Software Engineering PaaS Platforms simplify architecture of applications by providing a more abstract model of its service, providing simpler deployment and hiding the complexity of scalability behind their interface to the application developer. In the best case (i.e. when there is a good fit to the service’s model) a user does not have to worry about scalability or availability simply by adhering to the platforms APIs. Lock-in The biggest drawback to consider however is “lock-in”: For IaaS, there are already many abstraction (cross-platform) APIs around that help mitigate the risk of locking the user to a specific provider. For platforms, this will be more difficult, although there are also already first open source implementations of existing platforms (e.g. AppScale, which is an open source implementation of the Google AppEngine)
ESA and the Cloud Quality Software Engineering SaaS A SaaS provider is basically a ASP, as it was called a couple of years ago. While the Cloud lets providers leverage existing infrastructure and the traditional ASP based his offering mostly on his own infrastructure, the difference for the actual user is insignificant. With SaaS too, the main characterization is: self-service, pay-as-you-go and on-demand. Moving from local installations to SaaS allows new capabilities that are not possible locally: e.g. simultaneous editing of documents. Variety Whenever a software can potentially be accessed with a browser, it is a candidate for SaaS. This implies that the number of offerings and services will explode even more in the near future. The difficult question when exploring possibilities is that the market is changing even faster than the IaaS and PaaS market.
ESA and the Cloud Quality Software Engineering Basic principles Economies of scale: cheaper to produce the service for the provider Pay a higher price, but for a (much) shorter time: Accommodate for variability in demand (for buyer) No idling resources (for buyer) Utilization and variability of demand are key factors: The better utilized any system is, the lower is the TCO Responsibility is transferred to Cloud provider: Seller needs to provide elasticity The higher in the (Cloud) stack, the more responsibility is transferred to the seller Moving from CapEx to OpEx: Makes it easier for buyer to budget (no investment) Opting-out is easier with OpEx (although it usually requires an alternative) The Cloud stack builds the economic basis It’s very easy for a provider to base its services on top of existing Cloud services: Write a self-service application for an existing platform and you can provide a SaaS offering that’s scales easily and where operations cost is strictly based on actual usage. The higher “up” a service, the more value it adds for the customer. It leverages the underlying services and allows for every provider in the value chain to earn money.
ESA and the Cloud Quality Software Engineering Initial goal The initial goal of the market study was to get a clear picture of the status of Enterprise Cloud providers as well as “pure” Cloud providers. While Enterprise providers were generously giving information, pure cloud providers did not respond to our inquiries. Concentration on IaaS providers and analysis As the offerings of Enterprise Cloud providers are very customer specific, as these providers traditionally do not only specific consulting, but very specific implementation, it was difficult to create a generalized view on their services. “Tell as what you need and we’ll implement it for you” is the basic summary for services offered by Enterprise cloud providers. For the study that was not helpful, so we concentrated on pure IaaS players and used publicly available information to enhance our study results. The big players The current big players (specifically in the infrastructure services area) are monitored by various sources like Guy Rosen’s “State of the Cloud”: they run queries on the top 500k sites identified by quantcast.com and assign them to infrastructure providers. Top players currently: Amazon EC2, Rackspace (closing the gap to Amazon) and Linode.
ESA and the Cloud Quality Software Engineering How big is the Cloud? In 2010: SaaS: ~$11.7 billion (Forrester) PaaS: ~$311 million (Forrester) IaaS: ~$1 billion (Forrester, Economist, UBS) Total: ~$13 billion In 2020: IaaS: ~$4 billion (Forrester) Total: ~$56 billion Source: Economist, “Tanks in the cloud”, 29. Dec 2010, (http://www.economist.com/node/17797794) Consolidation is happening now and even more in the near future Rackspace acquired Cloudkick in December 2010 Salesforce acquired Heroku in December 2010 Teradata to acquire Aprimo, December 1020 The “big ones” also heavily acquire Cloud companies IBM acquired Unica, Cast Iron Systems, Netezza, Coremetrics, … HP acquired Stratavia, ArcSight, Melodeo, … Dell acquired Everdream, SilverBack, MessageOne, Perot Systems, InSite One, Boomi, … Current developments The market is still changing at an enormous pace. New SaaS offerings pop up on a daily basis and even the current IaaS market leader Amazon regularly innovates with new services: In November 2010, Amazon Web Services (AWS) achieved Level 1 compliance with the Payment Card Industry (PCI) Data Security Standard, which means its cloud can process credit card transactions and store credit card data. In January 2011, Amazon launched a new PaaS: “Elastic Beanstalk”, an auto-scaling application platform to operate (primarily) Java applications.
ESA and the Cloud Quality Software Engineering Status Potential and existing Cloud projects at ESA Projects started already (since 2001, even before Cloud Computing was termed ;-) in the following areas: Collaboration Web Distribution Prototyping Software development … and others Main adoption areas Archive / Storage Collaboration Distribution Processing Current challenges Long term archive Large data transfer Cloud Management Centralized vs. Distributed? Public/Private Cloud? Lock-in / interoperability Governance … and others Office Projects / Topics involved ESTEC Gaia Operations EOP-GU EO Catalogue, UI, Web-Portal CEOS Cal/Val, Portal EOP-GG Data Reprocessing, Large Volume Data Processing, Processing on-demand with toolboxes Corp IT SaaS (SharePoint, WebEx) LEX Corporate Communications EOP-GTR G-Pod, Processing on-demand with toolboxes EOP-GQ Data Reprocessing, Large Volume Data Processing, EOP Data Distribution, L0 Cal/Val, QC, ESL OPS-L Space Situational Awareness Processing, Security EOP-GS EOP Data Distribution, L0 Cal/Val, QC, ESL TEC Simulation/Collaboration EOP-S Grid, Infrastructure, EO Catalogue, UI, Web-Portal, Processing on-demand EOP-GQ Data Reprocessing, Large Volume Data Processing, EOP Data Distribution, L0 Cal/Val, QC, ESL LEX-CCW Corporate Communications Web and Distribution Unit (Akamai, Flickr, YouTube) EOP-GTF Data Reprocessing, Large Volume Data Processing, EOP Data Distribution, L0 Cal/Val, QC, ESL OPS-C Operations directorate and Infrastructure EOP-GM Data Distribution, L0 Cal/Val, QC, ESL, Supersites Virtual Archive EOP-GU EO Catalogue, UI, Web-Portal EOP-GTF Data Reprocessing, Large Volume Data Processing, EOP Data Distribution, L0 Cal/Val, QC, ESL EOP-GT Data Reprocessing, Large Volume Data Processing OPS-GI EC2 for development ESAC Gaia Development and Testing
ESA and the Cloud Quality Software Engineering
ESA and the Cloud Quality Software Engineering LEX-CCW ESA's Communication & Knowledge Department is responsible for keeping the world's media, decision makers and the public up to date with what is happening at ESA and for providing news on all its latest activities. Among this department's main activities is organizing large media events and press conferences, and prepare written and audiovisual material for specific target groups. The Communication & Knowledge Department also organizes exhibits, maintains the main photo and video archives and the corporate web portal. Content distribution Since 2001, ESA is using Akamai to cache/deliver esa.int portal content, such as HTML, images, and video. (See: www.esa.int) Digital asset management The new “Digital Asset Management" (DAM) project, plans to use the eZ Publish platform to offer a video software as a service internally to scientific users within the Agency. The ESA Video Archive itself uses Highwinds CDN for content distribution (See: multimedia.esa.int) Social media accounts YouTube: ESA (as a partner) is allowed >10min movies Flickr Pro: (unlimited #of images, and size) Why interesting for the Cloud? There is no real alternative to using a CDN to distribute data globally very fast.
ESA and the Cloud Quality Software Engineering ESAC The European Space Astronomy Centre (ESAC) is ESA’s centre for space science. It is located in Villanueva de la Cañada, close to Madrid in Spain . GAIA / AGIS Gaia is an astrometry mission with the primary goal of making the largest, most precise three-dimensional map of the Milky Way Galaxy by surveying more than one billion stars. One of the largest data processing components of that mission is the AGIS (Astrometric Global Iterative Solution) component, which is tasked with computing key motion and positional attributes of the stars observed by the mission's satellite, due to be launched in 2012. Amazon AWS prototype Only 20 person days (with UK-based The Server Labs) were needed to get the software running in Amazon cloud Oracle ASM Image based on Oracle Database 11g Release 1 Enterprise Edition - 64 Bit (Large instance) The prototype revealed and helped solve a scalability problem in code – The team never had 100 nodes to test before: Only 4 lines of code needed to be changed It ran at similar performance to existing in-house cheap cluster – E2C indeed is no super computer Availability of large number of nodes very interesting – not affordable in-house today Why interesting for the Cloud? Gaia needs to process data only twice per year, but then as massively as possible. Exclusive hardware would be used at very bad utilization this way. This means, that the more nodes can by used, the shorter the run cycle (faster delivery) and the cheaper AWS becomes in comparison to in-house hardware (if it cannot be used for other projects in the time between the runs).
ESA and the Cloud Quality Software Engineering
ESA and the Cloud Quality Software Engineering
ESA and the Cloud Quality Software Engineering
ESA and the Cloud Quality Software Engineering
ESA and the Cloud Quality Software Engineering Benefits There is a multitude of benefits in different categories. For IaaS the benefits concentrate around price (Cloud can be cheaper) and convenience. Delegation of responsibility for hardware and elasticity works. It’s clear however, that the case in question must properly be calculated as using the Cloud can also be more expensive than using an in-house alternative. For CDN the case is different: The service a CDN provide simply cannot be provided internally. In that sense, the only alternative would be to setup your own private global CDN, which by definition is then not really an in-house alternative. For content and media distribution the argumentation is similar. The benefits for SaaS in that context cannot easily be generalized. While there is no alternative to using external social media platforms to reach out globally, there are certainly alternatives for other types of services like collaboration etc.
ESA and the Cloud Quality Software Engineering IaaS While computation as the base Cloud service works pretty good, many of the main issues concern data input and output: Getting the data to where the compute capacity is a big problem. The fact that shipping physical devices is actually considered a valid alternative is already kind of a defeat reaction to the problem of not having enough bandwidth to transfer data online. Even with transfer rates in the Gigabit range, data volumes of 100 TB and more are infeasible for online transfer. Another issue of IaaS services is that scaling the resources is up to the user. This is one argument why a further abstraction (towards PaaS) is a good idea. CDN The main issue with CDNs is that they do not follow the Cloud paradigm. Whether this is really a problem is up to the reader. Obviously, as long as it fits the purpose, we should not care too much. However, a more flexible model with respect to on-demand, self-service, and pay-as-you-go would likely help to leverage the benefits even more. SaaS The user base for almost any software services is generally much larger than for underlying services such as PaaS and IaaS, because SaaS provides services to the end user. This implies that the acceptance and learning curve are very important factors to consider when evaluating a service. This is also the reason why switching providers is much more expensive than for lower level services: retraining and buy-in of a large user base to accept a new / different / other system incurs significant cost. General issues and (still) missing services Manageability and real-time tracking of cost often insufficient Long-term Archive solutions unclear and storage integrity rarely guaranteed Bandwidth / data transfer vs. data processing location is problematic (disk shipping although promising is not really a solution that will scale) Data deluge (e.g. Sentinel) requires even more capacity Security issues often not solved yet, governance often not clear Lock-in is likely (very few interoperability commitments between services) User expectations difficult to meet (iTunes-like convenience expected) Managing SLAs is difficult Scale Although one of the main driving factors for the economics of Cloud Computing are economies of scale, some aspects are just not ready now and they will become even greater bottlenecks specifically in respect to the future development of data volume and bandwidth needs
ESA and the Cloud Quality Software Engineering Typical risks The slides collects some critical risks that one might become more exposed to through the use of Cloud Computing. Being aware of what can go wrong and how bad it can be, is very important especially for first movers. It only takes one catastrophic early experience to cause potentially over-compensating measures which could limit the ability to profit from Cloud benefits for a long period of time before trust is again rebuilt or the technology becomes so mainstream that it overcomes the negative bias. Devising measures to mitigate the risks After having identified such risks, it is in everyone’s interest to define reasonable measures to prevent such cases from occurring. Easy to take measures include encryption of data or the introduction of checklists. Such measures are what many people think of as governance. Perhaps the easiest way to get buy-in is to elicit recommendations on such measures from the first movers. Such users are likely to recognize the danger of negative events and are more likely to actively participate in governance measures that they have helped to devise.
ESA and the Cloud Quality Software Engineering Worst case scenarios also happen outside of the cloud context
ESA and the Cloud Quality Software Engineering Summary of Learnings Cloud Computing works There are many opportunities Can be cheaper than in-house alternatives There are some challenges yet to be solved Some technological gaps have yet to be closed There is no need to wait because of that A sample from what others have learned: Conclusions from Netflix large move to Amazon Goals of the migration: Be Faster, Scalable, Available, Productive Main lesson: You only really learn by committing to it Development: Use latency tolerant protocols Can be faster to re-code than fixup datacenter-based apps Services (not jar files) as components Instrumented “service patterns” (rather than code) Operations: Datacenter oriented tools don’t work in the cloud Cloud tools don’t scale for the enterprise (yet) Lots of cloud tools available, but demand “deep-linking” ability Try to hide keys Source: Adrian Cockcroft, Netflix in the Cloud, Nov 2010
ESA and the Cloud Quality Software Engineering
ESA and the Cloud Quality Software Engineering Cloud Vision Obviously, the dashboard on this slide is just an idea for a teaser and the following discussion of where ESA wants to go and how to get there, but it shows some fundamental goals that should be high on the list: Leverage the first movers’ experience Be able to calculate the “true cost” of all IT services Have clear governance Know where the risks are and what measures to take to mitigate them From Virtualization to “Hypercloud” According to Jake Sorofman, it is possible to take a staged approach to adopting cloud computing where at each state you successfully improve upon the previous stage in terms of capabilities. However you also realize ROI with each step to justify the effort of handling the whole move in smaller steps instead of a giant leap. This of course also mitigates risks, assuming you don’t get stuck somewhere along the way. The ultimate goal is what he calls the “Hypercloud” which is defined by: Dynamic sharing of app workload Capacity arbitrage Self-service application provisioning Interpreting this pyramid considering the current market, ESA’s Cloud experience and learnings: Virtualization Cloud experimentation Cloud foundation (The missing PaaS experience) Cloud exploitation Hypercloud
ESA and the Cloud Quality Software Engineering Information For more information about the “Study on Cloud Computing”, the actual study document and other inquiries, please contact: ESA/ESRIN Via Galileo Galilei Casella Postale 64 00044 Frascati (Roma) Italy