This presentation accompanied a practical demonstration of Amazon's Elastic Computing services to CNET students at the University of Plymouth on 16/03/2010.
The practical demonstration involved an obviously parallel problem split on 5 Medium size AMIs. The problem was the calculation of the Clustering Coefficient and the Mean Path Length (Based on the original work done by Watts and Strogatz) for large networks. The code was written in Python taking advantage of the scipy, pyparallel and networkx toolkits
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Amazon Elastic Computing 2
1. Amazon Elastic Computing 2 AthanasiosAnastasiou Signal Processing And Multimedia Communications Research Group University of Plymouth - UK
2. Topics How Did We Get Here? Enabling Technologies Amazon Elastic Computing Why? What? How? A Quick Demonstration Exploring Complex Networks Further Reading & Resources
3. How Did We Get Here?(Enabling Technologies) 1939 The (Modern) Computer Is Born Almost instantly people start thinking about connecting many units (CPUs) together… 1960 The (Modern) Network Is Born 1964 The ‘Virtual Machine’ Is Born 1967 Paper on Amdahl’s Law 1970 The Internet Is Born (ARPANET) (Modern) Distributed Computing Is Born 1975 The Personal Computer Is Born Mass production of CPUs!!! 1988 SoftPC Is Released
4. How Did We Get Here?(Enabling Technologies) 1990 The World Wide Web Is Born A worldwide network of computers…Hmmm Computer Clusters (Local or over the internet) 1991 Linux Is Born 1998 VMWare patents its virtualisation techniques 2002 GRID Computing Bridging together a variety of technologies into ONE system. 2005 Today Cloud Computing Resources (Virtual Computers And Storage Devices) are remotely accessible on demand by some other system over a network (the internet)
5. Amazon Elastic ComputingWhy? On Demand Remote Access To Resources Computational Rent access to computer(s) Storage Rent storage space Easy, Cheap, Available Loose Restrictions Server instances, Databases, Bandwidth etc By Itself An Enabling Technology To: Commercial Projects Scientific Projects
6. AmazonElasticComputingWhat? (1/3) Amazon Online Enterprise Elastic Claiming Resources According To Your Needs Computing CPUs Computational Time What About Storage? Amazon Cloud Storage (S3) Create Disks Mount them on your filesystem Treat them like any other disk space Amazon Elastic Computing Offers Just The Infrastructure
8. Amazon Elastic ComputingWhat? (3/3) Amazon Elastic Computing Offers Just The Infrastructure User Registration Billing User Manage AMIs Manage I.Ps Manage Storage Store AMIs Services CloudWatch Auto Scaling Load Balancing
16. OK, Let’s Do Something With It!!! Time Consuming Tasks 3D Rendering Computational Fluid Dynamics Simulation Search Through A Large / Huge Domain
17. About The Demonstration Search Through A Large Domain Networks Duncan Watts, Steven Strogatz, 1998, Collective Dynamics of ‘Small World’ Networks Networks Abstract construction with many practical applications Nodes Edges Structure Lattice Random Function Structure affects the emergent functionality What if a network is just a little bit random?
18. Exploring Complex Networks Lattice Random Small World Rewiring Probability (p) Different p values lead to networks with varying structures. How can we characterise these networks?
27. Exploring Complex Networks Networks with the ‘Small World’ property are everywhere… Friendships The Internet The Brain
28. Exploring Complex Networks Let’s try and replicate Watts & Strogatz’s results! Based on Python Scipy PyParallel Networkx Amazon Elastic Computing A custom AMI based on Fedora All necessary software already installed 1 Small Instance (Acting as a “Coordinator”) 5 Medium Instances (Acting as “Workers”)
29. Further Reading & Resources The timeline was created with material from the following sources Computer History, Network History, Virtualisation Technology History, Linux Development Timeline, Super Computers Timeline Some Noteworthy Parallel Processing Projects Where YOU can take part! SETI@home Folding@Home Some Noteworthy Virtualisation Software The XenHupervisor (and cloud computing infrastructure) Oracle’s Virtualbox
30. Further Reading & Resources Amazon Web Services A huge resource about amazon’s cloud computing infrastructure Google App Engine Specifically targeted to web applications. Or, build your own cloud! With Ubuntulinux Python The official web page Scipy An example of a service that “integrates” with Amazon Cloud Computing Perhaps the natural evolution of Cloud Computing PiCloud
31. Further Reading & Resources Mapping The Internet For some HUGE graph datasets! The Internet Mapping Project The Opte Project Books SelimAkl, Parallel Computation: Models And Methods Behrooz, Pahrami, Introduction To Parallel Processing J. Rittinghouse & J. Ransome, Cloud Computing Implementation, Management and Security
32.
Hinweis der Redaktion
This is a brief introduction to Amazon’s Cloud computing service. But before we get into this it would be useful to see how did we get here. What were those technologies that enabled Amazon to create its elastic computing service?We will then look into the three key questions about Amazon Elastic Computing.Why? Or in other words what was the need or driving force behind its inceptionWhat Is it?How can we benefit from it?We will then move on to a quick demoAnd finally, for those of you that are more interested I have put together a brief list of further references you can look into.
The reason why this list extends so far in the past is to illustrate the point that this idea of decentralised and distributed processing is in fact almost as old as computers themselves.WWII (unfortunately) and events associated with it accelerated the invention of the modern electronic computer. Alan Turin and the rest of the code breakers in Bletchley park have access to Colossus! The “first programmable digital electronic computing device”…..They split their tasks amongst two of these computers to speed up their code breaking work (!)The evolutions that lead to the birth of the internet start in early 60s. In the mean time, IBM manufactures a series of mainframe computers that run an operating system that abstracts the hardware of a complete computer and uses the term “Virtual Machine”. By 1970, ARPANET is started, again (unfortunately) as a military project. By 1975, a key step is taken. The personal computer is born (!) Which inevitably leads to the mass production of CPUs which in turn means that computational power becomes accessible and affordable by everyone.In 1988 we see the development of SoftPC, a software emulator (You are probably familiar with game machine emulators? This one was a software emulator for the x86 platform)
In 1990, with the inception of HTML and other technologies, the internet acquires a “face” (HTML pages) and starts taking the shape that we know it today. Around the same time, various software toolkits (PVM, MPI) are developed that enable parallel processing on “common” cheap personal computers. Computer clusters and projects that distribute tasks over a very large pool of computers start taking shape. Perhaps the most popular of these projects was (and still is) SETI.1991 Another key step is taken. LinusTorvalds starts working on Linux initially as a uni project. He uses the internet to reach out to other talented people who start putting together Linux. The biggest advantage that Linux provided was that its code was open and available and therefore modifiable. If a “bottleneck” was discovered it was easy for a knowledgeable person to rectify it. As the operating system matures, people start writing software for it. There are no barriers to development, no additional costs to purchase costly development tools and licences and no pressure to generate revenue. Consequently, software is offered for free…Linux gradually conquers the server market.1998 VMWare is granted a patent for its virtualisation techniques…Eventually, it will lead to VMWare as we know it today. Although back then it did not create big waves, you can see that Vmware was in the making for a long time. Also around the same time, some free open source tools start to develop (For example bochs)2002 Various communication, computational and storage technologies have now matured enough to enable GRID computing. GRID computing attempts to abstract various underlying technologies to make a network of computers to appear operating as one. However, this computer might be composed of heterogeneous hardware connected over a heterogeneous network, storing and exchanging data over a number of different technologies without the end user having to mind the details of each system separately.2005 Cloud computing takes its first steps. Cloud computing is where the computers, operating systems, parallel computing and virtualisation software come together to offer remotely accessible resources on demand. Although major players such as Google and Amazon seem to be the major driving forces of this technology, the cloud computing concepts and capabilities continue to develop and grow quickly.And this brings us to today!
Through the rest of this talk, we are going to be looking at one Cloud Computing platform called Amazon Elastic Computing 2.Why should you (or anyone) care about it? Because it offers cheap remote access to resources in an easy way. This basically means that you can rent some computational time or storage capacity and pay by what you use (we will cover pricing later on).So, to make it more relevant to you, imagine that you are working on some project that requires a network of computers or that you would like to have a go at setting up a server with specific capabilities (web server, database server, LDAP server, anything you can think of)… Renting your own server (collocated or stand-alone) would mean something like tens of pounds for a few months (or per month for a ‘stand-alone’) or hundreds of pounds for a year. You would still be restricted in terms of software, bandwidth, number of databases, number of email accounts, etc. With this technology you could rent 10-20 ‘virtual computers’ at a fraction of the equivalent ‘real server’ cost.And of course, let us not forget, that Cloud Computing is itself an enabling technology to a number of commercial and scientific projects. People do find value in this technology to employ it in their businesses or projects (and we will see a few that do later on)
What is Amazon Elastic Computing? The name does a good job at explaining thisAfter all this, it would be good to just keep in mind that Amazon Elastic Computing offers just the infrastructure. In other words, unless your need is 20 networked computers available from the internet…you still have a bit of work to do. This means that you would still need to write the software that runs over this system. We will see what this means in a minute. First of all we need to take a look at a rough sketch of Amazon Elastic Computing and introduce some terminology.
Here is a rough sketch of the key entities in Amazon’s Cloud Computing.If you think about it, given the availability of enabling technologies, the structure of the whole system seems to be following “common sense”. If you pose yourself the question “How would I do this?” and start outlining your answers you would pretty much end up with something like this……Come to think of it, you could end up with something better! So give it a try anyway!!!Users access the service over the internet. Obviously, the service resides in a set of “real machines” or servers that are already networked. Through these servers you can launch ‘virtual servers’. We need a name for these. They are called AMIs from Amazon Machine Image. These are networked with each other on a “virtual network” but, through the use of software switches, are also networked with the real servers (the outside world), the “real network” within Amazon and eventually the internet.We must also point out two more servers that live in this network. The Amazon S3 storage server and a DNS server. You can think of the S3 server as virtualised disk space that belongs to a user and is accessible from the virtual machines. The DNS server makes it possible for the virtual servers to be accessible from ‘the outside world’ or anyone over the internet. Each virtual machine gets an internal name and an external name. If you are trying to access the machine from another computer within the network you can use the internal name while if you are trying to access the machine from the internet you use the external name. As you would expect, machine names and IP addresses are not the same each time an AMI is launched. If you want to uniquely identify a ‘virtual computer’ within this network you can (purchase and) use a static IP that can be binded uniquely to a machine.
Already from this brief description you can see that there are a few tasks that need to be carried out at infrastructure level.We need a framework to register users, bill them for what they use, provide them with tools that makes this infrastructure available to them and also provide services that add value such as Cloudwatch to monitor the ‘health’ of each server, Auto scaling with which you can launch more instances as the server load is increasing and finally provide Load Balancing for large installations.This is what Amazon Elastic Computing is about…You might be wondering, at what cost does all this come to? Let’s take a look at this issue
So now, let’s take a look at HOW does it workPLEASE NOTE: Prices depicted in this slide are as of 15/03/2010
The first step in using Amazon’s services is to register for it. I am not going to go into full detail about this step because it is already covered extensively by Amazon’s documentation at the provided link.Once registration is complete, a user gains access to the amazon web services management console which can be used to manage all available products. If you wanted to have access to a machine you can do it through SSH or SCP for a secure console or secure transfer of files respectively.Let’s see how this looks like.
General Overview
AMIs that are already shared by others.One thing to notice here is the variety of distributions
An overview of the available AMIs and underlying architecture to launch in
Spot instances. Variable pricing according to demand (!)….A computational stock market (!) :-D
OK, let’s take a small pause here. We are about to see what can we do with all this infrastructure but before we go there, are there any questions about the infrastructure so far?
OK, so let’s do something with all these nice little toys!We most commonly turn to parallel computation when we are faced with something that can not be done through:Clever mathematics!Optimising Code!! (or clever programming)Here are a few problems that remain hard even after investing a lot of clever mathematics and programming!!!!3D Rendering: How does light propagate through space and objects? Movies like UP, Toy Story, Wall-e, Shrek, Final Fantasy, etcComputational Fluid Dynamics: How does a fluid flow around an object? To do vehicle Design (Car, Train, Aircraft, Ship, Spaceship etc)Simulation:How would something behave in a given condition? (Before we build it, or while it is still alive)‘Small’ How would an airplane fly? ‘Extra Large’ What if the polar ice caps melt?High resolution weather simulation (and prediction) on Earth (or any other planet) high resolution full brain simulationSearch Through a Large / Huge Domain. This doesn’t mean necessarily literal search. Say for instance, find a name in a list of names. It could also mean, find one image inside Flickr’s huge dataset or What is the average distance between the codewords of a given code? OrWhat is the output of a model for different parameters? And other applications.We will actually look at one of these exploration applications.
We are now going to talk about networks and in particular Complex Networks, focusing on the brilliant work of Watts & Strogatz. This was published on Nature in 1998.
Up until Watts & Strogatz’s paper, graph theory related work employed models of networks. These models provided constructions that either had some well defined structure or were completely random!....But no one had ever looked at the characteristics of networks that leave in between these two extremes. Watts & Strogatz came into this while working on sociology. The nodes in their networks are individuals and the edges represent friendships.They created a model that could return networks that were somewhere in between of lattices and random networks and also studied many real life networks.They found that these networks that were in between order and disorder had some very interesting properties and they also found that this structure is very common in nature. They called these networks, the Small World networks (!)
What they did in order to characterise them was to use two metrics. The clustering coefficient and the mean path length…Here is how they are calculated.And you probably can do this mentaly for these networks over here but what about…
…this network? With just 128 nodes…
….Or this network which is actually a rendering of the connected parts of the internet (Millions of nodes, gazilions of edges)
You might be thinking….Why do we have to study these networks…..Here is why.