Keynote talk at the International Conference on Supercoming 2009, at IBM Yorktown in New York. This is a major update of a talk first given in New Zealand last January. The abstract follows.
The past decade has seen increasingly ambitious and successful methods for outsourcing computing. Approaches such as utility computing, on-demand computing, grid computing, software as a service, and cloud computing all seek to free computer applications from the limiting confines of a single computer. Software that thus runs "outside the box" can be more powerful (think Google, TeraGrid), dynamic (think Animoto, caBIG), and collaborative (think FaceBook, myExperiment). It can also be cheaper, due to economies of scale in hardware and software. The combination of new functionality and new economics inspires new applications, reduces barriers to entry for application providers, and in general disrupts the computing ecosystem. I discuss the new applications that outside-the-box computing enables, in both business and science, and the hardware and software architectures that make these new applications possible.
7. “ Computation may someday be organized as a public utility … The computing utility could become the basis for a new and important industry.” John McCarthy (1961)
8.
9. Time Connectivity (on log scale) Science “ When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances” (George Gilder, 2001) Grid
11. Layered grid architecture (“The Anatomy of the Grid,” 2001) Application Fabric “ Controlling things locally”: Access to, & control of, resources Connectivity “ Talking to things”: communication (Internet protocols) & security Resource “ Sharing single resources”: negotiating access, controlling use Collective “ Managing multiple resources”: ubiquitous infrastructure services User “ Specialized services”: user- or appln-specific distributed services Internet Transport Application Link Internet Protocol Architecture
25. Time Connectivity (on log scale) Science Enterprise “ When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances” (George Gilder, 2001) Grid Cloud
37. Technologies used in Dynamo Problem Technique Advantage Partitioning Consistent hashing Incremental scalability High Availability for writes Vector clocks with reconciliation during reads Version size is decoupled from update rates Handling temporary failures Sloppy quorum and hinted handoff Provides high availability and durability guarantee when some of the replicas are not available Recovering from permanent failures Anti-entropy using Merkle trees Synchronizes divergent replicas in the background Membership and failure detection Gossip-based membership protocol and failure detection. Preserves symmetry and avoids having a centralized registry for storing membership and node liveness information
41. Specializing further … User D S1 S2 S3 Service Provider “ Provide access to data D at S1, S2, S3 with performance P” Resource Provider “ Provide storage with performance P1, network with P2, …” D S1 S2 S3 Replica catalog, User-level multicast, … D S1 S2 S3
42. Using IaaS in biomedical informatics My servers Chicago Chicago handle.net BIRN Chicago IaaS provider Chicago BIRN Chicago
43. Clouds and supercomputers: Conventional wisdom? Too slow Too expensive Clouds/ clusters Super computers Loosely coupled applications Tightly coupled applications ✔ ✔
44. Ed Walker, Benchmarking Amazon EC2 for high-performance scientific computing, ;Login, October 2008.
45. Ed Walker, Benchmarking Amazon EC2 for high-performance scientific computing, ;Login, October 2008.
46. Ed Walker, Benchmarking Amazon EC2 for high-performance scientific computing, ;Login, October 2008.
47. Ed Walker, Benchmarking Amazon EC2 for high-performance scientific computing, ;Login, October 2008.
48. D. Nurmi, J. Brevik, R. Wolski: QBETS: queue bounds estimation from time series. SIGMETRICS 2007: 379-380
49. D. Nurmi, J. Brevik, R. Wolski: QBETS: queue bounds estimation from time series. SIGMETRICS 2007: 379-380
50. D. Nurmi, J. Brevik, R. Wolski: QBETS: queue bounds estimation from time series. SIGMETRICS 2007: 379-380
51. D. Nurmi, J. Brevik, R. Wolski: QBETS: queue bounds estimation from time series. SIGMETRICS 2007: 379-380
52. Clouds and supercomputers: Conventional wisdom? Good for rapid response Too expensive Clouds/ clusters Super computers Loosely coupled applications Tightly coupled applications ✔ ✔
53.
54. Many many tasks: Identifying potential drug targets 2M+ ligands Protein x target(s) (Mike Kubal, Benoit Roux, and others)
55. start report DOCK6 Receptor (1 per protein: defines pocket to bind to) ZINC 3-D structures ligands complexes NAB script parameters (defines flexible residues, #MDsteps) Amber Score: 1. AmberizeLigand 3. AmberizeComplex 5. RunNABScript end BuildNABScript NAB Script NAB Script Template Amber prep: 2. AmberizeReceptor 4. perl: gen nabscript FRED Receptor (1 per protein: defines pocket to bind to) Manually prep DOCK6 rec file Manually prep FRED rec file 1 protein (1MB) PDB protein descriptions For 1 target: 4 million tasks 500,000 cpu-hrs (50 cpu-years) 6 GB 2M structures (6 GB) DOCK6 FRED ~4M x 60s x 1 cpu ~60K cpu-hrs Amber ~10K x 20m x 1 cpu ~3K cpu-hrs Select best ~500 ~500 x 10hr x 100 cpu ~500K cpu-hrs GCMC Select best ~5K Select best ~5K
64. Clouds and supercomputers: Conventional wisdom? Good for rapid response Excellent Clouds/ clusters Super computers Loosely coupled applications Tightly coupled applications ✔ ✔
65. “ The computer revolution hasn’t happened yet.” Alan Kay, 1997
66. Time Connectivity (on log scale) Science Enterprise Consumer “ When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances” (George Gilder, 2001) Grid Cloud ????