SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Cloud Computing Lecture #1
What is Cloud Computing?
(and an intro to parallel/distributed processing)




                             Jimmy Lin
                             The iSchool
                             University of Maryland

                             Wednesday, September 3, 2008


       Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet,
       Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)
       This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States
       See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details
Source: http://www.free-pictures-photos.com/
What is Cloud Computing?
1.   Web-scale problems
2.   Large data centers
3.   Different models of computing
4.   Highly-interactive Web applications




                                                       The iSchool
                                           University of Maryland
1. Web-Scale Problems
   Characteristics:
       Definitely data-intensive
       May also be processing intensive
   Examples:
       Crawling, indexing, searching, mining the Web
       “Post-genomics” life sciences research
       Other scientific data (physics, astronomers, etc.)
       Sensor networks
       Web 2.0 applications
       …




                                                                     The iSchool
                                                         University of Maryland
How much data?
   Wayback Machine has 2 PB + 20 TB/month (2006)
   Google processes 20 PB a day (2008)
   “all words ever spoken by human beings” ~ 5 EB
   NOAA has ~1 PB climate data (2007)
   CERN’s LHC will generate 15 PB a year (2008)

                           640K ought to be
                           enough for anybody.




                                                             The iSchool
                                                 University of Maryland
Maximilien Brice, © CERN
Maximilien Brice, © CERN
There’s nothing like more data!
      s/inspiration/data/g;




                                          The iSchool
(Banko and Brill, ACL 2001)
                              University of Maryland
(Brants et al., EMNLP 2007)
What to do with more data?
          Answering factoid questions
                 Pattern matching on the Web
                 Works amazingly well
                            Who shot Abraham Lincoln? → X shot Abraham Lincoln

          Learning relations
                 Start with seed instances
                 Search for patterns on the Web
                 Using patterns to find more instances
                                                            Wolfgang Amadeus Mozart (1756 - 1791)
                                                            Einstein was born in 1879

               Birthday-of(Mozart, 1756)
               Birthday-of(Einstein, 1879)
                                                                    PERSON (DATE –
                                                                    PERSON was born in DATE

                                                                                                    The iSchool
(Brill et al., TREC 2001; Lin, ACM TOIS 2007)
(Agichtein and Gravano, DL 2000; Ravichandran and Hovy, ACL 2002; … )                   University of Maryland
2. Large Data Centers
   Web-scale problems? Throw more machines at it!
   Clear trend: centralization of computing resources in large
    data centers
       Necessary ingredients: fiber, juice, and space
       What do Oregon, Iceland, and abandoned mines have in
        common?
   Important Issues:
       Redundancy
       Efficiency
       Utilization
       Management



                                                               The iSchool
                                                   University of Maryland
Source: Harper’s (Feb, 2008)
Maximilien Brice, © CERN
Key Technology: Virtualization


                           App      App          App

    App     App      App   OS       OS           OS

      Operating System           Hypervisor

          Hardware               Hardware

      Traditional Stack      Virtualized Stack




                                                   The iSchool
                                       University of Maryland
3. Different Computing Models
     “Why do it yourself if you can pay someone to do it for you?”

   Utility computing
       Why buy machines when you can rent cycles?
       Examples: Amazon’s EC2, GoGrid, AppNexus
   Platform as a Service (PaaS)
       Give me nice API and take care of the implementation
       Example: Google App Engine
   Software as a Service (SaaS)
       Just run it for me!
       Example: Gmail



                                                                         The iSchool
                                                             University of Maryland
4. Web Applications
   A mistake on top of a hack built on sand held together by
    duct tape?
   What is the nature of software applications?
       From the desktop to the browser
       SaaS == Web-based applications
       Examples: Google Maps, Facebook
   How do we deliver highly-interactive Web-based
    applications?
       AJAX (asynchronous JavaScript and XML)
       For better, or for worse…




                                                             The iSchool
                                                 University of Maryland
What is the course about?
   MapReduce: the “back-end” of cloud computing
       Batch-oriented processing of large datasets
   Ajax: the “front-end” of cloud computing
       Highly-interactive Web-based applications
   Computing “in the clouds”
       Amazon’s EC2/S3 as an example of utility computing




                                                                  The iSchool
                                                      University of Maryland
Amazon Web Services
   Elastic Compute Cloud (EC2)
       Rent computing resources by the hour
       Basic unit of accounting = instance-hour
       Additional costs for bandwidth
   Simple Storage Service (S3)
       Persistent storage
       Charge by the GB/month
       Additional costs for bandwidth
   You’ll be using EC2/S3 for course assignments!




                                                               The iSchool
                                                   University of Maryland
This course is not for you…
    If you’re not genuinely interested in the topic
    If you’re not ready to do a lot of programming
    If you’re not open to thinking about computing in new ways
    If you can’t cope with uncertainly, unpredictability, poor
     documentation, and immature software
    If you can’t put in the time



             Otherwise, this will be a richly rewarding course!



                                                                    The iSchool
                                                        University of Maryland
Source: http://davidzinger.wordpress.com/2007/05/page/2/
Cloud Computing Zen
   Don’t get frustrated (take a deep breath)…
       This is bleeding edge technology
       Those W$*#T@F! moments
   Be patient…
       This is the second first time I’ve taught this course
   Be flexible…
       There will be unanticipated issues along the way
   Be constructive…
       Tell me how I can make everyone’s experience better




                                                                      The iSchool
                                                          University of Maryland
Source: Wikipedia
Source: Wikipedia
Source: Wikipedia
Source: Wikipedia
Things to go over…
   Course schedule
   Assignments and deliverables
   Amazon EC2/S3




                                               The iSchool
                                   University of Maryland
Web-Scale Problems?
   Don’t hold your breath:
       Biocomputing
       Nanocomputing
       Quantum computing
       …
   It all boils down to…
       Divide-and-conquer
       Throwing more hardware at the problem




                       Simple to understand… a lifetime to master…

                                                                       The iSchool
                                                           University of Maryland
Divide and Conquer

                 “Work”
                                       Partition


        w1         w2         w3


      “worker”   “worker”   “worker”


         r1         r2         r3




                 “Result”              Combine


                                                     The iSchool
                                         University of Maryland
Different Workers
   Different threads in the same core
   Different cores in the same CPU
   Different CPUs in a multi-processor system
   Different machines in a distributed system




                                                             The iSchool
                                                 University of Maryland
Choices, Choices, Choices
   Commodity vs. “exotic” hardware
   Number of machines vs. processor vs. cores
   Bandwidth of memory vs. disk vs. network
   Different programming models




                                                           The iSchool
                                               University of Maryland
Flynn’s Taxonomy

                                        Instructions

                              Single (SI)         Multiple (MI)

                                 SISD                  MISD
           Single (SD)


                            Single-threaded         Pipeline
                                process           architecture
    Data




                                SIMD                   MIMD
           Multiple (MD)




                           Vector Processing     Multi-threaded
                                                 Programming




                                                                          The iSchool
                                                              University of Maryland
SISD



                    Processor


       D   D   D        D         D   D   D




                   Instructions




                                                          The iSchool
                                              University of Maryland
SIMD

                       Processor


       D0   D0   D0       D0         D0   D0   D0
       D1   D1   D1       D1         D1   D1   D1
       D2   D2   D2       D2         D2   D2   D2
       D3   D3   D3       D3         D3   D3   D3
       D4   D4   D4       D4         D4   D4   D4
       …    …    …        …          …    …    …
       Dn   Dn   Dn       Dn         Dn   Dn   Dn



                      Instructions



                                                            The iSchool
                                                University of Maryland
MIMD
                    Processor


       D   D   D        D         D   D   D




                   Instructions

                    Processor


       D   D   D        D         D   D   D




                   Instructions


                                                          The iSchool
                                              University of Maryland
Memory Typology: Shared




           Processor            Processor

                       Memory

           Processor            Processor




                                                        The iSchool
                                            University of Maryland
Memory Typology: Distributed



       Processor   Memory             Processor   Memory


                            Network



       Processor   Memory             Processor   Memory




                                                              The iSchool
                                                  University of Maryland
Memory Typology: Hybrid


       Processor                      Processor
                   Memory                         Memory
       Processor                      Processor


                            Network



       Processor                      Processor
                   Memory                         Memory
       Processor                      Processor




                                                              The iSchool
                                                  University of Maryland
Parallelization Problems
    How do we assign work units to workers?
    What if we have more work units than workers?
    What if workers need to share partial results?
    How do we aggregate partial results?
    How do we know all the workers have finished?
    What if workers die?


                What is the common theme of all of these problems?



                                                                The iSchool
                                                    University of Maryland
General Theme?
   Parallelization problems arise from:
       Communication between workers
       Access to shared resources (e.g., data)
   Thus, we need a synchronization system!
   This is tricky:
       Finding bugs is hard
       Solving bugs is even harder




                                                              The iSchool
                                                  University of Maryland
Managing Multiple Workers
   Difficult because
       (Often) don’t know the order in which workers run
       (Often) don’t know where the workers are running
       (Often) don’t know when workers interrupt each other
   Thus, we need:
       Semaphores (lock, unlock)
       Conditional variables (wait, notify, broadcast)
       Barriers
   Still, lots of problems:
       Deadlock, livelock, race conditions, ...
   Moral of the story: be careful!
       Even trickier if the workers are on different machines
                                                                      The iSchool
                                                          University of Maryland
Patterns for Parallelism
    Parallel computing has been around for decades
    Here are some “design patterns” …




                                                         The iSchool
                                             University of Maryland
Master/Slaves



                master




                slaves




                                     The iSchool
                         University of Maryland
Producer/Consumer Flow




         P     C   P     C


         P     C   P     C


         P     C   P     C




                                         The iSchool
                             University of Maryland
Work Queues




        P                    C
              shared queue

        P     W W W W W      C

        P                    C




                                             The iSchool
                                 University of Maryland
Rubber Meets Road
   From patterns to implementation:
       pthreads, OpenMP for multi-threaded programming
       MPI for clustering computing
       …
   The reality:
       Lots of one-off solutions, custom code
       Write you own dedicated library, then program with it
       Burden on the programmer to explicitly manage everything
   MapReduce to the rescue!
       (for next time)




                                                                 The iSchool
                                                     University of Maryland

Weitere ähnliche Inhalte

Andere mochten auch

Parc national de pingualuit (evan burman)
Parc national de pingualuit (evan burman)Parc national de pingualuit (evan burman)
Parc national de pingualuit (evan burman)embur
 
Confesercenti Venezia - newsletter giugno 2012
Confesercenti Venezia - newsletter giugno 2012Confesercenti Venezia - newsletter giugno 2012
Confesercenti Venezia - newsletter giugno 2012confesercentivenezia
 
Bairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHOBairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHOPascale Gaudet
 
BioDBCore: Current Status and Next Developments
BioDBCore: Current Status and Next DevelopmentsBioDBCore: Current Status and Next Developments
BioDBCore: Current Status and Next DevelopmentsPascale Gaudet
 
José Cruz Toledo - Aptamer basebc2012
José Cruz  Toledo - Aptamer basebc2012José Cruz  Toledo - Aptamer basebc2012
José Cruz Toledo - Aptamer basebc2012Pascale Gaudet
 

Andere mochten auch (7)

Rinaldi - ODIN
Rinaldi - ODINRinaldi - ODIN
Rinaldi - ODIN
 
Parc national de pingualuit (evan burman)
Parc national de pingualuit (evan burman)Parc national de pingualuit (evan burman)
Parc national de pingualuit (evan burman)
 
Confesercenti Venezia - newsletter giugno 2012
Confesercenti Venezia - newsletter giugno 2012Confesercenti Venezia - newsletter giugno 2012
Confesercenti Venezia - newsletter giugno 2012
 
Masson - ViralZone
Masson - ViralZoneMasson - ViralZone
Masson - ViralZone
 
Bairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHOBairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHO
 
BioDBCore: Current Status and Next Developments
BioDBCore: Current Status and Next DevelopmentsBioDBCore: Current Status and Next Developments
BioDBCore: Current Status and Next Developments
 
José Cruz Toledo - Aptamer basebc2012
José Cruz  Toledo - Aptamer basebc2012José Cruz  Toledo - Aptamer basebc2012
José Cruz Toledo - Aptamer basebc2012
 

Ähnlich wie Cloud computing

ccna course 2
ccna course 2ccna course 2
ccna course 2S Sridhar
 
Introduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionIntroduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionDarian Frajberg
 
Conceptual Structures in STEM education
Conceptual Structures in STEM educationConceptual Structures in STEM education
Conceptual Structures in STEM educationSu White
 
100% training accuracy without overfitting
100% training accuracy without overfitting100% training accuracy without overfitting
100% training accuracy without overfittingLibgirlTeam
 
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing  with MapReduce Data-Intensive Text Processing  with MapReduce
Data-Intensive Text Processing with MapReduce George Ang
 
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduceData-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduceGeorge Ang
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
MeDiCI - How to Withstand a Research Data Tsunami
MeDiCI - How to Withstand a Research Data TsunamiMeDiCI - How to Withstand a Research Data Tsunami
MeDiCI - How to Withstand a Research Data Tsunamiinside-BigData.com
 
Dm sei-tutorial-v7
Dm sei-tutorial-v7Dm sei-tutorial-v7
Dm sei-tutorial-v7CS, NcState
 
Cyberinfrastructure and its Role in Science
Cyberinfrastructure and its Role in ScienceCyberinfrastructure and its Role in Science
Cyberinfrastructure and its Role in ScienceCameron Kiddle
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchUniversity of Washington
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
 
Code for science (rev 2)
Code for science (rev 2)Code for science (rev 2)
Code for science (rev 2)Andy Lenards
 
Learning in the Microcosmos
Learning in the MicrocosmosLearning in the Microcosmos
Learning in the Microcosmosopenforum
 
Lindner Microcontent Standards 2008
Lindner Microcontent Standards 2008Lindner Microcontent Standards 2008
Lindner Microcontent Standards 2008Lindner Martin
 
ICT and its impact on schools’ infrastructure, teaching and learning
ICT and its impact on schools’ infrastructure, teaching and learning ICT and its impact on schools’ infrastructure, teaching and learning
ICT and its impact on schools’ infrastructure, teaching and learning Mark S. Steed
 

Ähnlich wie Cloud computing (20)

ccna course 2
ccna course 2ccna course 2
ccna course 2
 
Data Research Vision
Data Research VisionData Research Vision
Data Research Vision
 
Introduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionIntroduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolution
 
Conceptual Structures in STEM education
Conceptual Structures in STEM educationConceptual Structures in STEM education
Conceptual Structures in STEM education
 
100% training accuracy without overfitting
100% training accuracy without overfitting100% training accuracy without overfitting
100% training accuracy without overfitting
 
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing  with MapReduce Data-Intensive Text Processing  with MapReduce
Data-Intensive Text Processing with MapReduce
 
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduceData-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduce
 
Empirical AI Research
Empirical AI Research Empirical AI Research
Empirical AI Research
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
MeDiCI - How to Withstand a Research Data Tsunami
MeDiCI - How to Withstand a Research Data TsunamiMeDiCI - How to Withstand a Research Data Tsunami
MeDiCI - How to Withstand a Research Data Tsunami
 
Dm sei-tutorial-v7
Dm sei-tutorial-v7Dm sei-tutorial-v7
Dm sei-tutorial-v7
 
Cyberinfrastructure and its Role in Science
Cyberinfrastructure and its Role in ScienceCyberinfrastructure and its Role in Science
Cyberinfrastructure and its Role in Science
 
Environmental Science, Big Data and the Cloud
Environmental Science, Big Data and the CloudEnvironmental Science, Big Data and the Cloud
Environmental Science, Big Data and the Cloud
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible Research
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
 
Code for science (rev 2)
Code for science (rev 2)Code for science (rev 2)
Code for science (rev 2)
 
Cloud hpc-bigdata-challenges
Cloud hpc-bigdata-challengesCloud hpc-bigdata-challenges
Cloud hpc-bigdata-challenges
 
Learning in the Microcosmos
Learning in the MicrocosmosLearning in the Microcosmos
Learning in the Microcosmos
 
Lindner Microcontent Standards 2008
Lindner Microcontent Standards 2008Lindner Microcontent Standards 2008
Lindner Microcontent Standards 2008
 
ICT and its impact on schools’ infrastructure, teaching and learning
ICT and its impact on schools’ infrastructure, teaching and learning ICT and its impact on schools’ infrastructure, teaching and learning
ICT and its impact on schools’ infrastructure, teaching and learning
 

Kürzlich hochgeladen

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Kürzlich hochgeladen (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Cloud computing

  • 1. Cloud Computing Lecture #1 What is Cloud Computing? (and an intro to parallel/distributed processing) Jimmy Lin The iSchool University of Maryland Wednesday, September 3, 2008 Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details
  • 3. What is Cloud Computing? 1. Web-scale problems 2. Large data centers 3. Different models of computing 4. Highly-interactive Web applications The iSchool University of Maryland
  • 4. 1. Web-Scale Problems  Characteristics:  Definitely data-intensive  May also be processing intensive  Examples:  Crawling, indexing, searching, mining the Web  “Post-genomics” life sciences research  Other scientific data (physics, astronomers, etc.)  Sensor networks  Web 2.0 applications  … The iSchool University of Maryland
  • 5. How much data?  Wayback Machine has 2 PB + 20 TB/month (2006)  Google processes 20 PB a day (2008)  “all words ever spoken by human beings” ~ 5 EB  NOAA has ~1 PB climate data (2007)  CERN’s LHC will generate 15 PB a year (2008) 640K ought to be enough for anybody. The iSchool University of Maryland
  • 8. There’s nothing like more data! s/inspiration/data/g; The iSchool (Banko and Brill, ACL 2001) University of Maryland (Brants et al., EMNLP 2007)
  • 9. What to do with more data?  Answering factoid questions  Pattern matching on the Web  Works amazingly well Who shot Abraham Lincoln? → X shot Abraham Lincoln  Learning relations  Start with seed instances  Search for patterns on the Web  Using patterns to find more instances Wolfgang Amadeus Mozart (1756 - 1791) Einstein was born in 1879 Birthday-of(Mozart, 1756) Birthday-of(Einstein, 1879) PERSON (DATE – PERSON was born in DATE The iSchool (Brill et al., TREC 2001; Lin, ACM TOIS 2007) (Agichtein and Gravano, DL 2000; Ravichandran and Hovy, ACL 2002; … ) University of Maryland
  • 10. 2. Large Data Centers  Web-scale problems? Throw more machines at it!  Clear trend: centralization of computing resources in large data centers  Necessary ingredients: fiber, juice, and space  What do Oregon, Iceland, and abandoned mines have in common?  Important Issues:  Redundancy  Efficiency  Utilization  Management The iSchool University of Maryland
  • 13. Key Technology: Virtualization App App App App App App OS OS OS Operating System Hypervisor Hardware Hardware Traditional Stack Virtualized Stack The iSchool University of Maryland
  • 14. 3. Different Computing Models “Why do it yourself if you can pay someone to do it for you?”  Utility computing  Why buy machines when you can rent cycles?  Examples: Amazon’s EC2, GoGrid, AppNexus  Platform as a Service (PaaS)  Give me nice API and take care of the implementation  Example: Google App Engine  Software as a Service (SaaS)  Just run it for me!  Example: Gmail The iSchool University of Maryland
  • 15. 4. Web Applications  A mistake on top of a hack built on sand held together by duct tape?  What is the nature of software applications?  From the desktop to the browser  SaaS == Web-based applications  Examples: Google Maps, Facebook  How do we deliver highly-interactive Web-based applications?  AJAX (asynchronous JavaScript and XML)  For better, or for worse… The iSchool University of Maryland
  • 16. What is the course about?  MapReduce: the “back-end” of cloud computing  Batch-oriented processing of large datasets  Ajax: the “front-end” of cloud computing  Highly-interactive Web-based applications  Computing “in the clouds”  Amazon’s EC2/S3 as an example of utility computing The iSchool University of Maryland
  • 17. Amazon Web Services  Elastic Compute Cloud (EC2)  Rent computing resources by the hour  Basic unit of accounting = instance-hour  Additional costs for bandwidth  Simple Storage Service (S3)  Persistent storage  Charge by the GB/month  Additional costs for bandwidth  You’ll be using EC2/S3 for course assignments! The iSchool University of Maryland
  • 18. This course is not for you…  If you’re not genuinely interested in the topic  If you’re not ready to do a lot of programming  If you’re not open to thinking about computing in new ways  If you can’t cope with uncertainly, unpredictability, poor documentation, and immature software  If you can’t put in the time Otherwise, this will be a richly rewarding course! The iSchool University of Maryland
  • 20. Cloud Computing Zen  Don’t get frustrated (take a deep breath)…  This is bleeding edge technology  Those W$*#T@F! moments  Be patient…  This is the second first time I’ve taught this course  Be flexible…  There will be unanticipated issues along the way  Be constructive…  Tell me how I can make everyone’s experience better The iSchool University of Maryland
  • 25. Things to go over…  Course schedule  Assignments and deliverables  Amazon EC2/S3 The iSchool University of Maryland
  • 26. Web-Scale Problems?  Don’t hold your breath:  Biocomputing  Nanocomputing  Quantum computing  …  It all boils down to…  Divide-and-conquer  Throwing more hardware at the problem Simple to understand… a lifetime to master… The iSchool University of Maryland
  • 27. Divide and Conquer “Work” Partition w1 w2 w3 “worker” “worker” “worker” r1 r2 r3 “Result” Combine The iSchool University of Maryland
  • 28. Different Workers  Different threads in the same core  Different cores in the same CPU  Different CPUs in a multi-processor system  Different machines in a distributed system The iSchool University of Maryland
  • 29. Choices, Choices, Choices  Commodity vs. “exotic” hardware  Number of machines vs. processor vs. cores  Bandwidth of memory vs. disk vs. network  Different programming models The iSchool University of Maryland
  • 30. Flynn’s Taxonomy Instructions Single (SI) Multiple (MI) SISD MISD Single (SD) Single-threaded Pipeline process architecture Data SIMD MIMD Multiple (MD) Vector Processing Multi-threaded Programming The iSchool University of Maryland
  • 31. SISD Processor D D D D D D D Instructions The iSchool University of Maryland
  • 32. SIMD Processor D0 D0 D0 D0 D0 D0 D0 D1 D1 D1 D1 D1 D1 D1 D2 D2 D2 D2 D2 D2 D2 D3 D3 D3 D3 D3 D3 D3 D4 D4 D4 D4 D4 D4 D4 … … … … … … … Dn Dn Dn Dn Dn Dn Dn Instructions The iSchool University of Maryland
  • 33. MIMD Processor D D D D D D D Instructions Processor D D D D D D D Instructions The iSchool University of Maryland
  • 34. Memory Typology: Shared Processor Processor Memory Processor Processor The iSchool University of Maryland
  • 35. Memory Typology: Distributed Processor Memory Processor Memory Network Processor Memory Processor Memory The iSchool University of Maryland
  • 36. Memory Typology: Hybrid Processor Processor Memory Memory Processor Processor Network Processor Processor Memory Memory Processor Processor The iSchool University of Maryland
  • 37. Parallelization Problems  How do we assign work units to workers?  What if we have more work units than workers?  What if workers need to share partial results?  How do we aggregate partial results?  How do we know all the workers have finished?  What if workers die? What is the common theme of all of these problems? The iSchool University of Maryland
  • 38. General Theme?  Parallelization problems arise from:  Communication between workers  Access to shared resources (e.g., data)  Thus, we need a synchronization system!  This is tricky:  Finding bugs is hard  Solving bugs is even harder The iSchool University of Maryland
  • 39. Managing Multiple Workers  Difficult because  (Often) don’t know the order in which workers run  (Often) don’t know where the workers are running  (Often) don’t know when workers interrupt each other  Thus, we need:  Semaphores (lock, unlock)  Conditional variables (wait, notify, broadcast)  Barriers  Still, lots of problems:  Deadlock, livelock, race conditions, ...  Moral of the story: be careful!  Even trickier if the workers are on different machines The iSchool University of Maryland
  • 40. Patterns for Parallelism  Parallel computing has been around for decades  Here are some “design patterns” … The iSchool University of Maryland
  • 41. Master/Slaves master slaves The iSchool University of Maryland
  • 42. Producer/Consumer Flow P C P C P C P C P C P C The iSchool University of Maryland
  • 43. Work Queues P C shared queue P W W W W W C P C The iSchool University of Maryland
  • 44. Rubber Meets Road  From patterns to implementation:  pthreads, OpenMP for multi-threaded programming  MPI for clustering computing  …  The reality:  Lots of one-off solutions, custom code  Write you own dedicated library, then program with it  Burden on the programmer to explicitly manage everything  MapReduce to the rescue!  (for next time) The iSchool University of Maryland