SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
Infrastructure for Cloud
Computing
Dahai Li

2008/06/12
Agenda

    • About Cloud Computing
    • Tools for Cloud Computing in Google
    • Google’s partnerships with universities




2
What’s new?




3
Advantages

• Data safety and reliability
• Data synchronization between different
 devices
• Low requirement of end device
• Unlimited potential of the cloud
Cloud for end user




                     Google Cloud
Cloud for web developer


             Google Cloud



                            APIs
Example: Earthquake map based on Map API




7
Agenda

    • About Cloud Computing
    • Tools for Cloud Computing in Google
    • Google’s partnerships with universities




8
google.stanford.edu (circa 1997)
google.com (1999)
Google Data Center (circa 2000)
Google File System (GFS)




12
Why GFS?

     • Google has unusual requirements
     • Unfair advantage
     • Fun and challenging to build large-scale
      systems




13
GFS Architecture




                         Replicas
                           GFS Master
               Masters    MSN                             Client
                          19% Master
                           GFS
                       Google                             Client
                                                         Client
                        48%
                                                         Client
                                                          Client

      C0     C1             C1            C0    C5       Client
                                                          Client
                         Yahoo

      C5     C2       C5
                          33%
                            C3       …          C2
                                                         Client
                                                          Client

     Chunkserver 1   Chunkserver 2       Chunkserver N




14
Master

     • Maintain Metadata:
       – File namespace
       – Access control info
       – Maps files to chunks
     • Control system activities:
       – Monitor state of chunkservers
       – Chunk allocation and placement
       – Initiate chunk recovery and rebalancing
       – Garbage collect dead chunks
       – Collect and display stats, admin functions
15
Client

     • Protocol implemented by client library
     • Read protocol




16
GFS Usage in Google Cloud

     • 50+ clusters
     • Filesystem clusters of up to 1000+
      machines
     • Pools of 1000+ clients
     • 10+ GB/s read/write load
       – in the presence of frequent hardware failures




17
MapReduce




18
What’s MapReduce

     • A simple programming model that applies to
      many large-scale computing problems
     • Hide messy details in MapReduce runtime
      library




19
Typical problem solved by MapReduce

     • Read a lot of data
     • Map: extract something you care about from
      each record
     • Shuffle and Sort
     • Reduce: aggregate, summarize, filter, or
      transform
     • Write the results




20
More specifically…

     • Programmer specifies two primary methods:
       – map(k, v) → <k', v'>*
       – reduce(k', <v'>*) → <k', v'>*
     • All v' with same k' are reduced together, in
      order.




21
Example: Word Frequencies in Web Pages

     • Input is files with one document per record
     • Specify a map function that takes a key/value pair
       – key = document URL
       – value = document contents
     • Output of map function is (potentially many) key/value
      pairs.
       – In our case, output (word, “1”) once per word in the
          document
                        <“网页1”, “是也不是”>

                              <“是”, “1”>
                              <“也”, “1”>
                              <“不”, “1”>
                              …
22
Continued: word frequencies in web pages

     • MapReduce library gathers together all pairs with the
      same key (shuffle/sort)
     • The reduce function combines the values for a key
      In our case, compute the sum
     key = “是”               key = “也”         key = “不”
     values = “1”, “1”       values = “1”      values = “1”

           “2”                   “1”               “1”

     • Output of reduce (usually 0 or 1 value) paired with key
      and saved
                             “是”, “2”
                             “也”, “1”
                             “不”, “1”


23
Example: Pseudo-code


     Map(String input_key, String input_value):
      // input_key: document name
      // input_value: document contents
      for each word w in input_values:
        EmitIntermediate(w, "1");

     Reduce(String key, Iterator intermediate_values):
      // key: a word, same for input and output
      // intermediate_values: a list of counts
      int result = 0;
      for each v in intermediate_values:
        result += ParseInt(v);
      Emit(AsString(result));



24
Conclusion to MapReduce

     • MapReduce has proven to be a remarkably-useful
      abstraction
     • Greatly simplifies large-scale computations at Google
     • Fun to use: focus on problem, let library deal with messy
      details
     • Many thousands of parallel programs written by
      hundreds of different programmers in last few years
       – Many had no prior parallel or distributed programming
          experience




25
BigTable




26
Overview

     • Structure data storage, not database
     • Wide applicability
     • Scalability
     • High performance
     • High availability




27
Basic Data Model

     • Distributed multi-dimensional sparse map
           (row, column, timestamp)        cell contents

                                 “contents”       COLUMNS

          ROWS
                                      …
     www.cnn.com                                   t1
                                    …
                                                t2
                               “<html>…”       t3 TIMESTAMPS




     • Good match for most of our applications


28
BigTable API

     • Metadata operations
       – Create/delete tables, column families, change metadata
     • Writes (atomic)
       – Set(): write cells in a row
       – DeleteCells(): delete cells in a row
       – DeleteRow(): delete all cells in a row
     • Reads
       – Scanner: read arbitrary cells in a bigtable




29
System Structure

                                                              Bigtable client
     Bigtable cell
                                                              Bigtable client
                                  Bigtable master                 library
                            performs metadata ops,              Open()
                                 load balancing

   Bigtable tablet server      Bigtable tablet server   Bigtable tablet server

       serves data                 serves data              serves data



   Cluster Scheduling Master             GFS               Lock service
handles failover, monitoring holds tablet data, logs holds metadata,
                                                   handles master-election
Current status of BigTable

     • Design/initial implementation started beginning of 2004
     • Currently ~100 BigTable cells
     • Production use or active development for many projects:
        – Google Print
        – My Search History
        – Orkut
        – Crawling/indexing pipeline
        – Google Maps/Google Earth
        – Blogger
        – …
     • Largest bigtable cell manages ~200TB of data spread
       over several thousand machines (larger cells planned)



31
Typical Cluster



             Lock service          GFS master             Scheduling masters


             Machine 1                  Machine 2                   Machine N
      User                       User                                         User
      app1                       app1                                         app3
                                                  User
     User app2                                    app3         User app2
                                                          …
     Scheduler       GFS        Scheduler      GFS            Scheduler       GFS
       slave      chunkserver     slave     chunkserver         slave      chunkserver
               Linux                      Linux                       Linux




32
Agenda

     • About Cloud Computing
     • Tools for Cloud Computing in Google
     • Google’s partnerships with universities




33
ACCI in Oct. 2007

     • Stand for Academic Cloud Computing
      Initiative
     • IBM and Google partnership
     • Facilitate universities education with
      distributed system programming skills
     • Started from University of Washington and
      scaling to many others




34
Google’s ACCI activities in Greater China

• Google Greater China has helped create a
 cloud computing course at Tsinghua in
 summer 2007
• Now scaling to other mainland China and
 Taiwan Universities
Example: THU MR Course, Fall 2007

• “Massive Data Processing” course based
 on Google Cloud technology
• Google employees gave lectures during
 the course offering;
• Got interesting results from the smart
 students



• http://hpc.cs.tsinghua.edu.cn/dpcourse/
Count: THU MR Course, Fall 2007




Students presenting course          Massive data processing to
project “simulating the operation   simulate the operation of
of solar system based on            the solar system
MapReduce technology” at
Google office
THANK YOU



More info on
        http://code.google.com/intl/zh-CN/

Weitere ähnliche Inhalte

Andere mochten auch

Nuxeo CMF, a framework for case centric applications
Nuxeo CMF, a framework for case centric applicationsNuxeo CMF, a framework for case centric applications
Nuxeo CMF, a framework for case centric applicationsNuxeo
 
Armedia nci content gov_alfresco_20120124_v1.0
Armedia nci content gov_alfresco_20120124_v1.0Armedia nci content gov_alfresco_20120124_v1.0
Armedia nci content gov_alfresco_20120124_v1.0Armedia LLC
 
Armedia Case Management with Alfresco ECM
Armedia Case Management with Alfresco ECMArmedia Case Management with Alfresco ECM
Armedia Case Management with Alfresco ECMArmedia LLC
 
Introduction to case management - Roeland Loggen vs1.1
Introduction to case management - Roeland Loggen vs1.1Introduction to case management - Roeland Loggen vs1.1
Introduction to case management - Roeland Loggen vs1.1rloggen
 
Nigeria national iccm implementation framework
Nigeria national iccm implementation frameworkNigeria national iccm implementation framework
Nigeria national iccm implementation frameworktomowo George
 
Composing a case management solution with SaaS, PaaS, On-premise products
Composing a case management solution with SaaS, PaaS, On-premise productsComposing a case management solution with SaaS, PaaS, On-premise products
Composing a case management solution with SaaS, PaaS, On-premise productsLeon Smiers
 
Amplexor - The K2 Case Management Framework
Amplexor - The K2 Case Management FrameworkAmplexor - The K2 Case Management Framework
Amplexor - The K2 Case Management FrameworkAmplexor
 
Nuxeo World Session: Case Management Framework
Nuxeo World Session: Case Management FrameworkNuxeo World Session: Case Management Framework
Nuxeo World Session: Case Management FrameworkNuxeo
 
Nuxeo ECM Platform - Technical Overview
Nuxeo ECM Platform - Technical OverviewNuxeo ECM Platform - Technical Overview
Nuxeo ECM Platform - Technical OverviewNuxeo
 
Managing the Cloud with Open Source Tools
Managing the Cloud with Open Source ToolsManaging the Cloud with Open Source Tools
Managing the Cloud with Open Source ToolsNakul Ezhuthupally
 
Open Source Tool Chains for Cloud Computing
Open Source Tool Chains for Cloud ComputingOpen Source Tool Chains for Cloud Computing
Open Source Tool Chains for Cloud ComputingMark Hinkle
 
Electronic Case Management System(eCMS) proposal
Electronic Case Management System(eCMS) proposalElectronic Case Management System(eCMS) proposal
Electronic Case Management System(eCMS) proposalLaud Randy Amofah
 
Dream of the (blue) Effective Case Management System
Dream of the (blue) Effective Case Management SystemDream of the (blue) Effective Case Management System
Dream of the (blue) Effective Case Management SystemSalesforce Engineering
 
Odoo - Open Source CMS: A performance comparision
Odoo - Open Source CMS: A performance comparisionOdoo - Open Source CMS: A performance comparision
Odoo - Open Source CMS: A performance comparisionOdoo
 
2015 Future of Open Source Survey Results
2015 Future of Open Source Survey Results2015 Future of Open Source Survey Results
2015 Future of Open Source Survey ResultsBlack Duck by Synopsys
 
Nuxeo Open Source ECM, OW2con 11, Nov 24-25, Paris
Nuxeo Open Source ECM, OW2con 11, Nov 24-25, ParisNuxeo Open Source ECM, OW2con 11, Nov 24-25, Paris
Nuxeo Open Source ECM, OW2con 11, Nov 24-25, ParisOW2
 

Andere mochten auch (18)

Nuxeo CMF, a framework for case centric applications
Nuxeo CMF, a framework for case centric applicationsNuxeo CMF, a framework for case centric applications
Nuxeo CMF, a framework for case centric applications
 
Armedia nci content gov_alfresco_20120124_v1.0
Armedia nci content gov_alfresco_20120124_v1.0Armedia nci content gov_alfresco_20120124_v1.0
Armedia nci content gov_alfresco_20120124_v1.0
 
Armedia Case Management with Alfresco ECM
Armedia Case Management with Alfresco ECMArmedia Case Management with Alfresco ECM
Armedia Case Management with Alfresco ECM
 
Grottarossa:Why?
Grottarossa:Why?Grottarossa:Why?
Grottarossa:Why?
 
Introduction to case management - Roeland Loggen vs1.1
Introduction to case management - Roeland Loggen vs1.1Introduction to case management - Roeland Loggen vs1.1
Introduction to case management - Roeland Loggen vs1.1
 
Nigeria national iccm implementation framework
Nigeria national iccm implementation frameworkNigeria national iccm implementation framework
Nigeria national iccm implementation framework
 
Composing a case management solution with SaaS, PaaS, On-premise products
Composing a case management solution with SaaS, PaaS, On-premise productsComposing a case management solution with SaaS, PaaS, On-premise products
Composing a case management solution with SaaS, PaaS, On-premise products
 
Amplexor - The K2 Case Management Framework
Amplexor - The K2 Case Management FrameworkAmplexor - The K2 Case Management Framework
Amplexor - The K2 Case Management Framework
 
Nuxeo World Session: Case Management Framework
Nuxeo World Session: Case Management FrameworkNuxeo World Session: Case Management Framework
Nuxeo World Session: Case Management Framework
 
Nuxeo ECM Platform - Technical Overview
Nuxeo ECM Platform - Technical OverviewNuxeo ECM Platform - Technical Overview
Nuxeo ECM Platform - Technical Overview
 
Managing the Cloud with Open Source Tools
Managing the Cloud with Open Source ToolsManaging the Cloud with Open Source Tools
Managing the Cloud with Open Source Tools
 
Open Source Tool Chains for Cloud Computing
Open Source Tool Chains for Cloud ComputingOpen Source Tool Chains for Cloud Computing
Open Source Tool Chains for Cloud Computing
 
Electronic Case Management System(eCMS) proposal
Electronic Case Management System(eCMS) proposalElectronic Case Management System(eCMS) proposal
Electronic Case Management System(eCMS) proposal
 
Dream of the (blue) Effective Case Management System
Dream of the (blue) Effective Case Management SystemDream of the (blue) Effective Case Management System
Dream of the (blue) Effective Case Management System
 
Odoo - Open Source CMS: A performance comparision
Odoo - Open Source CMS: A performance comparisionOdoo - Open Source CMS: A performance comparision
Odoo - Open Source CMS: A performance comparision
 
2015 Future of Open Source Survey Results
2015 Future of Open Source Survey Results2015 Future of Open Source Survey Results
2015 Future of Open Source Survey Results
 
Nuxeo Open Source ECM, OW2con 11, Nov 24-25, Paris
Nuxeo Open Source ECM, OW2con 11, Nov 24-25, ParisNuxeo Open Source ECM, OW2con 11, Nov 24-25, Paris
Nuxeo Open Source ECM, OW2con 11, Nov 24-25, Paris
 
Design Your Career 2018
Design Your Career 2018Design Your Career 2018
Design Your Career 2018
 

Ähnlich wie Infrastructure for cloud_computing

Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreducehansen3032
 
Cloud is such stuff as dreams are made on
Cloud is such stuff as dreams are made onCloud is such stuff as dreams are made on
Cloud is such stuff as dreams are made onPatrick Chanezon
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAlbert Bifet
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsMichael Kopp
 
Cloud computing_processing frameworks
Cloud computing_processing frameworksCloud computing_processing frameworks
Cloud computing_processing frameworksReem Abdel-Rahman
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Christopher Curtin
 
سکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابرسکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابرdatastack
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDBTim Callaghan
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesKelly Technologies
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesKelly Technologies
 
GR8Conf 2009: Practical Groovy DSL by Guillaume Laforge
GR8Conf 2009: Practical Groovy DSL by Guillaume LaforgeGR8Conf 2009: Practical Groovy DSL by Guillaume Laforge
GR8Conf 2009: Practical Groovy DSL by Guillaume LaforgeGR8Conf
 
MapReduce on Zero VM
MapReduce on Zero VM MapReduce on Zero VM
MapReduce on Zero VM Joy Rahman
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreKelly Technologies
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukAndrii Vozniuk
 
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...Xavier Llorà
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata Mk Kim
 

Ähnlich wie Infrastructure for cloud_computing (20)

Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 
Cloud is such stuff as dreams are made on
Cloud is such stuff as dreams are made onCloud is such stuff as dreams are made on
Cloud is such stuff as dreams are made on
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ Applications
 
Cloud computing_processing frameworks
Cloud computing_processing frameworksCloud computing_processing frameworks
Cloud computing_processing frameworks
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
 
سکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابرسکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابر
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
 
GR8Conf 2009: Practical Groovy DSL by Guillaume Laforge
GR8Conf 2009: Practical Groovy DSL by Guillaume LaforgeGR8Conf 2009: Practical Groovy DSL by Guillaume Laforge
GR8Conf 2009: Practical Groovy DSL by Guillaume Laforge
 
MapReduce on Zero VM
MapReduce on Zero VM MapReduce on Zero VM
MapReduce on Zero VM
 
Handout3o
Handout3oHandout3o
Handout3o
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangalore
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
 
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
 
Disco workshop
Disco workshopDisco workshop
Disco workshop
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata
 

Mehr von JULIO GONZALEZ SANZ

Cmmi hm 2008 sepg model changes for high maturity 1v01[1]
Cmmi hm 2008 sepg model changes for high maturity  1v01[1]Cmmi hm 2008 sepg model changes for high maturity  1v01[1]
Cmmi hm 2008 sepg model changes for high maturity 1v01[1]JULIO GONZALEZ SANZ
 
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]JULIO GONZALEZ SANZ
 
Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]JULIO GONZALEZ SANZ
 
Workshop healthy ingredients ppm[1]
Workshop healthy ingredients ppm[1]Workshop healthy ingredients ppm[1]
Workshop healthy ingredients ppm[1]JULIO GONZALEZ SANZ
 
The need for a balanced measurement system
The need for a balanced measurement systemThe need for a balanced measurement system
The need for a balanced measurement systemJULIO GONZALEZ SANZ
 
Just in-time and lean production
Just in-time and lean productionJust in-time and lean production
Just in-time and lean productionJULIO GONZALEZ SANZ
 
History of manufacturing systems and lean thinking enfr
History of manufacturing systems and lean thinking enfrHistory of manufacturing systems and lean thinking enfr
History of manufacturing systems and lean thinking enfrJULIO GONZALEZ SANZ
 
Une 66175 presentacion norma 2006 por julio
Une 66175 presentacion norma 2006 por julioUne 66175 presentacion norma 2006 por julio
Une 66175 presentacion norma 2006 por julioJULIO GONZALEZ SANZ
 
An architecture for data quality
An architecture for data qualityAn architecture for data quality
An architecture for data qualityJULIO GONZALEZ SANZ
 
Sap analytics creating smart business processes
Sap analytics   creating smart business processesSap analytics   creating smart business processes
Sap analytics creating smart business processesJULIO GONZALEZ SANZ
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 

Mehr von JULIO GONZALEZ SANZ (20)

Cmmi hm 2008 sepg model changes for high maturity 1v01[1]
Cmmi hm 2008 sepg model changes for high maturity  1v01[1]Cmmi hm 2008 sepg model changes for high maturity  1v01[1]
Cmmi hm 2008 sepg model changes for high maturity 1v01[1]
 
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
Cmmi%20 model%20changes%20for%20high%20maturity%20v01[1]
 
Cmmi 26 ago_2009_
Cmmi 26 ago_2009_Cmmi 26 ago_2009_
Cmmi 26 ago_2009_
 
Creation use-of-simple-model
Creation use-of-simple-modelCreation use-of-simple-model
Creation use-of-simple-model
 
Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]
 
Workshop healthy ingredients ppm[1]
Workshop healthy ingredients ppm[1]Workshop healthy ingredients ppm[1]
Workshop healthy ingredients ppm[1]
 
The need for a balanced measurement system
The need for a balanced measurement systemThe need for a balanced measurement system
The need for a balanced measurement system
 
Magic quadrant
Magic quadrantMagic quadrant
Magic quadrant
 
6 six sigma presentation
6 six sigma presentation6 six sigma presentation
6 six sigma presentation
 
Volvo csr suppliers guide vsib
Volvo csr suppliers guide vsibVolvo csr suppliers guide vsib
Volvo csr suppliers guide vsib
 
Just in-time and lean production
Just in-time and lean productionJust in-time and lean production
Just in-time and lean production
 
History of manufacturing systems and lean thinking enfr
History of manufacturing systems and lean thinking enfrHistory of manufacturing systems and lean thinking enfr
History of manufacturing systems and lean thinking enfr
 
Using minitab exec files
Using minitab exec filesUsing minitab exec files
Using minitab exec files
 
Sga iso-14001
Sga iso-14001Sga iso-14001
Sga iso-14001
 
Cslt closing plenary_portugal
Cslt closing plenary_portugalCslt closing plenary_portugal
Cslt closing plenary_portugal
 
Une 66175 presentacion norma 2006 por julio
Une 66175 presentacion norma 2006 por julioUne 66175 presentacion norma 2006 por julio
Une 66175 presentacion norma 2006 por julio
 
Swebokv3
Swebokv3 Swebokv3
Swebokv3
 
An architecture for data quality
An architecture for data qualityAn architecture for data quality
An architecture for data quality
 
Sap analytics creating smart business processes
Sap analytics   creating smart business processesSap analytics   creating smart business processes
Sap analytics creating smart business processes
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 

Kürzlich hochgeladen

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Kürzlich hochgeladen (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Infrastructure for cloud_computing

  • 2. Agenda • About Cloud Computing • Tools for Cloud Computing in Google • Google’s partnerships with universities 2
  • 4. Advantages • Data safety and reliability • Data synchronization between different devices • Low requirement of end device • Unlimited potential of the cloud
  • 5. Cloud for end user Google Cloud
  • 6. Cloud for web developer Google Cloud APIs
  • 7. Example: Earthquake map based on Map API 7
  • 8. Agenda • About Cloud Computing • Tools for Cloud Computing in Google • Google’s partnerships with universities 8
  • 11. Google Data Center (circa 2000)
  • 12. Google File System (GFS) 12
  • 13. Why GFS? • Google has unusual requirements • Unfair advantage • Fun and challenging to build large-scale systems 13
  • 14. GFS Architecture Replicas GFS Master Masters MSN Client 19% Master GFS Google Client Client 48% Client Client C0 C1 C1 C0 C5 Client Client Yahoo C5 C2 C5 33% C3 … C2 Client Client Chunkserver 1 Chunkserver 2 Chunkserver N 14
  • 15. Master • Maintain Metadata: – File namespace – Access control info – Maps files to chunks • Control system activities: – Monitor state of chunkservers – Chunk allocation and placement – Initiate chunk recovery and rebalancing – Garbage collect dead chunks – Collect and display stats, admin functions 15
  • 16. Client • Protocol implemented by client library • Read protocol 16
  • 17. GFS Usage in Google Cloud • 50+ clusters • Filesystem clusters of up to 1000+ machines • Pools of 1000+ clients • 10+ GB/s read/write load – in the presence of frequent hardware failures 17
  • 19. What’s MapReduce • A simple programming model that applies to many large-scale computing problems • Hide messy details in MapReduce runtime library 19
  • 20. Typical problem solved by MapReduce • Read a lot of data • Map: extract something you care about from each record • Shuffle and Sort • Reduce: aggregate, summarize, filter, or transform • Write the results 20
  • 21. More specifically… • Programmer specifies two primary methods: – map(k, v) → <k', v'>* – reduce(k', <v'>*) → <k', v'>* • All v' with same k' are reduced together, in order. 21
  • 22. Example: Word Frequencies in Web Pages • Input is files with one document per record • Specify a map function that takes a key/value pair – key = document URL – value = document contents • Output of map function is (potentially many) key/value pairs. – In our case, output (word, “1”) once per word in the document <“网页1”, “是也不是”> <“是”, “1”> <“也”, “1”> <“不”, “1”> … 22
  • 23. Continued: word frequencies in web pages • MapReduce library gathers together all pairs with the same key (shuffle/sort) • The reduce function combines the values for a key In our case, compute the sum key = “是” key = “也” key = “不” values = “1”, “1” values = “1” values = “1” “2” “1” “1” • Output of reduce (usually 0 or 1 value) paired with key and saved “是”, “2” “也”, “1” “不”, “1” 23
  • 24. Example: Pseudo-code Map(String input_key, String input_value): // input_key: document name // input_value: document contents for each word w in input_values: EmitIntermediate(w, "1"); Reduce(String key, Iterator intermediate_values): // key: a word, same for input and output // intermediate_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result)); 24
  • 25. Conclusion to MapReduce • MapReduce has proven to be a remarkably-useful abstraction • Greatly simplifies large-scale computations at Google • Fun to use: focus on problem, let library deal with messy details • Many thousands of parallel programs written by hundreds of different programmers in last few years – Many had no prior parallel or distributed programming experience 25
  • 27. Overview • Structure data storage, not database • Wide applicability • Scalability • High performance • High availability 27
  • 28. Basic Data Model • Distributed multi-dimensional sparse map (row, column, timestamp) cell contents “contents” COLUMNS ROWS … www.cnn.com t1 … t2 “<html>…” t3 TIMESTAMPS • Good match for most of our applications 28
  • 29. BigTable API • Metadata operations – Create/delete tables, column families, change metadata • Writes (atomic) – Set(): write cells in a row – DeleteCells(): delete cells in a row – DeleteRow(): delete all cells in a row • Reads – Scanner: read arbitrary cells in a bigtable 29
  • 30. System Structure Bigtable client Bigtable cell Bigtable client Bigtable master library performs metadata ops, Open() load balancing Bigtable tablet server Bigtable tablet server Bigtable tablet server serves data serves data serves data Cluster Scheduling Master GFS Lock service handles failover, monitoring holds tablet data, logs holds metadata, handles master-election
  • 31. Current status of BigTable • Design/initial implementation started beginning of 2004 • Currently ~100 BigTable cells • Production use or active development for many projects: – Google Print – My Search History – Orkut – Crawling/indexing pipeline – Google Maps/Google Earth – Blogger – … • Largest bigtable cell manages ~200TB of data spread over several thousand machines (larger cells planned) 31
  • 32. Typical Cluster Lock service GFS master Scheduling masters Machine 1 Machine 2 Machine N User User User app1 app1 app3 User User app2 app3 User app2 … Scheduler GFS Scheduler GFS Scheduler GFS slave chunkserver slave chunkserver slave chunkserver Linux Linux Linux 32
  • 33. Agenda • About Cloud Computing • Tools for Cloud Computing in Google • Google’s partnerships with universities 33
  • 34. ACCI in Oct. 2007 • Stand for Academic Cloud Computing Initiative • IBM and Google partnership • Facilitate universities education with distributed system programming skills • Started from University of Washington and scaling to many others 34
  • 35. Google’s ACCI activities in Greater China • Google Greater China has helped create a cloud computing course at Tsinghua in summer 2007 • Now scaling to other mainland China and Taiwan Universities
  • 36. Example: THU MR Course, Fall 2007 • “Massive Data Processing” course based on Google Cloud technology • Google employees gave lectures during the course offering; • Got interesting results from the smart students • http://hpc.cs.tsinghua.edu.cn/dpcourse/
  • 37. Count: THU MR Course, Fall 2007 Students presenting course Massive data processing to project “simulating the operation simulate the operation of of solar system based on the solar system MapReduce technology” at Google office
  • 38. THANK YOU More info on http://code.google.com/intl/zh-CN/