SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Google Base Infrastructure:

GFS and MapReduce
   Johannes Passing
Agenda
     Context
     Google File System
       Aims
       Design
       Walk through
     MapReduce
       Aims
       Design
       Walk through
     Conclusion


13.04.2008                Johannes Passing   2
Context
     Google‘s scalability strategy
       Cheap hardware
       Lots of it
       More than 450.000 servers (NYT, 2006)
      Basic Problems
         Faults are the norm
         Software must be massively parallelized
         Writing distributed software is hard




13.04.2008                     Johannes Passing    3
GFS: Aims
     Aims
        Must scale to thousands of servers
        Fault tolerance/Data redundancy
        Optimize for large WORM files
        Optimize for large reads
        Favor throughput over latency
     Non-Aims
        Space efficiency
        Client side caching futile
        POSIX compliance

      Significant deviation from standard DFS (NFS, AFS, DFS, …)

13.04.2008                      Johannes Passing                   4
Design Choices
     File system cluster
         Spread data over thousands of machines
         Provide safety by redundant storage
         Master/Slave architecture




13.04.2008                    Johannes Passing    5
Design Choices
     Files split into chunks
         Unit of management and distribution
         Replicated across servers
         Subdivided into blocks for integrity checks

             Accept internal fragmentation for better throughput



      GFS                                             …


      File System


13.04.2008                         Johannes Passing                6
Architecture: Chunkserver
     Host chunks
       Store chunks as files in Linux FS
       Verify data integrity
       Implemented entirely in user mode

     Fault tolerance
        Fail requests on integrity violation detection
        All chunks stored on at least one other server
        Re-replication restores server after downtime




13.04.2008                     Johannes Passing          7
Architecture: Master
     Namespace and metadata management

     Cluster management
        Chunk location registry (transient)
        Chunk placement decisions
        Re-replication, Garbage collection
        Health monitoring

     Fault tolerance
        Backed by shadow masters
        Mirrored operations log
        Periodic snapshots

13.04.2008                     Johannes Passing   8
Reading a chunk
                                     Send filename and offset
                                     Determine file and chunk

                                     Return chunk locations
                                     and version
                                     Choose closest server
                                     and request chunk
                                     Verify chunk version and
                                     block checksums
                                     Return data if valid, else
                                     fail request




13.04.2008        Johannes Passing                                9
Writing a chunk
                                     Send filename and offset
                                     Determine lease owner/
                                     grant lease
                                     Return chunk locations
                                     and lease owner
                                     Push data to any chunkserver

                                     Forward along replica chain

                                     Send write request to primary

                                     Create mutation order, apply

                                     Forward write request

                                     Validate blocks, apply
                                     modifications , inc version, ack

13.04.2008        Johannes Passing
                                     Reply to client           10
Appending a record (1/2)
                                     Send append request

                                     Choose offset

                                     Return chunk locations
                                     and lease owner
                                     Push data to any chunkserver

                                     Forward along replica chain

                                     Send append request

                                     Check for left space in chunk

                                     Pad chunk and request retry

                                     Request chunk allocation

                                     Return chunk locations
                                     and lease owner
13.04.2008        Johannes Passing                            11
Appending a record (2/2)
                                                 Send append request

                                                 Allocate chunk, write data

                                                 Forward request to replicas

                                                 Write data, failure occurs

                                                 Fail request

              A       A               A          Client retries

        Padding     Padding     Padding          Write data
              B       B         Garbage          Forward request to replicas
              B       B               B
                                                 Write data
             Free    Free          Free
             Free    Free          Free          Success

13.04.2008                    Johannes Passing                            12
File Region Consistency
     File region states:
         Consistent     All clients see same data
         Defined     Consistent ^ clients see change in its entirety
         Inconsistent
     Consequences
       Unique record IDs to identify duplicates
       Records should be self-identifying and self-validating
              B        B          Garbage          inconsistent

              B        B                B          defined

              D
             Free
               C      D
                     Free
                       C              D
                                     Free
                                       C           consistent
                                                   defined
               C       C               C

13.04.2008                      Johannes Passing                       13
Fault tolerance GFS
     Automatic re-replication
     Replicas are spread across racks

     Fault mitigations
        Data corruption choose different replica
        Machine crash choose replica on different machine
        Network outage choose replica in different network
        Master crash shadow masters




13.04.2008                     Johannes Passing              14
Wrap-up GFS
     Proven scalability, fault tolerance and performance
     Highly specialized
     Distribution transparent to clients
     Only partial abstraction – clients must cooperate
     Network is the bottleneck




13.04.2008                     Johannes Passing            15
MapReduce: Aims
     Aims
        Unified model for large scale distributed data processing
        Massively parallelizable
        Distribution transparent to developer
        Fault tolerance
        Allow moving of computation close to data




13.04.2008                     Johannes Passing                     16
Higher order functions
     Functional: map(f, [a, b, c, …])
                             a         b             c
                                 f         f              f
                            f(a)      f(b)         f(c)



     Google: map(k, v)
                                     (k, v)

                                                    map

                  (f(k1), f(v1)) (f(k1), f(v1)) (f(k1), f(v1))



13.04.2008                             Johannes Passing          17
Higher order functions
     Functional: foldl/reduce(f, [a, b, c, …], i)
                                                  f

                                          f               c

                                  f               b

                          i               a

     Google: reduce(k, [a, b, c, d, …])
                              a       b               c       d


                          f(a)        –           f(c, d)


13.04.2008                            Johannes Passing            18
Idea
     Observation
       Map can be easily parallelized
       Reduce might be parallelized

     Idea
        Design application around map and reduce scheme
        Infrastructure manages scheduling and distribution
        User implements map and reduce

     Constraints
       All data coerced into key/value pairs

13.04.2008                    Johannes Passing               19
Walkthrough
                                                  Spawn master
                                                  Provide data
                                                      e.g. from GFS or BigTable
                                                  Create M splits
                                                  Spawn M mappers
                                                  Map
                                                      Once per key/value pair
                                                      Partition into R buckets
        Mappers   Combiners   Reducers            Spawn up to R*M combiners
                                                  Combine
                                                  Barrier
                                                  Spawn up to R reducers
                                                  Reduce

13.04.2008                     Johannes Passing                            20
Scheduling
     Locality
        Mappers scheduled close to data
        Chunk replication improves locality
        Reducers run on same machine as mappers

     Choosing M and R
       Load balancing & fast recovery vs. number of output files
          M >> R, R small multiple of #machines

     Schedule backup tasks to avoid stragglers


13.04.2008                    Johannes Passing                     21
Fault tolerance MapReduce
     Fault mitigations
        Map crash
            All intermediate data in questionable state
            Repeat all map tasks of machine
            Rationale of barrier
        Reduce crash
            Data is global, completed stages marked
            Repeat crashed resuce task only
        Master crash
             start over
        Repeated crashs on certain records
            Skip records
13.04.2008                     Johannes Passing           22
Wrap-up MapReduce
     Proven scalability
     Restrict programming model for better runtime support
     Tailored to Google’s needs
     Programming model is fairly low-level




13.04.2008                   Johannes Passing                23
Conclusion
     Rethinking infrastructure
        Consequent design for scalability and performance
        Highly specialized solutions
        Not generally applicable
        Solutions more low-level than usual
     Maintenance efforts may be significant


             „We believe we get tremendous competitive advantage by
             essentially building our own infrastructures“
             --Eric Schmidt




13.04.2008                           Johannes Passing                 24
References
     Jeffrey Dean and Sanjay Ghemawat, “MapReduce: Simplified Data
     Processing on Large Clusters”, 2004
     Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung,
     “The Google File System”, 2003
     Ralf Lämmel, “Google's MapReduce Programming Model –
     Revisited”, SCP journal, 2006
     Google, “Cluster Computing and MapReduce”,
     http://code.google.com/edu/content/submissions/mapreduce-
     minilecture/listing.html, 2007
     David F. Carr, “How Google Works”,
     http://www.baselinemag.com/c/a/Projects-Networks-and-
     Storage/How-Google-Works/, 2006
     Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski,
     Christos Kozyrakis, “Evaluating MapReduce for Multi-core and
     Multiprocessor Systems”, 2007
13.04.2008                     Johannes Passing                   25

Weitere ähnliche Inhalte

Was ist angesagt?

Simple layouts for ECKD and zfcp disk configurations on Linux on System z
Simple layouts for ECKD and zfcp disk configurations on Linux on System zSimple layouts for ECKD and zfcp disk configurations on Linux on System z
Simple layouts for ECKD and zfcp disk configurations on Linux on System zIBM India Smarter Computing
 
Beyond Disaster Recovery: Restoring Production Workloads with PlateSpin Forge
Beyond Disaster Recovery: Restoring Production Workloads with PlateSpin ForgeBeyond Disaster Recovery: Restoring Production Workloads with PlateSpin Forge
Beyond Disaster Recovery: Restoring Production Workloads with PlateSpin ForgeNovell
 
Large customers want postgresql too !!
Large customers want postgresql too !!Large customers want postgresql too !!
Large customers want postgresql too !!rosensteel
 
ttec infortrend ds
ttec infortrend dsttec infortrend ds
ttec infortrend dsTTEC
 

Was ist angesagt? (8)

Simple layouts for ECKD and zfcp disk configurations on Linux on System z
Simple layouts for ECKD and zfcp disk configurations on Linux on System zSimple layouts for ECKD and zfcp disk configurations on Linux on System z
Simple layouts for ECKD and zfcp disk configurations on Linux on System z
 
Workload Optimization
Workload OptimizationWorkload Optimization
Workload Optimization
 
Beyond Disaster Recovery: Restoring Production Workloads with PlateSpin Forge
Beyond Disaster Recovery: Restoring Production Workloads with PlateSpin ForgeBeyond Disaster Recovery: Restoring Production Workloads with PlateSpin Forge
Beyond Disaster Recovery: Restoring Production Workloads with PlateSpin Forge
 
Large customers want postgresql too !!
Large customers want postgresql too !!Large customers want postgresql too !!
Large customers want postgresql too !!
 
SPC BENCHMARK 2/ENERGY™ EXECUTIVE SUMMARY
SPC BENCHMARK 2/ENERGY™  EXECUTIVE SUMMARYSPC BENCHMARK 2/ENERGY™  EXECUTIVE SUMMARY
SPC BENCHMARK 2/ENERGY™ EXECUTIVE SUMMARY
 
ttec infortrend ds
ttec infortrend dsttec infortrend ds
ttec infortrend ds
 
Xen community update
Xen community updateXen community update
Xen community update
 
10
1010
10
 

Andere mochten auch

Bmw x6 marketing
Bmw x6 marketingBmw x6 marketing
Bmw x6 marketingHarsh Shah
 
BMW
BMWBMW
BMWA B
 
Analysis of BMWs Global Supply Chain Network - its production - distribution ...
Analysis of BMWs Global Supply Chain Network - its production - distribution ...Analysis of BMWs Global Supply Chain Network - its production - distribution ...
Analysis of BMWs Global Supply Chain Network - its production - distribution ...Sachin Mathews
 
Supply Chain Management in the Motor Vehicle Industry, the Example of Mini.
Supply Chain Management in the Motor Vehicle Industry, the Example of Mini.Supply Chain Management in the Motor Vehicle Industry, the Example of Mini.
Supply Chain Management in the Motor Vehicle Industry, the Example of Mini.aguesdon
 

Andere mochten auch (10)

2012 Lexus LS Hybrid Detroit Michigan
2012 Lexus LS Hybrid Detroit Michigan2012 Lexus LS Hybrid Detroit Michigan
2012 Lexus LS Hybrid Detroit Michigan
 
Bmw x6 marketing
Bmw x6 marketingBmw x6 marketing
Bmw x6 marketing
 
BMW Case Study
BMW Case StudyBMW Case Study
BMW Case Study
 
BMW
BMWBMW
BMW
 
Analysis of BMWs Global Supply Chain Network - its production - distribution ...
Analysis of BMWs Global Supply Chain Network - its production - distribution ...Analysis of BMWs Global Supply Chain Network - its production - distribution ...
Analysis of BMWs Global Supply Chain Network - its production - distribution ...
 
Bmw marketing
Bmw marketingBmw marketing
Bmw marketing
 
BMW Case Study Analysis
BMW Case Study AnalysisBMW Case Study Analysis
BMW Case Study Analysis
 
PRESENTATION ON BMW
PRESENTATION ON BMWPRESENTATION ON BMW
PRESENTATION ON BMW
 
BMW Market Analysis
BMW Market AnalysisBMW Market Analysis
BMW Market Analysis
 
Supply Chain Management in the Motor Vehicle Industry, the Example of Mini.
Supply Chain Management in the Motor Vehicle Industry, the Example of Mini.Supply Chain Management in the Motor Vehicle Industry, the Example of Mini.
Supply Chain Management in the Motor Vehicle Industry, the Example of Mini.
 

Ähnlich wie Gfs andmapreduce

Filesystems, RPC and HDFS
Filesystems, RPC and HDFSFilesystems, RPC and HDFS
Filesystems, RPC and HDFSAlexander Alten
 
Virtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFireVirtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFireCarter Shanklin
 
Mark Marsiglio - Autoscaling with eZ in the Cloud - A Case Study
Mark Marsiglio - Autoscaling with eZ in the Cloud - A Case StudyMark Marsiglio - Autoscaling with eZ in the Cloud - A Case Study
Mark Marsiglio - Autoscaling with eZ in the Cloud - A Case StudyeZ Publish Community
 
Providing fault tolerance in extreme scale parallel applications
Providing fault tolerance in extreme scale parallel applicationsProviding fault tolerance in extreme scale parallel applications
Providing fault tolerance in extreme scale parallel applicationshjjvandam
 
Multi site Clustering with Windows Server 2008 Enterprise
Multi site Clustering with Windows Server 2008 EnterpriseMulti site Clustering with Windows Server 2008 Enterprise
Multi site Clustering with Windows Server 2008 EnterprisePaulo Freitas
 
Pithos - Architecture and .NET Technologies
Pithos - Architecture and .NET TechnologiesPithos - Architecture and .NET Technologies
Pithos - Architecture and .NET TechnologiesPanagiotis Kanavos
 
NFSv4 Replication for Grid Computing
NFSv4 Replication for Grid ComputingNFSv4 Replication for Grid Computing
NFSv4 Replication for Grid Computingpeterhoneyman
 
Sql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffySql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffyAnuradha
 
AWS Summit 2011: High Availability Database Architectures in AWS Cloud
AWS Summit 2011: High Availability Database Architectures in AWS CloudAWS Summit 2011: High Availability Database Architectures in AWS Cloud
AWS Summit 2011: High Availability Database Architectures in AWS CloudAmazon Web Services
 
Openstorage with OpenStack, by Bradley
Openstorage with OpenStack, by BradleyOpenstorage with OpenStack, by Bradley
Openstorage with OpenStack, by BradleyHui Cheng
 
Pm 01 bradley stone_openstorage_openstack
Pm 01 bradley stone_openstorage_openstackPm 01 bradley stone_openstorage_openstack
Pm 01 bradley stone_openstorage_openstackOpenCity Community
 
eBay From Ground Level to the Clouds
eBay From Ground Level to the CloudseBay From Ground Level to the Clouds
eBay From Ground Level to the CloudsX.commerce
 

Ähnlich wie Gfs andmapreduce (20)

Filesystems, RPC and HDFS
Filesystems, RPC and HDFSFilesystems, RPC and HDFS
Filesystems, RPC and HDFS
 
Virtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFireVirtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFire
 
AFS introduction
AFS introductionAFS introduction
AFS introduction
 
Mark Marsiglio - Autoscaling with eZ in the Cloud - A Case Study
Mark Marsiglio - Autoscaling with eZ in the Cloud - A Case StudyMark Marsiglio - Autoscaling with eZ in the Cloud - A Case Study
Mark Marsiglio - Autoscaling with eZ in the Cloud - A Case Study
 
Hdfs
HdfsHdfs
Hdfs
 
Pnfs
PnfsPnfs
Pnfs
 
Providing fault tolerance in extreme scale parallel applications
Providing fault tolerance in extreme scale parallel applicationsProviding fault tolerance in extreme scale parallel applications
Providing fault tolerance in extreme scale parallel applications
 
Server 2008 R2 Yeniliklər
Server 2008 R2 YeniliklərServer 2008 R2 Yeniliklər
Server 2008 R2 Yeniliklər
 
Multi site Clustering with Windows Server 2008 Enterprise
Multi site Clustering with Windows Server 2008 EnterpriseMulti site Clustering with Windows Server 2008 Enterprise
Multi site Clustering with Windows Server 2008 Enterprise
 
tittle
tittletittle
tittle
 
Pithos - Architecture and .NET Technologies
Pithos - Architecture and .NET TechnologiesPithos - Architecture and .NET Technologies
Pithos - Architecture and .NET Technologies
 
Libra Library OS
Libra Library OSLibra Library OS
Libra Library OS
 
NFSv4 Replication for Grid Computing
NFSv4 Replication for Grid ComputingNFSv4 Replication for Grid Computing
NFSv4 Replication for Grid Computing
 
Sql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffySql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffy
 
AWS Summit 2011: High Availability Database Architectures in AWS Cloud
AWS Summit 2011: High Availability Database Architectures in AWS CloudAWS Summit 2011: High Availability Database Architectures in AWS Cloud
AWS Summit 2011: High Availability Database Architectures in AWS Cloud
 
Openstorage with OpenStack, by Bradley
Openstorage with OpenStack, by BradleyOpenstorage with OpenStack, by Bradley
Openstorage with OpenStack, by Bradley
 
Pm 01 bradley stone_openstorage_openstack
Pm 01 bradley stone_openstorage_openstackPm 01 bradley stone_openstorage_openstack
Pm 01 bradley stone_openstorage_openstack
 
High Availability Solutions in SQL 2012
High Availability Solutions in SQL 2012High Availability Solutions in SQL 2012
High Availability Solutions in SQL 2012
 
eBay From Ground Level to the Clouds
eBay From Ground Level to the CloudseBay From Ground Level to the Clouds
eBay From Ground Level to the Clouds
 
Google
GoogleGoogle
Google
 

Mehr von Institute of Computing Technology, Chinese Academy of Sciences (9)

C make tutorial
C make tutorialC make tutorial
C make tutorial
 
Distributedsystems 100912185813-phpapp01
Distributedsystems 100912185813-phpapp01Distributedsystems 100912185813-phpapp01
Distributedsystems 100912185813-phpapp01
 
二叉查找树范围查询算法
二叉查找树范围查询算法二叉查找树范围查询算法
二叉查找树范围查询算法
 
leveldb Log format
leveldb Log formatleveldb Log format
leveldb Log format
 
随即存储引擎Bitcask
随即存储引擎Bitcask随即存储引擎Bitcask
随即存储引擎Bitcask
 
Skip list
Skip listSkip list
Skip list
 
Five minuterule
Five minuteruleFive minuterule
Five minuterule
 
Lsm
LsmLsm
Lsm
 
Hfile格式详细介绍
Hfile格式详细介绍Hfile格式详细介绍
Hfile格式详细介绍
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 

Kürzlich hochgeladen (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 

Gfs andmapreduce

  • 1. Google Base Infrastructure: GFS and MapReduce Johannes Passing
  • 2. Agenda Context Google File System Aims Design Walk through MapReduce Aims Design Walk through Conclusion 13.04.2008 Johannes Passing 2
  • 3. Context Google‘s scalability strategy Cheap hardware Lots of it More than 450.000 servers (NYT, 2006) Basic Problems Faults are the norm Software must be massively parallelized Writing distributed software is hard 13.04.2008 Johannes Passing 3
  • 4. GFS: Aims Aims Must scale to thousands of servers Fault tolerance/Data redundancy Optimize for large WORM files Optimize for large reads Favor throughput over latency Non-Aims Space efficiency Client side caching futile POSIX compliance Significant deviation from standard DFS (NFS, AFS, DFS, …) 13.04.2008 Johannes Passing 4
  • 5. Design Choices File system cluster Spread data over thousands of machines Provide safety by redundant storage Master/Slave architecture 13.04.2008 Johannes Passing 5
  • 6. Design Choices Files split into chunks Unit of management and distribution Replicated across servers Subdivided into blocks for integrity checks Accept internal fragmentation for better throughput GFS … File System 13.04.2008 Johannes Passing 6
  • 7. Architecture: Chunkserver Host chunks Store chunks as files in Linux FS Verify data integrity Implemented entirely in user mode Fault tolerance Fail requests on integrity violation detection All chunks stored on at least one other server Re-replication restores server after downtime 13.04.2008 Johannes Passing 7
  • 8. Architecture: Master Namespace and metadata management Cluster management Chunk location registry (transient) Chunk placement decisions Re-replication, Garbage collection Health monitoring Fault tolerance Backed by shadow masters Mirrored operations log Periodic snapshots 13.04.2008 Johannes Passing 8
  • 9. Reading a chunk Send filename and offset Determine file and chunk Return chunk locations and version Choose closest server and request chunk Verify chunk version and block checksums Return data if valid, else fail request 13.04.2008 Johannes Passing 9
  • 10. Writing a chunk Send filename and offset Determine lease owner/ grant lease Return chunk locations and lease owner Push data to any chunkserver Forward along replica chain Send write request to primary Create mutation order, apply Forward write request Validate blocks, apply modifications , inc version, ack 13.04.2008 Johannes Passing Reply to client 10
  • 11. Appending a record (1/2) Send append request Choose offset Return chunk locations and lease owner Push data to any chunkserver Forward along replica chain Send append request Check for left space in chunk Pad chunk and request retry Request chunk allocation Return chunk locations and lease owner 13.04.2008 Johannes Passing 11
  • 12. Appending a record (2/2) Send append request Allocate chunk, write data Forward request to replicas Write data, failure occurs Fail request A A A Client retries Padding Padding Padding Write data B B Garbage Forward request to replicas B B B Write data Free Free Free Free Free Free Success 13.04.2008 Johannes Passing 12
  • 13. File Region Consistency File region states: Consistent All clients see same data Defined Consistent ^ clients see change in its entirety Inconsistent Consequences Unique record IDs to identify duplicates Records should be self-identifying and self-validating B B Garbage inconsistent B B B defined D Free C D Free C D Free C consistent defined C C C 13.04.2008 Johannes Passing 13
  • 14. Fault tolerance GFS Automatic re-replication Replicas are spread across racks Fault mitigations Data corruption choose different replica Machine crash choose replica on different machine Network outage choose replica in different network Master crash shadow masters 13.04.2008 Johannes Passing 14
  • 15. Wrap-up GFS Proven scalability, fault tolerance and performance Highly specialized Distribution transparent to clients Only partial abstraction – clients must cooperate Network is the bottleneck 13.04.2008 Johannes Passing 15
  • 16. MapReduce: Aims Aims Unified model for large scale distributed data processing Massively parallelizable Distribution transparent to developer Fault tolerance Allow moving of computation close to data 13.04.2008 Johannes Passing 16
  • 17. Higher order functions Functional: map(f, [a, b, c, …]) a b c f f f f(a) f(b) f(c) Google: map(k, v) (k, v) map (f(k1), f(v1)) (f(k1), f(v1)) (f(k1), f(v1)) 13.04.2008 Johannes Passing 17
  • 18. Higher order functions Functional: foldl/reduce(f, [a, b, c, …], i) f f c f b i a Google: reduce(k, [a, b, c, d, …]) a b c d f(a) – f(c, d) 13.04.2008 Johannes Passing 18
  • 19. Idea Observation Map can be easily parallelized Reduce might be parallelized Idea Design application around map and reduce scheme Infrastructure manages scheduling and distribution User implements map and reduce Constraints All data coerced into key/value pairs 13.04.2008 Johannes Passing 19
  • 20. Walkthrough Spawn master Provide data e.g. from GFS or BigTable Create M splits Spawn M mappers Map Once per key/value pair Partition into R buckets Mappers Combiners Reducers Spawn up to R*M combiners Combine Barrier Spawn up to R reducers Reduce 13.04.2008 Johannes Passing 20
  • 21. Scheduling Locality Mappers scheduled close to data Chunk replication improves locality Reducers run on same machine as mappers Choosing M and R Load balancing & fast recovery vs. number of output files M >> R, R small multiple of #machines Schedule backup tasks to avoid stragglers 13.04.2008 Johannes Passing 21
  • 22. Fault tolerance MapReduce Fault mitigations Map crash All intermediate data in questionable state Repeat all map tasks of machine Rationale of barrier Reduce crash Data is global, completed stages marked Repeat crashed resuce task only Master crash start over Repeated crashs on certain records Skip records 13.04.2008 Johannes Passing 22
  • 23. Wrap-up MapReduce Proven scalability Restrict programming model for better runtime support Tailored to Google’s needs Programming model is fairly low-level 13.04.2008 Johannes Passing 23
  • 24. Conclusion Rethinking infrastructure Consequent design for scalability and performance Highly specialized solutions Not generally applicable Solutions more low-level than usual Maintenance efforts may be significant „We believe we get tremendous competitive advantage by essentially building our own infrastructures“ --Eric Schmidt 13.04.2008 Johannes Passing 24
  • 25. References Jeffrey Dean and Sanjay Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters”, 2004 Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, “The Google File System”, 2003 Ralf Lämmel, “Google's MapReduce Programming Model – Revisited”, SCP journal, 2006 Google, “Cluster Computing and MapReduce”, http://code.google.com/edu/content/submissions/mapreduce- minilecture/listing.html, 2007 David F. Carr, “How Google Works”, http://www.baselinemag.com/c/a/Projects-Networks-and- Storage/How-Google-Works/, 2006 Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, Christos Kozyrakis, “Evaluating MapReduce for Multi-core and Multiprocessor Systems”, 2007 13.04.2008 Johannes Passing 25