SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Downloaden Sie, um offline zu lesen
Shark: Scaling File Servers via Cooperative
                     Caching

Siddhartha Annapureddy Michael J. Freedman David Mazi`res
                                                     e

                          New York University


    Networked Systems Design and Implementation, 2005




             Siddhartha Annapureddy   Shark
Motivation

    Scenario – Want to test a new application on a hundred nodes

                                  Workstation   Workstation


                    Workstation                               Workstation




             Workstation                                             Workstation




                    Workstation                               Workstation


                                  Workstation   Workstation



    Problem – Need to push software to all nodes and collect results

                      Siddhartha Annapureddy     Shark
Current Approaches


                    Distribute from a central server
         Server                                          Client

                    Network A                Network B

         Client                                          Client
                         rsync, unison, scp

    – Server’s network uplink saturates
           Operation takes a long time to finish
    – Wastes bandwidth along bottleneck links

                  Siddhartha Annapureddy   Shark
Current Approaches

                      File distribution mechanisms




                         BitTorrent, Bullet

    + Scales by offloading burden on server
    – Client downloads from half-way across the world
                  Siddhartha Annapureddy   Shark
Inherent Problems with Copying Files


   Users have to decide a priori what to ship
       Ship too much – Waste bandwidth, takes long time
       Ship too little – Hassle to work in a poor environment
   Idle environments consume disk space
       Users are loath to cleanup ⇒ Redistribution
       Need low cost solution for refetching files
   Manual management of software versions

              Illusion of having development environment
              Programs fetched transparently on demand




                   Siddhartha Annapureddy   Shark
Networked File Systems




    + Know how to deploy these systems
    + Know how to administer such systems
    + Simple accountability mechanisms
    Eg: NFS, AFS, SFS etc.




                 Siddhartha Annapureddy   Shark
Problem: Scalability
                               800


                               700
   Time to finish Read (sec)



                               600


                               500


                               400                                             40 MB read
                                                                                     SFS

                               300


                               200


                               100


                                0
                                     0    20               40             60           80   100
                                                            Unique nodes


                                                     Very slow ≈ 775s

                                         Siddhartha Annapureddy   Shark
Problem: Scalability
                               800


                               700
   Time to finish Read (sec)



                               600


                               500


                               400                                             40 MB read
                                                                                     SFS
                                                                                     Shark
                               300


                               200


                               100


                                0
                                     0    20               40             60           80    100
                                                            Unique nodes


                                                Very slow ≈ 775s
                                         Much better !!! ≈ 88s (9x better)
                                         Siddhartha Annapureddy   Shark
Peer-to-Peer Filesystems



   P2P Filesystems (Pond, Ivy)
    + Scalability
     – Non-standard administrative models
     – New filesystem semantics
     – Haven’t been widely deployed yet
                        Goal – Best of Both Worlds
       Convenience of central servers
       Scalability of peer-to-peer




                    Siddhartha Annapureddy   Shark
Let’s Design this New Filesystem




                 Client                            Server

   Vanilla SFS
       Central server model




                  Siddhartha Annapureddy   Shark
Scalability – Client-side Caching




                 Client                               Server
                  Cache

   Large client cache ` la AFS
                      a
       Scales by avoiding redundant data transfers
       Leases to ensure NFS-style cache consistency
       With multiple clients, must address bandwidth concerns

                  Siddhartha Annapureddy   Shark
Scalability – Cooperative Caching



                         Client

                                                 Client
          Client                                          Server


                                        Client
                   Client




                   Siddhartha Annapureddy   Shark
Scalability – Cooperative Caching

    Clients fetch data from each other and offload burden from server

                          Client

                                                  Client
           Client                                          Server


                                         Client
                    Client



       Shark clients maintain a distributed index




                    Siddhartha Annapureddy   Shark
Scalability – Cooperative Caching

    Clients fetch data from each other and offload burden from server

                          Client

                                                  Client
           Client                                            Server


                                         Client
                    Client



       Shark clients maintain a distributed index
       Fetch a file from multiple other clients in parallel




                    Siddhartha Annapureddy   Shark
Scalability – Cooperative Caching

    Clients fetch data from each other and offload burden from server

                          Client

                                                  Client
           Client                                            Server


                                         Client
                    Client



       Shark clients maintain a distributed index
       Fetch a file from multiple other clients in parallel
           LBFS-style Chunks – Variable-sized blocks
           Chunks are better – Exploit commonalities across files
           Chunks preserved across file versions, concatenations

                    Siddhartha Annapureddy   Shark
Scalability – Cooperative Caching

    Clients fetch data from each other and offload burden from server

                          Client

                                                  Client
           Client                                            Server


                                         Client
                    Client



       Shark clients maintain a distributed index
       Fetch a file from multiple other clients in parallel
       Locality-awareness – Preferentially fetch chunks from nearby
       clients

                    Siddhartha Annapureddy   Shark
Cross-filesystem Sharing

             Global cooperative cache regardless of origin servers

                      Server                                            Server
                        A                                                 B




                 Client                                                    Client

                                        Client        Client
    Client                                                                                Client


                               Client                          Client
             Client                                                              Client



       Two groups of clients accessing servers A and B
       Client groups share a large amount of software
       Such clients automatically form a global cache
                          Siddhartha Annapureddy   Shark
Cross-filesystem Sharing

             Global cooperative cache regardless of origin servers

                      Server                                            Server
                        A                                                 B




                 Client                                                    Client

                                        Client        Client
    Client                                                                                Client


                               Client                          Client
             Client                                                              Client



       Two groups of clients accessing servers A and B
       Client groups share a large amount of software
       Such clients automatically form a global cache
                          Siddhartha Annapureddy   Shark
Application

                   Linux distribution with LiveCDs

                                                                           Knoppix
                                            Sharkcd
                                   Slax
                                                       Knoppix
    Slax
                                                                 Sharkcd

                      Slax

                       Sharkcd            Knoppix                          Knoppix
                                                                            Mirror
                                             Sharkcd



      LiveCD – Run an entire OS without using hard disk
      But all your programs must fit on a CD-ROM
      Download dynamically from server but scalability problems
      Knoppix and Slax users form global cache – Relieve servers
                 Siddhartha Annapureddy     Shark
Cooperative Caching – Roadmap

                  Client


                                                     Fetch Metadata
     Client                                 Client                    Server


              Client
                                Client


      Fetch metadata from the server




                   Siddhartha Annapureddy    Shark
Cooperative Caching – Roadmap

                  Client



     Client        Lookup Clients           Client       Server


              Client
                                Client


      Fetch metadata from the server
      Look up clients caching needed chunks in overlay




                   Siddhartha Annapureddy    Shark
Cooperative Caching – Roadmap

                  Client



     Client      Download Chunks            Client           Server


              Client
                                Client


      Fetch metadata from the server
      Look up clients caching needed chunks in overlay
      Connect to multiple clients and download chunks in parallel



                   Siddhartha Annapureddy    Shark
Cooperative Caching – Roadmap

                  Client



     Client      Download Chunks            Client           Server


              Client
                                Client


      Fetch metadata from the server
      Look up clients caching needed chunks in overlay
      Connect to multiple clients and download chunks in parallel
      Check integrity of fetched chunks
  Clients are mutually distrustful
                   Siddhartha Annapureddy    Shark
File Metadata



                                 GET_TOKEN
                                 (fh,offset,count)
                Client            Metadata:
                                                     Server
                                 Chunk tokens




                 Siddhartha Annapureddy   Shark
File Metadata



                                 GET_TOKEN
                                 (fh,offset,count)
                Client            Metadata:
                                                     Server
                                 Chunk tokens


       Possession of token implies server permissions to read




                 Siddhartha Annapureddy   Shark
File Metadata



                                 GET_TOKEN
                                 (fh,offset,count)
                Client            Metadata:
                                                     Server
                                 Chunk tokens


       Possession of token implies server permissions to read
       Tokens are a shared secret between authorized clients




                 Siddhartha Annapureddy   Shark
File Metadata



                                   GET_TOKEN
                                   (fh,offset,count)
                  Client            Metadata:
                                                       Server
                                   Chunk tokens


        Possession of token implies server permissions to read
        Tokens are a shared secret between authorized clients


   Given a chunk B...
       Chunk token TB = H(B)
       H is a collision resistant hash function

                   Siddhartha Annapureddy   Shark
File Metadata



                                   GET_TOKEN
                                   (fh,offset,count)
                  Client            Metadata:
                                                       Server
                                   Chunk tokens


        Possession of token implies server permissions to read
        Tokens are a shared secret between authorized clients
        Tokens can be used to check integrity of fetched data
   Given a chunk B...
       Chunk token TB = H(B)
       H is a collision resistant hash function

                   Siddhartha Annapureddy   Shark
Discovering Clients Caching Chunks

                    Client



     Client        Discover Clients           Client   Server


                Client
                                   Client



      For every chunk B, there’s indexing key IB
      IB used to index clients caching B
      Cannot set IB = TB , as TB is a secret
              IB = MAC (TB , “Indexing Key”)

                     Siddhartha Annapureddy   Shark
Locality Awareness



             Client                                       Client

                        Network A             Network B

             Client                                       Client




      Overlay organized as clusters based on latency
      Indexing infrastructure preferentially returns sources in same
      cluster as the client
      Hence, chunks usually transferred from nearby clients
                  Siddhartha Annapureddy   Shark
Final Steps



   Download Chunk
       Security issues discussed later
   Register as a source
       Client now becomes a source for the downloaded chunk
       Client registers in distributed index – PUT(IB , Addr)
   Chunk Reconciliation
       Reuse connection to download more chunks
       Exchange mutually needed chunks w/o indexing overhead




                   Siddhartha Annapureddy   Shark
Security Issues – Client Authentication

                  Client


                                                      Traditional
     Client                                 Client   Authentication
                                                                      Server


              Client
                                Client




       Traditionally, server authenticated read requests using uids




                   Siddhartha Annapureddy    Shark
Security Issues – Client Authentication

                  Client


               Authenticator Derived                   Traditional
     Client        From Token
                                             Client   Authentication
                                                                       Server


              Client
                                 Client




       Traditionally, server authenticated read requests using uids
       Challenge – How does a client know when to send chunks?
       Chunk token allows client to identify authorized clients

                    Siddhartha Annapureddy    Shark
Security Issues – Client Communication




      Client should be able to check integrity of downloaded chunk
      Client should not send chunks to other unauthorized clients
      An eavesdropper shouldn’t be able to obtain chunk contents




                  Siddhartha Annapureddy   Shark
Security Protocol



              Receiver                           Source
               Client                            Client




                Siddhartha Annapureddy   Shark
Security Protocol

                                           RC
                                           RP
                Receiver                           Source
                 Client                            Client



       RC , RP – Random nonces to ensure freshness




                  Siddhartha Annapureddy   Shark
Security Protocol

                                           RC
                                           RP
                Receiver                            Source
                 Client                IB, AuthC    Client
                                           EK(B)


       RC , RP – Random nonces to ensure freshness

       AuthC – Authenticator to prove receiver has token
           AuthC = MAC (TB , “Auth C”, C, P, RC , RP )




                  Siddhartha Annapureddy    Shark
Security Protocol

                                           RC
                                           RP
                Receiver                            Source
                 Client                IB, AuthC    Client
                                           EK(B)


       RC , RP – Random nonces to ensure freshness

       AuthC – Authenticator to prove receiver has token
           AuthC = MAC (TB , “Auth C”, C, P, RC , RP )

       K – Key to encrypt chunk contents
           K = MAC (TB , “Encryption”, C, P, RC , RP )


                  Siddhartha Annapureddy    Shark
Security Properties

                                           RC
                                           RP
                Receiver                                Source
                 Client                IB, AuthC        Client
                                           EK(B)

   Client can check integrity of downloaded chunk
                                                    ?
       Client checks H(Downloaded chunk) = TB
   Source should not send chunks to unauthorized clients
       Malicious clients cannot send correct AuthC
   Eavesdropper shouldn’t get chunk contents
       All communication encrypted with K


                  Siddhartha Annapureddy    Shark
Security Properties

                                             RC
                                             RP
                 Receiver                             Source
                  Client                 IB, AuthC    Client
                                             EK(B)


   Privacy limitations for world-readable files
       Eavesdropper can track lookups of clients
       Eavesdropper hashes data, finds what exactly client downloads
   For private files, solution described in paper
       Sacrifices cross-FS sharing for better privacy
   Forward Secrecy not guaranteed

                    Siddhartha Annapureddy    Shark
Implementation

                                            Server
                                            Sharksd




                                            NFS V3
                                             Server

           Client                                                         Client
       Application   Sharkcd                                      Sharkcd Application

        Syscall                                                              Syscall
                               Corald                    Corald

          NFS V3                                                           NFS V3
           Client                                                           Client




      sharkcd – Incorporates source-receiver client functionality
      sharksd – Incorporates chunking mechanisms
      corald – A node in the indexing infrastructure
                       Siddhartha Annapureddy    Shark
Evaluation




      How does Shark compare with SFS? With NFS?
      How scalable is the server?
      How fair is Shark across clients?
      Which order is better? Random or Sequential
      What are the benefits of set reconciliation?




                  Siddhartha Annapureddy   Shark
Emulab – 100 Nodes on LAN

                                      100
   Percentage completed within time




                                       80



                                       60



                                       40

                                                                                                 40 MB read
                                                                                            Shark, rand, negotiation
                                       20                                                   NFS
                                                                                            SFS



                                       0
                                            0   100     200       300       400       500       600        700         800
                                                              Time since initialization (sec)
                                        Shark – 88s
                                        SFS – 775s (≈ 9x better), NFS – 350s (≈ 4x better)
                                        SFS less fair because of TCP backoffs

                                                      Siddhartha Annapureddy      Shark
PlanetLab – 185 Nodes

                                        100
     Percentage completed within time

                                        80



                                        60



                                        40



                                        20
                                                                                                     40 MB read
                                                                                                           Shark
                                                                                                           SFS
                                         0
                                              0      500            1000           1500       2000                 2500
                                                            Time since initialization (sec)


                   Shark ≈ 7 min – 95th Percentile
                   SFS ≈ 39 min – 95th Percentile (5x better)
                   NFS – Triggered kernel panics in server
                                                  Siddhartha Annapureddy   Shark
Data pushed by Server


                                                 7400




                    Normalized bandwidth
                                           1.0




                                           0.5

                                                            954

                                           0.0
                                                 SFS       Shark

      Shark vs SFS – 23 copies vs 185 copies (8x better)

                 Siddhartha Annapureddy            Shark
Data served by Clients
   Bandwidth transmitted (MB)


                                160                                                       PlanetLab hosts
                                140                                                              Shark, 40MB

                                120

                                100

                                 80

                                 60

                                 40

                                 20

                                 0
                                      0   10    20       30       40    50      60   70    80        90
                                                              Unique Shark proxies

                                 Maximum contribution ≈ 3.5 copies
                                 Median contribution ≈ 1.5 copies
                                 Minimum contribution ≈ 0.75 copies

                                               Siddhartha Annapureddy   Shark
Fetching Chunks – Order Matters



      In what order should we fetch chunks of a file?
      Natural choices – Random or Sequential
    Intuitively, when many clients start simultaneously
       Random
           All clients fetch independent chunks
           More chunks become available in the cooperative cache
      Sequential
           Better disk I/O scheduling on the server
           Client that downloads most chunks alone adds to cache




                   Siddhartha Annapureddy   Shark
Emulab – 100 Nodes on LAN

                                      100
   Percentage completed within time




                                       80



                                       60



                                       40

                                                                                            40 MB read
                                                                                               Shark, rand
                                       20                                                      Shark, seq




                                       0
                                            0   100     200       300       400       500         600        700   800
                                                              Time since initialization (sec)
                                        Random – 133s
                                        Sequential – 203s
                                        Random Wins !!! – 35% better

                                                      Siddhartha Annapureddy      Shark
Emulab – 100 Nodes on LAN

                                      100
   Percentage completed within time




                                       80



                                       60



                                       40

                                                                                                 40 MB read
                                                                                            Shark, rand
                                       20                                                   Shark, rand, negotiation




                                       0
                                            0   100     200       300       400       500       600        700         800
                                                              Time since initialization (sec)
                                        Random + Reconciliation – 88s
                                        Random – 133s
                                        Reconciliation crucial – 34% improvement

                                                      Siddhartha Annapureddy      Shark
Conclusions




      Networked filesystems offer a convenient interface
      Current networked filesystems like NFS are not scalable
          Forces users to resort to inconvenient tools like rsync
      Shark offers a filesystem that scales to hundreds of clients
          Locality-aware cooperative cache
          Supports cross-FS sharing enabling novel applications




                  Siddhartha Annapureddy   Shark
Questions




            http://www.scs.cs.nyu.edu/shark




                Siddhartha Annapureddy   Shark

Weitere ähnliche Inhalte

Andere mochten auch

LGO Presentation (25.1.2012)
LGO Presentation (25.1.2012)LGO Presentation (25.1.2012)
LGO Presentation (25.1.2012)Michael Hill
 
Game theory
Game theoryGame theory
Game theorypoundza
 
26º domingo tob 2015 bene pagola
26º domingo tob 2015 bene pagola26º domingo tob 2015 bene pagola
26º domingo tob 2015 bene pagolanuria04
 
London2012: Learnings on How Australians Consumed The Games
London2012:  Learnings on How Australians Consumed The GamesLondon2012:  Learnings on How Australians Consumed The Games
London2012: Learnings on How Australians Consumed The GamesSteve Weaver
 
SyncNI Magazine Spring 2012
SyncNI Magazine Spring 2012SyncNI Magazine Spring 2012
SyncNI Magazine Spring 2012Mark W. Bennett
 
Scala @ soundcloud [scaladores]
Scala @ soundcloud [scaladores]Scala @ soundcloud [scaladores]
Scala @ soundcloud [scaladores]Flavio W. Brasil
 
Game theory 11
Game theory 11Game theory 11
Game theory 11poundza
 
Onalytica-CyberSecurity-2015-Top-100-Influencers-And-Brands
Onalytica-CyberSecurity-2015-Top-100-Influencers-And-BrandsOnalytica-CyberSecurity-2015-Top-100-Influencers-And-Brands
Onalytica-CyberSecurity-2015-Top-100-Influencers-And-BrandsMark W. Bennett
 
The Death of TV? Hardly...
The Death of TV? Hardly...The Death of TV? Hardly...
The Death of TV? Hardly...Steve Weaver
 
ADR Benchmarking research
ADR Benchmarking researchADR Benchmarking research
ADR Benchmarking researchMichael Hill
 
ScalaDays Amsterdam - Don't block yourself
ScalaDays Amsterdam - Don't block yourselfScalaDays Amsterdam - Don't block yourself
ScalaDays Amsterdam - Don't block yourselfFlavio W. Brasil
 
Program Engagement Power. Programs Do Affect Ad Engagement
Program Engagement Power.  Programs Do Affect Ad EngagementProgram Engagement Power.  Programs Do Affect Ad Engagement
Program Engagement Power. Programs Do Affect Ad EngagementSteve Weaver
 
Social TV and the Rise of Companion Screens
Social TV and the Rise of Companion ScreensSocial TV and the Rise of Companion Screens
Social TV and the Rise of Companion ScreensSteve Weaver
 
Palestra mercado adulto para e commerce e-show maio 2014
Palestra mercado adulto para e commerce e-show maio 2014Palestra mercado adulto para e commerce e-show maio 2014
Palestra mercado adulto para e commerce e-show maio 2014Solange Oliveira
 
Service Master Response
Service Master ResponseService Master Response
Service Master Responsejaciblain
 

Andere mochten auch (19)

Horses
HorsesHorses
Horses
 
Turst
TurstTurst
Turst
 
LGO Presentation (25.1.2012)
LGO Presentation (25.1.2012)LGO Presentation (25.1.2012)
LGO Presentation (25.1.2012)
 
Game theory
Game theoryGame theory
Game theory
 
26º domingo tob 2015 bene pagola
26º domingo tob 2015 bene pagola26º domingo tob 2015 bene pagola
26º domingo tob 2015 bene pagola
 
London2012: Learnings on How Australians Consumed The Games
London2012:  Learnings on How Australians Consumed The GamesLondon2012:  Learnings on How Australians Consumed The Games
London2012: Learnings on How Australians Consumed The Games
 
SyncNI Magazine Spring 2012
SyncNI Magazine Spring 2012SyncNI Magazine Spring 2012
SyncNI Magazine Spring 2012
 
Scala @ soundcloud [scaladores]
Scala @ soundcloud [scaladores]Scala @ soundcloud [scaladores]
Scala @ soundcloud [scaladores]
 
Game theory 11
Game theory 11Game theory 11
Game theory 11
 
Onalytica-CyberSecurity-2015-Top-100-Influencers-And-Brands
Onalytica-CyberSecurity-2015-Top-100-Influencers-And-BrandsOnalytica-CyberSecurity-2015-Top-100-Influencers-And-Brands
Onalytica-CyberSecurity-2015-Top-100-Influencers-And-Brands
 
The Death of TV? Hardly...
The Death of TV? Hardly...The Death of TV? Hardly...
The Death of TV? Hardly...
 
Art apr one
Art apr oneArt apr one
Art apr one
 
ADR Benchmarking research
ADR Benchmarking researchADR Benchmarking research
ADR Benchmarking research
 
Design process
Design processDesign process
Design process
 
ScalaDays Amsterdam - Don't block yourself
ScalaDays Amsterdam - Don't block yourselfScalaDays Amsterdam - Don't block yourself
ScalaDays Amsterdam - Don't block yourself
 
Program Engagement Power. Programs Do Affect Ad Engagement
Program Engagement Power.  Programs Do Affect Ad EngagementProgram Engagement Power.  Programs Do Affect Ad Engagement
Program Engagement Power. Programs Do Affect Ad Engagement
 
Social TV and the Rise of Companion Screens
Social TV and the Rise of Companion ScreensSocial TV and the Rise of Companion Screens
Social TV and the Rise of Companion Screens
 
Palestra mercado adulto para e commerce e-show maio 2014
Palestra mercado adulto para e commerce e-show maio 2014Palestra mercado adulto para e commerce e-show maio 2014
Palestra mercado adulto para e commerce e-show maio 2014
 
Service Master Response
Service Master ResponseService Master Response
Service Master Response
 

Shark: Scaling File Servers via Cooperative Caching

  • 1. Shark: Scaling File Servers via Cooperative Caching Siddhartha Annapureddy Michael J. Freedman David Mazi`res e New York University Networked Systems Design and Implementation, 2005 Siddhartha Annapureddy Shark
  • 2. Motivation Scenario – Want to test a new application on a hundred nodes Workstation Workstation Workstation Workstation Workstation Workstation Workstation Workstation Workstation Workstation Problem – Need to push software to all nodes and collect results Siddhartha Annapureddy Shark
  • 3. Current Approaches Distribute from a central server Server Client Network A Network B Client Client rsync, unison, scp – Server’s network uplink saturates Operation takes a long time to finish – Wastes bandwidth along bottleneck links Siddhartha Annapureddy Shark
  • 4. Current Approaches File distribution mechanisms BitTorrent, Bullet + Scales by offloading burden on server – Client downloads from half-way across the world Siddhartha Annapureddy Shark
  • 5. Inherent Problems with Copying Files Users have to decide a priori what to ship Ship too much – Waste bandwidth, takes long time Ship too little – Hassle to work in a poor environment Idle environments consume disk space Users are loath to cleanup ⇒ Redistribution Need low cost solution for refetching files Manual management of software versions Illusion of having development environment Programs fetched transparently on demand Siddhartha Annapureddy Shark
  • 6. Networked File Systems + Know how to deploy these systems + Know how to administer such systems + Simple accountability mechanisms Eg: NFS, AFS, SFS etc. Siddhartha Annapureddy Shark
  • 7. Problem: Scalability 800 700 Time to finish Read (sec) 600 500 400 40 MB read SFS 300 200 100 0 0 20 40 60 80 100 Unique nodes Very slow ≈ 775s Siddhartha Annapureddy Shark
  • 8. Problem: Scalability 800 700 Time to finish Read (sec) 600 500 400 40 MB read SFS Shark 300 200 100 0 0 20 40 60 80 100 Unique nodes Very slow ≈ 775s Much better !!! ≈ 88s (9x better) Siddhartha Annapureddy Shark
  • 9. Peer-to-Peer Filesystems P2P Filesystems (Pond, Ivy) + Scalability – Non-standard administrative models – New filesystem semantics – Haven’t been widely deployed yet Goal – Best of Both Worlds Convenience of central servers Scalability of peer-to-peer Siddhartha Annapureddy Shark
  • 10. Let’s Design this New Filesystem Client Server Vanilla SFS Central server model Siddhartha Annapureddy Shark
  • 11. Scalability – Client-side Caching Client Server Cache Large client cache ` la AFS a Scales by avoiding redundant data transfers Leases to ensure NFS-style cache consistency With multiple clients, must address bandwidth concerns Siddhartha Annapureddy Shark
  • 12. Scalability – Cooperative Caching Client Client Client Server Client Client Siddhartha Annapureddy Shark
  • 13. Scalability – Cooperative Caching Clients fetch data from each other and offload burden from server Client Client Client Server Client Client Shark clients maintain a distributed index Siddhartha Annapureddy Shark
  • 14. Scalability – Cooperative Caching Clients fetch data from each other and offload burden from server Client Client Client Server Client Client Shark clients maintain a distributed index Fetch a file from multiple other clients in parallel Siddhartha Annapureddy Shark
  • 15. Scalability – Cooperative Caching Clients fetch data from each other and offload burden from server Client Client Client Server Client Client Shark clients maintain a distributed index Fetch a file from multiple other clients in parallel LBFS-style Chunks – Variable-sized blocks Chunks are better – Exploit commonalities across files Chunks preserved across file versions, concatenations Siddhartha Annapureddy Shark
  • 16. Scalability – Cooperative Caching Clients fetch data from each other and offload burden from server Client Client Client Server Client Client Shark clients maintain a distributed index Fetch a file from multiple other clients in parallel Locality-awareness – Preferentially fetch chunks from nearby clients Siddhartha Annapureddy Shark
  • 17. Cross-filesystem Sharing Global cooperative cache regardless of origin servers Server Server A B Client Client Client Client Client Client Client Client Client Client Two groups of clients accessing servers A and B Client groups share a large amount of software Such clients automatically form a global cache Siddhartha Annapureddy Shark
  • 18. Cross-filesystem Sharing Global cooperative cache regardless of origin servers Server Server A B Client Client Client Client Client Client Client Client Client Client Two groups of clients accessing servers A and B Client groups share a large amount of software Such clients automatically form a global cache Siddhartha Annapureddy Shark
  • 19. Application Linux distribution with LiveCDs Knoppix Sharkcd Slax Knoppix Slax Sharkcd Slax Sharkcd Knoppix Knoppix Mirror Sharkcd LiveCD – Run an entire OS without using hard disk But all your programs must fit on a CD-ROM Download dynamically from server but scalability problems Knoppix and Slax users form global cache – Relieve servers Siddhartha Annapureddy Shark
  • 20. Cooperative Caching – Roadmap Client Fetch Metadata Client Client Server Client Client Fetch metadata from the server Siddhartha Annapureddy Shark
  • 21. Cooperative Caching – Roadmap Client Client Lookup Clients Client Server Client Client Fetch metadata from the server Look up clients caching needed chunks in overlay Siddhartha Annapureddy Shark
  • 22. Cooperative Caching – Roadmap Client Client Download Chunks Client Server Client Client Fetch metadata from the server Look up clients caching needed chunks in overlay Connect to multiple clients and download chunks in parallel Siddhartha Annapureddy Shark
  • 23. Cooperative Caching – Roadmap Client Client Download Chunks Client Server Client Client Fetch metadata from the server Look up clients caching needed chunks in overlay Connect to multiple clients and download chunks in parallel Check integrity of fetched chunks Clients are mutually distrustful Siddhartha Annapureddy Shark
  • 24. File Metadata GET_TOKEN (fh,offset,count) Client Metadata: Server Chunk tokens Siddhartha Annapureddy Shark
  • 25. File Metadata GET_TOKEN (fh,offset,count) Client Metadata: Server Chunk tokens Possession of token implies server permissions to read Siddhartha Annapureddy Shark
  • 26. File Metadata GET_TOKEN (fh,offset,count) Client Metadata: Server Chunk tokens Possession of token implies server permissions to read Tokens are a shared secret between authorized clients Siddhartha Annapureddy Shark
  • 27. File Metadata GET_TOKEN (fh,offset,count) Client Metadata: Server Chunk tokens Possession of token implies server permissions to read Tokens are a shared secret between authorized clients Given a chunk B... Chunk token TB = H(B) H is a collision resistant hash function Siddhartha Annapureddy Shark
  • 28. File Metadata GET_TOKEN (fh,offset,count) Client Metadata: Server Chunk tokens Possession of token implies server permissions to read Tokens are a shared secret between authorized clients Tokens can be used to check integrity of fetched data Given a chunk B... Chunk token TB = H(B) H is a collision resistant hash function Siddhartha Annapureddy Shark
  • 29. Discovering Clients Caching Chunks Client Client Discover Clients Client Server Client Client For every chunk B, there’s indexing key IB IB used to index clients caching B Cannot set IB = TB , as TB is a secret IB = MAC (TB , “Indexing Key”) Siddhartha Annapureddy Shark
  • 30. Locality Awareness Client Client Network A Network B Client Client Overlay organized as clusters based on latency Indexing infrastructure preferentially returns sources in same cluster as the client Hence, chunks usually transferred from nearby clients Siddhartha Annapureddy Shark
  • 31. Final Steps Download Chunk Security issues discussed later Register as a source Client now becomes a source for the downloaded chunk Client registers in distributed index – PUT(IB , Addr) Chunk Reconciliation Reuse connection to download more chunks Exchange mutually needed chunks w/o indexing overhead Siddhartha Annapureddy Shark
  • 32. Security Issues – Client Authentication Client Traditional Client Client Authentication Server Client Client Traditionally, server authenticated read requests using uids Siddhartha Annapureddy Shark
  • 33. Security Issues – Client Authentication Client Authenticator Derived Traditional Client From Token Client Authentication Server Client Client Traditionally, server authenticated read requests using uids Challenge – How does a client know when to send chunks? Chunk token allows client to identify authorized clients Siddhartha Annapureddy Shark
  • 34. Security Issues – Client Communication Client should be able to check integrity of downloaded chunk Client should not send chunks to other unauthorized clients An eavesdropper shouldn’t be able to obtain chunk contents Siddhartha Annapureddy Shark
  • 35. Security Protocol Receiver Source Client Client Siddhartha Annapureddy Shark
  • 36. Security Protocol RC RP Receiver Source Client Client RC , RP – Random nonces to ensure freshness Siddhartha Annapureddy Shark
  • 37. Security Protocol RC RP Receiver Source Client IB, AuthC Client EK(B) RC , RP – Random nonces to ensure freshness AuthC – Authenticator to prove receiver has token AuthC = MAC (TB , “Auth C”, C, P, RC , RP ) Siddhartha Annapureddy Shark
  • 38. Security Protocol RC RP Receiver Source Client IB, AuthC Client EK(B) RC , RP – Random nonces to ensure freshness AuthC – Authenticator to prove receiver has token AuthC = MAC (TB , “Auth C”, C, P, RC , RP ) K – Key to encrypt chunk contents K = MAC (TB , “Encryption”, C, P, RC , RP ) Siddhartha Annapureddy Shark
  • 39. Security Properties RC RP Receiver Source Client IB, AuthC Client EK(B) Client can check integrity of downloaded chunk ? Client checks H(Downloaded chunk) = TB Source should not send chunks to unauthorized clients Malicious clients cannot send correct AuthC Eavesdropper shouldn’t get chunk contents All communication encrypted with K Siddhartha Annapureddy Shark
  • 40. Security Properties RC RP Receiver Source Client IB, AuthC Client EK(B) Privacy limitations for world-readable files Eavesdropper can track lookups of clients Eavesdropper hashes data, finds what exactly client downloads For private files, solution described in paper Sacrifices cross-FS sharing for better privacy Forward Secrecy not guaranteed Siddhartha Annapureddy Shark
  • 41. Implementation Server Sharksd NFS V3 Server Client Client Application Sharkcd Sharkcd Application Syscall Syscall Corald Corald NFS V3 NFS V3 Client Client sharkcd – Incorporates source-receiver client functionality sharksd – Incorporates chunking mechanisms corald – A node in the indexing infrastructure Siddhartha Annapureddy Shark
  • 42. Evaluation How does Shark compare with SFS? With NFS? How scalable is the server? How fair is Shark across clients? Which order is better? Random or Sequential What are the benefits of set reconciliation? Siddhartha Annapureddy Shark
  • 43. Emulab – 100 Nodes on LAN 100 Percentage completed within time 80 60 40 40 MB read Shark, rand, negotiation 20 NFS SFS 0 0 100 200 300 400 500 600 700 800 Time since initialization (sec) Shark – 88s SFS – 775s (≈ 9x better), NFS – 350s (≈ 4x better) SFS less fair because of TCP backoffs Siddhartha Annapureddy Shark
  • 44. PlanetLab – 185 Nodes 100 Percentage completed within time 80 60 40 20 40 MB read Shark SFS 0 0 500 1000 1500 2000 2500 Time since initialization (sec) Shark ≈ 7 min – 95th Percentile SFS ≈ 39 min – 95th Percentile (5x better) NFS – Triggered kernel panics in server Siddhartha Annapureddy Shark
  • 45. Data pushed by Server 7400 Normalized bandwidth 1.0 0.5 954 0.0 SFS Shark Shark vs SFS – 23 copies vs 185 copies (8x better) Siddhartha Annapureddy Shark
  • 46. Data served by Clients Bandwidth transmitted (MB) 160 PlanetLab hosts 140 Shark, 40MB 120 100 80 60 40 20 0 0 10 20 30 40 50 60 70 80 90 Unique Shark proxies Maximum contribution ≈ 3.5 copies Median contribution ≈ 1.5 copies Minimum contribution ≈ 0.75 copies Siddhartha Annapureddy Shark
  • 47. Fetching Chunks – Order Matters In what order should we fetch chunks of a file? Natural choices – Random or Sequential Intuitively, when many clients start simultaneously Random All clients fetch independent chunks More chunks become available in the cooperative cache Sequential Better disk I/O scheduling on the server Client that downloads most chunks alone adds to cache Siddhartha Annapureddy Shark
  • 48. Emulab – 100 Nodes on LAN 100 Percentage completed within time 80 60 40 40 MB read Shark, rand 20 Shark, seq 0 0 100 200 300 400 500 600 700 800 Time since initialization (sec) Random – 133s Sequential – 203s Random Wins !!! – 35% better Siddhartha Annapureddy Shark
  • 49. Emulab – 100 Nodes on LAN 100 Percentage completed within time 80 60 40 40 MB read Shark, rand 20 Shark, rand, negotiation 0 0 100 200 300 400 500 600 700 800 Time since initialization (sec) Random + Reconciliation – 88s Random – 133s Reconciliation crucial – 34% improvement Siddhartha Annapureddy Shark
  • 50. Conclusions Networked filesystems offer a convenient interface Current networked filesystems like NFS are not scalable Forces users to resort to inconvenient tools like rsync Shark offers a filesystem that scales to hundreds of clients Locality-aware cooperative cache Supports cross-FS sharing enabling novel applications Siddhartha Annapureddy Shark
  • 51. Questions http://www.scs.cs.nyu.edu/shark Siddhartha Annapureddy Shark