SlideShare a Scribd company logo
1 of 46
Download to read offline
Causality-Based Versioning
    Kiran-Kumar Muniswamy-Reddy and David A. Holland
    Slides By Authors And Aleatha Parker-Wood




Tuesday, June 1, 2010
Versioning


    •    Already popular

    •    Saves back up “versions” of files as they change

    •    Two flavors: versioning (event based) and snapshotting (time based)

    •    Snapshots: WAFL, Venti...

    •    Versioning: Elephant, VersionFS...



Tuesday, June 1, 2010
Why Version/Snapshot?

    •    Disaster recovery is baked into the file system

    •    “Oops, I needed that...”

    •    “Oops, I didn’t mean to click that virus...”

    •    “Oops, that new driver patch broke everything...”

    •    Maintains backup files to which you can recover (without going
         offsite)


Tuesday, June 1, 2010
Causality

    •    Depends on time (to cause Y, X must be before it)

    •    Uni-directional (If X causes Y, Y cannot cause X)

    •    Defined in terms of data flow

          •    A reads B ⇒ B causes A

          •    A writes B ⇒ A causes B

    •    PASS, Intrusion Dectection Systems (BackTracker, Taser...)


Tuesday, June 1, 2010
Why Causality?



    •    Track propagation of data

    •    Find out what files were modified by what processes

    •    Reconstruct the scene of the crime




Tuesday, June 1, 2010
Causality-Based Versioning

    •    Decide when to version using causal relationships between two files

    •    Has advantages of versioning file systems or snapshots

    •    Eases recovery from corruption, viruses, and user mistakes

    •    In addition, creates causal links between files

    •    Easier to decide what to restore

    •    Sort of like transactions on steroids


Tuesday, June 1, 2010
Applications


    •    Intrusion Recovery

    •    System configuration management

    •    IP compliance

    •    Reproduction of research results




Tuesday, June 1, 2010
A Scenario...


          •    Apache split-logfile Vulnerability

          •    Vulnerability in Apache 1.3

          •    Vulnerability allows attacker to overwrite any file with a .log
               extension

          •    Let’s look at the current versioning options...




Tuesday, June 1, 2010
#'

      $%                 
*+


      $                                         ,-

      '$

       ''                                                           ,-

       '()             *


      !
#$% !       7


Tuesday, June 1, 2010
8)	'

      $%                                                      

                                                               

 
      $                                                        %
                           '.*+


     '$


      '()              *


      !
#$% !           !


Tuesday, June 1, 2010
$%
      $                                                


                        '.*+
                           '$0!
                        (.*+
                           -

                                             .*+



     '$
                                                 /'.*+

                                                  /(.*+

      !
#$% !      5


Tuesday, June 1, 2010
The Goal



    •    One of these has too much information

    •    The other not enough

    •    Can we leverage causality to create just enough versions?




Tuesday, June 1, 2010
Creating Just Enough Versions


    •    Building on top of the Provenance Aware Storage System (PASS)

    •    Two options

          •    Cycle Avoidance

          •    Graph Finesse




Tuesday, June 1, 2010
How PASS works


    •    Translates system calls to provenance records (read/write become
         edges in a dependency graph)

    •    Maintains provenance for transient objects such as pipes and
         processes, and creates virtual objects as needed

    •    Analyzes to ensure there are no cyclic dependencies between objects

    •    Causality based versioning extends the analysis phase



Tuesday, June 1, 2010
The big idea



    •    Cycles are violations of causality

    •    The creation of a cycle is an indicator that this is an interesting event

    •    We can prevent cycles by creating a new version every time a cycle is
         about to occur




Tuesday, June 1, 2010
6)
'

                          3          D
2
!
!
#$% !   5!


Tuesday, June 1, 2010
3         D

            8)
)
'


      !
#$% !         


Tuesday, June 1, 2010
3          D

            8)
)                                     3
'                                          '



      !
#$% !         5


Tuesday, June 1, 2010
3          D

            8)
)                                      3
'   (                                      '



      !
#$% !          


Tuesday, June 1, 2010
3          D

            8)
)                                      3
'   (                                (    '



      !
#$% !            /


Tuesday, June 1, 2010
3              D

            8)
)                                      3
'   (                                (    '



      !
#$% !             0


Tuesday, June 1, 2010
3          D
            8)
)                   3
                45


	+
'    (                                  (   '


      !

More Related Content

Similar to Causality Based Versioning

Google App Engine - Devfest India 2010
Google App Engine -  Devfest India 2010Google App Engine -  Devfest India 2010
Google App Engine - Devfest India 2010Patrick Chanezon
 
Mobile Strategy & Product Dev. - iRush
Mobile Strategy & Product Dev. - iRushMobile Strategy & Product Dev. - iRush
Mobile Strategy & Product Dev. - iRushAndrew Donoho
 
Learning with Digital Media
Learning with Digital MediaLearning with Digital Media
Learning with Digital MediaThe New School
 
Human APIs, the future of mobile
Human APIs, the future of mobileHuman APIs, the future of mobile
Human APIs, the future of mobileNikolai Onken
 
OpenStreetMap & Walking-Papers Workflow
OpenStreetMap & Walking-Papers WorkflowOpenStreetMap & Walking-Papers Workflow
OpenStreetMap & Walking-Papers WorkflowShoaib Burq
 
Мерчендайзинг против юзабилити
Мерчендайзинг против юзабилитиМерчендайзинг против юзабилити
Мерчендайзинг против юзабилитиOWOX
 

Similar to Causality Based Versioning (8)

HTML5 offline
HTML5 offlineHTML5 offline
HTML5 offline
 
Google App Engine - Devfest India 2010
Google App Engine -  Devfest India 2010Google App Engine -  Devfest India 2010
Google App Engine - Devfest India 2010
 
Mobile Strategy & Product Dev. - iRush
Mobile Strategy & Product Dev. - iRushMobile Strategy & Product Dev. - iRush
Mobile Strategy & Product Dev. - iRush
 
Learning with Digital Media
Learning with Digital MediaLearning with Digital Media
Learning with Digital Media
 
Human APIs, the future of mobile
Human APIs, the future of mobileHuman APIs, the future of mobile
Human APIs, the future of mobile
 
OpenStreetMap & Walking-Papers Workflow
OpenStreetMap & Walking-Papers WorkflowOpenStreetMap & Walking-Papers Workflow
OpenStreetMap & Walking-Papers Workflow
 
Ruby Coding Dojo
Ruby Coding DojoRuby Coding Dojo
Ruby Coding Dojo
 
Мерчендайзинг против юзабилити
Мерчендайзинг против юзабилитиМерчендайзинг против юзабилити
Мерчендайзинг против юзабилити
 

Recently uploaded

An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfSanaAli374401
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.MateoGardella
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 

Recently uploaded (20)

An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 

Causality Based Versioning

  • 1. Causality-Based Versioning Kiran-Kumar Muniswamy-Reddy and David A. Holland Slides By Authors And Aleatha Parker-Wood Tuesday, June 1, 2010
  • 2. Versioning • Already popular • Saves back up “versions” of files as they change • Two flavors: versioning (event based) and snapshotting (time based) • Snapshots: WAFL, Venti... • Versioning: Elephant, VersionFS... Tuesday, June 1, 2010
  • 3. Why Version/Snapshot? • Disaster recovery is baked into the file system • “Oops, I needed that...” • “Oops, I didn’t mean to click that virus...” • “Oops, that new driver patch broke everything...” • Maintains backup files to which you can recover (without going offsite) Tuesday, June 1, 2010
  • 4. Causality • Depends on time (to cause Y, X must be before it) • Uni-directional (If X causes Y, Y cannot cause X) • Defined in terms of data flow • A reads B ⇒ B causes A • A writes B ⇒ A causes B • PASS, Intrusion Dectection Systems (BackTracker, Taser...) Tuesday, June 1, 2010
  • 5. Why Causality? • Track propagation of data • Find out what files were modified by what processes • Reconstruct the scene of the crime Tuesday, June 1, 2010
  • 6. Causality-Based Versioning • Decide when to version using causal relationships between two files • Has advantages of versioning file systems or snapshots • Eases recovery from corruption, viruses, and user mistakes • In addition, creates causal links between files • Easier to decide what to restore • Sort of like transactions on steroids Tuesday, June 1, 2010
  • 7. Applications • Intrusion Recovery • System configuration management • IP compliance • Reproduction of research results Tuesday, June 1, 2010
  • 8. A Scenario... • Apache split-logfile Vulnerability • Vulnerability in Apache 1.3 • Vulnerability allows attacker to overwrite any file with a .log extension • Let’s look at the current versioning options... Tuesday, June 1, 2010
  • 9. #' $% *+ $ ,- '$ '' ,- '() * !
  • 10. #$% ! 7 Tuesday, June 1, 2010
  • 11. 8) ' $% $ % '.*+ '$ '() * !
  • 12. #$% ! ! Tuesday, June 1, 2010
  • 13. $% $ '.*+ '$0! (.*+ - .*+ '$ /'.*+ /(.*+ !
  • 14. #$% ! 5 Tuesday, June 1, 2010
  • 15. The Goal • One of these has too much information • The other not enough • Can we leverage causality to create just enough versions? Tuesday, June 1, 2010
  • 16. Creating Just Enough Versions • Building on top of the Provenance Aware Storage System (PASS) • Two options • Cycle Avoidance • Graph Finesse Tuesday, June 1, 2010
  • 17. How PASS works • Translates system calls to provenance records (read/write become edges in a dependency graph) • Maintains provenance for transient objects such as pipes and processes, and creates virtual objects as needed • Analyzes to ensure there are no cyclic dependencies between objects • Causality based versioning extends the analysis phase Tuesday, June 1, 2010
  • 18. The big idea • Cycles are violations of causality • The creation of a cycle is an indicator that this is an interesting event • We can prevent cycles by creating a new version every time a cycle is about to occur Tuesday, June 1, 2010
  • 19. 6) ' 3 D
  • 20. 2
  • 21. !
  • 22. !
  • 23. #$% ! 5! Tuesday, June 1, 2010
  • 24. 3 D 8)
  • 25. )
  • 26. ' !
  • 27. #$% ! Tuesday, June 1, 2010
  • 28. 3 D 8)
  • 29. ) 3
  • 30. ' ' !
  • 31. #$% ! 5 Tuesday, June 1, 2010
  • 32. 3 D 8)
  • 33. ) 3
  • 34. ' ( ' !
  • 35. #$% ! Tuesday, June 1, 2010
  • 36. 3 D 8)
  • 37. ) 3
  • 38. ' ( ( ' !
  • 39. #$% ! / Tuesday, June 1, 2010
  • 40. 3 D 8)
  • 41. ) 3
  • 42. ' ( ( ' !
  • 43. #$% ! 0 Tuesday, June 1, 2010
  • 44. 3 D 8)
  • 45. ) 3 45 +
  • 46. ' ( ( ' !
  • 47. #$% ! Tuesday, June 1, 2010
  • 48. Version-On-Write? • We could remove cycles using Version-On-Write • Every read creates a new version of the process • Every write creates a new version of the file • But this results in 8 versions • Huge management overhead Tuesday, June 1, 2010
  • 49. Cycle Avoidance Algorithm • Uses local information about the object • Create a new version of an object whenever a new ancestor is added • Different versions are considered to be “new” ancestors • Not every write causes a new version Tuesday, June 1, 2010
  • 50. The Algorithm • Assume new data: A1 depends on B2 • If B is not in A’s dependencies, create a new version of A • Else if B is already in A’s dependencies: • If B2 is in dependencies, discard (no new information) • If B3 is in dependencies, discard (no new causality) • If B1 is in dependencies, create new version of A Tuesday, June 1, 2010
  • 51. 3 D '
  • 52. '
  • 53. )' )(
  • 54. ' ' !
  • 55. #$% ! ! Tuesday, June 1, 2010
  • 56. 3 D '
  • 57. '
  • 58. )( )6 3(
  • 59. ' ( ( ' !
  • 60. #$% ! / Tuesday, June 1, 2010
  • 61. 3 D
  • 62. '
  • 63. '
  • 64. 5 0 )( )6 3( 36 ! ' ( ( ' !
  • 65. #$% ! /5 Tuesday, June 1, 2010
  • 66. Graph Finesse • As before: A1 depends on B2 • If B2 is already in A’s history, discard • Otherwise, check for a path from B2 - A1 • If yes, we have a cycle. Make a new version of A1 • Otherwise, add A1- B2 to the dependency graph Tuesday, June 1, 2010
  • 67. 3 D 9)
  • 68. )' 3' 3( ' ( ( ' !
  • 69. #$% ! /0 Tuesday, June 1, 2010
  • 70. '
  • 71. ' )( )6 3( 36 ' ( ( ' 7 8+
  • 72. 9) )' 3' 3( ' ( ( ' !
  • 73. #$% ! / Tuesday, June 1, 2010
  • 74. '
  • 75. ' 9) . ?' . 9+ * * '
  • 76. !
  • 77. #$% ! /1 Tuesday, June 1, 2010
  • 78. Evaluation • Run-time overhead • Space overhead • Recovery costs • All results are average of 5 runs • Less than 5% standard deviation Tuesday, June 1, 2010
  • 79. Workloads used • Linux compile (CPU intensive) • Postmark (I/O intensive) • Applying patches with Mercurial (developer workload) • blast protein-sequencing (scientific workload) Tuesday, June 1, 2010
  • 80. Algorithms used • Without causal data: • Ext2: Baseline (Lasagna, Harvard’s versioning FS, on top of ext2) • VER: Plain open-close versioning • With causal data • OC: Open-close • CA: Cycle-Avoidance • GF: Graph Finesse • ALL: version on every write Tuesday, June 1, 2010
  • 82. $ 6$$$ ;B+C: , (;$$ A '%+6: ('+6: 'B+': ($$$ ''+: ?@ ';$$ '$$$ ;$$ $ ( = 4 78 !
  • 83. #$% ! 0 Tuesday, June 1, 2010
  • 85. 6+$ '('+D: (+; (+$ ?7@ '+; ';+%: 'B+D: ';+%: (+: '+$ $+; $+$ ( = 4 78 !
  • 86. #$% ! 07 Tuesday, June 1, 2010
  • 88. $ 'C$$+$ , A %+D: '($$+$ D'+6: '$$$+$ (;+: (%+%: (B+: ?@ %$$+$ D$$+$ C$$+$ ($$+$ $+$ ( = 4 78 !
  • 89. #$% ! 5 Tuesday, June 1, 2010
  • 91. '+C ;6+B: '+( 6'+D: 6$+(: 6'+: (D+D: '+$ ?7@ $+% $+D $+C $+( $+$ ( = 4 78 !
  • 92. #$% ! 0 Tuesday, June 1, 2010
  • 93. ' ', )* ' ) ' **' '+',
  • 94. )' ) ' !
  • 95. #$% ! Tuesday, June 1, 2010
  • 97. 3 ) !
  • 98. #$% ! 1 Tuesday, June 1, 2010
  • 99. ' '+',= #)' . 8 1 5 541 04 9 570 04 ?? 41! 5!49 !
  • 100. #$% ! 4 Tuesday, June 1, 2010
  • 101. ' $ 6$ = ?@ (; ($ '; 78 '$ ; $ = ! ' = ! ; = !
  • 102. #$% ! 7 Tuesday, June 1, 2010
  • 103. ' $ %$$ (;+'- B$$ 78 = ?@ D$$ ;$$ 'B+- C$$ 6$$ +6- ($$ '$$ $ = ! ' = ! ; = !
  • 104. #$% ! ! Tuesday, June 1, 2010
  • 105. Conclusions • Both algorithms require less time and space than Version-On-Write • Both algorithms offer finer grained control than Open-Close • Graph-Finesse creates fewer unnecessary versions • Cycle-Avoidance has overhead comparable to Open-Close Tuesday, June 1, 2010
  • 106. Expanding on it • Not just good for disaster recovery • Search • Social network analysis Tuesday, June 1, 2010