SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Hadoop:
  Code Injection
  Distributed Fault Injection

    Konstantin Boudnik
       Hadoop committer, Pig contributor
       cos@apache.org
Few assumptions

   The following work has been done with use of AOP injection technology called
    AspectJ
   Similar results could be achieved with
       direct code implementation
       MOP (monkey-patching)
       Direct byte-code manipulations
   Offered approaches aren't limited by the scope of Hadoop platform ;)

   The scope of the talk isn't about AspectJ nor AOP/MOP technology




                                             2
Code Injection




       3
What for?

    Some APIs as extremely useful as dangerous if made public
       stop/blacklist a node or daemon
    
        change a node configuration

    certain functionality is experimental and needn't to be in production


    a component's source code is unavailable


    a build's re-spin isn't practical


    many changes of the same nature need to be applied


    your application doesn't have enough bugs yet


                                          4
Use cases
   producing a build for developer's testing

   simulate faults and test error recovery before deployment

   to sneak-in to the production something your boss don't
    need to know




                                 5
Injecting away
pointcut execGetBlockFile() :
// the following will inject faults inside of the method in question
  execution (* FSDataset.getBlockFile(..)) && !within(FSDatasetAspects +);

// This aspect specifies the logic of our fault point.
// In this case it simply throws DiskErrorException before invoking
// the method, specified by callGetBlockFile() pointcut
before() throws DiskErrorException : execGetBlockFile() {
  if (ProbabilityModel.injectCriteria(FSDataset.class.getSimpleName())) {
    LOG.info("Before the injection point");
    Thread.dumpStack();
    throw new DiskErrorException("FI: injected fault point at "
        + thisJoinPoint.getStaticPart().getSourceLocation());
  }
}




                                      6
Injecting away (intercept & mock)
 pointcut callCreateUri() : call (URI FileDataServlet.createUri(
     String, HdfsFileStatus, UserGroupInformation, ClientProtocol,
     HttpServletRequest, String));

 /** Replace host name with "localhost" for unit test environment. */
 URI around () throws URISyntaxException : callCreateUri() {
   final URI original = proceed();
   LOG.info("FI: original uri = " + original);
   final URI replaced = new URI(original.getScheme(),
       original.getUserInfo(),
       "localhost", original.getPort(), original.getPath(),
       original.getQuery(),
       original.getFragment()) ;
   LOG.info("FI: replaced uri = " + replaced);
   return replaced;
 }




                                     7
Distributed Fault Injection




              8
Why Fault Injection

   Hadoop deals with many kinds of faults
           Block corruption

           Failures of disk, Datanode, Namenode, Clients, Jobtracker, Tasktrackers and Tasks

           Varying rates of bandwidth and latency

   These are hard to test
           Unit tests mostly deal with specific single faults or patterns

           Faults do not occur frequently and hard to reproduce

   Need to inject fault in the real system (as opposed to a simulated system)
   More info
       http://wiki.apache.org/hadoop/HowToUseInjectionFramework




                                                         9
Usage models


    An actor configures a Hadoop cluster and “dials-in” a desired faults then runs a
    set of applications on the cluster.
    
        Test the behavior of particular feature under faults
    
        Test time and consistency of recovery at high rate of faults
    
        Observe loss of data under certain pattern and frequency of faults
    
        Observe performance/utilization
        
            Note: can inject faults in the real system's (as opposed to a simulated system)
            running jobs

    An actor write/reuse a unit/function test using the fault inject framework to
    introduce faults during the test

    Recovery procedures testing (!)


                                                 10
Fault examples (Hdfs)

   Link/communication failure and communication corruption
       Namenode to Datanode communication
       Client to Datanode communications
       Client to Namenode communications
   Namenode related failures
       General slow downs
       Edit logs slow downs
       NFS-mounted volume is slow or not responding
   Datanode related failures
       Hardware corruption and data failures
   Storage latencies and bandwidth anomalies


                                                11
Fault examples (Mapreduce)


    Task tracker
    
        Lost task trackers

    Tasks
    
        Timeouts
    
        Slow downs
    
        Shuffle failures
    
        Sort/merge failures

    Local storage issues

    JobTracker failures

    Link communication failures and corruptions

                                         12
n
Scale


    Multi-hundred nodes cluster

    Heterogeneous environment
    
        OS. switches, secure/non-secure configurations


    Multi-node faults scenarios (e.g. pipelines recovery)


    Requires fault manager/dispensary

    
        Support for multi-node, multi-conditions faults

    
        Fault identification, reproducibility, repeatability

    
        Infrastructure auto-discovery to avoid configuration complexities


                                             13
Coming soon...




       14
Client side
pointcut execGetBlockFile() :
// the following will inject faults inside of the method in question
  execution (* FSDataset.getBlockFile(..)) && !within(FSDatasetAspects +);

before() throws DiskErrorException : execGetBlockFile() {
  ArrayList<GenericFault> pipelineFault =
       FiDispenser.getFaultsFor(FSDataset.class,
       FaultID.PipelineRecovery(),
       RANDOM);

    for (int i = 0; i < pipelineFault.size(); i++) {
      pipelineFault.get(i).execute();
    }
}

Fault dispenser
MachineGroup Rack1DataNodes = new MachineGroup(rack1, TYPE.DN)

Rack1DataNodes.each {
  if (it.type == RANDOM) {
    it.setTimeout(random.nextInt(2000))
    it.setType(DiskErrorException.class)
    it.setReport('logcollector.domain.com', SYSLOG)
  }
}
                                        15
Q&A




 16
Attic slides




      17
White-box system testing: Herriot




                18
Goals


    Write cluster-based tests using Java object model


    Automate many types of tests on real clusters:

    
        Functional

    
        System

    
        Load

    
        Recovery


    More information

    
        http://wiki.apache.org/hadoop/HowToUseSystemTestFramework



                                       19
Main Features

 
     Remote daemon Observability and Controllability APIs

 
     Enables large cluster-based tests written in Java using JUnit
     (TestNG) framework

 
     Herriot is comprised of a library of utility APIs, and code
     injections into Hadoop binaries

 
     Assumes a deployed and instrumented cluster

 
     Production build contains NO Herriot instrumentation

 
     Supports fault injection



                                        20
Major design considerations


 
     Common

 
     RPC-based utilities to control remote daemons

 
     Daemons belong to different roles

 
     Remote process management from Java: start/stop, change/push
     configuration, etc.

 
     HDFS and MR specific APIs on top of Common




                                     21
Common Features

 
     Get a daemon (a remote Hadoop process) current configuration

 
     Get a daemon process info: thread#, heap, environment…

 
     Ping a daemon (make sure it’s up and ready)

 
     Get/list FileStatus of a path from a remote daemon

 
     Deamon Log Inspection: Grep remote logs, count exceptions…

 
     Cluster setup/tear down; restart

 
     Change a daemon(s) configuration, push new configs…



                                        22
Deployment Diagram



                                    DN/TT host VM
 Test host VM
                                 JT host VM
                               NN host VM
   Test                                    Injected
                Herriot
                library                Injected
                                           code
                                   Herriot
                                       code
                                   injections




                          23

Weitere ähnliche Inhalte

Was ist angesagt?

Virtual platform
Virtual platformVirtual platform
Virtual platform
sean chen
 
Crash dump analysis - experience sharing
Crash dump analysis - experience sharingCrash dump analysis - experience sharing
Crash dump analysis - experience sharing
James Hsieh
 

Was ist angesagt? (20)

Implementation of the ZigBee ZCL Reporting Configuration Features
Implementation of the ZigBee ZCL Reporting Configuration FeaturesImplementation of the ZigBee ZCL Reporting Configuration Features
Implementation of the ZigBee ZCL Reporting Configuration Features
 
[嵌入式系統] MCS-51 實驗 - 使用 IAR (3)
[嵌入式系統] MCS-51 實驗 - 使用 IAR (3)[嵌入式系統] MCS-51 實驗 - 使用 IAR (3)
[嵌入式系統] MCS-51 實驗 - 使用 IAR (3)
 
Windows Debugging with WinDbg
Windows Debugging with WinDbgWindows Debugging with WinDbg
Windows Debugging with WinDbg
 
08 - Return Oriented Programming, the chosen one
08 - Return Oriented Programming, the chosen one08 - Return Oriented Programming, the chosen one
08 - Return Oriented Programming, the chosen one
 
Take a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru Otsuka
Take a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru OtsukaTake a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru Otsuka
Take a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru Otsuka
 
For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...
For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...
For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...
 
fg.workshop: Software vulnerability
fg.workshop: Software vulnerabilityfg.workshop: Software vulnerability
fg.workshop: Software vulnerability
 
You're Off the Hook: Blinding Security Software
You're Off the Hook: Blinding Security SoftwareYou're Off the Hook: Blinding Security Software
You're Off the Hook: Blinding Security Software
 
Erp 2.50 openbravo environment installation openbravo-wiki
Erp 2.50 openbravo environment installation   openbravo-wikiErp 2.50 openbravo environment installation   openbravo-wiki
Erp 2.50 openbravo environment installation openbravo-wiki
 
Debugging linux kernel tools and techniques
Debugging linux kernel tools and  techniquesDebugging linux kernel tools and  techniques
Debugging linux kernel tools and techniques
 
Linux Kernel Debugging Essentials workshop
Linux Kernel Debugging Essentials workshopLinux Kernel Debugging Essentials workshop
Linux Kernel Debugging Essentials workshop
 
Awesome_fuzzing_for _pentester_red-pill_2017
Awesome_fuzzing_for _pentester_red-pill_2017Awesome_fuzzing_for _pentester_red-pill_2017
Awesome_fuzzing_for _pentester_red-pill_2017
 
Course lecture - An introduction to the Return Oriented Programming
Course lecture - An introduction to the Return Oriented ProgrammingCourse lecture - An introduction to the Return Oriented Programming
Course lecture - An introduction to the Return Oriented Programming
 
Androsia: A step ahead in securing in-memory Android application data by Sami...
Androsia: A step ahead in securing in-memory Android application data by Sami...Androsia: A step ahead in securing in-memory Android application data by Sami...
Androsia: A step ahead in securing in-memory Android application data by Sami...
 
CppUnit using introduction
CppUnit using introductionCppUnit using introduction
CppUnit using introduction
 
Reverse engineering - Shellcodes techniques
Reverse engineering - Shellcodes techniquesReverse engineering - Shellcodes techniques
Reverse engineering - Shellcodes techniques
 
Virtual platform
Virtual platformVirtual platform
Virtual platform
 
[ZigBee 嵌入式系統] ZigBee 應用實作 - 使用 TI Z-Stack Firmware
[ZigBee 嵌入式系統] ZigBee 應用實作 - 使用 TI Z-Stack Firmware[ZigBee 嵌入式系統] ZigBee 應用實作 - 使用 TI Z-Stack Firmware
[ZigBee 嵌入式系統] ZigBee 應用實作 - 使用 TI Z-Stack Firmware
 
Crash dump analysis - experience sharing
Crash dump analysis - experience sharingCrash dump analysis - experience sharing
Crash dump analysis - experience sharing
 
Return Oriented Programming (ROP) Based Exploits - Part I
Return Oriented Programming  (ROP) Based Exploits  - Part IReturn Oriented Programming  (ROP) Based Exploits  - Part I
Return Oriented Programming (ROP) Based Exploits - Part I
 

Ähnlich wie Hadoop: Code Injection, Distributed Fault Injection

香港六合彩 &raquo; SlideShare
香港六合彩 &raquo; SlideShare香港六合彩 &raquo; SlideShare
香港六合彩 &raquo; SlideShare
yayao
 
Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!
Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!
Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!
Priyanka Aash
 
Working Effectively With Legacy Perl Code
Working Effectively With Legacy Perl CodeWorking Effectively With Legacy Perl Code
Working Effectively With Legacy Perl Code
erikmsp
 
Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Dev...
Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Dev...Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Dev...
Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Dev...
inovex GmbH
 
What's New In Apache Lenya 1.4
What's New In Apache Lenya 1.4What's New In Apache Lenya 1.4
What's New In Apache Lenya 1.4
nobby
 

Ähnlich wie Hadoop: Code Injection, Distributed Fault Injection (20)

Command pattern vs. MVC: Lean Beans (are made of this)
Command pattern vs. MVC: Lean Beans (are made of this)Command pattern vs. MVC: Lean Beans (are made of this)
Command pattern vs. MVC: Lean Beans (are made of this)
 
香港六合彩 &raquo; SlideShare
香港六合彩 &raquo; SlideShare香港六合彩 &raquo; SlideShare
香港六合彩 &raquo; SlideShare
 
Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!
Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!
Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!
 
Hadoop Internals
Hadoop InternalsHadoop Internals
Hadoop Internals
 
A Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy SystemA Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy System
 
Understanding the Dalvik Virtual Machine
Understanding the Dalvik Virtual MachineUnderstanding the Dalvik Virtual Machine
Understanding the Dalvik Virtual Machine
 
Working Effectively With Legacy Perl Code
Working Effectively With Legacy Perl CodeWorking Effectively With Legacy Perl Code
Working Effectively With Legacy Perl Code
 
Introduction to PowerShell
Introduction to PowerShellIntroduction to PowerShell
Introduction to PowerShell
 
Resilience Testing
Resilience Testing Resilience Testing
Resilience Testing
 
FIWARE Wednesday Webinars - How to Debug IoT Agents
FIWARE Wednesday Webinars - How to Debug IoT AgentsFIWARE Wednesday Webinars - How to Debug IoT Agents
FIWARE Wednesday Webinars - How to Debug IoT Agents
 
LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1
 
.Net Debugging Techniques
.Net Debugging Techniques.Net Debugging Techniques
.Net Debugging Techniques
 
.NET Debugging Tips and Techniques
.NET Debugging Tips and Techniques.NET Debugging Tips and Techniques
.NET Debugging Tips and Techniques
 
Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Dev...
Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Dev...Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Dev...
Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Dev...
 
Genode Compositions
Genode CompositionsGenode Compositions
Genode Compositions
 
What's New In Apache Lenya 1.4
What's New In Apache Lenya 1.4What's New In Apache Lenya 1.4
What's New In Apache Lenya 1.4
 
What to expect from Java 9
What to expect from Java 9What to expect from Java 9
What to expect from Java 9
 
Stopping the Rot - Putting Legacy C++ Under Test
Stopping the Rot - Putting Legacy C++ Under TestStopping the Rot - Putting Legacy C++ Under Test
Stopping the Rot - Putting Legacy C++ Under Test
 
Real World Lessons on the Pain Points of Node.js Applications
Real World Lessons on the Pain Points of Node.js ApplicationsReal World Lessons on the Pain Points of Node.js Applications
Real World Lessons on the Pain Points of Node.js Applications
 
Oleksandr Valetskyy - DI vs. IoC
Oleksandr Valetskyy - DI vs. IoCOleksandr Valetskyy - DI vs. IoC
Oleksandr Valetskyy - DI vs. IoC
 

Mehr von Cloudera, Inc.

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Hadoop: Code Injection, Distributed Fault Injection

  • 1. Hadoop: Code Injection Distributed Fault Injection Konstantin Boudnik Hadoop committer, Pig contributor cos@apache.org
  • 2. Few assumptions  The following work has been done with use of AOP injection technology called AspectJ  Similar results could be achieved with  direct code implementation  MOP (monkey-patching)  Direct byte-code manipulations  Offered approaches aren't limited by the scope of Hadoop platform ;)  The scope of the talk isn't about AspectJ nor AOP/MOP technology 2
  • 4. What for?  Some APIs as extremely useful as dangerous if made public  stop/blacklist a node or daemon  change a node configuration  certain functionality is experimental and needn't to be in production  a component's source code is unavailable  a build's re-spin isn't practical  many changes of the same nature need to be applied  your application doesn't have enough bugs yet 4
  • 5. Use cases  producing a build for developer's testing  simulate faults and test error recovery before deployment  to sneak-in to the production something your boss don't need to know 5
  • 6. Injecting away pointcut execGetBlockFile() : // the following will inject faults inside of the method in question execution (* FSDataset.getBlockFile(..)) && !within(FSDatasetAspects +); // This aspect specifies the logic of our fault point. // In this case it simply throws DiskErrorException before invoking // the method, specified by callGetBlockFile() pointcut before() throws DiskErrorException : execGetBlockFile() { if (ProbabilityModel.injectCriteria(FSDataset.class.getSimpleName())) { LOG.info("Before the injection point"); Thread.dumpStack(); throw new DiskErrorException("FI: injected fault point at " + thisJoinPoint.getStaticPart().getSourceLocation()); } } 6
  • 7. Injecting away (intercept & mock) pointcut callCreateUri() : call (URI FileDataServlet.createUri( String, HdfsFileStatus, UserGroupInformation, ClientProtocol, HttpServletRequest, String)); /** Replace host name with "localhost" for unit test environment. */ URI around () throws URISyntaxException : callCreateUri() { final URI original = proceed(); LOG.info("FI: original uri = " + original); final URI replaced = new URI(original.getScheme(), original.getUserInfo(), "localhost", original.getPort(), original.getPath(), original.getQuery(), original.getFragment()) ; LOG.info("FI: replaced uri = " + replaced); return replaced; } 7
  • 9. Why Fault Injection  Hadoop deals with many kinds of faults  Block corruption  Failures of disk, Datanode, Namenode, Clients, Jobtracker, Tasktrackers and Tasks  Varying rates of bandwidth and latency  These are hard to test  Unit tests mostly deal with specific single faults or patterns  Faults do not occur frequently and hard to reproduce  Need to inject fault in the real system (as opposed to a simulated system)  More info  http://wiki.apache.org/hadoop/HowToUseInjectionFramework 9
  • 10. Usage models  An actor configures a Hadoop cluster and “dials-in” a desired faults then runs a set of applications on the cluster.  Test the behavior of particular feature under faults  Test time and consistency of recovery at high rate of faults  Observe loss of data under certain pattern and frequency of faults  Observe performance/utilization  Note: can inject faults in the real system's (as opposed to a simulated system) running jobs  An actor write/reuse a unit/function test using the fault inject framework to introduce faults during the test  Recovery procedures testing (!) 10
  • 11. Fault examples (Hdfs)  Link/communication failure and communication corruption  Namenode to Datanode communication  Client to Datanode communications  Client to Namenode communications  Namenode related failures  General slow downs  Edit logs slow downs  NFS-mounted volume is slow or not responding  Datanode related failures  Hardware corruption and data failures  Storage latencies and bandwidth anomalies 11
  • 12. Fault examples (Mapreduce)  Task tracker  Lost task trackers  Tasks  Timeouts  Slow downs  Shuffle failures  Sort/merge failures  Local storage issues  JobTracker failures  Link communication failures and corruptions 12
  • 13. n Scale  Multi-hundred nodes cluster  Heterogeneous environment  OS. switches, secure/non-secure configurations  Multi-node faults scenarios (e.g. pipelines recovery)  Requires fault manager/dispensary  Support for multi-node, multi-conditions faults  Fault identification, reproducibility, repeatability  Infrastructure auto-discovery to avoid configuration complexities 13
  • 15. Client side pointcut execGetBlockFile() : // the following will inject faults inside of the method in question execution (* FSDataset.getBlockFile(..)) && !within(FSDatasetAspects +); before() throws DiskErrorException : execGetBlockFile() { ArrayList<GenericFault> pipelineFault = FiDispenser.getFaultsFor(FSDataset.class, FaultID.PipelineRecovery(), RANDOM); for (int i = 0; i < pipelineFault.size(); i++) { pipelineFault.get(i).execute(); } } Fault dispenser MachineGroup Rack1DataNodes = new MachineGroup(rack1, TYPE.DN) Rack1DataNodes.each { if (it.type == RANDOM) { it.setTimeout(random.nextInt(2000)) it.setType(DiskErrorException.class) it.setReport('logcollector.domain.com', SYSLOG) } } 15
  • 19. Goals  Write cluster-based tests using Java object model  Automate many types of tests on real clusters:  Functional  System  Load  Recovery  More information  http://wiki.apache.org/hadoop/HowToUseSystemTestFramework 19
  • 20. Main Features  Remote daemon Observability and Controllability APIs  Enables large cluster-based tests written in Java using JUnit (TestNG) framework  Herriot is comprised of a library of utility APIs, and code injections into Hadoop binaries  Assumes a deployed and instrumented cluster  Production build contains NO Herriot instrumentation  Supports fault injection 20
  • 21. Major design considerations  Common  RPC-based utilities to control remote daemons  Daemons belong to different roles  Remote process management from Java: start/stop, change/push configuration, etc.  HDFS and MR specific APIs on top of Common 21
  • 22. Common Features  Get a daemon (a remote Hadoop process) current configuration  Get a daemon process info: thread#, heap, environment…  Ping a daemon (make sure it’s up and ready)  Get/list FileStatus of a path from a remote daemon  Deamon Log Inspection: Grep remote logs, count exceptions…  Cluster setup/tear down; restart  Change a daemon(s) configuration, push new configs… 22
  • 23. Deployment Diagram DN/TT host VM Test host VM JT host VM NN host VM Test Injected Herriot library Injected code Herriot code injections 23