SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
Apache HAMA: An Introduction to
Bulk Synchronization Parallel on Hadoop           (Ver 0.1)



         Hyunsik Choi <hyunsik.choi@gmail.com>,
        Edward J. Yoon <edwardyoon@apache.org>,


            Apache Hama Team
Table Of Contents

•   An Overview of the BSP Model
•   Why the BSP on Hadoop?
•   A Comparison to M/R Programming Model
•   What Applications are Appropriate for BSP?
•   The BSP Implementation on Hadoop
•   Some Examples on BSP
•   What's Next?
An Overview of the BSP Model

The BSP (Bulk Synchronous Parallel) is a parallel
computational model that performs a series supersteps.

.   A superstep consists of three ordered stages:
    • Concurrent Computation: Computation on locally stored data
    • Communication: Send and Receive messages in a point-to-point manner
    • Barrier Synchronization: Wait and Synchronize all processors at end of
      superstep

A BSP system is composed of a number of networked computers with
both local memory and disk.

More detailed information
    • http://en.wikipedia.org/wiki/Bulk_synchronous_parallel
A Vertical Structure of Superstep
Why the BSP on Hadoop?

We want to use Hadoop cluster for more complicated
computations
 • But, the MapReduce is not easy to program complex
   mathematical and graphcial operations
   o   e.g., Matrix Multiply, Page-Rank, Breadth-First Search, ..., etc
• Preserving data locality is an important factor in
  distributed/parallel environments.
   o   Data locality is also important in MapReduce for efficient processing.
• However, MapReduce frameworks do not preserve data
  locality in consecutive operations due to its inherent natures.
A Comparison to M/R Programming model

Map/Reduce does any filtering and/or transformations while
each mapper is reading input data split by InputFormat.
 • Simple API
   o   Map and Reduce
• Automatic parallelization and distribution
• During MapReduce processing, it generally passes input data
  through either many passes of MapReduce or MapReduce iteration
  in order to derive final results.
• Partitioner is too simple
   o MR does not provide an way to preserve data locality either a transition
     from Map to Reduce or between MapReduce iteration.
   o It incurs not only intensive communication cost but also unnecessary
     processing cost.
A Comparison to M/R Programming model

The BSP is tailored towards processing data with locality.
 • The BSP enables each peer to communicate only necessary
   data to other peers while peers are preserving data locality.
 • Simple API and Easy Synchronization
 • Like MapReduce, it makes programs to be automatically
   parallelized and distributed across cluster nodes.
What Applications are Appropriate for BSP?

• BSP will show its innate capability in applications with
  following characteristics:
   o Important to data locality
   o Processing data with complicated relations
   o Using many iterations and recursions
The BSP Implementation on Hadoop

• Written in Java
• The BSP package is now available in the Hama repository.
  o   Implementation available for Hadoop version greater than 0.20.x
  o   Allows to develop new applications
• Hadoop RPC is used for BSP peers to communicate each
  other.
• Barrier Sync mechanism is helped by zookeeper.
Serialize Printing of Hello World (shows synchronization)

// BSP start with 10 threads.
// Each thread will have a shuffled ID number from 0 to 9.

for (int i = 0; i < numberOfPeer; i++) {
    if (myId == i) {
      System.out.println("Hello BSP world from " + i + " of " + numberOfPeer);
    }

    peer.sync();
}

// BSP end
Breath-first Search on Graph

public void expand(Map<Vertex, Integer> input, Map<Vertex, Integer> nextQueue) {
   for (Map.Entry<Vertex, Integer> e : input.entrySet()) {
     Vertex vertex = e.getKey();
     if (vertex.isVisited() == true) { // only visit a vertex that has never been visited.
       continue;
     } else {
       vertex.visit();
     }

        // Put vertices adjacent to current vertex into nextQueue with increment distance.
        for (Integer i : vertex.getAdjacent()) {
          if (needToVisit(i, vertex, input.get(e.getKey()))) {
            nextQueue.put(getVertexByKey(i), e.getValue() + 1);
          }
        }
    }
}
Breath-first Search on Graph

public void process() {
   currentQueue = new HashMap<Vertex, Integer>();
   nextQueue = new HashMap<Integer, Integer>();
   currentQueue.put(startVertex,0); // initially put a start vertex into the current queue.

    while (true) {
     expand(currentQueue, nextQueue);

         // The peer.sync method can determine if certain vertex resides
         // on local disk according to vertex id.
         peer.sync(nextQueue); // Synchronize nextQueue with other peers.
                            // At this step, the peer communicates vertex IDs corresponding to
                            // vertices that resides on remote peers.
        if (peer.state == TERMINATE)
          break;
        // Convert nextQueue that contains vertex IDs into currentQueue queue that contains vertices.
        currentQueue = convertQueue(nextQueue);
        nextQueue = new HashMap<Vertex, Integer>();
    }
}
What's Next?

• Novel matrix computation algorithms using BSP
• Angrapa
  o   A graph computation framework based on BSP
  o   API familiar with graph features
• More improved fault tolerance for BSP
  o   Now, BSP has poor fault tolerance mechanisms.
• Support collective and compressed communication
  mechanisms for high performance computing
Who We Are
• Apache Hama Project Team
  o (http://incubator.apache.org/hama)
• Mailing List
  o   hama-user@incubator.apache.org
  o   hama-dev@incubator.apache.org

Weitere ähnliche Inhalte

Was ist angesagt?

Spark Summit EU talk by Reza Karimi
Spark Summit EU talk by Reza KarimiSpark Summit EU talk by Reza Karimi
Spark Summit EU talk by Reza KarimiSpark Summit
 
GoFFish - A Sub-graph centric framework for large scale graph analytics
GoFFish - A Sub-graph centric framework for large scale graph analyticsGoFFish - A Sub-graph centric framework for large scale graph analytics
GoFFish - A Sub-graph centric framework for large scale graph analyticscharithwiki
 
VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC
VENUS: Vertex-Centric Streamlined Graph Computation on a Single PCVENUS: Vertex-Centric Streamlined Graph Computation on a Single PC
VENUS: Vertex-Centric Streamlined Graph Computation on a Single PCQin Liu
 
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...Spark Summit
 
Download It
Download ItDownload It
Download Itbutest
 
Spark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit
 
Working Experience_V5.0
Working Experience_V5.0Working Experience_V5.0
Working Experience_V5.0Danny Lai
 
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Ram Sriharsha
 
Map Reduce
Map ReduceMap Reduce
Map Reduceschapht
 
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...DataStax Academy
 
Streaming sql w kafka and flink
Streaming sql w  kafka and flinkStreaming sql w  kafka and flink
Streaming sql w kafka and flinkKenny Gorman
 
Data science on big data. Pragmatic approach
Data science on big data. Pragmatic approachData science on big data. Pragmatic approach
Data science on big data. Pragmatic approachPavel Mezentsev
 
Writing an Interactive Interface for SQL on Flink
Writing an Interactive Interface for SQL on FlinkWriting an Interactive Interface for SQL on Flink
Writing an Interactive Interface for SQL on FlinkEventador
 
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...Flink Forward
 
How c++ stored in ram
How c++ stored in ramHow c++ stored in ram
How c++ stored in ramAshok Raj
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit
 

Was ist angesagt? (20)

Spark Summit EU talk by Reza Karimi
Spark Summit EU talk by Reza KarimiSpark Summit EU talk by Reza Karimi
Spark Summit EU talk by Reza Karimi
 
GoFFish - A Sub-graph centric framework for large scale graph analytics
GoFFish - A Sub-graph centric framework for large scale graph analyticsGoFFish - A Sub-graph centric framework for large scale graph analytics
GoFFish - A Sub-graph centric framework for large scale graph analytics
 
VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC
VENUS: Vertex-Centric Streamlined Graph Computation on a Single PCVENUS: Vertex-Centric Streamlined Graph Computation on a Single PC
VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC
 
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
 
Apache MXNet AI
Apache MXNet AIApache MXNet AI
Apache MXNet AI
 
Download It
Download ItDownload It
Download It
 
Spark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef Habdank
 
Working Experience_V5.0
Working Experience_V5.0Working Experience_V5.0
Working Experience_V5.0
 
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
 
Streaming sql w kafka and flink
Streaming sql w  kafka and flinkStreaming sql w  kafka and flink
Streaming sql w kafka and flink
 
Data science on big data. Pragmatic approach
Data science on big data. Pragmatic approachData science on big data. Pragmatic approach
Data science on big data. Pragmatic approach
 
Scalding intro 20141125
Scalding intro 20141125Scalding intro 20141125
Scalding intro 20141125
 
Helm
HelmHelm
Helm
 
Writing an Interactive Interface for SQL on Flink
Writing an Interactive Interface for SQL on FlinkWriting an Interactive Interface for SQL on Flink
Writing an Interactive Interface for SQL on Flink
 
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
 
How c++ stored in ram
How c++ stored in ramHow c++ stored in ram
How c++ stored in ram
 
Hadoop - Apache Pig
Hadoop - Apache PigHadoop - Apache Pig
Hadoop - Apache Pig
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
 

Ähnlich wie Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop

Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine OverviewKunal Gupta
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processingsscdotopen
 
Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Soumee Maschatak
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online trainingHarika583
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkJames Chen
 
Cascading on starfish
Cascading on starfishCascading on starfish
Cascading on starfishFei Dong
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigDataThanusha154
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map ReduceUrvashi Kataria
 
Finalprojectpresentation
FinalprojectpresentationFinalprojectpresentation
FinalprojectpresentationSANTOSH WAYAL
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Provectus
 
Hadoop Summit 2010 Benchmarking And Optimizing Hadoop
Hadoop Summit 2010 Benchmarking And Optimizing HadoopHadoop Summit 2010 Benchmarking And Optimizing Hadoop
Hadoop Summit 2010 Benchmarking And Optimizing HadoopYahoo Developer Network
 
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfimpalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfssusere05ec21
 
Hadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigHadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigKhanKhaja1
 
HBase and Hadoop at Urban Airship
HBase and Hadoop at Urban AirshipHBase and Hadoop at Urban Airship
HBase and Hadoop at Urban Airshipdave_revell
 
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop EcosystemUnveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystemmashoodsyed66
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.pptSathish24111
 

Ähnlich wie Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop (20)

Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine Overview
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processing
 
Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online training
 
Apache Crunch
Apache CrunchApache Crunch
Apache Crunch
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
 
Cascading on starfish
Cascading on starfishCascading on starfish
Cascading on starfish
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigData
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
Finalprojectpresentation
FinalprojectpresentationFinalprojectpresentation
Finalprojectpresentation
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
 
Hadoop Summit 2010 Benchmarking And Optimizing Hadoop
Hadoop Summit 2010 Benchmarking And Optimizing HadoopHadoop Summit 2010 Benchmarking And Optimizing Hadoop
Hadoop Summit 2010 Benchmarking And Optimizing Hadoop
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfimpalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
 
Hadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigHadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pig
 
HBase and Hadoop at Urban Airship
HBase and Hadoop at Urban AirshipHBase and Hadoop at Urban Airship
HBase and Hadoop at Urban Airship
 
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop EcosystemUnveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 

Kürzlich hochgeladen

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Kürzlich hochgeladen (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop

  • 1. Apache HAMA: An Introduction to Bulk Synchronization Parallel on Hadoop (Ver 0.1) Hyunsik Choi <hyunsik.choi@gmail.com>, Edward J. Yoon <edwardyoon@apache.org>, Apache Hama Team
  • 2. Table Of Contents • An Overview of the BSP Model • Why the BSP on Hadoop? • A Comparison to M/R Programming Model • What Applications are Appropriate for BSP? • The BSP Implementation on Hadoop • Some Examples on BSP • What's Next?
  • 3. An Overview of the BSP Model The BSP (Bulk Synchronous Parallel) is a parallel computational model that performs a series supersteps. . A superstep consists of three ordered stages: • Concurrent Computation: Computation on locally stored data • Communication: Send and Receive messages in a point-to-point manner • Barrier Synchronization: Wait and Synchronize all processors at end of superstep A BSP system is composed of a number of networked computers with both local memory and disk. More detailed information • http://en.wikipedia.org/wiki/Bulk_synchronous_parallel
  • 4. A Vertical Structure of Superstep
  • 5. Why the BSP on Hadoop? We want to use Hadoop cluster for more complicated computations • But, the MapReduce is not easy to program complex mathematical and graphcial operations o e.g., Matrix Multiply, Page-Rank, Breadth-First Search, ..., etc • Preserving data locality is an important factor in distributed/parallel environments. o Data locality is also important in MapReduce for efficient processing. • However, MapReduce frameworks do not preserve data locality in consecutive operations due to its inherent natures.
  • 6. A Comparison to M/R Programming model Map/Reduce does any filtering and/or transformations while each mapper is reading input data split by InputFormat. • Simple API o Map and Reduce • Automatic parallelization and distribution • During MapReduce processing, it generally passes input data through either many passes of MapReduce or MapReduce iteration in order to derive final results. • Partitioner is too simple o MR does not provide an way to preserve data locality either a transition from Map to Reduce or between MapReduce iteration. o It incurs not only intensive communication cost but also unnecessary processing cost.
  • 7. A Comparison to M/R Programming model The BSP is tailored towards processing data with locality. • The BSP enables each peer to communicate only necessary data to other peers while peers are preserving data locality. • Simple API and Easy Synchronization • Like MapReduce, it makes programs to be automatically parallelized and distributed across cluster nodes.
  • 8. What Applications are Appropriate for BSP? • BSP will show its innate capability in applications with following characteristics: o Important to data locality o Processing data with complicated relations o Using many iterations and recursions
  • 9. The BSP Implementation on Hadoop • Written in Java • The BSP package is now available in the Hama repository. o Implementation available for Hadoop version greater than 0.20.x o Allows to develop new applications • Hadoop RPC is used for BSP peers to communicate each other. • Barrier Sync mechanism is helped by zookeeper.
  • 10. Serialize Printing of Hello World (shows synchronization) // BSP start with 10 threads. // Each thread will have a shuffled ID number from 0 to 9. for (int i = 0; i < numberOfPeer; i++) { if (myId == i) { System.out.println("Hello BSP world from " + i + " of " + numberOfPeer); } peer.sync(); } // BSP end
  • 11. Breath-first Search on Graph public void expand(Map<Vertex, Integer> input, Map<Vertex, Integer> nextQueue) { for (Map.Entry<Vertex, Integer> e : input.entrySet()) { Vertex vertex = e.getKey(); if (vertex.isVisited() == true) { // only visit a vertex that has never been visited. continue; } else { vertex.visit(); } // Put vertices adjacent to current vertex into nextQueue with increment distance. for (Integer i : vertex.getAdjacent()) { if (needToVisit(i, vertex, input.get(e.getKey()))) { nextQueue.put(getVertexByKey(i), e.getValue() + 1); } } } }
  • 12. Breath-first Search on Graph public void process() { currentQueue = new HashMap<Vertex, Integer>(); nextQueue = new HashMap<Integer, Integer>(); currentQueue.put(startVertex,0); // initially put a start vertex into the current queue. while (true) { expand(currentQueue, nextQueue); // The peer.sync method can determine if certain vertex resides // on local disk according to vertex id. peer.sync(nextQueue); // Synchronize nextQueue with other peers. // At this step, the peer communicates vertex IDs corresponding to // vertices that resides on remote peers. if (peer.state == TERMINATE) break; // Convert nextQueue that contains vertex IDs into currentQueue queue that contains vertices. currentQueue = convertQueue(nextQueue); nextQueue = new HashMap<Vertex, Integer>(); } }
  • 13. What's Next? • Novel matrix computation algorithms using BSP • Angrapa o A graph computation framework based on BSP o API familiar with graph features • More improved fault tolerance for BSP o Now, BSP has poor fault tolerance mechanisms. • Support collective and compressed communication mechanisms for high performance computing
  • 14. Who We Are • Apache Hama Project Team o (http://incubator.apache.org/hama) • Mailing List o hama-user@incubator.apache.org o hama-dev@incubator.apache.org