SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
1 
Community 1.3.0 
(Optimize both Yarn & Non Yarn Hadoop clusters)
2 
Agenda 
• Big Data Trends 
• What is Jumbune? 
• Description of Components
3 
Big Data Trends 
Resource sharing/isolation 
frameworks: Yarn, Mesos, 
Shared cluster workers etc. 
(resources) 
Multiple Execution engines: 
MapReduce, Spark, Hama, 
Storm, Giraph, etc. 
Data ETLing from all 
possible sources to Data 
Lake
4 
Hadoop based solution life stages 
(as on ground) – Cyclic execution 
xxx 
xxx 
Business User Data Analyst MapReduce Dev 
Logic & Data Test 
Staging Data Devops 
Production 
Bad 
Logic? 
Resource 
Utilization ? 
Bad 
Data? 
Monitoring 
Needs
55 
Challenges in Analytical Solutions 
1. No common 
platform across 
actors to detect root 
causes 
2. Incremental 
imports may ingest 
bad data 
3. Cluster 
resources are 
shared and optimal 
utilization is key 
4. Implementing 
models in custom 
MR in initial 
attempts is like 
hitting bull’s eye 
5. Bad Logic or Bad 
data
6 
Intersecting solution Lifecycle Stages 
xxx 
xxx 
Solution 
Development Quality Test 
Bulk & Incremental Devops 
Data
7 
Jumbune 
“A catalyst to accelerate realization of analytical solutions” 
Data Validation Flow Analyzer Cluster Monitor Job Profiler
8 
Niche offerings 
• In depth code level analysis of cluster wide flow 
• Record level data violation reports. 
• No deployment on Workers - Ultra light agent installation on Hadoop master 
only 
• Ability to turn on/off cluster monitoring at will – lessens resource load 
• Customizable rack aware monitoring 
• Correlated profiling analysis of phases, throughput and resource consumption 
• Ability to work across all Hadoop Distributions
9 
Components - Recommended Environments 
Dev 
• Flow 
Debugger 
• Data 
Validation 
• MR Job 
Profiler 
QA 
• Data 
Validation 
Stage + Perf 
• MR Job 
Profiler 
Prod 
• Cluster 
Monitoring 
• Data 
Validation
10 
Supported Deployments 
Jumbune 
Azure, EC2 
All major distributions 
On Premise
11 
MapReduce Flow Debugger 
• Verifies the flow of input records in user’s map reduce implementation 
• Drill down visualization helps developer to quickly identify the problem. 
• Only tool to assist developers to figure out MapReduce implementation 
faults without any extra coding
12 
Data Validator 
• Validates inconsistencies in data in the form of : 
– Null checks 
– Data type checks 
– Regular expression checks 
• Generic way of specifying validation rules 
• Provides record level report for found anomalies 
• Currently supports HDFS as the lake file system
13 
MR Job Profiling 
• Per Job Phase wise 
– performance for each JVM 
– data flow rate 
– Resource usage 
• Per Job Heap sites for Mapper & Reducer 
• Per Job CPU cycles for Mapper & Reducer
14 
Hadoop Cluster Monitoring 
• Data Centre & Rack aware nodes view of Yarn and Non Yarn Daemons 
• Dynamic Interval based monitoring 
• Hadoop JMX, Node Resource Statistics 
• Per file, node wise replica Placement (which nodes have replicas of a given 
file ?) 
• HDFS data placement view (HDFS balanced ?)
15 
How we are building Jumbune?
16 
Let’s Collaborate  
Website 
• http://jumbune.org 
Contribute 
• http://github.com/impetus-opensource/jumbune 
• http://jumbune.org/jira/JUM 
Social 
• Follow @jumbune Use #jumbune 
• Jumbune Group: http://linkd.in/1mUmcYm 
Forums 
• Users: users-subscribe@collaborate.jumbune.org 
• Dev: dev-subscribe@collaborate.jumbune.org 
• Issues: issues-subscribe@collaborate.jumbune.org 
Downloads 
• http://jumbune.org 
• https://bintray.com/jumbune/downloads/jumbune
17 
Thanks

Weitere ähnliche Inhalte

Was ist angesagt?

Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-timeFlurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-timeTrieu Nguyen
 
Spark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit
 
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...viirya
 
Apache Drill (ver. 0.1, check ver. 0.2)
Apache Drill (ver. 0.1, check ver. 0.2)Apache Drill (ver. 0.1, check ver. 0.2)
Apache Drill (ver. 0.1, check ver. 0.2)Camuel Gilyadov
 
Pacemaker hadoop infrastructure and soft serve experience
Pacemaker   hadoop infrastructure and soft serve experiencePacemaker   hadoop infrastructure and soft serve experience
Pacemaker hadoop infrastructure and soft serve experienceVitaliy Bashun
 
Data Streaming For Big Data
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big DataSeval Çapraz
 
Boston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the EnterpriseBoston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the EnterpriseMatt Fuller
 
Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...
Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...
Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...Databricks
 
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic RepartitioningHandling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic RepartitioningSpark Summit
 
Assaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at ScaleAssaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at ScaleFlink Forward
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingGuozhang Wang
 
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)Uri Laserson
 
Automatic Scaling Iterative Computations
Automatic Scaling Iterative ComputationsAutomatic Scaling Iterative Computations
Automatic Scaling Iterative ComputationsGuozhang Wang
 
Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache ApexApache Apex
 
Next-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2msNext-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2msIlya Ganelin
 
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinFlink Forward
 
HBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsMichael Stack
 

Was ist angesagt? (20)

Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-timeFlurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
 
Spark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar Veliqi
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
 
Apache Drill (ver. 0.1, check ver. 0.2)
Apache Drill (ver. 0.1, check ver. 0.2)Apache Drill (ver. 0.1, check ver. 0.2)
Apache Drill (ver. 0.1, check ver. 0.2)
 
Pacemaker hadoop infrastructure and soft serve experience
Pacemaker   hadoop infrastructure and soft serve experiencePacemaker   hadoop infrastructure and soft serve experience
Pacemaker hadoop infrastructure and soft serve experience
 
Data Streaming For Big Data
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big Data
 
Boston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the EnterpriseBoston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the Enterprise
 
Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...
Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...
Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...
 
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic RepartitioningHandling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
 
Assaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at ScaleAssaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at Scale
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
 
Presto - SQL on anything
Presto  - SQL on anythingPresto  - SQL on anything
Presto - SQL on anything
 
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)
 
Automatic Scaling Iterative Computations
Automatic Scaling Iterative ComputationsAutomatic Scaling Iterative Computations
Automatic Scaling Iterative Computations
 
Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni Schiefer
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
Next-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2msNext-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2ms
 
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
 
HBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbms
 

Andere mochten auch

November 2013 HUG: Compute Capacity Calculator
November 2013 HUG: Compute Capacity CalculatorNovember 2013 HUG: Compute Capacity Calculator
November 2013 HUG: Compute Capacity CalculatorYahoo Developer Network
 
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceHadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceCloudera, Inc.
 
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your ApplicationHadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your ApplicationYahoo Developer Network
 
Profile hadoop apps
Profile hadoop appsProfile hadoop apps
Profile hadoop appsBasant Verma
 
Hadoop Monitoring best Practices
Hadoop Monitoring best PracticesHadoop Monitoring best Practices
Hadoop Monitoring best PracticesEdward Capriolo
 

Andere mochten auch (6)

November 2013 HUG: Compute Capacity Calculator
November 2013 HUG: Compute Capacity CalculatorNovember 2013 HUG: Compute Capacity Calculator
November 2013 HUG: Compute Capacity Calculator
 
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceHadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
 
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your ApplicationHadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
 
Profile hadoop apps
Profile hadoop appsProfile hadoop apps
Profile hadoop apps
 
HW09 Hadoop Vaidya
HW09 Hadoop VaidyaHW09 Hadoop Vaidya
HW09 Hadoop Vaidya
 
Hadoop Monitoring best Practices
Hadoop Monitoring best PracticesHadoop Monitoring best Practices
Hadoop Monitoring best Practices
 

Ähnlich wie Jumbune optimize hadoop-solutions

Teradata Loom Introductory Presentation
Teradata Loom Introductory PresentationTeradata Loom Introductory Presentation
Teradata Loom Introductory Presentationmlang222
 
OpenSource Big Data Platform - Flamingo Project
OpenSource Big Data Platform - Flamingo ProjectOpenSource Big Data Platform - Flamingo Project
OpenSource Big Data Platform - Flamingo ProjectBYOUNG GON KIM
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemInSemble
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Sparktsliwowicz
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impalamarkgrover
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.pptSathish24111
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics PlatformN Masahiro
 
Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)Thomas W. Dinsmore
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
Hadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFSHadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFSpraveen bhat
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoopch adnan
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdataTom Rogers
 
Java scalability considerations yogesh deshpande
Java scalability considerations   yogesh deshpandeJava scalability considerations   yogesh deshpande
Java scalability considerations yogesh deshpandeIndicThreads
 
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovRUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovBig Data Spain
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 

Ähnlich wie Jumbune optimize hadoop-solutions (20)

Teradata Loom Introductory Presentation
Teradata Loom Introductory PresentationTeradata Loom Introductory Presentation
Teradata Loom Introductory Presentation
 
OpenSource Big Data Platform - Flamingo Project
OpenSource Big Data Platform - Flamingo ProjectOpenSource Big Data Platform - Flamingo Project
OpenSource Big Data Platform - Flamingo Project
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impala
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Hadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFSHadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFS
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
 
Spark 1.0
Spark 1.0Spark 1.0
Spark 1.0
 
Java scalability considerations yogesh deshpande
Java scalability considerations   yogesh deshpandeJava scalability considerations   yogesh deshpande
Java scalability considerations yogesh deshpande
 
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovRUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
 
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 

Kürzlich hochgeladen

Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 

Kürzlich hochgeladen (20)

Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 

Jumbune optimize hadoop-solutions

  • 1. 1 Community 1.3.0 (Optimize both Yarn & Non Yarn Hadoop clusters)
  • 2. 2 Agenda • Big Data Trends • What is Jumbune? • Description of Components
  • 3. 3 Big Data Trends Resource sharing/isolation frameworks: Yarn, Mesos, Shared cluster workers etc. (resources) Multiple Execution engines: MapReduce, Spark, Hama, Storm, Giraph, etc. Data ETLing from all possible sources to Data Lake
  • 4. 4 Hadoop based solution life stages (as on ground) – Cyclic execution xxx xxx Business User Data Analyst MapReduce Dev Logic & Data Test Staging Data Devops Production Bad Logic? Resource Utilization ? Bad Data? Monitoring Needs
  • 5. 55 Challenges in Analytical Solutions 1. No common platform across actors to detect root causes 2. Incremental imports may ingest bad data 3. Cluster resources are shared and optimal utilization is key 4. Implementing models in custom MR in initial attempts is like hitting bull’s eye 5. Bad Logic or Bad data
  • 6. 6 Intersecting solution Lifecycle Stages xxx xxx Solution Development Quality Test Bulk & Incremental Devops Data
  • 7. 7 Jumbune “A catalyst to accelerate realization of analytical solutions” Data Validation Flow Analyzer Cluster Monitor Job Profiler
  • 8. 8 Niche offerings • In depth code level analysis of cluster wide flow • Record level data violation reports. • No deployment on Workers - Ultra light agent installation on Hadoop master only • Ability to turn on/off cluster monitoring at will – lessens resource load • Customizable rack aware monitoring • Correlated profiling analysis of phases, throughput and resource consumption • Ability to work across all Hadoop Distributions
  • 9. 9 Components - Recommended Environments Dev • Flow Debugger • Data Validation • MR Job Profiler QA • Data Validation Stage + Perf • MR Job Profiler Prod • Cluster Monitoring • Data Validation
  • 10. 10 Supported Deployments Jumbune Azure, EC2 All major distributions On Premise
  • 11. 11 MapReduce Flow Debugger • Verifies the flow of input records in user’s map reduce implementation • Drill down visualization helps developer to quickly identify the problem. • Only tool to assist developers to figure out MapReduce implementation faults without any extra coding
  • 12. 12 Data Validator • Validates inconsistencies in data in the form of : – Null checks – Data type checks – Regular expression checks • Generic way of specifying validation rules • Provides record level report for found anomalies • Currently supports HDFS as the lake file system
  • 13. 13 MR Job Profiling • Per Job Phase wise – performance for each JVM – data flow rate – Resource usage • Per Job Heap sites for Mapper & Reducer • Per Job CPU cycles for Mapper & Reducer
  • 14. 14 Hadoop Cluster Monitoring • Data Centre & Rack aware nodes view of Yarn and Non Yarn Daemons • Dynamic Interval based monitoring • Hadoop JMX, Node Resource Statistics • Per file, node wise replica Placement (which nodes have replicas of a given file ?) • HDFS data placement view (HDFS balanced ?)
  • 15. 15 How we are building Jumbune?
  • 16. 16 Let’s Collaborate  Website • http://jumbune.org Contribute • http://github.com/impetus-opensource/jumbune • http://jumbune.org/jira/JUM Social • Follow @jumbune Use #jumbune • Jumbune Group: http://linkd.in/1mUmcYm Forums • Users: users-subscribe@collaborate.jumbune.org • Dev: dev-subscribe@collaborate.jumbune.org • Issues: issues-subscribe@collaborate.jumbune.org Downloads • http://jumbune.org • https://bintray.com/jumbune/downloads/jumbune