SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Diomin Aliaksey
R&D
2014, Minsk
OpenSource

Monitoring

Target Group

Apache Hadoop

Yes

X

Developers

Cloudera

Yes

Good

All

Hortonworks

Yes

Good

All

MapR

No

Bad

Enterprise

PivotalHD

No

Bad

Enterprise

3
How to find the bottleneck?

4
5
6
8
9
10
11
12
1. Increase size of cluster
2. Increase input block size
3. Increase buffer size

13
1. Increase size of cluster
2. Increase input block size
3. Increase buffer size

14
15
16
17
1. Increase size of cluster
2. Increase input block size
3. Increase buffer size

18
19
1. Increase size of cluster
2. Increase input block size
3. Increase buffer size

20
1. Compression

21
1. Compression
2. Combiner

22
Wordcount

Reduce function as Combine
combine 1:

<a, 1> <b, 1> <a, 1>

=> <a, 2> <b, 1>

combine 2:

<a, 1> <b, 1>

=> <a, 1> <b, 1>

Reduce:

<a, {1, 2}> <b, {1, 1}> => <a, 3> <b, 2>

23
Mean

combine 1: <k,40> <k,30> <k,20> =>

<k, 30>

combine 2: <k,2> <k,8>

=>

<k, 5>

Reduce:

=>

<k, 17.5>

<k, {30, 5}>

24
Mean

combine 1: <k,40> <k,30> <k,20> =>

<k, 30>

combine 2: <k,2> <k,8>

=>

<k, 5>

Reduce:

=>

<k, 17.5>

<k, {30, 5}>

(40 + 30 + 20 + 2 + 8)/5 = 17.5

25
Mean

combine 1:

<k,<40,1>> <k,<30,1>>, <k,<20,1>>

=>

<k, <90,3> >

<k,<2,1>> <k, <8,1>>

=>

<k, <10, 2> >

Reduce:

=>

<k, 20>

combine 2:

<k, {<90,3>, <10,2>} >

26
27

Weitere ähnliche Inhalte

Andere mochten auch

BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
Brendan Gregg
 

Andere mochten auch (11)

Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
Powered by the Sun
Powered by the SunPowered by the Sun
Powered by the Sun
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuning
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 

Ähnlich wie Hadoop Distributions: Bottlenecks and Tuning

Vcs slides on or 2014
Vcs slides on or 2014Vcs slides on or 2014
Vcs slides on or 2014
Shakti Ranjan
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
Noha Elprince
 

Ähnlich wie Hadoop Distributions: Bottlenecks and Tuning (11)

Обзор Hadoop-дистрибутивов. Тюнинг «узких мест» Hadoop
Обзор Hadoop-дистрибутивов. Тюнинг «узких мест» HadoopОбзор Hadoop-дистрибутивов. Тюнинг «узких мест» Hadoop
Обзор Hadoop-дистрибутивов. Тюнинг «узких мест» Hadoop
 
PostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tablesPostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tables
 
How we switched to columnar at SpendHQ
How we switched to columnar at SpendHQHow we switched to columnar at SpendHQ
How we switched to columnar at SpendHQ
 
Vcs slides on or 2014
Vcs slides on or 2014Vcs slides on or 2014
Vcs slides on or 2014
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
 
Big data meetup 2012 01-18 - stripped
Big data meetup 2012 01-18 - strippedBig data meetup 2012 01-18 - stripped
Big data meetup 2012 01-18 - stripped
 
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
 
SEMLA_logging_infra
SEMLA_logging_infraSEMLA_logging_infra
SEMLA_logging_infra
 
Big query - Command line tools and Tips - (MOSG)
Big query - Command line tools and Tips - (MOSG)Big query - Command line tools and Tips - (MOSG)
Big query - Command line tools and Tips - (MOSG)
 
COCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate AscentCOCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate Ascent
 
LalitBDA2015V3
LalitBDA2015V3LalitBDA2015V3
LalitBDA2015V3
 

Mehr von Altoros

Mehr von Altoros (20)

Maturing with Kubernetes
Maturing with KubernetesMaturing with Kubernetes
Maturing with Kubernetes
 
Kubernetes Platform Readiness and Maturity Assessment
Kubernetes Platform Readiness and Maturity AssessmentKubernetes Platform Readiness and Maturity Assessment
Kubernetes Platform Readiness and Maturity Assessment
 
Journey Through Four Stages of Kubernetes Deployment Maturity
Journey Through Four Stages of Kubernetes Deployment MaturityJourney Through Four Stages of Kubernetes Deployment Maturity
Journey Through Four Stages of Kubernetes Deployment Maturity
 
SGX: Improving Privacy, Security, and Trust Across Blockchain Networks
SGX: Improving Privacy, Security, and Trust Across Blockchain NetworksSGX: Improving Privacy, Security, and Trust Across Blockchain Networks
SGX: Improving Privacy, Security, and Trust Across Blockchain Networks
 
Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...
Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...
Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...
 
A Zero-Knowledge Proof: Improving Privacy on a Blockchain
A Zero-Knowledge Proof:  Improving Privacy on a BlockchainA Zero-Knowledge Proof:  Improving Privacy on a Blockchain
A Zero-Knowledge Proof: Improving Privacy on a Blockchain
 
Crap. Your Big Data Kitchen Is Broken.
Crap. Your Big Data Kitchen Is Broken.Crap. Your Big Data Kitchen Is Broken.
Crap. Your Big Data Kitchen Is Broken.
 
Containers and Kubernetes
Containers and KubernetesContainers and Kubernetes
Containers and Kubernetes
 
Distributed Ledger Technology for Over-the-Counter Trading
Distributed Ledger Technology for Over-the-Counter TradingDistributed Ledger Technology for Over-the-Counter Trading
Distributed Ledger Technology for Over-the-Counter Trading
 
5-Step Deployment of Hyperledger Fabric on Multiple Nodes
5-Step Deployment of Hyperledger Fabric on Multiple Nodes5-Step Deployment of Hyperledger Fabric on Multiple Nodes
5-Step Deployment of Hyperledger Fabric on Multiple Nodes
 
Deploying Kubernetes on GCP with Kubespray
Deploying Kubernetes on GCP with KubesprayDeploying Kubernetes on GCP with Kubespray
Deploying Kubernetes on GCP with Kubespray
 
UAA for Kubernetes
UAA for KubernetesUAA for Kubernetes
UAA for Kubernetes
 
Troubleshooting .NET Applications on Cloud Foundry
Troubleshooting .NET Applications on Cloud FoundryTroubleshooting .NET Applications on Cloud Foundry
Troubleshooting .NET Applications on Cloud Foundry
 
Continuous Integration and Deployment with Jenkins for PCF
Continuous Integration and Deployment with Jenkins for PCFContinuous Integration and Deployment with Jenkins for PCF
Continuous Integration and Deployment with Jenkins for PCF
 
How to Never Leave Your Deployment Unattended
How to Never Leave Your Deployment UnattendedHow to Never Leave Your Deployment Unattended
How to Never Leave Your Deployment Unattended
 
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and LogsCloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
 
Smart Baggage Tracking: End-to-End Sensor-Based Solution
Smart Baggage Tracking: End-to-End Sensor-Based SolutionSmart Baggage Tracking: End-to-End Sensor-Based Solution
Smart Baggage Tracking: End-to-End Sensor-Based Solution
 
Navigating the Ecosystem of Pivotal Cloud Foundry Tiles
Navigating the Ecosystem of Pivotal Cloud Foundry TilesNavigating the Ecosystem of Pivotal Cloud Foundry Tiles
Navigating the Ecosystem of Pivotal Cloud Foundry Tiles
 
AI as a Catalyst for IoT
AI as a Catalyst for IoTAI as a Catalyst for IoT
AI as a Catalyst for IoT
 
Over-Engineering: Causes, Symptoms, and Treatment
Over-Engineering: Causes, Symptoms, and TreatmentOver-Engineering: Causes, Symptoms, and Treatment
Over-Engineering: Causes, Symptoms, and Treatment
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Hadoop Distributions: Bottlenecks and Tuning

Hinweis der Redaktion

  1. вывод map, если в буфер не влазит то сброс на диск, потом merge-sort.в определенный момент 2х кратное превышение использования диска относительно вывода map
  2. данные гоняются по сети, нагрузка на io – disk read &amp; network
  3. вывод map, если в буфер не влазит то сброс на диск, потом merge-sort.в определенный момент 2х кратное превышение использования диска относительно вывода map
  4. Задачка: сколько записей и чтений на диск можно получить имея вывод X.идеально: X записали из map, X считали на этапе fetchсуровая реальность: write: X(spill) + X (merge-sort) + X (fetch/spill) = 3 Xread: X (merge-sort) + X (fetch) + X (toreducer) = 3 X
  5. Задачка: сколько записей и чтений на диск можно получить имея вывод X.идеально: X записали из map, X считали на этапе fetchсуровая реальность: write: X(spill) + X (merge-sort) + X (fetch/spill) = 3 Xread: X (merge-sort) + X (fetch) + X (toreducer) = 3 X
  6. увеличим количество машин в 2 раза, а заодно и в параметрах проставим в 2 раза больше map и reducemap и reduce =&gt; eachother =&gt; в 4 раза больше коннектов на получение данных =&gt; лимиты на обработку handlers, на самой датанодеВЫВОД: количество одновременно запущенных map/reduceинстансов должно определяться в первую очередь задачей, линейное масштабирование это сказка
  7. увеличим количество машин в 2 раза, а заодно и в параметрах проставим в 2 раза больше map и reducemap и reduce =&gt; eachother =&gt; в 4 раза больше коннектов на получение данных =&gt; лимиты на обработку handlers, на самой датанодеВЫВОД: количество одновременно запущенных map/reduceинстансов должно определяться в первую очередь задачей, линейное масштабирование это сказка
  8. 2) увеличим блок данных для map =&gt; выскочили за размеры буфера =&gt; лишний spill на диск =&gt; больше дискового io =&gt; все медленней. ВЫВОД: размер блока для обработки на вход map должен быть достаточно большим чтобы заполнить буфер, но не больше, иначе лишняя активность на диске
  9. 2) увеличим блок данных для map =&gt; выскочили за размеры буфера =&gt; лишний spill на диск =&gt; больше дискового io =&gt; все медленней. ВЫВОД: размер блока для обработки на вход map должен быть достаточно большим чтобы заполнить буфер, но не больше, иначе лишняя активность на диске
  10. 3) увеличим размер кеша на map/reduce =&gt; ограничения размера для буфера в jvm (больше 2х гб на массив не выделить)Тут уже ничего не поделать, нужно учитывать что у map/reduce функций есть свои лимиты и они легко достижимы
  11. компрессия =&gt; размен cpu на diskio =&gt; snappy, достаточно шустрое решение для потокового сжатия
  12. Combiner - не всегда возможно использовать в лоб (например мы считаем с помощью hive/pig) или у нас веселая функция
  13. incorrect
  14. incorrect
  15. правильное решение, но требует дополнительных манипуляций на всех уровнях: 1) меняем MapOutputFormat (в значении не просто число, а сумма свернутых чисел и количество чисел для получения текущей суммы)2) отдельная функция для Combine