SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Downloaden Sie, um offline zu lesen
Benchmarking Hadoop
with ALOJA
Oct 6, 2015
by Nicolas Poggi @ni_po
sudoers Barcelona:
About Nicolas Poggi @ni_po
Work:
Education:
Community:
Agenda
 Intro on Hadoop
 Current scenario and problematic
 ALOJA project
 Open source tools
 Benchmarking DEMO
 Results
 DEMO results online
 Open questions and comments
Intro: Hadoop design and ecosystem
Hadoop design
 Hadoop designed to solve complex data
 Structured and non structured
 With [close to] linear scalability
 Simplifying the programming model
 From MPI, OpenMP, CUDA, …
 Operates as a blackbox for data analysts
Image source: Hadoop, the definitive guide
Hadoop parameters
 > 100+ tunable parameters
 mapred.map/reduce.tasks.speculative.execution
 obscure and interrelated
 io.sort.mb 100 (300)
 io.sort.record.percent 5% (15%)
 io.sort.spill.percent 80% (95 – 100%)
 Number of Mappers and Reducers
 Rule of thumb 0.5 - 2 per CPU core
Hadoop stack for tuning
Image source: Intel® Distribution for Apache Hadoop
Hadoop highly-scalable but…
 Not a high-performance solution!
 Requires
 Design,
 Clusters, topology clusters
 Setup,
 OS, Hadoop config
 and tuning required
 Iterative approach
 Time consuming
 And extensive benchmarking!
Hadoop ecosystem
 Large and spread
 Dominated by big players
 Custom patches
 Default values not ideal
 Product claims
 Cloud vs. On-premise
 IaaS
 PaaS
 EMR, HDInsight
 Needs standardization
and auditing!
DATA
Product claims
 Needs auditing!
Too many choices?
Remote volumes
-
-
Rotational HDDs
JBODs
Large VMs
Small VMs
GbEthernet
InfiniBand
RAID
Cost
Performance
On-Premise
Cloud
And where is my system
configurationpositionedon
each of these axes?
Highavailability
Replication
+
+
Project ALOJA
 Open initiative to produce mechanisms for an
 automated characterization of cost-effectiveness
 of Big Data deployments
 Results from of a growing need of the community to
understand job execution details and create transparency
 Explore different configuration deployment options and
their tradeoffs
 Both software and hardware
 Cloud services and on-premise
 Seeks to provide knowledge, tools, and an online service
 to with which users make better informed decisions
 reduce the TCO for their Big Data infrastructures
 Guide the future development and deployment of Big Data clusters
and applications
Challenges, options, and implementation
Challenges (circa end 2013)
 Test different clusters architectures
 On-premise
 Commodity, high-end, appliance, low-power
 Cloud IaaS
 32 different VMs in Azure, similar in other
providers
 Cloud PaaS
 HDInsight, EMR, CloudBigData
 Different access level
 Full admin, user-only, request-to-install,
everything ready, queuing systems (SGE)
 Different versions
 Hadoop, JVM, Spark, Hive, etc…
 Dev environments and testing
 Big Data usually requires a cluster to
develop and test
Benchmarking vs. Production envs
 Need to compare different executions
 Not how the systems are doing now
 This is the main diff with prod products
 Dada does not change (non-OLTP)
 Temporary data for benchmarks vs. Important data
 Fast iteration vs. Reliability
 Iterates configurations vs. fixed config
 Many fast, experimental changes
 Security can be relaxed
 Management for Hadoop
 Vendor lock-in
 Lack of systems support(azure, on-prem, low-power)
 Hadoop is our use case, not the only one
 Leave no traces on the benchmarked system
Available options: (circa end 2013)
 Deployment
 jclouds
 foreman
 Puppet
 Ambari
 Config and deploy
 Ambari (hadoop only)
 Use Configuration
Management (CM)
 Puppet, chef, ansible…
 Monitoring
 Ganglia, Zabbix
 Amabari
 Cloudera Manager
 Kibana, GraphD…
 Problems
 All systems thoughfor PROD
 Not for comparison
 No Azure support
 Many different packages
 No one-fits-all solution
 Solution
 Custom implementation
 Based in simple components
 Wrapping commands
ALOJA Platform main components
2 Online Repository
•Explore results
•Execution details
•Cluster details
•Costs
•Data sharing
3 Web Analytics
•Data views and evaluations
•Aggregates
•Abstracted Metrics
•Job characterization
•Machine Learning
•Predictions and clustering
1 Big Data Benchmarking
•Deploy & Provision
•Conf Management
•Parameter selection & Queuing
•Perf counters
•Low-level instrumentation
•App logs
17
NGINX, PHP, MySQL
BASH, Unix tools, CLIs R, SQL, JS
Workflow in ALOJA
Cluster(s)
definition
• VM sizes
• # nodes
• OS, disks
• Capabilities
Execution
plan
• Start cluster
• Exec Benchmarks
• Gather results
• Cleanup
Import
data
• Convert perf metric
• Parse logs
• Import into DB
Evaluate
data
• Data views in Vagrant VM
• Or http://hadoop.bsc.es
PA and KD
•Predictive
Analytics
•Knowledge
Discovery
Historic
Repo
(in progress)
Cluster and node definitions
Clusters (Azure example) Node (Web in Rackspace)
#load AZURE defaults
source "$CONF_DIR/azure_defaults.conf"
clusterName="al-08"
numberOfNodes="8"
vmSize=“Large”
#details
vmCores="4"
vmRAM="7" #in GB
#costs
clusterCostHour="1.584"#0.176 * 9
clusterType="IaaS"
clusterDescription="A3 type VMs"
#load node defaults
source “$CONF_DIR/node_defaults.conf"
defaultProvider="rackspace"
vm_name="aloja-web"
vmSize='io1-30'
attachedVolumes="2"
diskSize="1023"
# Node roles (install functions)
extraLocalCommands="
vm_install_webserver;
vm_install_repo 'provider/rackspace';
install_ganglia_gmond;
config_ganglia_gmond 'aloja-web-rackspace' 'aloja-
web';
install_percona /scratch/attached/2/mysql;"
Commands and providers
Provisioning commands Providers
 Connect
 Node and Cluster
 Uses SSH proxies
automatically
 Deploy
 Start, Stop
 Delete
 Nodes and clusters
 On-premise
 Custom settings for
clusters
 Multiple disk types
 Different architectures
 Cloud IaaS
 Azure, OpenStack,
Rackspace, AWS (testing)
 Cloud PaaS
 HDInsight, CloudBigData,
EMR soon
Code at: https://github.com/Aloja/aloja/tree/master/aloja-deploy
Running benchmarks in ALOJA
 Example of submitting a job to run:
 https://github.com/Aloja/aloja/blob/master/aloja-bench/run_benchs.sh
 To queue jobs and control results:
 https://github.com/Aloja/aloja/blob/master/shell/exeq.sh
Benchmarking results
ALOJA Online Benchmark Repository
 Entry point for explorethe results collected from the executions
 Index of executions
 Quick glance of executions
 Searchable,Sortable
 Execution details
 Performance chartsandhistograms
 Hadoopcounters
 Jobsand taskdetails
 Data management of benchmark executions
 Data importing from different clusters
 Execution validation
 Data management and backup
 Cluster definitions
 Cluster capabilities (resources)
 Cluster costs
 Sharing results
 Download executions
 Add external executions
 Documentation and References
 Papers, links, and feature documentation
Availableat: http://aloja.bsc.es
Impact of SW configurations in Speedup
(4 node clusters)
Number of mappers Compression algorithm
No comp.
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using: http://hadoop.bsc.es/configimprovement
Details: https://raw.githubusercontent.com/Aloja/aloja/master/publications/BSC-MSR_ALOJA.pdf
Impact of HW configurationsin Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes
/tmp local
2 Remotes
/tmp local
1 Remotes
/tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using: http://hadoop.bsc.es/configimprovement
Details: https://raw.githubusercontent.com/Aloja/aloja/master/publications/BSC-MSR_ALOJA.pdf
Speedup: all disk configurationsSSD vs JBOD
 For DFSIOEread, DFSIOEwrite, and Terasort
URL:
http://hadoop.bsc.es/configimprovement?datefrom=&dateto=&benchs%5B%5D=dfsioe_read&benchs%5B%5D=dfsioe_write&benchs%5B%5D=terasort&id_clusters%5B%5D=21&nets%5B%5D=None&disks%5B%5D=HD2&disks%5B%5D=H
D3&disks%5B%5D=HD4&disks%5B%5D=HD5&disks%5B%5D=HDD&disks%5B%5D=HS5&disks%5B%5D=RL1&disks%5B%5D=RL2&disks%5B%5D=RL3&disks%5B%5D=RL4&disks%5B%5D=RL5&disks%5B%5D=RL6&disks%5B%5D=RR1&disks%
5B%5D=SS2&disks%5B%5D=SSD&mapss%5B%5D=None&comps%5B%5D=None&replications%5B%5D=None&blk_sizes%5B%5D=None&iosfs%5B%5D=None&iofilebufs%5B%5D=None&datanodess%5B%5D=None&bench_types%5B%5D=H
DI&bench_types%5B%5D=HiBench&vm_sizes%5B%5D=None&vm_coress%5B%5D=None&vm_RAMs%5B%5D=None&hadoop_versions%5B%5D=None&types%5B%5D=None&filters%5B%5D=valid&filters%5B%5D=filters&allunchecked=
2 SSDs
5 SATA
1 SSD /tmp
1 SSD
1 SATA
2 SATA
3 SATA
4 SATA
5 SATA
Higherisbetter
Fastest config
Highcapacity and fast
Highcapacity but slow
Speedup by disk configuration in the Cloud
(higher is better)
URL
http://104.130.159.92/configimprovement?benchs%5B%5D=terasort&disks%5B%5D=HDD&disks%5B%5D=RL1&disks%5B%5D=RL2&disks%5B%5D=RL3&disks%5B%5D=RR1&disks%5B%5D=RR2&disk
s%5B%5D=RR3&disks%5B%5D=RR4&disks%5B%5D=RR5&disks%5B%5D=RR6&disks%5B%5D=RS1&disks%5B%5D=RS6&disks%5B%5D=SSD&bench_types%5B%5D=HiBench&filters%5B%5D=valid&filt
ers%5B%5D=filters&allunchecked=&selected-groups=disk&datefrom=&dateto=&minexetime=150&maxexetime=1500
1-6 remotes
1 and 6
remotes with
/tmp on SSD
SSD only
Higherisbetter
VM Size comparison(Azure) Lower is better
Preview: Cost/Performance Scalability
 This shows a sample of a new screen (with sample data) to find the most cost-
effective cluster size
 X axis number of datanodes (cluster size
 Left Y Execution time (lower is better)
 Right Y Execution cost
Execution time Execution cost
Recommendedsize
InfiniBand + SDD (LOCAL)
GbE SDD + (LOCAL) CLOUD (local disk/tmpand HDFS)
CLOUD (/tmpinLocal Disk, HDFSin Blob storage 1-3
devices)
CLOUD (/tmpandHDFSin Blob storage
1-3 devices)
InfiniBand + SATA disks (LOCAL)
GbE+ SATA disks (LOCAL)
Price
Performance
Cost-effectiveness On-premise vs. Cloud)
Details at: https://raw.githubusercontent.com/Aloja/aloja/master/publications/BSC-MSR_ALOJA.pdf
Open questions:
is BASH good enough?
PROs CONs and Alternatives
 Simple and Fast
 Well known
 (basics at least)
 Easy to hack
 Most of the work
requires running sys
commands
 Custom implementation
problems
 Missing some systems
 Too simple, missing:
 objects, inheritance,
types, data structures,
testing
 Python? Perl?
 Puppet? Ansible?
 We’ll stick to bash for
now..
 What’s missing for
incubating in Apache?
More info:
 ALOJA Benchmarking platform and online repository
 http://aloja.bsc.es
 Benchmarking Big Data by Nicolas Poggi
 http://www.slideshare.net/ni_po/benchmarking-hadoop
 Big Data Benchmarking Community (BDBC) mailing list
 (~200 members from ~80organizations)
 http://clds.sdsc.edu/bdbc/community
 Workshop Big Data Benchmarking (WBDB)
 Next: http://clds.sdsc.edu/wbdb2015.ca
 SPEC Research Big Data working group
 http://research.spec.org/working-groups/big-data-working-group.html
 Slides and video:
 Michael Frank on Big Data benchmarking
 http://www.tele-task.de/archive/podcast/20430/
 Tilmann Rabl BigData BenchmarkingTutorial
 http://www.slideshare.net/tilmann_rabl/ieee2014-tutorialbarurabl
@BDOOP_BCN
More info: http://aloja.bsc.es
or join BDOOP group
http://www.meetup.com/Barcelona-BigData-Perfomance-and-
Operations
Oct 06, 2015

Weitere ähnliche Inhalte

Was ist angesagt?

Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)Nicolas Poggi
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...DataWorks Summit/Hadoop Summit
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)Nicolas Poggi
 
Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng
Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin ChengPinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng
Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin ChengCeph Community
 
Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges DataWorks Summit
 
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark ClustersTensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark ClustersDataWorks Summit
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingGreat Wide Open
 
The state of Spark in the cloud
The state of Spark in the cloudThe state of Spark in the cloud
The state of Spark in the cloudNicolas Poggi
 
Big Data Technology on Red Hat Enterprise Linux: OpenJDK vs. Oracle JDK
Big Data Technology on Red Hat Enterprise Linux: OpenJDK vs. Oracle JDKBig Data Technology on Red Hat Enterprise Linux: OpenJDK vs. Oracle JDK
Big Data Technology on Red Hat Enterprise Linux: OpenJDK vs. Oracle JDKPrincipled Technologies
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopDataWorks Summit
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 

Was ist angesagt? (20)

Big Data Benchmarking
Big Data BenchmarkingBig Data Benchmarking
Big Data Benchmarking
 
Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)
 
Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng
Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin ChengPinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng
Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng
 
Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark ClustersTensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
 
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
 
The state of Spark in the cloud
The state of Spark in the cloudThe state of Spark in the cloud
The state of Spark in the cloud
 
Big Data Technology on Red Hat Enterprise Linux: OpenJDK vs. Oracle JDK
Big Data Technology on Red Hat Enterprise Linux: OpenJDK vs. Oracle JDKBig Data Technology on Red Hat Enterprise Linux: OpenJDK vs. Oracle JDK
Big Data Technology on Red Hat Enterprise Linux: OpenJDK vs. Oracle JDK
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On TezFebruary 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
 
February 2014 HUG : Hive On Tez
February 2014 HUG : Hive On TezFebruary 2014 HUG : Hive On Tez
February 2014 HUG : Hive On Tez
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Inferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on SparkInferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on Spark
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
 

Andere mochten auch

EE 305 Project_1 The Effective External Defibrillators
EE 305 Project_1 The Effective External Defibrillators EE 305 Project_1 The Effective External Defibrillators
EE 305 Project_1 The Effective External Defibrillators kehali Haileselassie
 
The avanti group sharp turn for electronics company
The avanti group sharp turn for electronics companyThe avanti group sharp turn for electronics company
The avanti group sharp turn for electronics companyApplecherr McDougal
 
Bahasa indonesia teks laporan hasil observasi
Bahasa indonesia teks laporan hasil observasiBahasa indonesia teks laporan hasil observasi
Bahasa indonesia teks laporan hasil observasiSri Utanti
 
스마트큐 발표자료 김재우
스마트큐 발표자료 김재우스마트큐 발표자료 김재우
스마트큐 발표자료 김재우JaeWoo Kim
 
Trainer David Cruuz Slimmers Testimonies
Trainer David Cruuz Slimmers TestimoniesTrainer David Cruuz Slimmers Testimonies
Trainer David Cruuz Slimmers TestimoniesTrainerDavid
 
Texture powerpoint final
Texture powerpoint finalTexture powerpoint final
Texture powerpoint finalkphan22
 
Keeping Pressure Vessels Safe with the Sharck™  Probe
Keeping Pressure Vessels Safe with the Sharck™  ProbeKeeping Pressure Vessels Safe with the Sharck™  Probe
Keeping Pressure Vessels Safe with the Sharck™  ProbeEddyfi
 
Bahasa indonesia teks laporan hasil observasi
Bahasa indonesia teks laporan hasil observasiBahasa indonesia teks laporan hasil observasi
Bahasa indonesia teks laporan hasil observasiSri Utanti
 
Inspection of Stainless Steel Heat Exchanger Tubes with Eddy Current Array Probe
Inspection of Stainless Steel Heat Exchanger Tubes with Eddy Current Array ProbeInspection of Stainless Steel Heat Exchanger Tubes with Eddy Current Array Probe
Inspection of Stainless Steel Heat Exchanger Tubes with Eddy Current Array ProbeEddyfi
 
Inspecting Laser Welds in Component Manufacturing
Inspecting Laser Welds in Component ManufacturingInspecting Laser Welds in Component Manufacturing
Inspecting Laser Welds in Component ManufacturingEddyfi
 
Factors affecting lls usage
Factors affecting lls usageFactors affecting lls usage
Factors affecting lls usageEvelyn Estrella
 
Defect Detection & Prevention in Cast Turbine Wheels
Defect Detection & Prevention in Cast Turbine WheelsDefect Detection & Prevention in Cast Turbine Wheels
Defect Detection & Prevention in Cast Turbine WheelsEddyfi
 
State-of-the-Art RFT— Meeting the Ferromagnetic Tube Challenge
State-of-the-Art RFT— Meeting the Ferromagnetic Tube ChallengeState-of-the-Art RFT— Meeting the Ferromagnetic Tube Challenge
State-of-the-Art RFT— Meeting the Ferromagnetic Tube ChallengeEddyfi
 
Bipolar junction transistor characterstics biassing and amplification, lab 9
Bipolar junction transistor characterstics biassing and amplification, lab 9Bipolar junction transistor characterstics biassing and amplification, lab 9
Bipolar junction transistor characterstics biassing and amplification, lab 9kehali Haileselassie
 
High-Speed Remote-Field Testing in Carbon Steel Tubing
High-Speed Remote-Field Testing in Carbon Steel TubingHigh-Speed Remote-Field Testing in Carbon Steel Tubing
High-Speed Remote-Field Testing in Carbon Steel TubingEddyfi
 
Twisted Tube ® Heat Exchanger Inspection with Eddy Currents
Twisted Tube ® Heat Exchanger Inspection with Eddy CurrentsTwisted Tube ® Heat Exchanger Inspection with Eddy Currents
Twisted Tube ® Heat Exchanger Inspection with Eddy CurrentsEddyfi
 
Assigment 6
Assigment 6Assigment 6
Assigment 6fuzuli41
 
Detecting Flaws in Condenser Tubing Welds With the DefHi® Probe
Detecting Flaws in Condenser Tubing Welds With the DefHi® ProbeDetecting Flaws in Condenser Tubing Welds With the DefHi® Probe
Detecting Flaws in Condenser Tubing Welds With the DefHi® ProbeEddyfi
 
The case for Hadoop performance
The case for Hadoop performanceThe case for Hadoop performance
The case for Hadoop performanceNicolas Poggi
 
JLL Electronics Treadmills Magzine
JLL Electronics Treadmills MagzineJLL Electronics Treadmills Magzine
JLL Electronics Treadmills MagzineJLL Fitness
 

Andere mochten auch (20)

EE 305 Project_1 The Effective External Defibrillators
EE 305 Project_1 The Effective External Defibrillators EE 305 Project_1 The Effective External Defibrillators
EE 305 Project_1 The Effective External Defibrillators
 
The avanti group sharp turn for electronics company
The avanti group sharp turn for electronics companyThe avanti group sharp turn for electronics company
The avanti group sharp turn for electronics company
 
Bahasa indonesia teks laporan hasil observasi
Bahasa indonesia teks laporan hasil observasiBahasa indonesia teks laporan hasil observasi
Bahasa indonesia teks laporan hasil observasi
 
스마트큐 발표자료 김재우
스마트큐 발표자료 김재우스마트큐 발표자료 김재우
스마트큐 발표자료 김재우
 
Trainer David Cruuz Slimmers Testimonies
Trainer David Cruuz Slimmers TestimoniesTrainer David Cruuz Slimmers Testimonies
Trainer David Cruuz Slimmers Testimonies
 
Texture powerpoint final
Texture powerpoint finalTexture powerpoint final
Texture powerpoint final
 
Keeping Pressure Vessels Safe with the Sharck™  Probe
Keeping Pressure Vessels Safe with the Sharck™  ProbeKeeping Pressure Vessels Safe with the Sharck™  Probe
Keeping Pressure Vessels Safe with the Sharck™  Probe
 
Bahasa indonesia teks laporan hasil observasi
Bahasa indonesia teks laporan hasil observasiBahasa indonesia teks laporan hasil observasi
Bahasa indonesia teks laporan hasil observasi
 
Inspection of Stainless Steel Heat Exchanger Tubes with Eddy Current Array Probe
Inspection of Stainless Steel Heat Exchanger Tubes with Eddy Current Array ProbeInspection of Stainless Steel Heat Exchanger Tubes with Eddy Current Array Probe
Inspection of Stainless Steel Heat Exchanger Tubes with Eddy Current Array Probe
 
Inspecting Laser Welds in Component Manufacturing
Inspecting Laser Welds in Component ManufacturingInspecting Laser Welds in Component Manufacturing
Inspecting Laser Welds in Component Manufacturing
 
Factors affecting lls usage
Factors affecting lls usageFactors affecting lls usage
Factors affecting lls usage
 
Defect Detection & Prevention in Cast Turbine Wheels
Defect Detection & Prevention in Cast Turbine WheelsDefect Detection & Prevention in Cast Turbine Wheels
Defect Detection & Prevention in Cast Turbine Wheels
 
State-of-the-Art RFT— Meeting the Ferromagnetic Tube Challenge
State-of-the-Art RFT— Meeting the Ferromagnetic Tube ChallengeState-of-the-Art RFT— Meeting the Ferromagnetic Tube Challenge
State-of-the-Art RFT— Meeting the Ferromagnetic Tube Challenge
 
Bipolar junction transistor characterstics biassing and amplification, lab 9
Bipolar junction transistor characterstics biassing and amplification, lab 9Bipolar junction transistor characterstics biassing and amplification, lab 9
Bipolar junction transistor characterstics biassing and amplification, lab 9
 
High-Speed Remote-Field Testing in Carbon Steel Tubing
High-Speed Remote-Field Testing in Carbon Steel TubingHigh-Speed Remote-Field Testing in Carbon Steel Tubing
High-Speed Remote-Field Testing in Carbon Steel Tubing
 
Twisted Tube ® Heat Exchanger Inspection with Eddy Currents
Twisted Tube ® Heat Exchanger Inspection with Eddy CurrentsTwisted Tube ® Heat Exchanger Inspection with Eddy Currents
Twisted Tube ® Heat Exchanger Inspection with Eddy Currents
 
Assigment 6
Assigment 6Assigment 6
Assigment 6
 
Detecting Flaws in Condenser Tubing Welds With the DefHi® Probe
Detecting Flaws in Condenser Tubing Welds With the DefHi® ProbeDetecting Flaws in Condenser Tubing Welds With the DefHi® Probe
Detecting Flaws in Condenser Tubing Welds With the DefHi® Probe
 
The case for Hadoop performance
The case for Hadoop performanceThe case for Hadoop performance
The case for Hadoop performance
 
JLL Electronics Treadmills Magzine
JLL Electronics Treadmills MagzineJLL Electronics Treadmills Magzine
JLL Electronics Treadmills Magzine
 

Ähnlich wie sudoers: Benchmarking Hadoop with ALOJA

Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAlluxio, Inc.
 
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Sumeet Singh
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Sumeet Singh
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseJames Serra
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...Cisco DevNet
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21JDA Labs MTL
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT_MTL
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsGeoffrey Fox
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsGeoffrey Fox
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101MongoDB
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesDataWorks Summit
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next DecadePaula Koziol
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceData Works MD
 
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
 Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep... Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...Databricks
 

Ähnlich wie sudoers: Benchmarking Hadoop with ALOJA (20)

Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
 
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Hadoop Platform at Yahoo
Hadoop Platform at YahooHadoop Platform at Yahoo
Hadoop Platform at Yahoo
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! Perspectives
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next Decade
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data Science
 
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
 Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep... Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
 

Kürzlich hochgeladen

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Kürzlich hochgeladen (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

sudoers: Benchmarking Hadoop with ALOJA

  • 1. Benchmarking Hadoop with ALOJA Oct 6, 2015 by Nicolas Poggi @ni_po sudoers Barcelona:
  • 2. About Nicolas Poggi @ni_po Work: Education: Community:
  • 3. Agenda  Intro on Hadoop  Current scenario and problematic  ALOJA project  Open source tools  Benchmarking DEMO  Results  DEMO results online  Open questions and comments
  • 4. Intro: Hadoop design and ecosystem
  • 5. Hadoop design  Hadoop designed to solve complex data  Structured and non structured  With [close to] linear scalability  Simplifying the programming model  From MPI, OpenMP, CUDA, …  Operates as a blackbox for data analysts Image source: Hadoop, the definitive guide
  • 6. Hadoop parameters  > 100+ tunable parameters  mapred.map/reduce.tasks.speculative.execution  obscure and interrelated  io.sort.mb 100 (300)  io.sort.record.percent 5% (15%)  io.sort.spill.percent 80% (95 – 100%)  Number of Mappers and Reducers  Rule of thumb 0.5 - 2 per CPU core
  • 7. Hadoop stack for tuning Image source: Intel® Distribution for Apache Hadoop
  • 8. Hadoop highly-scalable but…  Not a high-performance solution!  Requires  Design,  Clusters, topology clusters  Setup,  OS, Hadoop config  and tuning required  Iterative approach  Time consuming  And extensive benchmarking!
  • 9. Hadoop ecosystem  Large and spread  Dominated by big players  Custom patches  Default values not ideal  Product claims  Cloud vs. On-premise  IaaS  PaaS  EMR, HDInsight  Needs standardization and auditing! DATA
  • 11. Too many choices? Remote volumes - - Rotational HDDs JBODs Large VMs Small VMs GbEthernet InfiniBand RAID Cost Performance On-Premise Cloud And where is my system configurationpositionedon each of these axes? Highavailability Replication + +
  • 12. Project ALOJA  Open initiative to produce mechanisms for an  automated characterization of cost-effectiveness  of Big Data deployments  Results from of a growing need of the community to understand job execution details and create transparency  Explore different configuration deployment options and their tradeoffs  Both software and hardware  Cloud services and on-premise  Seeks to provide knowledge, tools, and an online service  to with which users make better informed decisions  reduce the TCO for their Big Data infrastructures  Guide the future development and deployment of Big Data clusters and applications
  • 13. Challenges, options, and implementation
  • 14. Challenges (circa end 2013)  Test different clusters architectures  On-premise  Commodity, high-end, appliance, low-power  Cloud IaaS  32 different VMs in Azure, similar in other providers  Cloud PaaS  HDInsight, EMR, CloudBigData  Different access level  Full admin, user-only, request-to-install, everything ready, queuing systems (SGE)  Different versions  Hadoop, JVM, Spark, Hive, etc…  Dev environments and testing  Big Data usually requires a cluster to develop and test
  • 15. Benchmarking vs. Production envs  Need to compare different executions  Not how the systems are doing now  This is the main diff with prod products  Dada does not change (non-OLTP)  Temporary data for benchmarks vs. Important data  Fast iteration vs. Reliability  Iterates configurations vs. fixed config  Many fast, experimental changes  Security can be relaxed  Management for Hadoop  Vendor lock-in  Lack of systems support(azure, on-prem, low-power)  Hadoop is our use case, not the only one  Leave no traces on the benchmarked system
  • 16. Available options: (circa end 2013)  Deployment  jclouds  foreman  Puppet  Ambari  Config and deploy  Ambari (hadoop only)  Use Configuration Management (CM)  Puppet, chef, ansible…  Monitoring  Ganglia, Zabbix  Amabari  Cloudera Manager  Kibana, GraphD…  Problems  All systems thoughfor PROD  Not for comparison  No Azure support  Many different packages  No one-fits-all solution  Solution  Custom implementation  Based in simple components  Wrapping commands
  • 17. ALOJA Platform main components 2 Online Repository •Explore results •Execution details •Cluster details •Costs •Data sharing 3 Web Analytics •Data views and evaluations •Aggregates •Abstracted Metrics •Job characterization •Machine Learning •Predictions and clustering 1 Big Data Benchmarking •Deploy & Provision •Conf Management •Parameter selection & Queuing •Perf counters •Low-level instrumentation •App logs 17 NGINX, PHP, MySQL BASH, Unix tools, CLIs R, SQL, JS
  • 18. Workflow in ALOJA Cluster(s) definition • VM sizes • # nodes • OS, disks • Capabilities Execution plan • Start cluster • Exec Benchmarks • Gather results • Cleanup Import data • Convert perf metric • Parse logs • Import into DB Evaluate data • Data views in Vagrant VM • Or http://hadoop.bsc.es PA and KD •Predictive Analytics •Knowledge Discovery Historic Repo (in progress)
  • 19. Cluster and node definitions Clusters (Azure example) Node (Web in Rackspace) #load AZURE defaults source "$CONF_DIR/azure_defaults.conf" clusterName="al-08" numberOfNodes="8" vmSize=“Large” #details vmCores="4" vmRAM="7" #in GB #costs clusterCostHour="1.584"#0.176 * 9 clusterType="IaaS" clusterDescription="A3 type VMs" #load node defaults source “$CONF_DIR/node_defaults.conf" defaultProvider="rackspace" vm_name="aloja-web" vmSize='io1-30' attachedVolumes="2" diskSize="1023" # Node roles (install functions) extraLocalCommands=" vm_install_webserver; vm_install_repo 'provider/rackspace'; install_ganglia_gmond; config_ganglia_gmond 'aloja-web-rackspace' 'aloja- web'; install_percona /scratch/attached/2/mysql;"
  • 20. Commands and providers Provisioning commands Providers  Connect  Node and Cluster  Uses SSH proxies automatically  Deploy  Start, Stop  Delete  Nodes and clusters  On-premise  Custom settings for clusters  Multiple disk types  Different architectures  Cloud IaaS  Azure, OpenStack, Rackspace, AWS (testing)  Cloud PaaS  HDInsight, CloudBigData, EMR soon Code at: https://github.com/Aloja/aloja/tree/master/aloja-deploy
  • 21. Running benchmarks in ALOJA  Example of submitting a job to run:  https://github.com/Aloja/aloja/blob/master/aloja-bench/run_benchs.sh  To queue jobs and control results:  https://github.com/Aloja/aloja/blob/master/shell/exeq.sh
  • 23. ALOJA Online Benchmark Repository  Entry point for explorethe results collected from the executions  Index of executions  Quick glance of executions  Searchable,Sortable  Execution details  Performance chartsandhistograms  Hadoopcounters  Jobsand taskdetails  Data management of benchmark executions  Data importing from different clusters  Execution validation  Data management and backup  Cluster definitions  Cluster capabilities (resources)  Cluster costs  Sharing results  Download executions  Add external executions  Documentation and References  Papers, links, and feature documentation Availableat: http://aloja.bsc.es
  • 24. Impact of SW configurations in Speedup (4 node clusters) Number of mappers Compression algorithm No comp. ZLIB BZIP2 snappy 4m 6m 8m 10m Speedup (higher is better) Results using: http://hadoop.bsc.es/configimprovement Details: https://raw.githubusercontent.com/Aloja/aloja/master/publications/BSC-MSR_ALOJA.pdf
  • 25. Impact of HW configurationsin Speedup Disks and Network Cloud remote volumes Local only 1 Remote 2 Remotes 3 Remotes 3 Remotes /tmp local 2 Remotes /tmp local 1 Remotes /tmp local HDD-ETH HDD-IB SSD-ETH SDD-IB Speedup (higher is better) Results using: http://hadoop.bsc.es/configimprovement Details: https://raw.githubusercontent.com/Aloja/aloja/master/publications/BSC-MSR_ALOJA.pdf
  • 26. Speedup: all disk configurationsSSD vs JBOD  For DFSIOEread, DFSIOEwrite, and Terasort URL: http://hadoop.bsc.es/configimprovement?datefrom=&dateto=&benchs%5B%5D=dfsioe_read&benchs%5B%5D=dfsioe_write&benchs%5B%5D=terasort&id_clusters%5B%5D=21&nets%5B%5D=None&disks%5B%5D=HD2&disks%5B%5D=H D3&disks%5B%5D=HD4&disks%5B%5D=HD5&disks%5B%5D=HDD&disks%5B%5D=HS5&disks%5B%5D=RL1&disks%5B%5D=RL2&disks%5B%5D=RL3&disks%5B%5D=RL4&disks%5B%5D=RL5&disks%5B%5D=RL6&disks%5B%5D=RR1&disks% 5B%5D=SS2&disks%5B%5D=SSD&mapss%5B%5D=None&comps%5B%5D=None&replications%5B%5D=None&blk_sizes%5B%5D=None&iosfs%5B%5D=None&iofilebufs%5B%5D=None&datanodess%5B%5D=None&bench_types%5B%5D=H DI&bench_types%5B%5D=HiBench&vm_sizes%5B%5D=None&vm_coress%5B%5D=None&vm_RAMs%5B%5D=None&hadoop_versions%5B%5D=None&types%5B%5D=None&filters%5B%5D=valid&filters%5B%5D=filters&allunchecked= 2 SSDs 5 SATA 1 SSD /tmp 1 SSD 1 SATA 2 SATA 3 SATA 4 SATA 5 SATA Higherisbetter Fastest config Highcapacity and fast Highcapacity but slow
  • 27. Speedup by disk configuration in the Cloud (higher is better) URL http://104.130.159.92/configimprovement?benchs%5B%5D=terasort&disks%5B%5D=HDD&disks%5B%5D=RL1&disks%5B%5D=RL2&disks%5B%5D=RL3&disks%5B%5D=RR1&disks%5B%5D=RR2&disk s%5B%5D=RR3&disks%5B%5D=RR4&disks%5B%5D=RR5&disks%5B%5D=RR6&disks%5B%5D=RS1&disks%5B%5D=RS6&disks%5B%5D=SSD&bench_types%5B%5D=HiBench&filters%5B%5D=valid&filt ers%5B%5D=filters&allunchecked=&selected-groups=disk&datefrom=&dateto=&minexetime=150&maxexetime=1500 1-6 remotes 1 and 6 remotes with /tmp on SSD SSD only Higherisbetter
  • 28. VM Size comparison(Azure) Lower is better
  • 29. Preview: Cost/Performance Scalability  This shows a sample of a new screen (with sample data) to find the most cost- effective cluster size  X axis number of datanodes (cluster size  Left Y Execution time (lower is better)  Right Y Execution cost Execution time Execution cost Recommendedsize
  • 30. InfiniBand + SDD (LOCAL) GbE SDD + (LOCAL) CLOUD (local disk/tmpand HDFS) CLOUD (/tmpinLocal Disk, HDFSin Blob storage 1-3 devices) CLOUD (/tmpandHDFSin Blob storage 1-3 devices) InfiniBand + SATA disks (LOCAL) GbE+ SATA disks (LOCAL) Price Performance Cost-effectiveness On-premise vs. Cloud) Details at: https://raw.githubusercontent.com/Aloja/aloja/master/publications/BSC-MSR_ALOJA.pdf
  • 31. Open questions: is BASH good enough? PROs CONs and Alternatives  Simple and Fast  Well known  (basics at least)  Easy to hack  Most of the work requires running sys commands  Custom implementation problems  Missing some systems  Too simple, missing:  objects, inheritance, types, data structures, testing  Python? Perl?  Puppet? Ansible?  We’ll stick to bash for now..  What’s missing for incubating in Apache?
  • 32. More info:  ALOJA Benchmarking platform and online repository  http://aloja.bsc.es  Benchmarking Big Data by Nicolas Poggi  http://www.slideshare.net/ni_po/benchmarking-hadoop  Big Data Benchmarking Community (BDBC) mailing list  (~200 members from ~80organizations)  http://clds.sdsc.edu/bdbc/community  Workshop Big Data Benchmarking (WBDB)  Next: http://clds.sdsc.edu/wbdb2015.ca  SPEC Research Big Data working group  http://research.spec.org/working-groups/big-data-working-group.html  Slides and video:  Michael Frank on Big Data benchmarking  http://www.tele-task.de/archive/podcast/20430/  Tilmann Rabl BigData BenchmarkingTutorial  http://www.slideshare.net/tilmann_rabl/ieee2014-tutorialbarurabl
  • 33. @BDOOP_BCN More info: http://aloja.bsc.es or join BDOOP group http://www.meetup.com/Barcelona-BigData-Perfomance-and- Operations Oct 06, 2015