SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Hadoop Query Performance
Smackdown
Michael Fagan & Dushyanth Vaddi
June 14 2017
2
Background on Big Data at Comcast
 1K+ users in enterprise data lake
 Spectrum of use cases
 Multi-tenant PAS, SAS, DAS
 24x7 Environment
 Speed & Stability are King & Queen!
 Petabytes of enterprise data available via Hive tables
3
Focus & Outcomes
 Focus on SQL query engines that can be connected to traditional BI &
Reporting Tools
 Independent performance results from our lab.
 Which engine(s) are the fastest running the TPC – DS dataset?
 Are there significant performance differences in how the data is stored and
compressed?
 How do these engines perform in a memory limited environment?
 Which engine(s) would you offer to your executive staff?
4
Test Environment
 Physical Masters (5)
 32 cores
 90Gb Ram
 12x4Tb Hard Drives
 10 Gb Ethernet
 Physical Workers (11)
 32 cores
 128Gb Ram
 12x4Tb Hard Drives
 10 Gb Ethernet
 40Gb Rack top switches
CreativeCommons
5
Software
 Host OS - CentOS 6.8
 Hadoop - HDP 2.6.0.3
 MapReduce2 – 2.7.3.2.6
 Hive/LLAP - 1.2.1.26
 Tez - 0.7.0
 Spark – 2.1.0
 Presto – 0.175
6
Test Data & Queries
 Started with Hortonworks hive test benchmark repository
https://github.com/hortonworks/hive-testbench
 Generated base 1TB TPC-DS partitioned tables
 Used base data to build 14 test databases
 Table and partition stats were collected on all schemas
7
14 Test Data Sets
8
Test Methodology
 Utilized all 66 TPC-DS Queries defined in Hive Benchmark
 Same SQL executed in all engines*
 Presto had issues with casting dates
 Presto currently does not support “use db;”
 Care was taken to tune engine & configurations
 We expect everyone can find additional optimizations
 Queries are run against one engine at a time
 Allowing each engine to utilize all environment resources
 Each query is was run consecutively 3 times
9
Performance Measurement
 Time is measured in wall clock execution time on server
 Queries Invoked via Beeline
 Utilize !sh command to write out query timings
 Implemented simple SQL client for Presto
 Failed queries where assigned a penalty time of 10 minutes
10
TPC-DS Queries – 66 total
Q3 Q24 Q24 Q56 Q72 Q88
Q7 Q25 Q25 Q58 Q73 Q89
Q12 Q26 Q26 Q60 Q75 Q90
Q13 Q27 Q27 Q63 Q76 Q91
Q15 Q28 Q28 Q64 Q79 Q92
Q17 Q29 Q29 Q65 Q80 Q93
Q18 Q31 Q31 Q66 Q82 Q94
Q19 Q32 Q32 Q67 Q83 Q95
Q20 Q34 Q34 Q68 Q84 Q96
Q21 Q39 Q39 Q70 Q85 Q97
Q22 Q40 Q40 Q71 Q87 Q98
11
Results
CreativeCommons
Results
12
Query Failures & Penalties
 Map Reduce: Q13, Q24, Q31, Q46, Q64, Q68,
Q83, Q85– (80 minutes)
 LLAP: Q29 - (10 minutes)
 Spark: Q70, Q72 - (20 minutes)
 Presto: Q70, Q72, Q80, Q94, Q95 - (50) minutes
 Tez: Q24, Q83 - (20 minutes)
CreativeCommons
13
Results: Map-Reduce
USAirForce
Required 36 hours to run the 66 queries from one ORC schema!
14
Settings – Tez
hive.optimize.bucketmapjoin=true;
hive.optimize.index.filter=true;
hive.optimize.reducededuplication.min.reducer=4;
hive.optimize.reducededuplication=true;
hive.orc.splits.include.file.footer=false;
hive.security.authorization.enabled=false;
hive.security.metastore.authorization.manager=
org.apache.hadoop.hive.ql.security.authorization.
StorageBasedAuthorizationProvider;
hive.server2.tez.default.queues=default;
hive.server2.tez.initialize.default.sessions=false;
hive.server2.tez.sessions.per.default.queue=1;
hive.stats.autogather=true;
hive.tez.input.format=
org.apache.hadoop.hive.ql.io.HiveInputFormat;
hive.txn.manager=
org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager;
hive.txn.max.open.batch=1000;
hive.txn.timeout=300;
hive.vectorized.execution.enabled=true;
hive.vectorized.groupby.checkinterval=1024;
hive.vectorized.groupby.flush.percent=1;
hive.vectorized.groupby.maxentries=1024;
hive.tez.container.size=3072;
hive.auto.convert.join.noconditionaltask.size=572662306;
hive.execution.engine=tez;
hive.cbo.enable=true;
hive.stats.fetch.column.stats=true;
hive.exec.dynamic.partition.mode=nonstrict;
hive.tez.auto.reducer.parallelism=true;
hive.exec.reducers.bytes.per.reducer=100000000;
hive.txn.manager=
org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager;
hive.support.concurrency=false;
15
Results – Tez
ORC Zlib
6312.45
ORC Snappy
6343.11
ORC None
6499.72
Parquet None
7147.16
Parquet Snappy
7168.43
Parquet Gzip
7186.57
Text Gzip
7692.53
Seq Gzip
7734.06
Seq Snappy
7808.23
Text None
8013.46
Text Snappy
8118.11
Seq None
8997.16
Seq Bzip
11710.05
Text Bzip
12871.95
1
TEZ CUMULATIVE QUERY TIMES 1TB TPC-DS
(IN SECONDS)
Smaller is better
0.4 % 13.2% 21.9% 103.9%
16
Settings – Presto
 config.properties
 query.max.memory = 1000 GB
 query.max-memory-per-node = 29 GB
 query.initial-hash-partitions=11
 task.concurrency=32
 jvm.conf
 -Server
 -Xmx50G
 -XX:+UseG1GC
 -XX:+UseGCOverheadLimit
 -XX:+ExplicitGCInvokesConcurrent
 -XX:+HeapDumpOnOutOfMemoryError
 -XX:OnOutOfMemoryError=kill -9 %p
17
Results – Presto
ORC Zlib
6216.20
ORC Snappy
6267.88
ORC None
6480.80
Parquet Gzip
6528.14
Parquet Snappy
6529.59
Parquet None
6538.24
Text Snappy
8292.60
Seq Snappy
8695.51
Text Gzip
8735.74
Text None
8853.33
Seq Gzip
8978.22
Seq None
11150.02
Seq Bzip
22124.66
Text Bzip
23681.03
1
PRESTO CUMULATIVE QUERY TIMES 1TB TPC-DS
(IN SECONDS)
Smaller is better
0.8 % 5 % 33.4 %
281.0 %
18
Results – Spark SQL with Spark Thrift Server (STS)
 The Spark Thrift Server proved to be
inconsistent in our test environment
 Required monitoring and restart to address long
garbage collection pauses
 Achieving repeatable results through the STS
proved to be very problematic
 We were forced to scratch the results
 This is a relatively new technology that is under
construction – stay tuned
CreativeCommons:102orion
19
Settings – LLAP
 Ambari
 Memory per Daemon = 113664 MB
 In-Memory Cache per Daemon = 39116 MB (30%)
 LLAP Daemon Heap Size = 62260 Mb (70%)
 Enable Reduce Vectorization = true
 Number of Nodes for running Hive LLAP daemon = 11
 hive.auto.convert.join.noconditionaltask.size = 4294967296
 Hive.llap.io.threadpool.size = 28
20
Results – LLAP
Smaller is better
ORC Zlib
4718.67
ORC Snappy
4783.33
ORC None
4854.79
Parquet Gzip
5487.08
Parquet None
5592.51
Parquet Snappy
5597.81
Seq Snappy
6835.25
Text None
7015.09
Text Snappy
7076.00
Seq Gzip
7086.77
Text Gzip
7121.59
Seq None
7545.16
Seq Bzip
10916.10
Text Bzip
12364.79
1
LLAP CUMULATIVE AVG. QUERY TIMES 1TB TPC-DS
(IN SECONDS)
16.3 %1.4 % 44.9 %
159.9%
21
Fastest Execution*
LLAP
78.64
Presto
103.60
Tez
105.21
1
BEST CUMULATIVE AVG. QUERY TIMES 1TB TPC-DS
(IN MINUTES)
24 % 25 %
Smaller is better
22
Wins on Query Execution*
LLAP
44
Presto
16
Tez
6
1
TOTAL QUERY WINS BY ENGINE ON 1TB TPC-DS
Larger is better
23
Query Results*
Smaller is better
0.00
100.00
200.00
300.00
400.00
500.00
600.00
700.00
800.00
900.00
1000.00
Q3
Q7
Q12
Q13
Q15
Q17
Q18
Q19
Q20
Q21
Q22
Q24
Q25
Q26
Q27
Q28
Q29
Q31
Q32
Q34
Q39
Q40
Q27
Q28
Q29
Q31
Q32
Q34
Q39
Q40
Q42
Q43
Q45
Q56
Q58
Q60
Q63
Q64
Q65
Q66
Q67
Q68
Q70
Q71
Q72
Q73
Q75
Q76
Q79
Q80
Q82
Q83
Q84
Q85
Q87
Q88
Q89
Q90
Q91
Q92
Q93
Q94
Q95
Q96
Q97
Q98
QUERY EXECUTION TIMES ON 1TB TPC-DS
(IN SECONDS)
LLAP Presto
24
Observations
 Performance tuning engines is still more art than science
 High level of confidence more performance gains out there
 Performance gains are still achievable for memory intensive
engines running lower memory environments
 LLAP and Presto are solid engines and with zero issues
encountered over the several months of testing
 All the engines played well together in our test environment
25
Losers
 Map Reduce SQL workloads
 BZip compressed TEXT and SEQUENCE files
 Spark Thrift Server (temporary)
26
Winners
 Solution for the C-Suite - LLAP
 Runner up is Presto
 Columnar Storage – ORC Zlib
 Runner up is Parquet (no clear compression winner)
27
Questions
Dushyanth Vaddi
dushyanth_vaddi@comcast.com
Michael Fagan
michael_fagan@comcast.com
We are hiring in Colorado!
28
Backup Slides
29
Settings – Spark Thrift Server (Spark)
driver-memory 30g
master yarn
executor-memory 3g
executor-cores 2
num-executors 280
spark.cleaner.ttl=1800s
spark.hadoop.yarn.timeline-service.enabled=false
spark.yarn.executor.memoryOverhead=1024
spark.rpc.askTimeout=2000
spark.sql.broadcastTimeout=60000
spark.kryoserializer.buffer.max=1000m
spark.sql.inMemoryColumnarStorage.compressed=
true
spark.scheduler.mode=FAIR
spark.io.compression.codec=lzf
spark.serializer =
org.apache.spark.serializer.KryoSerializer
spark.sql.crossJoin.enabled=true
spark.yarn.scheduler.heartbeat.interval-ms =
30000000
spark.executor.heartbeatInterval=600s
spark.network.timeout=1200s
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.initialExecutors=280
spark.dynamicAllocation.maxExecutors=308
spark.dynamicAllocation.minExecutors=280
spark.shuffle.service.enabled=true
spark.kryo.referenceTracking=false
spark.sql.orc.filterPushdown=true
30
Settings – Spark Thrift Server (Hive)
hive.vectorized.execution.enabled=true
hive.cbo.enable=true
hive.merge.mapredfiles=false
hive.merge.mapfiles=true
hive.merge.sparkfiles=true
hive.merge.smallfiles.avgsize=16000000hive.merge.
size.per.task=256000000
hive.auto.convert.join=true
hive.auto.convert.join.noconditionaltask=true
hive.auto.convert.join.noconditionaltask.size=
200000000
hive.optimize.bucketmapjoin.sortedmerge=false
hive.map.aggr.hash.percentmemory=0.5
hive.map.aggr=true
hive.optimize.sort.dynamic.partition=false
hive.stats.fetch.column.stats=true
hive.vectorized.execution.reduce.enabled=false
hive.vectorized.groupby.checkinterval=4096
hive.vectorized.groupby.flush.percent=0.1
hive.compute.query.using.stats=true
hive.limit.pushdown.memory.usage=0.4
hive.exec.reducers.bytes.per.reducer=67108864
hive.smbjoin.cache.rows=10000
hive.fetch.task.conversion=more
hive.fetch.task.conversion.threshold=1073741824
hive.optimize.index.filter=true
hive.optimize.ppd=true

Weitere ähnliche Inhalte

Was ist angesagt?

HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance ImprovementBiju Nair
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path ForwardAlluxio, Inc.
 
Millions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersMillions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersDataWorks Summit
 
A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides Altinity Ltd
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsAlluxio, Inc.
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in SparkDatabricks
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariDataWorks Summit
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PGConf APAC
 
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoHigh Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoAlluxio, Inc.
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraVictor Coustenoble
 

Was ist angesagt? (20)

HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
Millions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersMillions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size Matters
 
A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Hadoopエコシステムのデータストア振り返り
Hadoopエコシステムのデータストア振り返りHadoopエコシステムのデータストア振り返り
Hadoopエコシステムのデータストア振り返り
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs
 
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoHigh Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
 

Ähnlich wie Hadoop Query Performance Smackdown

Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)Nicolas Poggi
 
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Community
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureDanielle Womboldt
 
Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and ArchitectureSidney Chen
 
Beyond Best Practice: Grid Computing in the Modern World
Beyond Best Practice: Grid Computing in the Modern World Beyond Best Practice: Grid Computing in the Modern World
Beyond Best Practice: Grid Computing in the Modern World ThotWave
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkDongwon Kim
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightDataStax Academy
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownScyllaDB
 
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster RedundancyPilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster RedundancyStuart Pook
 
USE_OF_PACKET_CAPTURE.pptx
USE_OF_PACKET_CAPTURE.pptxUSE_OF_PACKET_CAPTURE.pptx
USE_OF_PACKET_CAPTURE.pptxrajaguru91
 
SREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsSREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsBrendan Gregg
 
Retour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantRetour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantSwiss Data Forum Swiss Data Forum
 
GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame
GTC Tel Aviv: Accelerate Analytics with a GPU Data FrameGTC Tel Aviv: Accelerate Analytics with a GPU Data Frame
GTC Tel Aviv: Accelerate Analytics with a GPU Data FrameAaron Williams
 
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...DataWorks Summit
 
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsChoose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsDatabricks
 
Deep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDeep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDatabricks
 
Алексей Петров "PHP at Scale: Knowing enough to be dangerous!"
Алексей Петров "PHP at Scale: Knowing enough to be dangerous!"Алексей Петров "PHP at Scale: Knowing enough to be dangerous!"
Алексей Петров "PHP at Scale: Knowing enough to be dangerous!"Fwdays
 
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffDatabases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffTimescale
 
Cloud Performance Benchmarking
Cloud Performance BenchmarkingCloud Performance Benchmarking
Cloud Performance BenchmarkingSantanu Dey
 

Ähnlich wie Hadoop Query Performance Smackdown (20)

Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)Using BigBench to compare Hive and Spark (short version)
Using BigBench to compare Hive and Spark (short version)
 
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and Architecture
 
Beyond Best Practice: Grid Computing in the Modern World
Beyond Best Practice: Grid Computing in the Modern World Beyond Best Practice: Grid Computing in the Modern World
Beyond Best Practice: Grid Computing in the Modern World
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance Showdown
 
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster RedundancyPilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
 
USE_OF_PACKET_CAPTURE.pptx
USE_OF_PACKET_CAPTURE.pptxUSE_OF_PACKET_CAPTURE.pptx
USE_OF_PACKET_CAPTURE.pptx
 
SREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsSREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREs
 
Retour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantRetour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenant
 
GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame
GTC Tel Aviv: Accelerate Analytics with a GPU Data FrameGTC Tel Aviv: Accelerate Analytics with a GPU Data Frame
GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame
 
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
 
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsChoose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
 
Deep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDeep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.x
 
Алексей Петров "PHP at Scale: Knowing enough to be dangerous!"
Алексей Петров "PHP at Scale: Knowing enough to be dangerous!"Алексей Петров "PHP at Scale: Knowing enough to be dangerous!"
Алексей Петров "PHP at Scale: Knowing enough to be dangerous!"
 
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffDatabases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
 
Cloud Performance Benchmarking
Cloud Performance BenchmarkingCloud Performance Benchmarking
Cloud Performance Benchmarking
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 

Kürzlich hochgeladen (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 

Hadoop Query Performance Smackdown

Hinweis der Redaktion

  1. Map reduce would have had turned in a faster performance number if it has failed all the tests
  2. For the non columar formats Gzip wins the compression
  3. Non columnar formats snappy is the winner
  4. Total Caching size is 420GB
  5. Improvement on the high end of almost 1600 seconds low end of 600 seconds Improvement of over 10 – 26 minutes over the entire run or 9 -24 seconds on average per query