SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
1	
1	
Evaluation  of  Cloudera  impala  1.1
Aug  7,  2013
CELLANT  Corp.  R&D  Strategy  Division
Yukinori  SUDA
@sudabon
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
v  Sentry  support:
l  Fine-‐‑‒grained  authorization
l  Role-‐‑‒based  authorization
v  Support  for  views
v  Performance  improvements
l  Parquet  columnar  performance
l  More  efficient  metadata  refresh  for  larger  installations
v  Additional  SQL
l  SQL-‐‑‒89  joins  (in  addition  to  existing  SQL-‐‑‒92)
l  LOAD  function
l  REFRESH  command  for  JDBC/ODBC
v  Improved  Hbase  support:
l  Binary  types
l  Caching  configuration
v  Fixed  many  bugs
Cloudera  Impala  1.1  was  released  !!
2
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
v Hive  ⇒  Impala
l On  Impala  shell,  can  read  data  in  “VIEW”  that  was  
created  via  Hive  command  ?
v Impala  ⇒  Hive
l On  Hive  shell,  can  read  data  in  “VIEW”  that  was  
created  via  Impala  command  ?
v Result
Two  “VIEW”s  have  compatibility
Check  compatibility  of  “VIEW”
3
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
Check  performance  (Hive  on  Cluster1)
4
0 50 100 150 200 250
No  Comp.
Gzip
Snappy
Gzip
Snappy
TextFileSequenceFileRCFile
222.039
244.67
239.182
228.801
230.327
Avg.  Job  Latency  [sec]
This result will be invalid as performance evaluation cause some data may be read remotely.
See the slide of “Check performance (Hive on Cluster2)”.
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
Check  performance  (Impala  on  Cluster1)
5
0 50 100 150 200 250
No  Comp.
Gzip
Snappy
Gzip
Snappy
Snappy
Text
File
Sequence
FileRCFile
Parquet
File
23.518
32.155
28.617
20.774
12.654
13.146
Avg.  Job  Latency  [sec]
This result will be invalid as performance evaluation
cause some data may be read remotely.
See the slide of “Check performance (Impala on Cluster2)”.
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
Check  performance  (Hive  on  Cluster2)
6
0 50 100 150 200 250 300
No  Comp.
Gzip
Snappy
Gzip
Snappy
TextFileSequenceFileRCFile
272.176
249.531
245.009
230.034
216.802
Avg.  Job  Latency  [sec]
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
Check  performance  (Impala  on  Cluster2)
7
0 50 100 150 200 250 300
No  Comp.
Gzip
Snappy
Gzip
Snappy
Snappy
Text
File
Sequence
FileRCFile
Parquet
File
32.528
28.73
21.173
24.794
14.308
19.814
Avg.  Job  Latency  [sec]
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
v IMPALA-‐‑‒357
l Insert  into  Parquet  exceed  mem-‐‑‒limit
v Problem
l Even  if  set  mem_̲limit  setting,  when  create  ParquetFile  
table  with  partitions,  consumed  memory  isnʼ’t  limited.  
l At  last,  Impalad  crashes  due  to  memory  shortage
v Result
CREATE  command  failed  due  to  memory  limit
Check  fixed  bug
8
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
v Thanks  to  dev.  team,  Impala  is  also  going  
from  “Good  to  Great”
v Both  “VIEW”  and  “Parquet”  are  already  ready
v Performance
v RCFile+Snappy  is  the  fastest  on  both  Cluster1  and  
Cluster2
v If  use  larger  size  table,  Parquet+Snappy  may  be  the  
fastest
v Hope  for  future  extension
l Support  Structure  Types
l Support  UDF/UDTF,  etc
Summary
9
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
10
Appendix.  Benchmark  Details
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
Our  System  Environment(Cluster1)
11
v  Install  using  Cloudera  Manager  Free  Edition  4.6.0
Master Slave
14  Servers
All  servers  are  connected  with  1Gbps  Ethernet  through  an  L2  switch
Active
NameNode
DataNode
TaskTracker
Impalad
Stand-‐‑‒by
NameNode
JobTracker
statestored
3  Servers
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
DataNode
DataNode
DataNode
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
Our  System  Environment(Cluster2)
12
v  Install  using  Cloudera  Manager  Free  Edition  4.6.0
Master Slave
10  Servers
All  servers  are  connected  with  1Gbps  Ethernet  through  an  L2  switch
Active
NameNode
DataNode
TaskTracker
Impalad
Stand-‐‑‒by
NameNode
JobTracker
statestored
3  Servers
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
DataNode
DataNode
DataNode
Decommissioned
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
v CPU
l Intel  Core  2  Duo  2.13  GHz  with  Hyper  Threading
v Memory
l 8GB  :  Namenodes  only
l 4GB  :  Others
v Disk
l 7,200  rpm  SATA  mechanical  Hard  Disk  Drive  *  1
v OS
l Cent  OS  6.3
Our  Server  Specification
13
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
v  Use  CDH4.3.0  +  Impala  1.1
v  Use  hivebench  in  open-‐‑‒sourced  benchmark  tool  “HiBench”
l  https://github.com/hibench
v  Modified  datasets  to  1/10  scale
l  Default  configuration  generates  table  with  1  billion  rows
v  Modified  query  sentence
l  Deleted  “INSERT  INTO  TABLE  …”  to  evaluate  read-‐‑‒only  performance
v  Combines  a  few  storage  format  with  a  few  compression  method
l  TextFile,  SequenceFile,  RCFile,  ParquestFile
l  No  compression,  Gzip,  Snappy
v  Comparison  with  job  query  latency
v  Average  job  latency  over  5  measurements
v  Benchmark  on  both  Cluster1  and  Cluster2
Benchmark
14
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
•  Uservisits  table
–  100  million  rows
–  16,895  MB  as  TextFile
–  Table  Definitions
•  sourceIP   string
•  destURL   string
•  visitDate   string
•  adRevenue   double
•  userAgent   string
•  countryCode   string
•  languageCode  string
•  searchWord   string
•  duration   int
•  Rankings  table
–  12  million  rows
–  744  MB  as  TextFile
–  Table  Definitions
•  pageURL string
•  pageRank int
•  avgDuration int
Modified  Datasets
15
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
SELECT
  sourceIP,
  sum(adRevenue)  as  totalRevenue,
  avg(pageRank)  
FROM
  rankings_̲t  R
JOIN  [BROADCAST]  (
  SELECT
    sourceIP,
    destURL,
    adRevenue
  FROM
    uservisits_̲t  UV
  WHERE
    (datediff(UV.visitDate,  '1999-‐‑‒01-‐‑‒01')>=0
    AND
    datediff(UV.visitDate,  '2000-‐‑‒01-‐‑‒01')<=0)
  )  NUV
ON
  (R.pageURL  =  NUV.destURL)
group  by  sourceIP
order  by  totalRevenue  DESC
limit  1;
Modified  Query
16
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
17
Thanks!
I  want  to  use  TPC  in  next  evaluation…

Weitere ähnliche Inhalte

Was ist angesagt?

Building Spark as Service in Cloud
Building Spark as Service in CloudBuilding Spark as Service in Cloud
Building Spark as Service in CloudInMobi Technology
 
PGConf.ASIA 2019 Bali - Foreign Data Wrappers - Etsuro Fujita & Tatsuro Yamada
PGConf.ASIA 2019 Bali - Foreign Data Wrappers - Etsuro Fujita & Tatsuro YamadaPGConf.ASIA 2019 Bali - Foreign Data Wrappers - Etsuro Fujita & Tatsuro Yamada
PGConf.ASIA 2019 Bali - Foreign Data Wrappers - Etsuro Fujita & Tatsuro YamadaEqunix Business Solutions
 
RedGateWebinar - Where did my CPU go?
RedGateWebinar - Where did my CPU go?RedGateWebinar - Where did my CPU go?
RedGateWebinar - Where did my CPU go?Kristofferson A
 
Replicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon RedshiftReplicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon RedshiftContinuent
 
OOW 2013: Where did my CPU go
OOW 2013: Where did my CPU goOOW 2013: Where did my CPU go
OOW 2013: Where did my CPU goKristofferson A
 
FortranCon2020: Highly Parallel Fortran and OpenACC Directives
FortranCon2020: Highly Parallel Fortran and OpenACC DirectivesFortranCon2020: Highly Parallel Fortran and OpenACC Directives
FortranCon2020: Highly Parallel Fortran and OpenACC DirectivesJeff Larkin
 
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Monitoring PostgreSQL at ScalePGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Monitoring PostgreSQL at ScalePGConf APAC
 
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...Equnix Business Solutions
 
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016DataStax
 
NVIDIA Tesla K40 GPU
NVIDIA Tesla K40 GPUNVIDIA Tesla K40 GPU
NVIDIA Tesla K40 GPUCan Ozdoruk
 
An Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating SystemAn Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating SystemLinaro
 
HCQC : HPC Compiler Quality Checker
HCQC : HPC Compiler Quality CheckerHCQC : HPC Compiler Quality Checker
HCQC : HPC Compiler Quality CheckerLinaro
 
The Database Sizing Workflow
The Database Sizing WorkflowThe Database Sizing Workflow
The Database Sizing WorkflowKristofferson A
 
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedPGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedEqunix Business Solutions
 
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB ProjectRakuten Group, Inc.
 
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...Equnix Business Solutions
 
OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...
OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...
OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...Kristofferson A
 
Replicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analyticsReplicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analyticsContinuent
 
Whitepaper: Exadata Consolidation Success Story
Whitepaper: Exadata Consolidation Success StoryWhitepaper: Exadata Consolidation Success Story
Whitepaper: Exadata Consolidation Success StoryKristofferson A
 

Was ist angesagt? (20)

Building Spark as Service in Cloud
Building Spark as Service in CloudBuilding Spark as Service in Cloud
Building Spark as Service in Cloud
 
PGConf.ASIA 2019 Bali - Foreign Data Wrappers - Etsuro Fujita & Tatsuro Yamada
PGConf.ASIA 2019 Bali - Foreign Data Wrappers - Etsuro Fujita & Tatsuro YamadaPGConf.ASIA 2019 Bali - Foreign Data Wrappers - Etsuro Fujita & Tatsuro Yamada
PGConf.ASIA 2019 Bali - Foreign Data Wrappers - Etsuro Fujita & Tatsuro Yamada
 
RedGateWebinar - Where did my CPU go?
RedGateWebinar - Where did my CPU go?RedGateWebinar - Where did my CPU go?
RedGateWebinar - Where did my CPU go?
 
Replicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon RedshiftReplicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon Redshift
 
OOW 2013: Where did my CPU go
OOW 2013: Where did my CPU goOOW 2013: Where did my CPU go
OOW 2013: Where did my CPU go
 
FortranCon2020: Highly Parallel Fortran and OpenACC Directives
FortranCon2020: Highly Parallel Fortran and OpenACC DirectivesFortranCon2020: Highly Parallel Fortran and OpenACC Directives
FortranCon2020: Highly Parallel Fortran and OpenACC Directives
 
Case Studies on PostgreSQL
Case Studies on PostgreSQLCase Studies on PostgreSQL
Case Studies on PostgreSQL
 
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Monitoring PostgreSQL at ScalePGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
 
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
 
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
 
NVIDIA Tesla K40 GPU
NVIDIA Tesla K40 GPUNVIDIA Tesla K40 GPU
NVIDIA Tesla K40 GPU
 
An Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating SystemAn Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating System
 
HCQC : HPC Compiler Quality Checker
HCQC : HPC Compiler Quality CheckerHCQC : HPC Compiler Quality Checker
HCQC : HPC Compiler Quality Checker
 
The Database Sizing Workflow
The Database Sizing WorkflowThe Database Sizing Workflow
The Database Sizing Workflow
 
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedPGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
 
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
 
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
 
OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...
OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...
OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...
 
Replicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analyticsReplicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analytics
 
Whitepaper: Exadata Consolidation Success Story
Whitepaper: Exadata Consolidation Success StoryWhitepaper: Exadata Consolidation Success Story
Whitepaper: Exadata Consolidation Success Story
 

Andere mochten auch

Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad ranaData Con LA
 
(Aaron myers) hdfs impala
(Aaron myers)   hdfs impala(Aaron myers)   hdfs impala
(Aaron myers) hdfs impalaNAVER D2
 
ImpalaToGo introduction
ImpalaToGo introductionImpalaToGo introduction
ImpalaToGo introductionDavid Groozman
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...Gregg Barrett
 
Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)Yukinori Suda
 

Andere mochten auch (6)

Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad rana
 
(Aaron myers) hdfs impala
(Aaron myers)   hdfs impala(Aaron myers)   hdfs impala
(Aaron myers) hdfs impala
 
ImpalaToGo introduction
ImpalaToGo introductionImpalaToGo introduction
ImpalaToGo introduction
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
 
Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)
 

Ähnlich wie Evaluation of cloudera impala 1.1

Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HivePerformance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HiveYukinori Suda
 
What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0ScyllaDB
 
OpenStack in 10 minutes with Devstack
OpenStack in 10 minutes with DevstackOpenStack in 10 minutes with Devstack
OpenStack in 10 minutes with DevstackSean Dague
 
Switch as a Server - PuppetConf 2014 - Leslie Carr
Switch as a Server - PuppetConf 2014 - Leslie CarrSwitch as a Server - PuppetConf 2014 - Leslie Carr
Switch as a Server - PuppetConf 2014 - Leslie CarrCumulus Networks
 
SCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scalingSCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scalingStanislav Osipov
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYWangda Tan
 
Production Grade Kubernetes Applications
Production Grade Kubernetes ApplicationsProduction Grade Kubernetes Applications
Production Grade Kubernetes ApplicationsNarayanan Krishnamurthy
 
The Switch as a Server - PuppetConf 2014
The Switch as a Server - PuppetConf 2014The Switch as a Server - PuppetConf 2014
The Switch as a Server - PuppetConf 2014Puppet
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Dave Holland
 
Pig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataPig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataDataWorks Summit
 
Katello on TorqueBox
Katello on TorqueBoxKatello on TorqueBox
Katello on TorqueBoxlzap
 
The Data Center and Hadoop
The Data Center and HadoopThe Data Center and Hadoop
The Data Center and HadoopDataWorks Summit
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...Chris Fregly
 
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...Chris Fregly
 
Container orchestration from theory to practice
Container orchestration from theory to practiceContainer orchestration from theory to practice
Container orchestration from theory to practiceDocker, Inc.
 
Running Stateful Apps on Kubernetes
Running Stateful Apps on KubernetesRunning Stateful Apps on Kubernetes
Running Stateful Apps on KubernetesYugabyte
 
ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021 - Apache NiFi Deep Dive 300ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021 - Apache NiFi Deep Dive 300Timothy Spann
 
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...Vietnam Open Infrastructure User Group
 
N(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdownN(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdownDataWorks Summit
 

Ähnlich wie Evaluation of cloudera impala 1.1 (20)

Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HivePerformance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
 
What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0
 
OpenStack in 10 minutes with Devstack
OpenStack in 10 minutes with DevstackOpenStack in 10 minutes with Devstack
OpenStack in 10 minutes with Devstack
 
Switch as a Server - PuppetConf 2014 - Leslie Carr
Switch as a Server - PuppetConf 2014 - Leslie CarrSwitch as a Server - PuppetConf 2014 - Leslie Carr
Switch as a Server - PuppetConf 2014 - Leslie Carr
 
SCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scalingSCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scaling
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
 
Production Grade Kubernetes Applications
Production Grade Kubernetes ApplicationsProduction Grade Kubernetes Applications
Production Grade Kubernetes Applications
 
The Switch as a Server - PuppetConf 2014
The Switch as a Server - PuppetConf 2014The Switch as a Server - PuppetConf 2014
The Switch as a Server - PuppetConf 2014
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
 
Pig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataPig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big Data
 
Katello on TorqueBox
Katello on TorqueBoxKatello on TorqueBox
Katello on TorqueBox
 
The Data Center and Hadoop
The Data Center and HadoopThe Data Center and Hadoop
The Data Center and Hadoop
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
 
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
 
Container orchestration from theory to practice
Container orchestration from theory to practiceContainer orchestration from theory to practice
Container orchestration from theory to practice
 
Running Stateful Apps on Kubernetes
Running Stateful Apps on KubernetesRunning Stateful Apps on Kubernetes
Running Stateful Apps on Kubernetes
 
ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021 - Apache NiFi Deep Dive 300ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021 - Apache NiFi Deep Dive 300
 
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
 
N(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdownN(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdown
 

Mehr von Yukinori Suda

Hadoop operation chaper 4
Hadoop operation chaper 4Hadoop operation chaper 4
Hadoop operation chaper 4Yukinori Suda
 
Cloudera Impalaをサービスに組み込むときに苦労した話
Cloudera Impalaをサービスに組み込むときに苦労した話Cloudera Impalaをサービスに組み込むときに苦労した話
Cloudera Impalaをサービスに組み込むときに苦労した話Yukinori Suda
 
Hadoopエコシステムを駆使したこれからのWebアクセス解析サービス
Hadoopエコシステムを駆使したこれからのWebアクセス解析サービスHadoopエコシステムを駆使したこれからのWebアクセス解析サービス
Hadoopエコシステムを駆使したこれからのWebアクセス解析サービスYukinori Suda
 
自宅でHive愛を育む方法 〜Raspberry Pi編〜
自宅でHive愛を育む方法 〜Raspberry Pi編〜自宅でHive愛を育む方法 〜Raspberry Pi編〜
自宅でHive愛を育む方法 〜Raspberry Pi編〜Yukinori Suda
 
⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)
⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)
⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)Yukinori Suda
 
HiveとImpalaのおいしいとこ取り
HiveとImpalaのおいしいとこ取りHiveとImpalaのおいしいとこ取り
HiveとImpalaのおいしいとこ取りYukinori Suda
 
Cloudera impalaの性能評価(Hiveとの比較)
Cloudera impalaの性能評価(Hiveとの比較)Cloudera impalaの性能評価(Hiveとの比較)
Cloudera impalaの性能評価(Hiveとの比較)Yukinori Suda
 

Mehr von Yukinori Suda (7)

Hadoop operation chaper 4
Hadoop operation chaper 4Hadoop operation chaper 4
Hadoop operation chaper 4
 
Cloudera Impalaをサービスに組み込むときに苦労した話
Cloudera Impalaをサービスに組み込むときに苦労した話Cloudera Impalaをサービスに組み込むときに苦労した話
Cloudera Impalaをサービスに組み込むときに苦労した話
 
Hadoopエコシステムを駆使したこれからのWebアクセス解析サービス
Hadoopエコシステムを駆使したこれからのWebアクセス解析サービスHadoopエコシステムを駆使したこれからのWebアクセス解析サービス
Hadoopエコシステムを駆使したこれからのWebアクセス解析サービス
 
自宅でHive愛を育む方法 〜Raspberry Pi編〜
自宅でHive愛を育む方法 〜Raspberry Pi編〜自宅でHive愛を育む方法 〜Raspberry Pi編〜
自宅でHive愛を育む方法 〜Raspberry Pi編〜
 
⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)
⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)
⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)
 
HiveとImpalaのおいしいとこ取り
HiveとImpalaのおいしいとこ取りHiveとImpalaのおいしいとこ取り
HiveとImpalaのおいしいとこ取り
 
Cloudera impalaの性能評価(Hiveとの比較)
Cloudera impalaの性能評価(Hiveとの比較)Cloudera impalaの性能評価(Hiveとの比較)
Cloudera impalaの性能評価(Hiveとの比較)
 

Kürzlich hochgeladen

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 

Kürzlich hochgeladen (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 

Evaluation of cloudera impala 1.1

  • 1. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / 1 1 Evaluation  of  Cloudera  impala  1.1 Aug  7,  2013 CELLANT  Corp.  R&D  Strategy  Division Yukinori  SUDA @sudabon
  • 2. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / v  Sentry  support: l  Fine-‐‑‒grained  authorization l  Role-‐‑‒based  authorization v  Support  for  views v  Performance  improvements l  Parquet  columnar  performance l  More  efficient  metadata  refresh  for  larger  installations v  Additional  SQL l  SQL-‐‑‒89  joins  (in  addition  to  existing  SQL-‐‑‒92) l  LOAD  function l  REFRESH  command  for  JDBC/ODBC v  Improved  Hbase  support: l  Binary  types l  Caching  configuration v  Fixed  many  bugs Cloudera  Impala  1.1  was  released  !! 2
  • 3. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / v Hive  ⇒  Impala l On  Impala  shell,  can  read  data  in  “VIEW”  that  was   created  via  Hive  command  ? v Impala  ⇒  Hive l On  Hive  shell,  can  read  data  in  “VIEW”  that  was   created  via  Impala  command  ? v Result Two  “VIEW”s  have  compatibility Check  compatibility  of  “VIEW” 3
  • 4. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / Check  performance  (Hive  on  Cluster1) 4 0 50 100 150 200 250 No  Comp. Gzip Snappy Gzip Snappy TextFileSequenceFileRCFile 222.039 244.67 239.182 228.801 230.327 Avg.  Job  Latency  [sec] This result will be invalid as performance evaluation cause some data may be read remotely. See the slide of “Check performance (Hive on Cluster2)”.
  • 5. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / Check  performance  (Impala  on  Cluster1) 5 0 50 100 150 200 250 No  Comp. Gzip Snappy Gzip Snappy Snappy Text File Sequence FileRCFile Parquet File 23.518 32.155 28.617 20.774 12.654 13.146 Avg.  Job  Latency  [sec] This result will be invalid as performance evaluation cause some data may be read remotely. See the slide of “Check performance (Impala on Cluster2)”.
  • 6. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / Check  performance  (Hive  on  Cluster2) 6 0 50 100 150 200 250 300 No  Comp. Gzip Snappy Gzip Snappy TextFileSequenceFileRCFile 272.176 249.531 245.009 230.034 216.802 Avg.  Job  Latency  [sec]
  • 7. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / Check  performance  (Impala  on  Cluster2) 7 0 50 100 150 200 250 300 No  Comp. Gzip Snappy Gzip Snappy Snappy Text File Sequence FileRCFile Parquet File 32.528 28.73 21.173 24.794 14.308 19.814 Avg.  Job  Latency  [sec]
  • 8. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / v IMPALA-‐‑‒357 l Insert  into  Parquet  exceed  mem-‐‑‒limit v Problem l Even  if  set  mem_̲limit  setting,  when  create  ParquetFile   table  with  partitions,  consumed  memory  isnʼ’t  limited.   l At  last,  Impalad  crashes  due  to  memory  shortage v Result CREATE  command  failed  due  to  memory  limit Check  fixed  bug 8
  • 9. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / v Thanks  to  dev.  team,  Impala  is  also  going   from  “Good  to  Great” v Both  “VIEW”  and  “Parquet”  are  already  ready v Performance v RCFile+Snappy  is  the  fastest  on  both  Cluster1  and   Cluster2 v If  use  larger  size  table,  Parquet+Snappy  may  be  the   fastest v Hope  for  future  extension l Support  Structure  Types l Support  UDF/UDTF,  etc Summary 9
  • 10. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / 10 Appendix.  Benchmark  Details
  • 11. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / Our  System  Environment(Cluster1) 11 v  Install  using  Cloudera  Manager  Free  Edition  4.6.0 Master Slave 14  Servers All  servers  are  connected  with  1Gbps  Ethernet  through  an  L2  switch Active NameNode DataNode TaskTracker Impalad Stand-‐‑‒by NameNode JobTracker statestored 3  Servers DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode DataNode DataNode DataNode
  • 12. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / Our  System  Environment(Cluster2) 12 v  Install  using  Cloudera  Manager  Free  Edition  4.6.0 Master Slave 10  Servers All  servers  are  connected  with  1Gbps  Ethernet  through  an  L2  switch Active NameNode DataNode TaskTracker Impalad Stand-‐‑‒by NameNode JobTracker statestored 3  Servers DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode DataNode DataNode DataNode Decommissioned
  • 13. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / v CPU l Intel  Core  2  Duo  2.13  GHz  with  Hyper  Threading v Memory l 8GB  :  Namenodes  only l 4GB  :  Others v Disk l 7,200  rpm  SATA  mechanical  Hard  Disk  Drive  *  1 v OS l Cent  OS  6.3 Our  Server  Specification 13
  • 14. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / v  Use  CDH4.3.0  +  Impala  1.1 v  Use  hivebench  in  open-‐‑‒sourced  benchmark  tool  “HiBench” l  https://github.com/hibench v  Modified  datasets  to  1/10  scale l  Default  configuration  generates  table  with  1  billion  rows v  Modified  query  sentence l  Deleted  “INSERT  INTO  TABLE  …”  to  evaluate  read-‐‑‒only  performance v  Combines  a  few  storage  format  with  a  few  compression  method l  TextFile,  SequenceFile,  RCFile,  ParquestFile l  No  compression,  Gzip,  Snappy v  Comparison  with  job  query  latency v  Average  job  latency  over  5  measurements v  Benchmark  on  both  Cluster1  and  Cluster2 Benchmark 14
  • 15. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / •  Uservisits  table –  100  million  rows –  16,895  MB  as  TextFile –  Table  Definitions •  sourceIP  string •  destURL  string •  visitDate  string •  adRevenue  double •  userAgent  string •  countryCode  string •  languageCode  string •  searchWord  string •  duration  int •  Rankings  table –  12  million  rows –  744  MB  as  TextFile –  Table  Definitions •  pageURL string •  pageRank int •  avgDuration int Modified  Datasets 15
  • 16. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / SELECT   sourceIP,   sum(adRevenue)  as  totalRevenue,   avg(pageRank)   FROM   rankings_̲t  R JOIN  [BROADCAST]  (   SELECT     sourceIP,     destURL,     adRevenue   FROM     uservisits_̲t  UV   WHERE     (datediff(UV.visitDate,  '1999-‐‑‒01-‐‑‒01')>=0     AND     datediff(UV.visitDate,  '2000-‐‑‒01-‐‑‒01')<=0)   )  NUV ON   (R.pageURL  =  NUV.destURL) group  by  sourceIP order  by  totalRevenue  DESC limit  1; Modified  Query 16
  • 17. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / 17 Thanks! I  want  to  use  TPC  in  next  evaluation…