SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Impala 2.0 
The Leading Analytic Database for Hadoop 
Justin Erickson | Director, Product Management
© 2014 Cloudera, Inc. All rights reserved. 2 
Notification 
• The information in this document is proprietary to Cloudera. No part of this document may be 
reproduced, copied or transmitted in any form for any purpose without the express prior written 
permission of Cloudera. 
• This document is a preliminary version and not subject to your license agreement or any other 
agreement with Cloudera. This document contains only intended strategies, developments and 
functionalities of Cloudera products and is not intended to be binding upon Cloudera to any 
particular course of business, product strategy and/or development. Please note that this 
document is subject to change and may be changed by Cloudera at any time without notice. 
• Cloudera assumes no responsibility for errors or omissions in this document. Cloudera does not 
warrant the accuracy or completeness of the information, text, graphics, links or other items 
contained within this material. This document is provided without a warranty of any kind, either 
express or implied, including but not limited to the implied warranties of merchantability, fitness 
for a particular purpose or non-infringement. 
• Cloudera shall have no liability for damages of any kind including without limitation direct, 
special, indirect or consequential damages that may result from the use of these materials. The 
limitation shall not apply in cases of gross negligence.
© 2014 Cloudera, Inc. All rights reserved. 3 
Agenda 
• Impala Overview 
• Milestones and 2.0 Features 
• SQL-on-Hadoop Performance Update 
• What’s Next
The Right SQL Engine for the Use Case 
SQL 
©2014 Cloudera, Inc. All rights reserved. © 2014 Cloudera, Inc. All rights reserved. 4 
BI and SQL 
Analytics 
Batch 
Processing 
Spark 
Developers
Analytic Database for Hadoop Requires 
© 2014 Cloudera, Inc. All rights reserved. 5 
Multi-User Interactive 
Performance 
Interaction at the speed of thought 
Compatibility Familiar BI tools/SQL interfaces 
Usability Accessible to broad range of applications 
Flexibility Use SQL along with other Hadoop frameworks 
across all data 
Native in Hadoop Unified resource management, metadata, security, 
and management across frameworks
© 2014 Cloudera, Inc. All rights reserved. 6 
Impala’s Benefits 
Multi-User Interactive 
Performance ✔ • 10x vs alternatives with latest benchmarks 
• Performance advantage increases with multi-user 
Compatibility ✔ • Provides both ANSI SQL and vendor-specific extensions 
• Compatibility with the leading BI partners 
Usability ✔ • Cost-based optimization allows for more users and tools to run a 
broader range of queries 
Flexibility ✔ • Supports the common native Hadoop file formats 
• Parquet provides best-of-breed columnar performance across 
Hadoop frameworks 
Native in Hadoop ✔ • Unified with Hadoop’s resource management, metadata, security, 
and management
Engines 
Resource Management 
© 2014 Cloudera, Inc. All rights reserved. 7 
Most Common Scenarios 
Single Platform for Data Processing and Analytics 
• Interactive BI/analytics on “big data” 
• Data discovery 
• Exploratory analytics 
• Queryable operational data store 
Storage 
Integration 
Metadata 
Batch 
Processing 
MAPREDUCE, 
HIVE & PIG 
… 
Interactive 
SQL 
IMPALA 
Interactive 
Search 
Solr 
HDFS HBase 
TEXT, RCFILE, PARQUET, AVRO… RECORDS 
Management | Support 
Interactive 
Analytics 
SAS, R, …
© 2014 Cloudera, Inc. All rights reserved. 8 
Most Common Use Cases 
Operational Dashboards 
Example: Healthcare Insurance Company 
Goal: 
• Visualizations of current hospital spending 
and comparison to peers and historical data 
• Integrate 1000s of client hospital purchasing 
systems 
Key benefits of Impala: 
• Simplification via unification 
• Saved license $ over traditional DBMS 
• Enabled finer-grain details in source data 
vs. planned summarized extracts 
• 3 nodes of Impala outperformed a rack of 
the traditional RDBMS on their workload 
Data Discovery 
Example: Major Financial Institution 
Goal: 
• Fraud group looking at internal / external fraud 
• Captured internal systems and external 
application/website logs 
Key benefits of Impala: 
• Flexibility to have more data readily 
available without upfront modeling 
• Ability to use existing BI visualization tools 
• Better TCO
Previous Key Milestones and Features 
© 2014 Cloudera, Inc. All rights reserved. 9 
• Impala 1.0 
• ~SQL-92 (minus correlated sub-queries) 
• Native Hadoop file formats (Parquet, Avro, text, Sequence, …) 
• Enterprise-readiness (authentication, ODBC/JDBC drivers, etc) 
• Service-level resource isolation with other Hadoop frameworks 
• Impala 1.1 
• Fine-grained, role-based authorization via Apache Sentry 
• Auditing (Impala 1.1.1 and CM 4.7+) 
• Impala 1.2 
• Custom language extensibility (UDFs, UDAFs) 
• Cost-based join-order optimization 
• On-par performance compared to traditional MPP query engines while maintaining native Hadoop data flexibility 
• Impala 1.3 / CDH 5.0 (also has version for CDH 4.x) 
• Resource management 
• Impala 1.4 / CDH 5.1 (also has version for CDH 4.x) 
• More SQL compatibility (DECIMAL, vendor-specific extensions, ORDER BY without LIMIT, etc) 
• HDFS caching 
• Faster performance (selective queries and compute stats in particular
© 2014 Cloudera, Inc. All rights reserved. 10 
Impala 2.0 Key Updates 
• Same great multi-user interactive performance 
• Removed limits on SQL compatibility 
• SQL:2003 analytic/window functions 
• Subqueries in WHERE clause, EXISTS, and IN 
• Additional data types (CHAR and VARCHAR) 
• GRANT/REVOKE functions via Sentry 
• Additional vendor-specific SQL extensions 
• Removed limits on query size 
• Disk-based query processing
September SQL-on-Hadoop Benchmark: 
Impala, Presto, Stinger, Spark SQL 
© 2014 Cloudera, Inc. All rights reserved. 11 
• Benchmarks on: 
• Impala (1.4.0) 
• Presto (0.74) 
• Stinger (final) phase 3 => aka Hive 0.13.0 
• Spark SQL (1.1) 
• As always, our public benchmarks are: 
• Based on industry standards (TPC) 
• Repeatable (https://github.com/cloudera/impala-tpcds-kit) 
• Methodical testing with multiple runs on same hardware 
• Help competing software put its best foot forward 
• SQL-92 join style for engines without CBO 
• JVM tuning for Presto 
• Run on optimal file formats for each 
• Full details on our blog: http://blog.cloudera.com/blog/2014/09/new-benchmarks-for-sql-on-hadoop- 
impala-1-4-widens-the-performance-gap/
© 2014 Cloudera, Inc. All rights reserved. 12 
Impala’s Multi-User Over 10x Faster: 
Gap widening compared to May’s update
© 2014 Cloudera, Inc. All rights reserved. 13 
Faster = More Work in Less Time: 
Impala enables over 8.7x throughput
© 2014 Cloudera, Inc. All rights reserved. 14 
Performance Takeaways 
• Impala’s advantage expands from 5x single-user to >10x with just 10 user 
• Performance gap is widening since May 
• Single user Presto went from 5x before to 7.5x now 
• Single user Hive/Tez went from 5x before to 9x now 
• Mid-term trends will further favor Impala’s design approach 
• More data sets move to memory (HDFS caching, in-memory joins, Intel joint roadmap) 
• CPU efficiency will increase in importance 
• Native code enables easy optimizations for CPU instruction sets (e.g. floating point operations, math 
operations, encrypt/decrypt) 
• The Intel joint roadmap helps support these opportunities
© 2014 Cloudera, Inc. All rights reserved. 15 
IBM Research Validation 
• New VLDB academic paper comparing Impala and Hive-based (both MR and Tez) for SQL-on-Hadoop 
• http://www.vldb.org/pvldb/vol7/p1295-floratou.pdf 
• Impala’s significantly more efficient than Hive/Tez or Hive/MR 
• “Impala’s database-like architecture provides significant performance gains, compared to Hive’s MapReduce or Tez based 
runtime” 
• Correctly attributes Impala’s lead to it’s CPU efficiency, IO manager, and overall architecture that resembles a shared-nothing 
parallel database 
• Parquet more efficient than ORC 
• “The Parquet format skips data more efficiently than ORC which tends to pre-fetch unnecessary data especially when a table 
contains a large number of columns” 
• Note: Paper is single-user only. Multi-user would make the gap even wider 
• Our published results show ~5x single-user Impala lead goes to ~10x with just 10 users in our blog: 
http://blog.cloudera.com/blog/2014/05/new-sql-choices-in-the-apache-hadoop-ecosystem-why-impala-continues-to-lead/ 
• Same CPU efficiency, IO manager, and overall architectural reasons 
• Additional Notes: 
• Impala 2.0 will have disk-based joins and aggregations 
• Impala 1.4 is significantly faster on selective joins than Impala 1.2.2 used in the paper
Impala’s Analytic Database Leadership 
1 > 1 MM downloads since GA 
2 Majority adoption across Cloudera EDH customers 
3 Certification across key application partners: 
© 2014 Cloudera, Inc. All rights reserved. 16 
4 De facto standard with multi-vendor support: 
5 Full Apache Open Source License 
and others
© 2014 Cloudera, Inc. All rights reserved. 17 
What’s Next? 
• Usability 
• Nested data structures for greater data flexibility and expressiveness than 
traditional RDBMS systems 
• Ability to run on data natively stored in Amazon S3 
• Advanced security with lineage tracking and query redaction from logs 
• Built-in abilities for data maintenance and updates 
• Compatibility 
• Continued additions of commonly used vendor-specific built-ins 
• Continued joint-development with BI partners 
• More advanced SQL:2010 set features 
• Multi-User Performance 
• Focus on even better multi-user concurrency 
• Continued performance increases and leadership
Engines 
Resource Management 
© 2014 Cloudera, Inc. All rights reserved. 18 
It’s Not Just About SQL-on-Hadoop 
The Platform for Big Data 
• Single platform for processing & 
analytics 
• Scales to ‘000s of servers 
• No upfront schema 
• 10% the cost per TB 
• Open source platform 
Storage 
Integration 
Metadata 
Batch 
Processing 
MAPREDUCE, 
HIVE & PIG 
… 
Interactive 
SQL 
IMPALA 
Interactive 
Search 
Solr 
HDFS HBase 
TEXT, RCFILE, PARQUET, AVRO… RECORDS 
Management | Support 
Interactive 
Analytics 
SAS, R, …
© 2014 Cloudera, Inc. All rights reserved. 19 
Try Impala Out! 
• 100% Apache-licensed open source 
• Downloads on http://impala.io/: 
• Live online 
• VM 
• Installation 
• Questions/comments? 
• Community: http://impala.io/community 
• Email: impala-user@cloudera.org 
19
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoopmarkgrover
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Data Con LA
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)Todd Lipcon
 
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Cloudera, Inc.
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera, Inc.
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupMike Percy
 
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera, Inc.
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
Node labels in YARN
Node labels in YARNNode labels in YARN
Node labels in YARNWangda Tan
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaCloudera, Inc.
 
Impala presentation
Impala presentationImpala presentation
Impala presentationtrihug
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/KuduChris George
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoopnvvrajesh
 

Was ist angesagt? (20)

Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
 
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013
 
Cloudera impala
Cloudera impalaCloudera impala
Cloudera impala
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
 
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Node labels in YARN
Node labels in YARNNode labels in YARN
Node labels in YARN
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
SQL On Hadoop
SQL On HadoopSQL On Hadoop
SQL On Hadoop
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Impala presentation
Impala presentationImpala presentation
Impala presentation
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/Kudu
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 

Andere mochten auch

Cloudera Impala technical deep dive
Cloudera Impala technical deep diveCloudera Impala technical deep dive
Cloudera Impala technical deep divehuguk
 
How Impala Works
How Impala WorksHow Impala Works
How Impala WorksYue Chen
 
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLCloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLliuknag
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHortonworks
 
Query Compilation in Impala
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in ImpalaCloudera, Inc.
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala InternalsDavid Groozman
 
Impala 퍼포먼스 개선을 위한 Research
Impala 퍼포먼스 개선을 위한 ResearchImpala 퍼포먼스 개선을 위한 Research
Impala 퍼포먼스 개선을 위한 ResearchKichul Jung
 
Parquet and impala overview external
Parquet and impala overview externalParquet and impala overview external
Parquet and impala overview externalmattlieber
 
Impala SQL Support
Impala SQL SupportImpala SQL Support
Impala SQL SupportYue Chen
 
2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...
2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...
2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...1Strategy
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine kiran palaka
 
Protecting Your IP with Perforce Helix and Interset
Protecting Your IP with Perforce Helix and IntersetProtecting Your IP with Perforce Helix and Interset
Protecting Your IP with Perforce Helix and IntersetPerforce
 
The Evolution of Data Analysis with Hadoop - StampedeCon 2014
The Evolution of Data Analysis with Hadoop - StampedeCon 2014The Evolution of Data Analysis with Hadoop - StampedeCon 2014
The Evolution of Data Analysis with Hadoop - StampedeCon 2014StampedeCon
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentationCyanny LIANG
 
The moroccan ethnic groups of Morocco
The moroccan ethnic groups of MoroccoThe moroccan ethnic groups of Morocco
The moroccan ethnic groups of MoroccoMohsine Mahraj
 
Elephant Roads: a tour of Postgres forks
Elephant Roads: a tour of Postgres forksElephant Roads: a tour of Postgres forks
Elephant Roads: a tour of Postgres forksCommand Prompt., Inc
 

Andere mochten auch (20)

The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Cloudera Impala technical deep dive
Cloudera Impala technical deep diveCloudera Impala technical deep dive
Cloudera Impala technical deep dive
 
How Impala Works
How Impala WorksHow Impala Works
How Impala Works
 
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLCloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
 
Query Compilation in Impala
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in Impala
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
 
Social Media Dataset
Social Media DatasetSocial Media Dataset
Social Media Dataset
 
Impala 퍼포먼스 개선을 위한 Research
Impala 퍼포먼스 개선을 위한 ResearchImpala 퍼포먼스 개선을 위한 Research
Impala 퍼포먼스 개선을 위한 Research
 
Parquet and impala overview external
Parquet and impala overview externalParquet and impala overview external
Parquet and impala overview external
 
Impala SQL Support
Impala SQL SupportImpala SQL Support
Impala SQL Support
 
Incredible Impala
Incredible Impala Incredible Impala
Incredible Impala
 
2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...
2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...
2016 Utah Cloud Summit: Big Data Architectural Patterns and Best Practices on...
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine
 
Protecting Your IP with Perforce Helix and Interset
Protecting Your IP with Perforce Helix and IntersetProtecting Your IP with Perforce Helix and Interset
Protecting Your IP with Perforce Helix and Interset
 
The Evolution of Data Analysis with Hadoop - StampedeCon 2014
The Evolution of Data Analysis with Hadoop - StampedeCon 2014The Evolution of Data Analysis with Hadoop - StampedeCon 2014
The Evolution of Data Analysis with Hadoop - StampedeCon 2014
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentation
 
The moroccan ethnic groups of Morocco
The moroccan ethnic groups of MoroccoThe moroccan ethnic groups of Morocco
The moroccan ethnic groups of Morocco
 
Elephant Roads: a tour of Postgres forks
Elephant Roads: a tour of Postgres forksElephant Roads: a tour of Postgres forks
Elephant Roads: a tour of Postgres forks
 

Ähnlich wie Impala 2.0 - The Best Analytic Database for Hadoop

Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...Cloudera, Inc.
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Stefan Lipp
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeNicolas Morales
 
Impala use case @ edge
Impala use case @ edgeImpala use case @ edge
Impala use case @ edgeRam Kedem
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全Jianwei Li
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform WebinarCloudera, Inc.
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesCloudera, Inc.
 
Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Cloudera, Inc.
 
Data Science Languages and Industry Analytics
Data Science Languages and Industry AnalyticsData Science Languages and Industry Analytics
Data Science Languages and Industry AnalyticsWes McKinney
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera, Inc.
 
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18Cloudera, Inc.
 
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan SaldichSpark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan SaldichSpark Summit
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014Wilfried Hoge
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Cloudera, Inc.
 
Tame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data IntegrationTame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data IntegrationMichael Rainey
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureRiccardo Romani
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Turning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
Turning Relational Database Tables into Hadoop Datasources by Kuassi MensahTurning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
Turning Relational Database Tables into Hadoop Datasources by Kuassi MensahData Con LA
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 

Ähnlich wie Impala 2.0 - The Best Analytic Database for Hadoop (20)

Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
Impala use case @ edge
Impala use case @ edgeImpala use case @ edge
Impala use case @ edge
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Cloudera 5.3 Update
Cloudera 5.3 UpdateCloudera 5.3 Update
Cloudera 5.3 Update
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)
 
Data Science Languages and Industry Analytics
Data Science Languages and Industry AnalyticsData Science Languages and Industry Analytics
Data Science Languages and Industry Analytics
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
 
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
 
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan SaldichSpark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan Saldich
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
 
Tame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data IntegrationTame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data Integration
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and Architecture
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Turning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
Turning Relational Database Tables into Hadoop Datasources by Kuassi MensahTurning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
Turning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 

Mehr von Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Kürzlich hochgeladen

Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdfSteve Caron
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...kalichargn70th171
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfkalichargn70th171
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Mastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxMastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxAS Design & AST.
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxSasikiranMarri
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 

Kürzlich hochgeladen (20)

Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Mastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxMastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptx
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 

Impala 2.0 - The Best Analytic Database for Hadoop

  • 1. Impala 2.0 The Leading Analytic Database for Hadoop Justin Erickson | Director, Product Management
  • 2. © 2014 Cloudera, Inc. All rights reserved. 2 Notification • The information in this document is proprietary to Cloudera. No part of this document may be reproduced, copied or transmitted in any form for any purpose without the express prior written permission of Cloudera. • This document is a preliminary version and not subject to your license agreement or any other agreement with Cloudera. This document contains only intended strategies, developments and functionalities of Cloudera products and is not intended to be binding upon Cloudera to any particular course of business, product strategy and/or development. Please note that this document is subject to change and may be changed by Cloudera at any time without notice. • Cloudera assumes no responsibility for errors or omissions in this document. Cloudera does not warrant the accuracy or completeness of the information, text, graphics, links or other items contained within this material. This document is provided without a warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability, fitness for a particular purpose or non-infringement. • Cloudera shall have no liability for damages of any kind including without limitation direct, special, indirect or consequential damages that may result from the use of these materials. The limitation shall not apply in cases of gross negligence.
  • 3. © 2014 Cloudera, Inc. All rights reserved. 3 Agenda • Impala Overview • Milestones and 2.0 Features • SQL-on-Hadoop Performance Update • What’s Next
  • 4. The Right SQL Engine for the Use Case SQL ©2014 Cloudera, Inc. All rights reserved. © 2014 Cloudera, Inc. All rights reserved. 4 BI and SQL Analytics Batch Processing Spark Developers
  • 5. Analytic Database for Hadoop Requires © 2014 Cloudera, Inc. All rights reserved. 5 Multi-User Interactive Performance Interaction at the speed of thought Compatibility Familiar BI tools/SQL interfaces Usability Accessible to broad range of applications Flexibility Use SQL along with other Hadoop frameworks across all data Native in Hadoop Unified resource management, metadata, security, and management across frameworks
  • 6. © 2014 Cloudera, Inc. All rights reserved. 6 Impala’s Benefits Multi-User Interactive Performance ✔ • 10x vs alternatives with latest benchmarks • Performance advantage increases with multi-user Compatibility ✔ • Provides both ANSI SQL and vendor-specific extensions • Compatibility with the leading BI partners Usability ✔ • Cost-based optimization allows for more users and tools to run a broader range of queries Flexibility ✔ • Supports the common native Hadoop file formats • Parquet provides best-of-breed columnar performance across Hadoop frameworks Native in Hadoop ✔ • Unified with Hadoop’s resource management, metadata, security, and management
  • 7. Engines Resource Management © 2014 Cloudera, Inc. All rights reserved. 7 Most Common Scenarios Single Platform for Data Processing and Analytics • Interactive BI/analytics on “big data” • Data discovery • Exploratory analytics • Queryable operational data store Storage Integration Metadata Batch Processing MAPREDUCE, HIVE & PIG … Interactive SQL IMPALA Interactive Search Solr HDFS HBase TEXT, RCFILE, PARQUET, AVRO… RECORDS Management | Support Interactive Analytics SAS, R, …
  • 8. © 2014 Cloudera, Inc. All rights reserved. 8 Most Common Use Cases Operational Dashboards Example: Healthcare Insurance Company Goal: • Visualizations of current hospital spending and comparison to peers and historical data • Integrate 1000s of client hospital purchasing systems Key benefits of Impala: • Simplification via unification • Saved license $ over traditional DBMS • Enabled finer-grain details in source data vs. planned summarized extracts • 3 nodes of Impala outperformed a rack of the traditional RDBMS on their workload Data Discovery Example: Major Financial Institution Goal: • Fraud group looking at internal / external fraud • Captured internal systems and external application/website logs Key benefits of Impala: • Flexibility to have more data readily available without upfront modeling • Ability to use existing BI visualization tools • Better TCO
  • 9. Previous Key Milestones and Features © 2014 Cloudera, Inc. All rights reserved. 9 • Impala 1.0 • ~SQL-92 (minus correlated sub-queries) • Native Hadoop file formats (Parquet, Avro, text, Sequence, …) • Enterprise-readiness (authentication, ODBC/JDBC drivers, etc) • Service-level resource isolation with other Hadoop frameworks • Impala 1.1 • Fine-grained, role-based authorization via Apache Sentry • Auditing (Impala 1.1.1 and CM 4.7+) • Impala 1.2 • Custom language extensibility (UDFs, UDAFs) • Cost-based join-order optimization • On-par performance compared to traditional MPP query engines while maintaining native Hadoop data flexibility • Impala 1.3 / CDH 5.0 (also has version for CDH 4.x) • Resource management • Impala 1.4 / CDH 5.1 (also has version for CDH 4.x) • More SQL compatibility (DECIMAL, vendor-specific extensions, ORDER BY without LIMIT, etc) • HDFS caching • Faster performance (selective queries and compute stats in particular
  • 10. © 2014 Cloudera, Inc. All rights reserved. 10 Impala 2.0 Key Updates • Same great multi-user interactive performance • Removed limits on SQL compatibility • SQL:2003 analytic/window functions • Subqueries in WHERE clause, EXISTS, and IN • Additional data types (CHAR and VARCHAR) • GRANT/REVOKE functions via Sentry • Additional vendor-specific SQL extensions • Removed limits on query size • Disk-based query processing
  • 11. September SQL-on-Hadoop Benchmark: Impala, Presto, Stinger, Spark SQL © 2014 Cloudera, Inc. All rights reserved. 11 • Benchmarks on: • Impala (1.4.0) • Presto (0.74) • Stinger (final) phase 3 => aka Hive 0.13.0 • Spark SQL (1.1) • As always, our public benchmarks are: • Based on industry standards (TPC) • Repeatable (https://github.com/cloudera/impala-tpcds-kit) • Methodical testing with multiple runs on same hardware • Help competing software put its best foot forward • SQL-92 join style for engines without CBO • JVM tuning for Presto • Run on optimal file formats for each • Full details on our blog: http://blog.cloudera.com/blog/2014/09/new-benchmarks-for-sql-on-hadoop- impala-1-4-widens-the-performance-gap/
  • 12. © 2014 Cloudera, Inc. All rights reserved. 12 Impala’s Multi-User Over 10x Faster: Gap widening compared to May’s update
  • 13. © 2014 Cloudera, Inc. All rights reserved. 13 Faster = More Work in Less Time: Impala enables over 8.7x throughput
  • 14. © 2014 Cloudera, Inc. All rights reserved. 14 Performance Takeaways • Impala’s advantage expands from 5x single-user to >10x with just 10 user • Performance gap is widening since May • Single user Presto went from 5x before to 7.5x now • Single user Hive/Tez went from 5x before to 9x now • Mid-term trends will further favor Impala’s design approach • More data sets move to memory (HDFS caching, in-memory joins, Intel joint roadmap) • CPU efficiency will increase in importance • Native code enables easy optimizations for CPU instruction sets (e.g. floating point operations, math operations, encrypt/decrypt) • The Intel joint roadmap helps support these opportunities
  • 15. © 2014 Cloudera, Inc. All rights reserved. 15 IBM Research Validation • New VLDB academic paper comparing Impala and Hive-based (both MR and Tez) for SQL-on-Hadoop • http://www.vldb.org/pvldb/vol7/p1295-floratou.pdf • Impala’s significantly more efficient than Hive/Tez or Hive/MR • “Impala’s database-like architecture provides significant performance gains, compared to Hive’s MapReduce or Tez based runtime” • Correctly attributes Impala’s lead to it’s CPU efficiency, IO manager, and overall architecture that resembles a shared-nothing parallel database • Parquet more efficient than ORC • “The Parquet format skips data more efficiently than ORC which tends to pre-fetch unnecessary data especially when a table contains a large number of columns” • Note: Paper is single-user only. Multi-user would make the gap even wider • Our published results show ~5x single-user Impala lead goes to ~10x with just 10 users in our blog: http://blog.cloudera.com/blog/2014/05/new-sql-choices-in-the-apache-hadoop-ecosystem-why-impala-continues-to-lead/ • Same CPU efficiency, IO manager, and overall architectural reasons • Additional Notes: • Impala 2.0 will have disk-based joins and aggregations • Impala 1.4 is significantly faster on selective joins than Impala 1.2.2 used in the paper
  • 16. Impala’s Analytic Database Leadership 1 > 1 MM downloads since GA 2 Majority adoption across Cloudera EDH customers 3 Certification across key application partners: © 2014 Cloudera, Inc. All rights reserved. 16 4 De facto standard with multi-vendor support: 5 Full Apache Open Source License and others
  • 17. © 2014 Cloudera, Inc. All rights reserved. 17 What’s Next? • Usability • Nested data structures for greater data flexibility and expressiveness than traditional RDBMS systems • Ability to run on data natively stored in Amazon S3 • Advanced security with lineage tracking and query redaction from logs • Built-in abilities for data maintenance and updates • Compatibility • Continued additions of commonly used vendor-specific built-ins • Continued joint-development with BI partners • More advanced SQL:2010 set features • Multi-User Performance • Focus on even better multi-user concurrency • Continued performance increases and leadership
  • 18. Engines Resource Management © 2014 Cloudera, Inc. All rights reserved. 18 It’s Not Just About SQL-on-Hadoop The Platform for Big Data • Single platform for processing & analytics • Scales to ‘000s of servers • No upfront schema • 10% the cost per TB • Open source platform Storage Integration Metadata Batch Processing MAPREDUCE, HIVE & PIG … Interactive SQL IMPALA Interactive Search Solr HDFS HBase TEXT, RCFILE, PARQUET, AVRO… RECORDS Management | Support Interactive Analytics SAS, R, …
  • 19. © 2014 Cloudera, Inc. All rights reserved. 19 Try Impala Out! • 100% Apache-licensed open source • Downloads on http://impala.io/: • Live online • VM • Installation • Questions/comments? • Community: http://impala.io/community • Email: impala-user@cloudera.org 19

Hinweis der Redaktion

  1. Our goal is to provide the best tools for a particular job * Hive is the best for batch, and of course we want to make that experience better. * Impala is purpose built for interactive BI on Hadoop. Latency, concurrency, vendor ecosystem, and partner certification. * Spark SQL is built for supporting an advanced analyst’s direct interactions with data, where you’re mixing Spark and SQL
  2. Multi-user performance – enables BI users and analysts to interact with Hadoop data at the speed of thought Compatibility - provides familiar BI tools/applications and SQL interfaces Usability - Accessible to the broad range of business users, analysts, and partner applications Flexibility – Enables users access to more data and the ability to use SQL along with the rest of the Hadoop frameworks across all their data Native in Hadoop - Easier and integrated administration with unified resource management, metadata, security, and management across frameworks
  3. Multi-user interactive performance 10x vs alternatives with latest benchmarks Broad SQL compatibility Provides both ANSI SQL and vendor-specific extensions Compatibility with the leading BI partners Usability Cost-based optimization allows for more users and tools to run a broader range of queries Flexibility Supports the common native Hadoop file formats Parquet provides best-of-breed columnar performance across Hadoop frameworks Native in Hadoop Unified with Hadoop’s resource management, metadata, security, and management