Weitere ähnliche Inhalte Ähnlich wie Cloudera Showcase: SQL-on-Hadoop (20) Mehr von Cloudera, Inc. (20) Kürzlich hochgeladen (20) Cloudera Showcase: SQL-on-Hadoop2. 2© Cloudera, Inc. All rights reserved.
§ The information in this document is proprietary to Cloudera. No part of this document may be reproduced, copied or transmitted in any form for
any purpose without the express prior written permission of Cloudera.
§ This document is a preliminary version and not subject to your license agreement or any other agreement with Cloudera. This document contains
only intended strategies, developments and functionalities of Cloudera products and is not intended to be binding upon Cloudera to any particular
course of business, product strategy and/or development. Please note that this document is subject to change and may be changed by Cloudera at
any time without notice.
§ Cloudera assumes no responsibility for errors or omissions in this document. Cloudera does not warrant the accuracy or completeness of the
information, text, graphics, links or other items contained within this material. This document is provided without a warranty of any kind, either
express or implied, including but not limited to the implied warranties of merchantability, fitness for a particular purpose or non-infringement.
§ Cloudera shall have no liability for damages of any kind including without limitation direct, special, indirect or consequentialdamages that may
result from the use of these materials. The limitation shall not apply in cases of gross negligence.
4. 4© Cloudera, Inc. All rights reserved.
One Platform, Many Workloads
Batch, Interactive,
and Real-Time.
Leading performance and
usability in one platform.
• End-to-end analytic workflows
• Access more data
• Work with data in new ways
• Enable new users
Security and Administration
Process
Ingest
Sqoop, Flume,
Kafka
Transform
MapReduce,
Hive, Pig, Spark
Discover
Analytic Database
Impala
Search
Solr
Model
Machine Learning
SAS, R, Spark,
Mahout
Serve
NoSQL Database
HBase
Streaming
Spark Streaming
Unlimited Storage HDFS, HBase
YARN, Cloudera Manager,
Cloudera Navigator
12. 12© Cloudera, Inc. All rights reserved.
Performance Benchmark Takeaways
• Impala unlocks BI usage directly on Hadoop
• Meets BI low-latency and multi-user requirements
• Advantage expands from 5x for single-user to >10x with just 10 users
• Hive is designed (and still great) for batch processing
• Most Impala customers use Hive for data preparation
• Hive is the most commonly used ETL framework
• Spark SQL enables easier Spark application development
• Enables mixed procedural Spark (Java/Scala) and SQL job development
• Mid-term trends will further favor Impala’s design approach for latency and concurrency
• More data sets move to memory (HDFS caching, in-memory joins, Intel joint roadmap)
• CPU efficiency will increase in importance
• Native code enables easy optimizations for CPU instruction sets
• Intel joint roadmap support these opportunities
14. 14© Cloudera, Inc. All rights reserved.
Major new SQL features in Cloudera 5.5
• Impala
• Reliability (particularly with concurrency and scale)
• Nested types
• Column-level security
• Additional functions
• Hive
• Quality
• S3 support
• CM monitoring
• Navigator lineage
• SparkSQL (with DataFrames)
• Now supported in CDH 5.5 (recommend HiveContext)
• Thriftserver and JDBC not ready for support
• Navigator Optimizer (beta)
• Helps assess and offload workloads onto Hadoop