A look at why SQL access in Hadoop is critical and the benefits of a native Hadoop analytic database, what’s new with Impala 2.0 and some of the recent performance benchmarks, some common Impala use cases and production customer stories, and insight into what’s next for Impala.
Our goal is to provide the best tools for a particular job
* Hive is the best for batch, and of course we want to make that experience better.
* Impala is purpose built for interactive BI on Hadoop. Latency, concurrency, vendor ecosystem, and partner certification.
* Spark SQL is built for supporting an advanced analyst’s direct interactions with data, where you’re mixing Spark and SQL
Multi-user performance – enables BI users and analysts to interact with Hadoop data at the speed of thought
Compatibility - provides familiar BI tools/applications and SQL interfaces
Usability - Accessible to the broad range of business users, analysts, and partner applications
Flexibility – Enables users access to more data and the ability to use SQL along with the rest of the Hadoop frameworks across all their data
Native in Hadoop - Easier and integrated administration with unified resource management, metadata, security, and management across frameworks
Multi-user interactive performance
10x vs alternatives with latest benchmarks
Broad SQL compatibility
Provides both ANSI SQL and vendor-specific extensions
Compatibility with the leading BI partners
Usability
Cost-based optimization allows for more users and tools to run a broader range of queries
Flexibility
Supports the common native Hadoop file formats
Parquet provides best-of-breed columnar performance across Hadoop frameworks
Native in Hadoop
Unified with Hadoop’s resource management, metadata, security, and management