Apache Kudu is an open source data storage engine that makes fast analytics on fast and changing data easy. In this presentation, Grant Henke from Cloudera will provide an overview of what Kudu is, how it works, and how it makes building an active data warehouse for real time analytics easy. Drawing on experiences from some of our largest deployments, this talk will also include an overview of common Kudu use cases and patterns. Additionally, some of the newest Kudu features and what is coming next will be covered.
15. Open source & open data standards are especially important
when storing your data.
Apache Kudu is a top-level Apache Software Foundation project released under the
Apache 2 license and values community participation.
We believe that Kudu's long-term success depends on building a vibrant community
of developers and users from diverse organizations and backgrounds.
An Open Source Data Storage Engine
That Makes
Fast Analytics on Fast And Changing Data Easy
16. Allows users to focus on the use case and not the storage details.
Manages the storage of your data including schema, layout, encoding,
compression and compaction to allow for efficient disk usage and minimize IO.
Separates storage management from computation. Though Kudu utilizes
pushdown projections, predicates/filters, and more to optimize data access, it
leverages tools like Impala, Hive, and Spark for complex computation.
An Open Source Data Storage Engine
That Makes
Fast Analytics on Fast And Changing Data Easy
17. Provides a combination of fast ingest and efficient columnar scans to enable
multiple real-time analytic workloads across a single storage layer.
Designed to strike a balance between full scan performance and low-latency random
access allowing it to address a wide array of analytical use cases.
Scale up and out to utilize all of the resources given to it across the cluster and on
each node.
Designed for next-generation hardware.
An Open Source Data Storage Engine
That Makes
Fast Analytics on Fast And Changing Data Easy
19. Data is immediately available to be analyzed as soon as it lands in Kudu.
Supports updates and deletes in order to address a wide variety of use cases without
exotic workarounds.
Supports sustained high throughput ingest to capture all of your data,
streaming or batch.
An Open Source Data Storage Engine
That Makes
Fast Analytics on Fast And Changing Data Easy
20. Kudu was built to be simple to deploy, monitor, operate and use.
Familiar concepts such as tables, partitions, and insert/update/delete operations to
minimize the expertise required to use it effectively.
Simple data model and mutability makes it a breeze to port legacy analytical
applications or build new ones.
Integrates with the big data ecosystem, and integrating it with other data processing
frameworks is simple.
An Open Source Data Storage Engine
That Makes
Fast Analytics on Fast And Changing Data Easy
37. 37
Active Data Warehouse in Cloudera Ecosystem
• On CDH 6.3 with Sentry
• On CDP Data Center 7.0
• On CDP Public Cloud
• Available in the Cloudera Data Hub
• In the future Kudu will be available Cloudera Data Warehouse too
How can you deploy an Active Data Warehouse today?