SlideShare ist ein Scribd-Unternehmen logo
1 von 22
1© Cloudera, Inc. All rights reserved.
Introducing Kudu
Jeremy Beard | Senior Solutions Architect
December 2015
2© Cloudera, Inc. All rights reserved.
Presenter
• Jeremy Beard
• Senior Solutions Architect at Cloudera
• Three years in big data
• Six years in data warehousing
• jeremy@cloudera.com
3© Cloudera, Inc. All rights reserved.
Current storage landscape in Hadoop
HDFS excels at:
• Efficiently scanning large amounts
of data
• Accumulating data with high
throughput
HBase excels at:
• Efficiently finding and writing
individual rows
• Making data mutable
Gaps exist when these properties
are needed simultaneously
4© Cloudera, Inc. All rights reserved.
Changing hardware landscape
• Spinning disk -> solid state storage
• NAND flash: Up to 450k read 250k write iops, about 2GB/sec read and
1.5GB/sec write throughput, at a price of less than $3/GB and dropping
• 3D XPoint memory (1000x faster than NAND, cheaper than RAM)
• RAM is cheaper and more abundant:
• 64->128->256GB over last few years
• Takeaway 1: The next bottleneck is CPU, and current storage systems weren’t
designed with CPU efficiency in mind
• Takeaway 2: Column stores are feasible for random access
5© Cloudera, Inc. All rights reserved.
Kudu
Storage for Fast Analytics on Fast Data
• New updating column store for Hadoop
• Simplifies the architecture for building analytic
applications on changing data
• Designed for fast analytic performance
• Natively integrated with Hadoop
• Donated as incubating project at
Apache Software Foundation
• Beta now available
STRUCTURED
Sqoop
UNSTRUCTURED
Kafka, Flume
PROCESS, ANALYZE, SERVE
UNIFIED SERVICES
RESOURCE MANAGEMENT
YARN
SECURITY
Sentry, RecordService
FILESYSTEM
HDFS
RELATIONAL
Kudu
NoSQL
HBase
STORE
INTEGRATE
BATCH
Spark, Hive, Pig
MapReduce
STREAM
Spark
SQL
Impala
SEARCH
Solr
SDK
Kite
6© Cloudera, Inc. All rights reserved.
• High throughput for big scans (columnar
storage and replication)
Goal: Within 2x of Parquet
• Low-latency for short accesses (primary key
indexes and quorum design)
Goal: 1ms read/write on SSD
• Database-like semantics (initially single-row
ACID)
• Relational data model
• SQL query
• “NoSQL” style scan/insert/update (Java client)
Kudu design goals
7© Cloudera, Inc. All rights reserved.
Kudu basic design
• Apache-licensed open source software
• Structured data model
• Basic construct: tables
• Tables broken down into tablets (roughly equivalent to partitions)
• Architecture supports geographically disparate, active/active systems
• Not the initial design goal
8© Cloudera, Inc. All rights reserved.
What Kudu is not
• Not a SQL interface
• Just the storage layer
• “BYO SQL”
• Not a file system
• Data must have tabular structure
• Not an application that runs on HDFS
• An alternative, native Hadoop storage engine
• Not a replacement for HDFS or HBase
• Select the right storage for the right use case
• Cloudera will continue to support and invest in all three
9© Cloudera, Inc. All rights reserved.
Kudu data model
• Tables have a RDBMS-like schema
• Finite number of columns (unlike HBase/Cassandra)
• Types: BOOL, INT8/16/32/64, FLOAT, DOUBLE, STRING, BINARY, TIMESTAMP
• Some subset of columns makes up a primary key
• Fast random reads/writes by primary key
• No secondary indexes (yet)
• Columnar layout on disk
• Lazy materialization
• Encoding and compression options
9
10© Cloudera, Inc. All rights reserved.
Table partitioning
• Hash bucketing
• Distribute records by hash of partition column(s)
• N buckets leads to N tablets
• Range partitioning
• Distribute records by ranges of the partition column(s)
• N split keys leads to N tablets
• Can be a mix for different columns of the primary key
11© Cloudera, Inc. All rights reserved.
Consistency model
• Consistency and replication enforced by Raft consensus (similar to Paxos)
• Replication by operation not data
• Single-row transactions now
• Multi-row transactions later
• Geo-distributed replicas will be possible under strict time synchronization
• Techniques drawn from Google Spanner and others
12© Cloudera, Inc. All rights reserved.
Kudu interfaces
• NoSQL-style APIs
• Insert(), Update(), Delete(), Scan()
• Java and C++ now
• Python soon
• Integrations with MapReduce, Spark, and Impala
• No direct access to underlying Kudu tablet files
• Beta does not have authentication, authorization, encryption
13© Cloudera, Inc. All rights reserved.
Impala integration
• Opens up Kudu to JDBC/ODBC clients
• Intuitive way to get data into Kudu
• INSERT INTO kudu SELECT * FROM csv;
• Additional commands
• UPDATE
• DELETE
• Efficient INSERT VALUES
• Runs on the Kudu C++ client
14© Cloudera, Inc. All rights reserved.
Performance characteristics
• Very CPU efficient
• Written in modern C++, uses specialized CPU instructions, JIT compilation
• Latency mostly driven by storage hardware capabilities
• Expect sub-millisecond response on SSDs and upcoming technologies
• No garbage collection allows very large memory footprint with no pauses
• Bloom filters reduce the need for many disk accesses
15© Cloudera, Inc. All rights reserved.
Operating Kudu
• Easiest through Cloudera Manager integration
• Separate parcel for now
• Kudu is always compacting
• No minor vs. major compaction
• No compaction latency spikes
• Web UI is full of metrics and logs
16© Cloudera, Inc. All rights reserved.
Cluster layout
• One or multiple masters
• Only one in current beta
• Low CPU and memory impact
• One tablet server per worker node
• Can share disks with HDFS
• One SSD per worker node just for Kudu WAL can speed up writes
• No dependencies on other Hadoop ecosystem components
• But interfacing components like Impala or Spark do
17© Cloudera, Inc. All rights reserved.
Real-time analytics in Hadoop today
Merging in new data = storage complexity
Downsides:
● Multiple storage layers
● Latest data is hidden
● Files are messy
● Complex to do updates
without breaking running
queriesNew Partition
Most Recent Partition
Historic Data
HBase
Parquet
File
Have we
accumulated
enough data?
Reorganize
HBase file
into Parquet
• Wait for running operations to complete
• Define new Impala partition referencing
the newly written Parquet file
Incoming Data
(Messaging
System)
Reporting
Request
HDFS + Impala
18© Cloudera, Inc. All rights reserved.
Real-time analytics in Hadoop with Kudu
Improvements:
● One system to operate
● No schedules or
background processes
● Handle late arrivals or data
corrections with ease
● New data available
immediately for analytics
or operations
Historical and Real-time
Data
Incoming Data
(Messaging
System)
Reporting
Request
Kudu + Impala
19© Cloudera, Inc. All rights reserved.
Kudu for data warehousing
• Near real time data visibility
• BI tools can display events that happened seconds earlier
• Excellent for star schemas
• Fast scans of deep fact tables
• Efficient wide fact tables
• Simplified updates of slowly changing dimensions
20© Cloudera, Inc. All rights reserved.
Near real time data warehousing on Kudu
Files
RDBMS
Streams
K
A
F
K
A
K
U
D
U
IMPALA
HUE
BI tools
User
FLUME
SPARK
STREAMING
Simple
Complex
21© Cloudera, Inc. All rights reserved.
Resources
Join the community
http://getkudu.io
kudu-user@googlegroups.com
Download the beta
cloudera.com/downloads
Read the whitepaper
getkudu.io/kudu.pdf
22© Cloudera, Inc. All rights reserved.
Thank you
jeremy@cloudera.com

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
 
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in HadoopKudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast Data
 
Kudu demo
Kudu demoKudu demo
Kudu demo
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
 
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platformcloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
 
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast DataKudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/Kudu
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
 
Exponea - Kafka and Hadoop as components of architecture
Exponea  - Kafka and Hadoop as components of architectureExponea  - Kafka and Hadoop as components of architecture
Exponea - Kafka and Hadoop as components of architecture
 
Hive vs. Impala
Hive vs. ImpalaHive vs. Impala
Hive vs. Impala
 
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in HadoopOctober 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
 

Andere mochten auch

Walmart strategy team victoria
Walmart strategy   team victoriaWalmart strategy   team victoria
Walmart strategy team victoria
Clara Olivier
 
Entrepreneurial Marketing; Starbucks Business Model
Entrepreneurial Marketing; Starbucks Business ModelEntrepreneurial Marketing; Starbucks Business Model
Entrepreneurial Marketing; Starbucks Business Model
clsmith652
 
Resource based view of firm
Resource based view of firmResource based view of firm
Resource based view of firm
Maira Moazum
 

Andere mochten auch (8)

Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache Kudu
 
20 Business model examples #2
20 Business model examples #220 Business model examples #2
20 Business model examples #2
 
Walmart strategy team victoria
Walmart strategy   team victoriaWalmart strategy   team victoria
Walmart strategy team victoria
 
Applying the vrio framework (1)
Applying the vrio framework (1)Applying the vrio framework (1)
Applying the vrio framework (1)
 
30 Business model examples + 120 brainstorm cards #1
30 Business model examples + 120 brainstorm cards #130 Business model examples + 120 brainstorm cards #1
30 Business model examples + 120 brainstorm cards #1
 
Entrepreneurial Marketing; Starbucks Business Model
Entrepreneurial Marketing; Starbucks Business ModelEntrepreneurial Marketing; Starbucks Business Model
Entrepreneurial Marketing; Starbucks Business Model
 
Starbucks Coffee Strategy
Starbucks Coffee StrategyStarbucks Coffee Strategy
Starbucks Coffee Strategy
 
Resource based view of firm
Resource based view of firmResource based view of firm
Resource based view of firm
 

Ähnlich wie Introducing Kudu

Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 Improving Apache Spark by Taking Advantage of Disaggregated Architecture Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Databricks
 

Ähnlich wie Introducing Kudu (20)

Kudu Cloudera Meetup Paris
Kudu Cloudera Meetup ParisKudu Cloudera Meetup Paris
Kudu Cloudera Meetup Paris
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
 
Kudu austin oct 2015.pptx
Kudu austin oct 2015.pptxKudu austin oct 2015.pptx
Kudu austin oct 2015.pptx
 
SFHUG Kudu Talk
SFHUG Kudu TalkSFHUG Kudu Talk
SFHUG Kudu Talk
 
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopOptimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
 
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike Percy
 
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 Improving Apache Spark by Taking Advantage of Disaggregated Architecture Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 

Kürzlich hochgeladen

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 

Kürzlich hochgeladen (20)

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 

Introducing Kudu

  • 1. 1© Cloudera, Inc. All rights reserved. Introducing Kudu Jeremy Beard | Senior Solutions Architect December 2015
  • 2. 2© Cloudera, Inc. All rights reserved. Presenter • Jeremy Beard • Senior Solutions Architect at Cloudera • Three years in big data • Six years in data warehousing • jeremy@cloudera.com
  • 3. 3© Cloudera, Inc. All rights reserved. Current storage landscape in Hadoop HDFS excels at: • Efficiently scanning large amounts of data • Accumulating data with high throughput HBase excels at: • Efficiently finding and writing individual rows • Making data mutable Gaps exist when these properties are needed simultaneously
  • 4. 4© Cloudera, Inc. All rights reserved. Changing hardware landscape • Spinning disk -> solid state storage • NAND flash: Up to 450k read 250k write iops, about 2GB/sec read and 1.5GB/sec write throughput, at a price of less than $3/GB and dropping • 3D XPoint memory (1000x faster than NAND, cheaper than RAM) • RAM is cheaper and more abundant: • 64->128->256GB over last few years • Takeaway 1: The next bottleneck is CPU, and current storage systems weren’t designed with CPU efficiency in mind • Takeaway 2: Column stores are feasible for random access
  • 5. 5© Cloudera, Inc. All rights reserved. Kudu Storage for Fast Analytics on Fast Data • New updating column store for Hadoop • Simplifies the architecture for building analytic applications on changing data • Designed for fast analytic performance • Natively integrated with Hadoop • Donated as incubating project at Apache Software Foundation • Beta now available STRUCTURED Sqoop UNSTRUCTURED Kafka, Flume PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT YARN SECURITY Sentry, RecordService FILESYSTEM HDFS RELATIONAL Kudu NoSQL HBase STORE INTEGRATE BATCH Spark, Hive, Pig MapReduce STREAM Spark SQL Impala SEARCH Solr SDK Kite
  • 6. 6© Cloudera, Inc. All rights reserved. • High throughput for big scans (columnar storage and replication) Goal: Within 2x of Parquet • Low-latency for short accesses (primary key indexes and quorum design) Goal: 1ms read/write on SSD • Database-like semantics (initially single-row ACID) • Relational data model • SQL query • “NoSQL” style scan/insert/update (Java client) Kudu design goals
  • 7. 7© Cloudera, Inc. All rights reserved. Kudu basic design • Apache-licensed open source software • Structured data model • Basic construct: tables • Tables broken down into tablets (roughly equivalent to partitions) • Architecture supports geographically disparate, active/active systems • Not the initial design goal
  • 8. 8© Cloudera, Inc. All rights reserved. What Kudu is not • Not a SQL interface • Just the storage layer • “BYO SQL” • Not a file system • Data must have tabular structure • Not an application that runs on HDFS • An alternative, native Hadoop storage engine • Not a replacement for HDFS or HBase • Select the right storage for the right use case • Cloudera will continue to support and invest in all three
  • 9. 9© Cloudera, Inc. All rights reserved. Kudu data model • Tables have a RDBMS-like schema • Finite number of columns (unlike HBase/Cassandra) • Types: BOOL, INT8/16/32/64, FLOAT, DOUBLE, STRING, BINARY, TIMESTAMP • Some subset of columns makes up a primary key • Fast random reads/writes by primary key • No secondary indexes (yet) • Columnar layout on disk • Lazy materialization • Encoding and compression options 9
  • 10. 10© Cloudera, Inc. All rights reserved. Table partitioning • Hash bucketing • Distribute records by hash of partition column(s) • N buckets leads to N tablets • Range partitioning • Distribute records by ranges of the partition column(s) • N split keys leads to N tablets • Can be a mix for different columns of the primary key
  • 11. 11© Cloudera, Inc. All rights reserved. Consistency model • Consistency and replication enforced by Raft consensus (similar to Paxos) • Replication by operation not data • Single-row transactions now • Multi-row transactions later • Geo-distributed replicas will be possible under strict time synchronization • Techniques drawn from Google Spanner and others
  • 12. 12© Cloudera, Inc. All rights reserved. Kudu interfaces • NoSQL-style APIs • Insert(), Update(), Delete(), Scan() • Java and C++ now • Python soon • Integrations with MapReduce, Spark, and Impala • No direct access to underlying Kudu tablet files • Beta does not have authentication, authorization, encryption
  • 13. 13© Cloudera, Inc. All rights reserved. Impala integration • Opens up Kudu to JDBC/ODBC clients • Intuitive way to get data into Kudu • INSERT INTO kudu SELECT * FROM csv; • Additional commands • UPDATE • DELETE • Efficient INSERT VALUES • Runs on the Kudu C++ client
  • 14. 14© Cloudera, Inc. All rights reserved. Performance characteristics • Very CPU efficient • Written in modern C++, uses specialized CPU instructions, JIT compilation • Latency mostly driven by storage hardware capabilities • Expect sub-millisecond response on SSDs and upcoming technologies • No garbage collection allows very large memory footprint with no pauses • Bloom filters reduce the need for many disk accesses
  • 15. 15© Cloudera, Inc. All rights reserved. Operating Kudu • Easiest through Cloudera Manager integration • Separate parcel for now • Kudu is always compacting • No minor vs. major compaction • No compaction latency spikes • Web UI is full of metrics and logs
  • 16. 16© Cloudera, Inc. All rights reserved. Cluster layout • One or multiple masters • Only one in current beta • Low CPU and memory impact • One tablet server per worker node • Can share disks with HDFS • One SSD per worker node just for Kudu WAL can speed up writes • No dependencies on other Hadoop ecosystem components • But interfacing components like Impala or Spark do
  • 17. 17© Cloudera, Inc. All rights reserved. Real-time analytics in Hadoop today Merging in new data = storage complexity Downsides: ● Multiple storage layers ● Latest data is hidden ● Files are messy ● Complex to do updates without breaking running queriesNew Partition Most Recent Partition Historic Data HBase Parquet File Have we accumulated enough data? Reorganize HBase file into Parquet • Wait for running operations to complete • Define new Impala partition referencing the newly written Parquet file Incoming Data (Messaging System) Reporting Request HDFS + Impala
  • 18. 18© Cloudera, Inc. All rights reserved. Real-time analytics in Hadoop with Kudu Improvements: ● One system to operate ● No schedules or background processes ● Handle late arrivals or data corrections with ease ● New data available immediately for analytics or operations Historical and Real-time Data Incoming Data (Messaging System) Reporting Request Kudu + Impala
  • 19. 19© Cloudera, Inc. All rights reserved. Kudu for data warehousing • Near real time data visibility • BI tools can display events that happened seconds earlier • Excellent for star schemas • Fast scans of deep fact tables • Efficient wide fact tables • Simplified updates of slowly changing dimensions
  • 20. 20© Cloudera, Inc. All rights reserved. Near real time data warehousing on Kudu Files RDBMS Streams K A F K A K U D U IMPALA HUE BI tools User FLUME SPARK STREAMING Simple Complex
  • 21. 21© Cloudera, Inc. All rights reserved. Resources Join the community http://getkudu.io kudu-user@googlegroups.com Download the beta cloudera.com/downloads Read the whitepaper getkudu.io/kudu.pdf
  • 22. 22© Cloudera, Inc. All rights reserved. Thank you jeremy@cloudera.com