Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

快速数据快速分析引擎-Kudu

349 Aufrufe

Veröffentlicht am

快速数据快速分析引擎-Kudu

Veröffentlicht in: Technologie
  • Loggen Sie sich ein, um Kommentare anzuzeigen.

  • Gehören Sie zu den Ersten, denen das gefällt!

快速数据快速分析引擎-Kudu

  1. 1. 1© Cloudera, Inc. All rights reserved. Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data Jianwei Li Jarred@cloudera.com
  2. 2. 2© Cloudera, Inc. All rights reserved. Agenda • What is Kudu? (Motivations & Design Goals) • Use Cases • Overview of Design & Internals • Simple Benchmark • Getting Started
  3. 3. 3© Cloudera, Inc. All rights reserved. Connected Cars • Analytical • R&D wants to know optimal charge performance. • How has software update affected vehicle performance over time? • Real-time • Consumer wants to know if teen is driving car right now. How fast are they accelerating? What is their speed? Where are they? • Vehicle Service – e.g. grab an up- to-date “diagnosis bundle” before or during service.
  4. 4. 4© Cloudera, Inc. All rights reserved. Connected Cars • Analytical • R&D wants to know optimal charge performance. • How has software update affected vehicle performance over time? • Real-time • Consumer wants to know if teen is driving car right now. How fast are they accelerating? What is their speed? Where are they? • Vehicle Service – e.g. grab an up- to-date “diagnosis bundle” before or during service. fast, efficient scans = HDFS fast inserts/lookups = HBase
  5. 5. 5© Cloudera, Inc. All rights reserved. How would we build the Real-time System Today? Hadoop 12:01:05 AM: Where is my car? How fast? 12:01:05 AM: Here I am! 12:01:05 AM: Your car is located at…. • Cars are constantly producing data, being ingested in real-time or micro-batches • A consumer wants to know what is the location and speed of their car in real-time. • Requires: – potentially millions of low latency writes / second – low latency, random access Consumer
  6. 6. 6© Cloudera, Inc. All rights reserved. How would we build the Real-time System Today? • Hbase Provides: • Fast, Random Read & Write Access • “Mini-scans” • Scale-out architecture capable of serving Big Data to many users Connected Cars Kafka / Pub-sub Events HBase Consumer
  7. 7. 7© Cloudera, Inc. All rights reserved. How would we build the Analytics System Today? Hadoop For the last 3 days, give me engine temp vs. radiator data. 12:01:05 PM: Here I am! Here you go…. • Cars are constantly producing data, being ingested in real-time or micro-batches • GM just made a software update to temp control module. How’s it working? • Requires fast, efficient scan performance Analyst
  8. 8. 8© Cloudera, Inc. All rights reserved. How would we build the Analytics System Today? • HDFS Excels at: • Full table scans • Ad-hoc analytics Connected Cars Kafka / Pub-sub Events Today’s Partition Yesterday’s Partition Historic Data Analyst Hbase/Flu me 1. Have we accumulated enough data? 2. Flush into HDFS
  9. 9. 9© Cloudera, Inc. All rights reserved. Hybrid big data analytics pipeline Before Kudu Connected Cars Kafka / Pub-sub Events HBase Consumer HDFS (Storage) Random Reads Analyst Analytics Snapshot & Convert to Parquet Compact late arriving data
  10. 10. 10© Cloudera, Inc. All rights reserved. Hybrid big data analytics pipeline After Kudu Connected Cars Kafka / Pub-sub Events Kudu Consume r Random Reads Analyst Analytics Kudu supports simultaneous combination of sequential and random reads and writes
  11. 11. 11© Cloudera, Inc. All rights reserved. What is Kudu?
  12. 12. 12© Cloudera, Inc. All rights reserved. Current Storage Landscape in Hadoop HDFS excels at: • Efficiently scanning large amounts of data • Accumulating data with high throughput HBase excels at: • Efficiently finding and writing individual rows • Making data mutable Gaps exist when these properties are needed simultaneously The Hadoop Storage “Gap”
  13. 13. 13© Cloudera, Inc. All rights reserved. • High throughput for big scans • Low-latency for random accesses • High CPU performance to better take advantage of RAM and Flash • Single-column scan rate 10-100x faster than HBase • High IO efficiency • True column store with type-specific encodings • Efficient analytics when only certain columns are accessed • Expressive and evolvable data model • Architecture that supports multi-data center operation Kudu Design Goals
  14. 14. 14© Cloudera, Inc. All rights reserved. Changing Hardware Landscape • Spinning disk -> solid state storage • NAND flash: Up to 450k read 250k write iops, about 2GB/sec read and 1.5GB/sec write throughput, at a price of less than $3/GB and dropping • 3D XPoint memory (1000x faster than NAND, cheaper than RAM) • RAM is cheaper and more abundant • 64->128->256GB over last few years Takeaway 1: The next bottleneck is CPU, and current storage systems weren’t designed with CPU efficiency in mind. .
  15. 15. 15© Cloudera, Inc. All rights reserved. The Kudu Elevator Pitch Storage for Fast Analytics on Fast Data • New updating column store for Hadoop • Simplifies the architecture for building analytic applications on changing data • Designed for fast analytic performance • Natively integrated with Hadoop • Apache-licensed open source (with pending ASF Incubator proposal) • Beta now available FILESYSTEM HDFS NoSQL HBASE INGEST – SQOOP, FLUME, KAFKA DATA INTEGRATION & STORAGE SECURITY – SENTRY RESOURCE MANAGEMENT – YARN UNIFIED DATA SERVICES BATCH STREAM SQL SEARCH MODEL ONLINE DATA ENGINEERING DATA DISCOVERY & ANALYTICS DATA APPS SPARK, HIVE, PIG SPARK IMPALA SOLR SPARK HBASE RELATIONAL KUDU
  16. 16. 16© Cloudera, Inc. All rights reserved. Using Kudu • Table has a SQL-like schema • Finite number of columns (unlike HBase/Cassandra) • Types: BOOL, INT8, INT16, INT32, INT64, FLOAT, DOUBLE, STRING, BINARY, TIMESTAMP • Some subset of columns makes up a possibly-composite primary key • Fast ALTER TABLE • Java and C++ “NoSQL” style APIs • Insert(), Update(), Delete(), Scan() • Integrations with MapReduce, Spark, and Impala • more to come! 16
  17. 17. 17© Cloudera, Inc. All rights reserved. What Kudu is *NOT* • Not a SQL interface itself • It’s just the storage layer – “Bring Your Own SQL” (eg Impala or Spark) • Not an application that runs on HDFS • It’s an alternative, native Hadoop storage engine • Colocation with HDFS is expected • Not a replacement for HDFS or HBase • Select the right storage for the right use case • Cloudera will support and invest in all three
  18. 18. 18© Cloudera, Inc. All rights reserved. Use Cases for Kudu
  19. 19. 19© Cloudera, Inc. All rights reserved. Kudu Use Cases Kudu is best for use cases requiring a simultaneous combination of sequential and random reads and writes, e.g.: ● Time Series ○ Examples: Stream market data; fraud detection & prevention; risk monitoring ○ Workload: Insert, updates, scans, lookups ● Machine Data Analytics ○ Examples: Network threat detection ○ Workload: Inserts, scans, lookups ● Online Reporting ○ Examples: ODS ○ Workload: Inserts, updates, scans, lookups
  20. 20. 20© Cloudera, Inc. All rights reserved. Real-Time Analytics in Hadoop Today Fraud Detection in the Real World = Storage Complexity Considerations: ● How do I handle failure during this process? ● How often do I reorganize data streaming in into a format appropriate for reporting? ● When reporting, how do I see data that has not yet been reorganized? ● How do I ensure that important jobs aren’t interrupted by maintenance? New Partition Most Recent Partition Historic Data HBase Parquet File Have we accumulated enough data? Reorganize HBase file into Parquet • Wait for running operations to complete • Define new Impala partition referencing the newly written Parquet file Incoming Data (Messaging System) Reporting Request Impala on HDFS
  21. 21. 21© Cloudera, Inc. All rights reserved. Real-Time Analytics in Hadoop with Kudu Simpler Architecture, Superior Performance over Hybrid Approaches Impala on Kudu Incoming Data (Messaging System) Reporting Request
  22. 22. 22© Cloudera, Inc. All rights reserved. Design & Internals
  23. 23. 23© Cloudera, Inc. All rights reserved. Kudu Basic Design • Typed storage • Basic Construct: Tables • Tables broken down into Tablets (roughly equivalent to partitions) • Maintains consistency through a Paxos-like quorum model (Raft) • Architecture supports geographically disparate, active/active systems
  24. 24. 24© Cloudera, Inc. All rights reserved. Tables and Tablets • Table is horizontally partitioned into tablets • Range or hash partitioning • PRIMARY KEY (host, metric, timestamp) DISTRIBUTE BY HASH(timestamp) INTO 100 BUCKETS • Each tablet has N replicas (3 or 5), with Raft consensus • Allow read from any replica, plus leader-driven writes with low MTTR • Tablet servers host tablets • Store data on local disks (no HDFS) 24
  25. 25. 25© Cloudera, Inc. All rights reserved. Client Meta Cache
  26. 26. 26© Cloudera, Inc. All rights reserved. Client Hey Master! Where is the row for ‘todd@cloudera.com’ in table “T”?Meta Cache
  27. 27. 27© Cloudera, Inc. All rights reserved. Client Hey Master! Where is the row for ‘todd@cloudera.com’ in table “T”? It’s part of tablet 2, which is on servers {Z,Y,X}. BTW, here’s info on other tablets you might care about: T1, T2, T3, … Meta Cache
  28. 28. 28© Cloudera, Inc. All rights reserved. Client Hey Master! Where is the row for ‘todd@cloudera.com’ in table “T”? It’s part of tablet 2, which is on servers {Z,Y,X}. BTW, here’s info on other tablets you might care about: T1, T2, T3, … Meta Cache T1: … T2: … T3: …
  29. 29. 29© Cloudera, Inc. All rights reserved. Client Hey Master! Where is the row for ‘todd@cloudera.com’ in table “T”? It’s part of tablet 2, which is on servers {Z,Y,X}. BTW, here’s info on other tablets you might care about: T1, T2, T3, … UPDATE todd@cloudera.com SET … Meta Cache T1: … T2: … T3: …
  30. 30. 30© Cloudera, Inc. All rights reserved. LSM vs Kudu • LSM – Log Structured Merge (Cassandra, HBase, etc) • Inserts and updates all go to an in-memory map (MemStore) and later flush to on-disk files (HFile/SSTable) • Reads perform an on-the-fly merge of all on-disk HFiles • Kudu • Shares some traits (memstores, compactions) • More complex. • Slower writes in exchange for faster reads (especially scans) 30
  31. 31. 31© Cloudera, Inc. All rights reserved. Kudu storage – Inserts and Flushes MemRowSet INSERT(“todd”, “$1000”,”engineer”) name pay role DiskRowSet 1 flush
  32. 32. 32© Cloudera, Inc. All rights reserved. Kudu storage – Inserts and Flushes MemRowSet name pay role DiskRowSet 1 name pay role DiskRowSet 2 INSERT(“doug”, “$1B”, “Hadoop man”) flush
  33. 33. 33© Cloudera, Inc. All rights reserved. Kudu storage - Updates MemRowSet name pay role DiskRowSet 1 name pay role DiskRowSet 2 Delta MS Delta MS Each DiskRowSet has its own DeltaMemStore to accumulate updates base data base data
  34. 34. 34© Cloudera, Inc. All rights reserved. Kudu storage - Updates MemRowSet name pay role DiskRowSet 1 name pay role DiskRowSet 2 Delta MS Delta MS UPDATE set pay=“$1M” WHERE name=“todd” Is the row in DiskRowSet 2? (check bloom filters) Is the row in DiskRowSet 1? (check bloom filters) Bloom says: no! Bloom says: maybe! Search key column to find offset: rowid = 150 150: pay=$1M @ time T base data
  35. 35. 35© Cloudera, Inc. All rights reserved. Metadata • Replicated master* • Acts as a tablet directory (“META” table) • Acts as a catalog (table schemas, etc) • Acts as a load balancer (tracks TS liveness, re-replicates under-replicated tablets) • Caches all metadata in RAM for high performance • 80-node load test, GetTableLocations RPC perf: • 99th percentile: 68us, 99.99th percentile: 657us • <2% peak CPU usage • Client configured with master addresses • Asks master for tablet locations as needed and caches them 35
  36. 36. 36© Cloudera, Inc. All rights reserved. Performance characteristics Very CPU efficient • Written in modern C++, uses specialized CPU instructions, JIT compilation with LLVM Latency dependent on storage hardware capabilities • Expect sub-millisecond response on SSDs and upcoming technologies No garbage collection allows very large memory footprint with no pauses Bloom filters reduce the need for many disk accesses
  37. 37. 37© Cloudera, Inc. All rights reserved. Kudu Trade-Offs • Random updates will be slower • HBase model allows random updates without incurring a disk seek • Kudu requires a key lookup before update, Bloom lookup before insert • Single-row reads may be slower • Columnar design is optimized for scans • Future: may introduce “column groups” for applications where single-row access is more important
  38. 38. 38© Cloudera, Inc. All rights reserved. APIs
  39. 39. 39© Cloudera, Inc. All rights reserved. Native clients • First class clients implemented in Java and C++ • Experimental Cython client that wraps the C++ • Supported operations • DDL (create, alter, delete tables) • DML (insert, update, delete) • Scan (with predicate pushdown, column projection)
  40. 40. 40© Cloudera, Inc. All rights reserved. Java/C++ native clients - Create table KuduClient client = new KuduClient.KuduClientBuilder("master1,master2,master3").build(); ColumnSchema idColumn = new ColumnSchema.ColumnSchemaBuilder("col_id", Type.INT64) .key(true) .build(); ColumnSchema nameColumn = new ColumnSchema.ColumnSchemaBuilder("col_name", Type.STRING) .build(); ColumnSchema emailColumn = new ColumnSchema.ColumnSchemaBuilder("col_email", Type.STRING) .nullable(true) .build(); Schema schema = new Schema(Lists.newArrayList(idColumn, nameColumn, emailColumn)); client.createTable("clouderans", schema);
  41. 41. 41© Cloudera, Inc. All rights reserved. Java/C++ native clients - Insert KuduTable table = client.openTable("clouderans"); KuduSession session = client.newSession(); Insert newRow = table.newInsert(); newRow.addLong(idColumn.getName(), 1435425662); newRow.addString(nameColumn.getName(), "JD"); newRow.addString(emailColumn.getName(), “jdcryans@cloudera.com"); session.apply(newRow); session.flush();
  42. 42. 42© Cloudera, Inc. All rights reserved. Java/C++ native clients - Update Update newUpdate = table.newUpdate(); newUpdate.addLong(idColumn.getName(), 1435425662); newUpdate.addString(nameColumn.getName(), “Jean-Daniel"); session.apply(newUpdate); session.flush();
  43. 43. 43© Cloudera, Inc. All rights reserved. Java/C++ native clients - Scan ColumnRangePredicate predicate = new ColumnRangePredicate(idColumn); predicate.setLowerBound(1435425662); KuduScanner scanner = client.newScannerBuilder(table) .setProjectedColumnNames(Lists.newArrayList("col_name", "col_email")) .addColumnRangePredicate(predicate) .build(); while (scanner.hasMoreRows()) { RowResultIterator rowResultIterator = scanner.nextRows(); while (rowResultIterator.hasNext()) { RowResult rowResult = rowResultIterator.next(); System.out.println("Found a clouderan " + rowResult.getString(0)); } }
  44. 44. 44© Cloudera, Inc. All rights reserved. MapReduce • Very similar to HBase’s • Input format • Generates one Map per Tablet • Configurable like a KuduScanner • A RowResult is given per map() • Output format • Configured for high throughput • Ingests Insert, Update, and Delete
  45. 45. 45© Cloudera, Inc. All rights reserved. Impala + Kudu CREATE TABLE my_first_table ( id BIGINT, name STRING ) DISTRIBUTE BY HASH (id) INTO 16 BUCKETS TBLPROPERTIES( 'storage_handler' = 'com.cloudera.kudu.hive.KuduStorageHandler', 'kudu.table_name' = 'my_first_table', 'kudu.master_addresses' = 'kudu-master.example.com:7051', 'kudu.key_columns' = 'id' );
  46. 46. 46© Cloudera, Inc. All rights reserved. Impala + Kudu – Pre Splitting CREATE TABLE cust_behavior ( _id BIGINT, salary STRING, edu_level INT, usergender STRING, rating INT) DISTRIBUTE BY RANGE(_id) SPLIT ROWS((1439560049342), (1439566253755), (1439572458168), (1439578662581), (1439584866994), (1439591071407)) TBLPROPERTIES( 'storage_handler' = 'com.cloudera.kudu.hive.KuduStorageHandler', 'kudu.table_name' = 'my_first_table', 'kudu.master_addresses' = 'kudu-master.example.com:7051', 'kudu.key_columns' = 'id' );
  47. 47. 47© Cloudera, Inc. All rights reserved. Impala + Kudu – Update & Delete UPDATE my_first_table SET name="bob" where id = 3; DELETE FROM my_first_table WHERE id < 3;
  48. 48. 48© Cloudera, Inc. All rights reserved. Benchmarks
  49. 49. 49© Cloudera, Inc. All rights reserved. TPC-H (Analytics benchmark) • 75TS + 1 master cluster • 12 (spinning) disk each, enough RAM to fit dataset • Using Kudu 0.5.0, Impala 2.2 with Kudu support, CDH 5.4 • TPC-H Scale Factor 100 (100GB) • Example query: • SELECT n_name, sum(l_extendedprice * (1 - l_discount)) as revenue FROM customer, orders, lineitem, supplier, nation, region WHERE c_custkey = o_custkey AND l_orderkey = o_orderkey AND l_suppkey = s_suppkey AND c_nationkey = s_nationkey AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA' AND o_orderdate >= date '1994-01-01' AND o_orderdate < '1995-01-01’ GROUP BY n_name ORDER BY revenue desc; 49
  50. 50. 50© Cloudera, Inc. All rights reserved. - Kudu outperforms Parquet by 31% (geometric mean) for RAM-resident data - Parquet likely to outperform Kudu for HDD-resident (larger IO requests)
  51. 51. 51© Cloudera, Inc. All rights reserved. What about Apache Phoenix? • 10 node cluster (9 worker, 1 master) • HBase 1.0, Phoenix 4.3 • TPC-H LINEITEM table only (6B rows) 51 2152 219 76 131 0.04 1918 13.2 1.7 0.7 0.15 155 9.3 1.4 1.5 1.37 0.01 0.1 1 10 100 1000 10000 Load TPCH Q1 COUNT(*) COUNT(*) WHERE… single-row lookup Time (sec) Phoenix Kudu Parquet
  52. 52. 52© Cloudera, Inc. All rights reserved. Xiaomi Use Case • Gather important RPC tracing events from mobile app and backend service • Service monitoring & troubleshooting tool • High write throughput • >5 Billion records/day and growing • Query latest data and quick response • Identify and resolve issues quickly • Can search for individual records • Easy for troubleshooting
  53. 53. 53© Cloudera, Inc. All rights reserved. Big Data Analytics Pipeline Before Kudu • Long pipeline high latency(1 hour ~ 1 day), data conversion pains • No ordering Log arrival(storage) order not exactly logical order e.g. read 2-3 days of log for data in 1 day
  54. 54. 54© Cloudera, Inc. All rights reserved. Big Data Analysis Pipeline Simplified With Kudu • ETL Pipeline(0~10s latency) Apps that need to prevent backpressure or require ETL • Direct Pipeline(no latency) Apps that don’t require ETL and no backpressure issues OLAP scan Side table lookup Result store
  55. 55. 55© Cloudera, Inc. All rights reserved. Use Case 1: Benchmark Environment • 71 Node cluster • Hardware • CPU: E5-2620 2.1GHz * 24 core Memory: 64GB • Network: 1Gb Disk: 12 HDD • Software • Hadoop2.6/Impala 2.1/Kudu Data • 1 day of server side tracing data • ~2.6 Billion rows • ~270 bytes/row • 17 columns, 5 key columns
  56. 56. 56© Cloudera, Inc. All rights reserved. Use Case 1: Benchmark Results 1.4 2.0 2.3 3.1 1.3 0.9 1.3 2.8 4.0 5.7 7.5 16.7 Q1 Q2 Q3 Q4 Q5 Q6 kudu parquet Total Time(s) Throughput(Total) Throughput(per node) Kudu 961.1 2.8M record/s 39.5k record/s Parquet 114.6 23.5M record/s 331k records/s Bulk load using impala (INSERT INTO): Query latency: * HDFS parquet file replication = 3 , kudu table replication = 3 * Each query run 5 times then take average
  57. 57. 57© Cloudera, Inc. All rights reserved. Getting Started Users: Install the Beta or try a VM: getkudu.io Get help: kudu-user@googlegroups.com Read the white paper: getkudu.io/kudu.pdf Developers: Contribute: github.com/cloudera/kudu (commits) gerrit.cloudera.org (reviews) issues.cloudera.org (JIRAs going back to 2013) Join the Dev list: kudu-dev@googlegroups.com Contributions/participation are welcome and encouraged!
  58. 58. 58© Cloudera, Inc. All rights reserved. Questions?

×