9. CLUSTER SIZES & TOPOLOGIES
• 5 TO TENS OF NODES
• COUPLE HUNDRED GiB TO DUZEN
TiB OF RAM
• COUPLE TiB TO HUNDRED TiB OF
RAW SPACE
• ON-PREM AND CLOUD
15. THE “WORST” OF BOTH WORLDS
• KUDU IS SLIGHTLY
WORSE THAN
HBASE FOR
INDEXED OPS
• KUDU SHOULD BE
NO MORE THAN 2x
WORSE THAN
PARQUET FOR
LARGE SCANS
16. KUDU BENCHMARKS – INGEST RATE
HIGHER IS BETTER
source: http://blog.cloudera.com/blog/2017/02/performance-comparing-of-different-file-formats-and-storage-engines-in-hadoop-file-system/
17. KUDU BENCHMARKS – RANDOM LOOKUP
LOWER IS BETTER
source: http://blog.cloudera.com/blog/2017/02/performance-comparing-of-different-file-formats-and-storage-engines-in-hadoop-file-system/
18. KUDU BENCHMARKS – SCAN RATE
HIGHER IS BETTER
source: http://blog.cloudera.com/blog/2017/02/performance-comparing-of-different-file-formats-and-storage-engines-in-hadoop-file-system/
19. KUDU BENCHMARKS – SUMMARY
source: http://blog.cloudera.com/blog/2017/02/performance-comparing-of-different-file-formats-and-storage-engines-in-hadoop-file-system/
23. ZWOOX - INGESTION FRAMEWORK
• LOW LATENCY, HIGH THROUGHPUT, HIGHLY
AVAILABLE
• NEAR-REAL TIME for KUDU & BATCH for HIVE/IMPALA
• BATCH & STREAMING REPLICATIONS
• AUTOMATIC CONSOLIDATION INTO HDFS BASED
TABLES
• MULTIPLE TABLEs WITH SPECIFIC PARTITIONING
SCHEME
• IN-LINE PROCESSING
• AUTOMATIC AUDIT DATA
24. MESSAGE BUS
• JMS SEMANTICS IS LIMITED
• JMS SCALING IS HARD
• JMS PERFORMANCE IS POOR COMPARED TO KAFKA