SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
1 
Kylin: Hadoop OLAP Engine - Tech Deep Dive 
Jiang Xu, Architect of Kylin 
October 2014
Kylin Overview 
Kylin 
- n. (in Chinese art) a mythical animal of composite form 
2 
http://kylin.io 
Kylin is an open source Distributed Analytics Engine from 
eBay Inc. that provides SQL interface and multi-dimensional 
analysis (OLAP) on Hadoop supporting extremely large 
datasets
What Is Kylin? 
• Extremely Fast OLAP Engine at Scale 
Kylin is designed to reduce query latency on Hadoop for 10+ billions of rows of data 
• ANSI-SQL Interface on Hadoop 
Kylin offers ANSI-SQL on Hadoop and supports most ANSI-SQL query functions 
• Interactive Query Capability 
Users can interact with Hadoop data via Kylin at sub-second latency, better than Hive queries for the same 
dataset 
• MOLAP Cube 
User can define a data model and pre-build in Kylin with more than 10+ billions of raw data records 
• Seamless Integration with BI Tools 
Kylin currently offers integration capability with BI Tools like Tableau. Integration with Microstrategy and Excel 
is coming soon 
3
What Is Kylin? - Other Highlights 
• Job Management and Monitoring 
• Compression and Encoding Support 
• Incremental Refresh of Cubes 
• Leverage HBase Coprocessor for query latency 
• Approximate Query Capability for distinct Count (HyperLogLog) 
• Easy Web interface to manage, build, monitor and query cubes 
• Security capability to set ACL at Cube/Project Level 
• Support LDAP Integration 
4
Glance of SQL-on-Hadoop Ecosystem … 
• SQL translated to MapReduce jobs 
– Hive 
– Stinger without Tez 
• SQL processed by a MPP Engine 
– Impala 
– Drill 
– Presto 
– Spark + Shark 
• SQL process by a existing SQL Engine + HDFS 
– EMC Greenplum (postgres) 
– Taobao Garude (mysql) 
• OLAP on Hadoop in other Companies 
– Adobe: HBase Cube 
– LinkedIn: Avatara 
– Salesforce.com: Phoenix 
5
Why Do We Build Kylin? 
• Why existing SQL-on-Hadoop solutions fall short? 
The existing SQL-on-Hadoop needs to scan partial or whole data set to answer a user query. Moreover, table join may trigger 
the data transfer across host. Due to large scan range and network traffic latency, many queries are very slow (minute+ 
latency). 
• What is MOLAP/ROLAP? 
– MOLAP (Multi-dimensional OLAP) is to pre-compute data along different dimensions of interest and store resultant 
values in the cube. MOLAP is much faster but is inflexible. Kylin is more like MOLAP. 
– ROLAP (Relational-OLAP) is to use star or snow-flake schema to do runtime aggregation. ROLAP is flexible but much 
slower. All existing SQL-on-Hadoop is kind of ROLAP. 
• How does Kylin support ROLAP/MOLAP? 
Kylin builds data cube (MOLAP) from hive table (ROLAP) according to the metadata definition. 
– If the query can be fulfilled by data cube, Kylin will route the query to data cube that is MOLAP. 
– If the query can’t be fulfilled by data cube, Kylin will route the query to hive table that is ROLAP. 
– Basically, you can think Kylin as HOLAP on top of MOLAP and ROLAP. 
6
Architecture Overview 
7
Analytics Query Taxonomy 
8 
High Level 
Aggregation 
• Very High Level, e.g GMV by 
site by vertical by weeks 
Analysis 
Query 
• Middle level, e.g GMV by site by vertical, by 
category (level x) past 12 weeks 
Drill Down to 
Detail • Detail Level (SSA Table) 
Low Level 
Aggregation • Seller ID 
Transaction 
Level • Transaction ID 
80+% 
Analytics 
Kylin is designed to accelerate 
Analytics queries!
Query Performance -- Compare to Hive 
9 
200 
100 
0 
SQL #1 SQL #2 SQL #3 
# Query Type Return 
Dataset 
Query 
On Kylin (s) 
Query 
On Hive (s) 
Hive 
Kylin 
Comments 
1 High Level Aggregation 4 0.129 157.437 1,217 times 
2 Analysis Query 22,669 1.615 109.206 68 times 
3 Drill Down to Detail 325,029 12.058 113.123 9 times 
4 Drill Down to Detail 524,780 22.42 6383.21 278 times 
5 Data Dump 972,002 49.054 N/A
Query Performance – Latency & Throughput 
10
Tech Highlights 
11
Component Design 
12 
Query Engine 
(Calcite) 
Metadata Manager 
JDBC Driver 
Storage Engine 
Data Cube 
(HBase) 
Job Engine 
(MapReduce) 
Star Schema 
(Hive) 
SQL 
RESTful Server 
Dictionary 
& Cube 
JDBC Driver ODBC Driver 
HBase 
Coprossor
Background – Cube & Cuboid 
13 
• Cuboid = one combination of dimensions 
• Cube = all combination of dimensions (all cuboids) 
time, item 
time item location supplier 
time, item, location 
item, location 
time, location, supplier 
time, item, location, supplier 
time, location 
Time, supplier 
item, supplier 
location, supplier 
time, item, supplier 
item, location, supplier 
0-D(apex) cuboid 
1-D cuboids 
2-D cuboids 
3-D cuboids 
4-D(base) cuboid
Metadata- Overview 
14 
Cube: … 
Fact Table: … 
Dimensions: … 
Measures: … 
Row Key: … 
HBase Mapping: 
… 
Dim 
Fact 
Dim Dim 
Source 
Star Schema 
row A 
row B 
row C 
Val 1 
Val 2 
Val 3 
Column 
Family 
Row Key 
Column 
Target 
HBase Storage 
Cube Metadata
Metadata – Dimension 
• Dimension 
– Normal 
– Mandatory 
– Hierarchy 
– Derived 
15
Metadata – Measure 
• Measure 
– Sum 
– Count 
– Max 
– Min 
– Average 
– Distinct Count (based on HyperLogLog) 
16
Query Engine - Overview 
• Query engine is based on Calcite 
• Query Execution Plan 
• Optiq Plug-ins in Query Engine 
17
Query Engine – Calcite 
• Calcite (http://incubator.apache.org/projects/calcite.html) is an extensible open source SQL 
engine that is also used in Stinger/Drill/Cascading. 
18
Query Engine – Explain Plan 
19 
SELECT test_cal_dt.week_beg_dt, test_category.lv1_categ, test_category.lv2_categ, test_category.lv3_categ, test_kylin_fact.format_name, 
test_sites.site_name, sum(test_kylin_fact.price) as total_price, count(*) as total_count 
FROM test_kylin_fact 
LEFT JOIN test_cal_dt ON test_kylin_fact.cal_dt = test_cal_dt.cal_dt 
LEFT JOIN test_category_groupings ON test_kylin_fact.leaf_categ_id = test_category_groupings.leaf_categ_id AND test_kylin_fact.lstg_site_id = 
test_category_groupings.site_id 
LEFT JOIN test_sites ON test_kylin_fact.lstg_site_id = test_sites.site_id 
WHERE test_kylin_fact.seller_id = 123456 OR test_kylin_fact.lstg_format_name = ’New' 
GROUP BY test_cal_dt.week_beg_dt, test_category.lv1_categ, test_category.lv2_categ, test_category.lv3_categ, test_kylin_fact.format_name, 
test_sites.site_name 
OLAPToEnumerableConverter 
OLAPProjectRel(WEEK_BEG_DT=[$0], LV1_CATEG=[$1], LVL2_CATEG=[$2], LVL3_CATEG=[$3], FORMAT_NAME=[$4], 
SITE_NAME=[$5], TOTAL_PRICE=[CASE(=($7, 0), null, $6)], TOTAL_COUNT=[$8]) 
OLAPAggregateRel(group=[{0, 1, 2, 3, 4, 5}], agg#0=[$SUM0($6)], agg#1=[COUNT($6)], TRANS_CNT=[COUNT()]) 
OLAPProjectRel(WEEK_BEG_DT=[$13], LV1_CATEG=[$21], LVL2_CATEG=[$15], LVL3_CATEG=[$14], FORMAT_NAME=[$5], 
SITE_NAME=[$23], PRICE=[$0]) 
OLAPFilterRel(condition=[OR(=($3, 123456), =($5, ‘New'))]) 
OLAPJoinRel(condition=[=($2, $25)], joinType=[left]) 
OLAPJoinRel(condition=[AND(=($6, $22), =($2, $17))], joinType=[left]) 
OLAPJoinRel(condition=[=($4, $12)], joinType=[left]) 
OLAPTableScan(table=[[DEFAULT, TEST_KYLIN_FACT]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]]) 
OLAPTableScan(table=[[DEFAULT, TEST_CAL_DT]], fields=[[0, 1]]) 
OLAPTableScan(table=[[DEFAULT, TEST_CATEGORY_GROUPINGS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8]]) 
OLAPTableScan(table=[[DEFAULT, TEST_SITES]], fields=[[0, 1, 2]])
Query Engine – Calcite Plugin-ins 
• Metadata SPI 
– Provide table schema from kylin metadata 
• Optimize Rule 
– Translate the logic operator into kylin operator 
• Relational Operator 
– Find right cube 
– Translate SQL into storage engine api call 
– Generate physical execute plan by linq4j java implementation 
• Result Enumerator 
– Translate storage engine result into java implementation result. 
• SQL Function 
– Add HyperLogLog for distinct count 
– Implement date time related functions (i.e. Quarter) 
20
Storage Engine - Overview 
• Provide cube query for query engine 
– Common iterator interface for storage engine 
– Isolate query engine from underline storage 
• HBase Storage 
– HBase schema 
– Pre-join & pre-aggregation 
– Dictionary 
– HBase coprocessor 
21
Storage Engine – HBase Schema 
22
Storage Engine – Pre-join & pre-aggregation 
• Pre-join (Hive) 
– Kylin will generate pre-join HQL based on metadata that will join the fact table with dimension tables 
– Kylin will use Hive to execute the pre-join HQL that will generate the pre-joined flat table 
• Pre-aggregation (MapReduce) 
– Kylin will generate a minimum spanning tree of cuboid from cube lattice graph. 
– Kylin will generate MapReduce job base on the minimum spanning tree 
– MapReduce job will do pre-aggregation to generate all cuboids from N dimensions to 1 dimension. 
• MOLAP Cube (HBase) 
– The pre-join & pre-aggregation results (i.e. MOLAP Cube) is stored in HBase 
23
Storage Engine – Dictionary 
• Dictionary maps dimension values into IDs that will reduce the memory and storage footprint. 
• Metadata can define whether one dimension use “dictionary” or not 
• Dictionary is based on Trie 
24
Storage Engine – HBase Coprocessor 
• HBase coprocessor can reduce network traffic and parallelize scan logic 
• HBase client 
– Serialize the filter + dimension + metrics into bytes 
– Send the encoded bytes to corprocessor 
• HBase Corprocessor 
– Deserialize the bytes to filter + dimension + metrics 
– Iterate all rows from each scan range 
• Filter unmatched rows 
• Aggregate the matched rows by dimensions in an cache 
• Send back the aggregated rows from cache 
25
Cube Build - Overview 
• Multi Staged Build 
• Map Reduce Job Flow 
• Cube Build Steps 
26
Cube Build – Multi Staged Build 
27
Cube Build – Map Reduce Job Flow 
28
Cube Build – Steps 
• Build dictionary from dimension tables on local disk. And copy dictionary to HDFS 
• Run Hive query to build a joined flatten table 
• Run map reduce job to build cuboids in HDFS sequence files from tier 1 to tier N 
• Calculate the key distribution of HDFS sequence files. And evenly split the key space into K regions. 
• 
• Translate HDFS sequence files into HBase Hfile 
• Bulk load the HFile into HBase 
29
Cube Optimization - Overview 
• “Curse of dimensionality”: N dimension cube has 2N cuboid 
– Full Cube vs. Partial Cube 
• Hugh data volume 
– Incremental Build 
• Slow Table Scan – TopN Query on High Cardinality Dimension 
– Bitmap inverted index 
– Time range partition 
– In-memory parallel scan: block cache + endpoint coprocessor 
30
Cube Optimization – Full Cube vs. Partial Cube 
• Full Cube 
– Pre-aggregate all dimension combinations 
– “Curse of dimensionality”: N dimension cube has 2N cuboid. 
• Partial Cube 
– To avoid dimension explosion, we divide the dimensions into different aggregation groups 
– For cube with 30 dimensions, if we divide these dimensions into 3 group, the cuboid number will 
reduce from 1 Billion to 3 Thousands 
– Tradeoff between online aggregation and offline pre-aggregation 
31
Cube Optimization – Partial Cube 
32
Cube Optimization – Incremental Build 
33
Cube Optimization – TopN Query on High Cardinality Dimension 
34 
• Bitmap inverted index 
• Separate high cardinality 
dimension from low cardinality 
dimension 
• Time range partition 
• In-memory (block cache) 
• Parallel scan (endpoint 
coprocessor)
Kylin Resources 
• Web Site 
http://kylin.io 
• Google Groups 
https://groups.google.com/forum/#!forum/kylin-olap 
• Source Code 
https://github.com/KylinOLAP/Kylin 
35
Q & A 
36

Weitere ähnliche Inhalte

Was ist angesagt?

Apache Kylin: Hadoop OLAP Engine, 2014 Dec
Apache Kylin: Hadoop OLAP Engine, 2014 DecApache Kylin: Hadoop OLAP Engine, 2014 Dec
Apache Kylin: Hadoop OLAP Engine, 2014 DecYang Li
 
Kylin Engineering Principles
Kylin Engineering PrinciplesKylin Engineering Principles
Kylin Engineering PrinciplesXu Jiang
 
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for HadoopHBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for HadoopHBaseCon
 
Apache Kylin Streaming
Apache Kylin Streaming Apache Kylin Streaming
Apache Kylin Streaming hongbin ma
 
Apache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouseApache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouseYang Li
 
Apache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBaseApache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBaseHBaseCon
 
Kylin OLAP Engine Tour
Kylin OLAP Engine TourKylin OLAP Engine Tour
Kylin OLAP Engine TourLuke Han
 
Apache Kylin - Balance between space and time - Hadoop Summit 2015
Apache Kylin -  Balance between space and time - Hadoop Summit 2015Apache Kylin -  Balance between space and time - Hadoop Summit 2015
Apache Kylin - Balance between space and time - Hadoop Summit 2015Debashis Saha
 
Apache Kylin 1.5 Updates
Apache Kylin 1.5 UpdatesApache Kylin 1.5 Updates
Apache Kylin 1.5 UpdatesYang Li
 
Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Seshu Adunuthula
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopTony Ng
 
Design cube in Apache Kylin
Design cube in Apache KylinDesign cube in Apache Kylin
Design cube in Apache KylinYang Li
 
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopHadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopDataWorks Summit
 
Apache kylin (china hadoop summit 2015 shanghai)
Apache kylin (china hadoop summit 2015 shanghai)Apache kylin (china hadoop summit 2015 shanghai)
Apache kylin (china hadoop summit 2015 shanghai)qhzhou
 
Apache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingApache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingLuke Han
 
Apache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and TimeApache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and TimeDataWorks Summit
 
The Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanThe Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanLuke Han
 
Adding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupAdding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupLuke Han
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache DrillDataWorks Summit
 

Was ist angesagt? (20)

Apache Kylin: Hadoop OLAP Engine, 2014 Dec
Apache Kylin: Hadoop OLAP Engine, 2014 DecApache Kylin: Hadoop OLAP Engine, 2014 Dec
Apache Kylin: Hadoop OLAP Engine, 2014 Dec
 
Kylin Engineering Principles
Kylin Engineering PrinciplesKylin Engineering Principles
Kylin Engineering Principles
 
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for HadoopHBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
 
The Evolution of Apache Kylin
The Evolution of Apache KylinThe Evolution of Apache Kylin
The Evolution of Apache Kylin
 
Apache Kylin Streaming
Apache Kylin Streaming Apache Kylin Streaming
Apache Kylin Streaming
 
Apache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouseApache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouse
 
Apache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBaseApache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBase
 
Kylin OLAP Engine Tour
Kylin OLAP Engine TourKylin OLAP Engine Tour
Kylin OLAP Engine Tour
 
Apache Kylin - Balance between space and time - Hadoop Summit 2015
Apache Kylin -  Balance between space and time - Hadoop Summit 2015Apache Kylin -  Balance between space and time - Hadoop Summit 2015
Apache Kylin - Balance between space and time - Hadoop Summit 2015
 
Apache Kylin 1.5 Updates
Apache Kylin 1.5 UpdatesApache Kylin 1.5 Updates
Apache Kylin 1.5 Updates
 
Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on Hadoop
 
Design cube in Apache Kylin
Design cube in Apache KylinDesign cube in Apache Kylin
Design cube in Apache Kylin
 
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopHadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
 
Apache kylin (china hadoop summit 2015 shanghai)
Apache kylin (china hadoop summit 2015 shanghai)Apache kylin (china hadoop summit 2015 shanghai)
Apache kylin (china hadoop summit 2015 shanghai)
 
Apache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingApache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 Beijing
 
Apache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and TimeApache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and Time
 
The Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanThe Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke Han
 
Adding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupAdding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark Meetup
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache Drill
 

Andere mochten auch

Low Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBaseLow Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBaseDataWorks Summit
 
IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?DataWorks Summit
 
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidPulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidTony Ng
 
Cloud for Developers: Azure vs. Google App Engine vs. Amazon vs. AppHarbor
Cloud for Developers: Azure vs. Google App Engine vs. Amazon vs. AppHarborCloud for Developers: Azure vs. Google App Engine vs. Amazon vs. AppHarbor
Cloud for Developers: Azure vs. Google App Engine vs. Amazon vs. AppHarborSvetlin Nakov
 
Combustion in diesel engine
Combustion in diesel engineCombustion in diesel engine
Combustion in diesel engineAmanpreet Singh
 
Search Engine Optimization (SEO)
Search Engine Optimization (SEO)Search Engine Optimization (SEO)
Search Engine Optimization (SEO)Dennis Deacon
 
Periodic Table of SEO Success Factors & Guide to SEO by SearchEngineLand
Periodic Table of SEO Success Factors & Guide to SEO by SearchEngineLandPeriodic Table of SEO Success Factors & Guide to SEO by SearchEngineLand
Periodic Table of SEO Success Factors & Guide to SEO by SearchEngineLandSearch Engine Land
 
Internal Combustion Engines - Construction and Working (All you need to know,...
Internal Combustion Engines - Construction and Working (All you need to know,...Internal Combustion Engines - Construction and Working (All you need to know,...
Internal Combustion Engines - Construction and Working (All you need to know,...Mihir Pai
 
Tk03 Google App Engine Fr
Tk03 Google App Engine FrTk03 Google App Engine Fr
Tk03 Google App Engine FrValtech
 
ppt on 2 stroke and 4 stroke petrol engine
ppt on 2 stroke and 4 stroke petrol engineppt on 2 stroke and 4 stroke petrol engine
ppt on 2 stroke and 4 stroke petrol engineharshid panchal
 
DNUG2015 Frühjahrskonferenz: Brücken bauen, Grenzen überwinden: Domino im Dia...
DNUG2015 Frühjahrskonferenz: Brücken bauen, Grenzen überwinden: Domino im Dia...DNUG2015 Frühjahrskonferenz: Brücken bauen, Grenzen überwinden: Domino im Dia...
DNUG2015 Frühjahrskonferenz: Brücken bauen, Grenzen überwinden: Domino im Dia...JRibbeck
 
I.C.ENGINE PPT
I.C.ENGINE PPTI.C.ENGINE PPT
I.C.ENGINE PPT8695
 
CAP 4: SEO - Optimizacion de Contenido
CAP 4: SEO - Optimizacion de ContenidoCAP 4: SEO - Optimizacion de Contenido
CAP 4: SEO - Optimizacion de ContenidoGary Briceño
 
App engine
App engineApp engine
App engineThirdWay
 
CAP 3: SEO - Keywords Research
CAP 3: SEO - Keywords ResearchCAP 3: SEO - Keywords Research
CAP 3: SEO - Keywords ResearchGary Briceño
 
Desarrollando para Nmap Scripting Engine (NSE) [GuadalajaraCON 2013]
Desarrollando para Nmap Scripting Engine (NSE) [GuadalajaraCON 2013]Desarrollando para Nmap Scripting Engine (NSE) [GuadalajaraCON 2013]
Desarrollando para Nmap Scripting Engine (NSE) [GuadalajaraCON 2013]Websec México, S.C.
 
Presentacion introduccion ibm file net p8 v10
Presentacion introduccion ibm file net p8 v10Presentacion introduccion ibm file net p8 v10
Presentacion introduccion ibm file net p8 v10Javier Laguens Garcia
 

Andere mochten auch (20)

Low Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBaseLow Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBase
 
IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?
 
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidPulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
 
Cloud for Developers: Azure vs. Google App Engine vs. Amazon vs. AppHarbor
Cloud for Developers: Azure vs. Google App Engine vs. Amazon vs. AppHarborCloud for Developers: Azure vs. Google App Engine vs. Amazon vs. AppHarbor
Cloud for Developers: Azure vs. Google App Engine vs. Amazon vs. AppHarbor
 
Combustion in diesel engine
Combustion in diesel engineCombustion in diesel engine
Combustion in diesel engine
 
RoomCloud Booking Engine
RoomCloud Booking EngineRoomCloud Booking Engine
RoomCloud Booking Engine
 
Search Engine Optimization (SEO)
Search Engine Optimization (SEO)Search Engine Optimization (SEO)
Search Engine Optimization (SEO)
 
Periodic Table of SEO Success Factors & Guide to SEO by SearchEngineLand
Periodic Table of SEO Success Factors & Guide to SEO by SearchEngineLandPeriodic Table of SEO Success Factors & Guide to SEO by SearchEngineLand
Periodic Table of SEO Success Factors & Guide to SEO by SearchEngineLand
 
Ic engine
Ic engineIc engine
Ic engine
 
Internal Combustion Engines - Construction and Working (All you need to know,...
Internal Combustion Engines - Construction and Working (All you need to know,...Internal Combustion Engines - Construction and Working (All you need to know,...
Internal Combustion Engines - Construction and Working (All you need to know,...
 
Tk03 Google App Engine Fr
Tk03 Google App Engine FrTk03 Google App Engine Fr
Tk03 Google App Engine Fr
 
ppt on 2 stroke and 4 stroke petrol engine
ppt on 2 stroke and 4 stroke petrol engineppt on 2 stroke and 4 stroke petrol engine
ppt on 2 stroke and 4 stroke petrol engine
 
DNUG2015 Frühjahrskonferenz: Brücken bauen, Grenzen überwinden: Domino im Dia...
DNUG2015 Frühjahrskonferenz: Brücken bauen, Grenzen überwinden: Domino im Dia...DNUG2015 Frühjahrskonferenz: Brücken bauen, Grenzen überwinden: Domino im Dia...
DNUG2015 Frühjahrskonferenz: Brücken bauen, Grenzen überwinden: Domino im Dia...
 
I.C.ENGINE PPT
I.C.ENGINE PPTI.C.ENGINE PPT
I.C.ENGINE PPT
 
CAP 4: SEO - Optimizacion de Contenido
CAP 4: SEO - Optimizacion de ContenidoCAP 4: SEO - Optimizacion de Contenido
CAP 4: SEO - Optimizacion de Contenido
 
App engine
App engineApp engine
App engine
 
CAP 3: SEO - Keywords Research
CAP 3: SEO - Keywords ResearchCAP 3: SEO - Keywords Research
CAP 3: SEO - Keywords Research
 
El SEOy la geolocalización
El SEOy la geolocalizaciónEl SEOy la geolocalización
El SEOy la geolocalización
 
Desarrollando para Nmap Scripting Engine (NSE) [GuadalajaraCON 2013]
Desarrollando para Nmap Scripting Engine (NSE) [GuadalajaraCON 2013]Desarrollando para Nmap Scripting Engine (NSE) [GuadalajaraCON 2013]
Desarrollando para Nmap Scripting Engine (NSE) [GuadalajaraCON 2013]
 
Presentacion introduccion ibm file net p8 v10
Presentacion introduccion ibm file net p8 v10Presentacion introduccion ibm file net p8 v10
Presentacion introduccion ibm file net p8 v10
 

Ähnlich wie Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive

ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartEvans Ye
 
Accelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinAccelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinTyler Wishnoff
 
PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)Stratebi
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeDatabricks
 
ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015Luke Han
 
Apache Kylin Introduction
Apache Kylin IntroductionApache Kylin Introduction
Apache Kylin IntroductionLuke Han
 
ScaleDB Technical Presentation
ScaleDB Technical PresentationScaleDB Technical Presentation
ScaleDB Technical PresentationIvan Zoratti
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...Lucas Jellema
 
Grill at HadoopSummit
Grill at HadoopSummitGrill at HadoopSummit
Grill at HadoopSummitamarsri
 
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersSQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersLucidworks
 
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDesigning, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDenny Lee
 
Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)Nicolas Poggi
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays
 
How we switched to columnar at SpendHQ
How we switched to columnar at SpendHQHow we switched to columnar at SpendHQ
How we switched to columnar at SpendHQMariaDB plc
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Data Con LA
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoopnvvrajesh
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin
 

Ähnlich wie Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive (20)

ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smart
 
Accelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinAccelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache Kylin
 
PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
 
ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015
 
Apache Kylin Introduction
Apache Kylin IntroductionApache Kylin Introduction
Apache Kylin Introduction
 
ScaleDB Technical Presentation
ScaleDB Technical PresentationScaleDB Technical Presentation
ScaleDB Technical Presentation
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Oow2016 review-db-dev-bigdata-BI
Oow2016 review-db-dev-bigdata-BIOow2016 review-db-dev-bigdata-BI
Oow2016 review-db-dev-bigdata-BI
 
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
 
Grill at HadoopSummit
Grill at HadoopSummitGrill at HadoopSummit
Grill at HadoopSummit
 
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersSQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
 
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDesigning, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons Learned
 
Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
How we switched to columnar at SpendHQ
How we switched to columnar at SpendHQHow we switched to columnar at SpendHQ
How we switched to columnar at SpendHQ
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
Cheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduceCheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduce
 

Kürzlich hochgeladen

Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxmavinoikein
 
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRachelAnnTenibroAmaz
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringSebastiano Panichella
 
Chizaram's Women Tech Makers Deck. .pptx
Chizaram's Women Tech Makers Deck.  .pptxChizaram's Women Tech Makers Deck.  .pptx
Chizaram's Women Tech Makers Deck. .pptxogubuikealex
 
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power
 
Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Escort Service
 
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...Henrik Hanke
 
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comSaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comsaastr
 
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.KathleenAnnCordero2
 
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC  - NANOTECHNOLOGYPHYSICS PROJECT BY MSC  - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC - NANOTECHNOLOGYpruthirajnayak525
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxJohnree4
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSebastiano Panichella
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationNathan Young
 
Engaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptxEngaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptxAsifArshad8
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxaryanv1753
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸mathanramanathan2005
 
Quality by design.. ppt for RA (1ST SEM
Quality by design.. ppt for  RA (1ST SEMQuality by design.. ppt for  RA (1ST SEM
Quality by design.. ppt for RA (1ST SEMCharmi13
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSebastiano Panichella
 
Early Modern Spain. All about this period
Early Modern Spain. All about this periodEarly Modern Spain. All about this period
Early Modern Spain. All about this periodSaraIsabelJimenez
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxCarrieButtitta
 

Kürzlich hochgeladen (20)

Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptx
 
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software Engineering
 
Chizaram's Women Tech Makers Deck. .pptx
Chizaram's Women Tech Makers Deck.  .pptxChizaram's Women Tech Makers Deck.  .pptx
Chizaram's Women Tech Makers Deck. .pptx
 
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
 
Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170
 
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
 
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comSaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
 
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
 
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC  - NANOTECHNOLOGYPHYSICS PROJECT BY MSC  - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptx
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation Track
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism Presentation
 
Engaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptxEngaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptx
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptx
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸
 
Quality by design.. ppt for RA (1ST SEM
Quality by design.. ppt for  RA (1ST SEMQuality by design.. ppt for  RA (1ST SEM
Quality by design.. ppt for RA (1ST SEM
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
 
Early Modern Spain. All about this period
Early Modern Spain. All about this periodEarly Modern Spain. All about this period
Early Modern Spain. All about this period
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptx
 

Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive

  • 1. 1 Kylin: Hadoop OLAP Engine - Tech Deep Dive Jiang Xu, Architect of Kylin October 2014
  • 2. Kylin Overview Kylin - n. (in Chinese art) a mythical animal of composite form 2 http://kylin.io Kylin is an open source Distributed Analytics Engine from eBay Inc. that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets
  • 3. What Is Kylin? • Extremely Fast OLAP Engine at Scale Kylin is designed to reduce query latency on Hadoop for 10+ billions of rows of data • ANSI-SQL Interface on Hadoop Kylin offers ANSI-SQL on Hadoop and supports most ANSI-SQL query functions • Interactive Query Capability Users can interact with Hadoop data via Kylin at sub-second latency, better than Hive queries for the same dataset • MOLAP Cube User can define a data model and pre-build in Kylin with more than 10+ billions of raw data records • Seamless Integration with BI Tools Kylin currently offers integration capability with BI Tools like Tableau. Integration with Microstrategy and Excel is coming soon 3
  • 4. What Is Kylin? - Other Highlights • Job Management and Monitoring • Compression and Encoding Support • Incremental Refresh of Cubes • Leverage HBase Coprocessor for query latency • Approximate Query Capability for distinct Count (HyperLogLog) • Easy Web interface to manage, build, monitor and query cubes • Security capability to set ACL at Cube/Project Level • Support LDAP Integration 4
  • 5. Glance of SQL-on-Hadoop Ecosystem … • SQL translated to MapReduce jobs – Hive – Stinger without Tez • SQL processed by a MPP Engine – Impala – Drill – Presto – Spark + Shark • SQL process by a existing SQL Engine + HDFS – EMC Greenplum (postgres) – Taobao Garude (mysql) • OLAP on Hadoop in other Companies – Adobe: HBase Cube – LinkedIn: Avatara – Salesforce.com: Phoenix 5
  • 6. Why Do We Build Kylin? • Why existing SQL-on-Hadoop solutions fall short? The existing SQL-on-Hadoop needs to scan partial or whole data set to answer a user query. Moreover, table join may trigger the data transfer across host. Due to large scan range and network traffic latency, many queries are very slow (minute+ latency). • What is MOLAP/ROLAP? – MOLAP (Multi-dimensional OLAP) is to pre-compute data along different dimensions of interest and store resultant values in the cube. MOLAP is much faster but is inflexible. Kylin is more like MOLAP. – ROLAP (Relational-OLAP) is to use star or snow-flake schema to do runtime aggregation. ROLAP is flexible but much slower. All existing SQL-on-Hadoop is kind of ROLAP. • How does Kylin support ROLAP/MOLAP? Kylin builds data cube (MOLAP) from hive table (ROLAP) according to the metadata definition. – If the query can be fulfilled by data cube, Kylin will route the query to data cube that is MOLAP. – If the query can’t be fulfilled by data cube, Kylin will route the query to hive table that is ROLAP. – Basically, you can think Kylin as HOLAP on top of MOLAP and ROLAP. 6
  • 8. Analytics Query Taxonomy 8 High Level Aggregation • Very High Level, e.g GMV by site by vertical by weeks Analysis Query • Middle level, e.g GMV by site by vertical, by category (level x) past 12 weeks Drill Down to Detail • Detail Level (SSA Table) Low Level Aggregation • Seller ID Transaction Level • Transaction ID 80+% Analytics Kylin is designed to accelerate Analytics queries!
  • 9. Query Performance -- Compare to Hive 9 200 100 0 SQL #1 SQL #2 SQL #3 # Query Type Return Dataset Query On Kylin (s) Query On Hive (s) Hive Kylin Comments 1 High Level Aggregation 4 0.129 157.437 1,217 times 2 Analysis Query 22,669 1.615 109.206 68 times 3 Drill Down to Detail 325,029 12.058 113.123 9 times 4 Drill Down to Detail 524,780 22.42 6383.21 278 times 5 Data Dump 972,002 49.054 N/A
  • 10. Query Performance – Latency & Throughput 10
  • 12. Component Design 12 Query Engine (Calcite) Metadata Manager JDBC Driver Storage Engine Data Cube (HBase) Job Engine (MapReduce) Star Schema (Hive) SQL RESTful Server Dictionary & Cube JDBC Driver ODBC Driver HBase Coprossor
  • 13. Background – Cube & Cuboid 13 • Cuboid = one combination of dimensions • Cube = all combination of dimensions (all cuboids) time, item time item location supplier time, item, location item, location time, location, supplier time, item, location, supplier time, location Time, supplier item, supplier location, supplier time, item, supplier item, location, supplier 0-D(apex) cuboid 1-D cuboids 2-D cuboids 3-D cuboids 4-D(base) cuboid
  • 14. Metadata- Overview 14 Cube: … Fact Table: … Dimensions: … Measures: … Row Key: … HBase Mapping: … Dim Fact Dim Dim Source Star Schema row A row B row C Val 1 Val 2 Val 3 Column Family Row Key Column Target HBase Storage Cube Metadata
  • 15. Metadata – Dimension • Dimension – Normal – Mandatory – Hierarchy – Derived 15
  • 16. Metadata – Measure • Measure – Sum – Count – Max – Min – Average – Distinct Count (based on HyperLogLog) 16
  • 17. Query Engine - Overview • Query engine is based on Calcite • Query Execution Plan • Optiq Plug-ins in Query Engine 17
  • 18. Query Engine – Calcite • Calcite (http://incubator.apache.org/projects/calcite.html) is an extensible open source SQL engine that is also used in Stinger/Drill/Cascading. 18
  • 19. Query Engine – Explain Plan 19 SELECT test_cal_dt.week_beg_dt, test_category.lv1_categ, test_category.lv2_categ, test_category.lv3_categ, test_kylin_fact.format_name, test_sites.site_name, sum(test_kylin_fact.price) as total_price, count(*) as total_count FROM test_kylin_fact LEFT JOIN test_cal_dt ON test_kylin_fact.cal_dt = test_cal_dt.cal_dt LEFT JOIN test_category_groupings ON test_kylin_fact.leaf_categ_id = test_category_groupings.leaf_categ_id AND test_kylin_fact.lstg_site_id = test_category_groupings.site_id LEFT JOIN test_sites ON test_kylin_fact.lstg_site_id = test_sites.site_id WHERE test_kylin_fact.seller_id = 123456 OR test_kylin_fact.lstg_format_name = ’New' GROUP BY test_cal_dt.week_beg_dt, test_category.lv1_categ, test_category.lv2_categ, test_category.lv3_categ, test_kylin_fact.format_name, test_sites.site_name OLAPToEnumerableConverter OLAPProjectRel(WEEK_BEG_DT=[$0], LV1_CATEG=[$1], LVL2_CATEG=[$2], LVL3_CATEG=[$3], FORMAT_NAME=[$4], SITE_NAME=[$5], TOTAL_PRICE=[CASE(=($7, 0), null, $6)], TOTAL_COUNT=[$8]) OLAPAggregateRel(group=[{0, 1, 2, 3, 4, 5}], agg#0=[$SUM0($6)], agg#1=[COUNT($6)], TRANS_CNT=[COUNT()]) OLAPProjectRel(WEEK_BEG_DT=[$13], LV1_CATEG=[$21], LVL2_CATEG=[$15], LVL3_CATEG=[$14], FORMAT_NAME=[$5], SITE_NAME=[$23], PRICE=[$0]) OLAPFilterRel(condition=[OR(=($3, 123456), =($5, ‘New'))]) OLAPJoinRel(condition=[=($2, $25)], joinType=[left]) OLAPJoinRel(condition=[AND(=($6, $22), =($2, $17))], joinType=[left]) OLAPJoinRel(condition=[=($4, $12)], joinType=[left]) OLAPTableScan(table=[[DEFAULT, TEST_KYLIN_FACT]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]]) OLAPTableScan(table=[[DEFAULT, TEST_CAL_DT]], fields=[[0, 1]]) OLAPTableScan(table=[[DEFAULT, TEST_CATEGORY_GROUPINGS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8]]) OLAPTableScan(table=[[DEFAULT, TEST_SITES]], fields=[[0, 1, 2]])
  • 20. Query Engine – Calcite Plugin-ins • Metadata SPI – Provide table schema from kylin metadata • Optimize Rule – Translate the logic operator into kylin operator • Relational Operator – Find right cube – Translate SQL into storage engine api call – Generate physical execute plan by linq4j java implementation • Result Enumerator – Translate storage engine result into java implementation result. • SQL Function – Add HyperLogLog for distinct count – Implement date time related functions (i.e. Quarter) 20
  • 21. Storage Engine - Overview • Provide cube query for query engine – Common iterator interface for storage engine – Isolate query engine from underline storage • HBase Storage – HBase schema – Pre-join & pre-aggregation – Dictionary – HBase coprocessor 21
  • 22. Storage Engine – HBase Schema 22
  • 23. Storage Engine – Pre-join & pre-aggregation • Pre-join (Hive) – Kylin will generate pre-join HQL based on metadata that will join the fact table with dimension tables – Kylin will use Hive to execute the pre-join HQL that will generate the pre-joined flat table • Pre-aggregation (MapReduce) – Kylin will generate a minimum spanning tree of cuboid from cube lattice graph. – Kylin will generate MapReduce job base on the minimum spanning tree – MapReduce job will do pre-aggregation to generate all cuboids from N dimensions to 1 dimension. • MOLAP Cube (HBase) – The pre-join & pre-aggregation results (i.e. MOLAP Cube) is stored in HBase 23
  • 24. Storage Engine – Dictionary • Dictionary maps dimension values into IDs that will reduce the memory and storage footprint. • Metadata can define whether one dimension use “dictionary” or not • Dictionary is based on Trie 24
  • 25. Storage Engine – HBase Coprocessor • HBase coprocessor can reduce network traffic and parallelize scan logic • HBase client – Serialize the filter + dimension + metrics into bytes – Send the encoded bytes to corprocessor • HBase Corprocessor – Deserialize the bytes to filter + dimension + metrics – Iterate all rows from each scan range • Filter unmatched rows • Aggregate the matched rows by dimensions in an cache • Send back the aggregated rows from cache 25
  • 26. Cube Build - Overview • Multi Staged Build • Map Reduce Job Flow • Cube Build Steps 26
  • 27. Cube Build – Multi Staged Build 27
  • 28. Cube Build – Map Reduce Job Flow 28
  • 29. Cube Build – Steps • Build dictionary from dimension tables on local disk. And copy dictionary to HDFS • Run Hive query to build a joined flatten table • Run map reduce job to build cuboids in HDFS sequence files from tier 1 to tier N • Calculate the key distribution of HDFS sequence files. And evenly split the key space into K regions. • • Translate HDFS sequence files into HBase Hfile • Bulk load the HFile into HBase 29
  • 30. Cube Optimization - Overview • “Curse of dimensionality”: N dimension cube has 2N cuboid – Full Cube vs. Partial Cube • Hugh data volume – Incremental Build • Slow Table Scan – TopN Query on High Cardinality Dimension – Bitmap inverted index – Time range partition – In-memory parallel scan: block cache + endpoint coprocessor 30
  • 31. Cube Optimization – Full Cube vs. Partial Cube • Full Cube – Pre-aggregate all dimension combinations – “Curse of dimensionality”: N dimension cube has 2N cuboid. • Partial Cube – To avoid dimension explosion, we divide the dimensions into different aggregation groups – For cube with 30 dimensions, if we divide these dimensions into 3 group, the cuboid number will reduce from 1 Billion to 3 Thousands – Tradeoff between online aggregation and offline pre-aggregation 31
  • 32. Cube Optimization – Partial Cube 32
  • 33. Cube Optimization – Incremental Build 33
  • 34. Cube Optimization – TopN Query on High Cardinality Dimension 34 • Bitmap inverted index • Separate high cardinality dimension from low cardinality dimension • Time range partition • In-memory (block cache) • Parallel scan (endpoint coprocessor)
  • 35. Kylin Resources • Web Site http://kylin.io • Google Groups https://groups.google.com/forum/#!forum/kylin-olap • Source Code https://github.com/KylinOLAP/Kylin 35
  • 36. Q & A 36