SlideShare a Scribd company logo
1 of 38
Download to read offline
使⽤用 Alluxio 加速云上 OLAP 分析
史少锋
Kyligence 资深架构师
shaofeng.shi@kyligence.io
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
议程
• Apache Kylin and Kyligence Inc.
• Kyligence Analytics Platform
• KAP in the Cloud
• Alluxio + KAP
• Summary
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
Apache Kylin:全球领先的大数据分析技术(OLAP-on-Hadoop)
全球最大的开源软件基金会
• 顶级项目
Ø Apache Kylin, 中国第一个
Apache顶级开源项目,核心
开发者及贡献者都在中国
• 行业认可
Ø 连续两年荣获InfoWorld“最佳
开源大数据工具奖”,与
Spark,TensorFlow一起获奖
• 用户认可
Ø 全球超过500家领先企业使用
Kylin大数据分析平台解决方案
与Apache Kylin团队一起合作使
Kylin通过孵化成为顶级项目对我而言
非常激动人心,Kylin在技术方面当然
是振奋人心的,但同样令人兴奋的是
Kylin代表了亚洲国家,特别是中国,
在开源社区中越来越高的参与度。
—Ted Dunning, Apache 孵化项目副总裁
• 生态社区
Ø 活跃的社区,众多用户及开发者,
广泛的开源、商业合作伙伴体系
• 技术优势
Ø 基于预计算+并行计算+列式存储
等优化技术,实现海量数据+高
并发+亚秒级响应的实时数据分
析平台
44
0.32
0
10
20
30
40
50
SparkSQL Kylin
某金融机构,6.9亿数据,15年数据,查询Top用户
SQL查询延迟
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
Kyligence = Kylin + Intelligence
构建领先的
全球开源社区
企业级
产品
专业服务
管理
与
自动化
云计算
行业
解决方案
Apache Kylin原创团队组建
ü 拥有50% Apache Kylin PMC
ü 贡献90%+的Kylin源代码
以Kylin为核心的企业级产品
ü KAP:企业级OLAP平台
ü Kyligence Cloud: 云计算+大数据+智能运维
全方位的原厂专业服务
ü 产品支持 & 认证培训
ü 平台实施 & 架构咨询
ü 硅谷上海 & 全球服务
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
关于我们
2014.11
加入Apache孵化器,Apache
Kylin正式开源
2015.11
毕业成为Apache
顶级项目
2016.3
Kyligence 公司建立,
获得红点创投数百万天使投资
2017.4
完成A轮融资(800
万美金),由宽带资
本、顺为资本领投,
红点中国跟投
2016.8
发布企业级智能大
数据解决方案
Kyligence
Analytics Platform
2017.5
Kyligence美
国分公司成立
2016.9
二次获得InfoWorld
最佳开源大数据工具
奖
2017.8
Kyligence成为
AWS Technical
Partner
2017.9
Kyligence Robot
发布,支持
Apache Kylin在
线智能优化
2017.12
Kyligence Cloud
发布
Kyligence Analytics
Platform
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
Kyligence Analytics Platform: 搭建⽤用户和⼤大数据之间的桥梁梁
• 高性能:亚秒级查询延迟,
满足交互式分析的时效性要
求,为mission-critical场景高
度优化
• 高并发:线性扩展,满足大
数据时代爆发的数据分析需
求,支持internet scale在线服
务
• 易使用:标准SQL访问,降
低技术门槛,屏蔽复杂的技术
接口
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
Kyligence Analytics Platform (KAP) 架构
Kyligence
RoBot
在线自助服务平台
为DevOps提供
系统监控
Cube优化
SQL调优
Kyligence Analytics Platform
Kylin/Open Source KAP/Commercial Online Service
Apache Kylin
Open Source
OLAP On Hadoop
KyAnalyzer
Agile BI
KyStudio
Data Model Designer
KyManager
Administrator Tool
KyStorage
Columnar Storage
Security
Cell Level ACL
On Demand
Deployment On-Premises On-Hybrid On-Cloud
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
Apache Kylin to KAP
新兴Hadoop技术
分布式计算框架
Scale-Out架构
SQL查询性能差
传统DW产品
经典OLAP理论
Scale-Up架构
Cube容量、性能、并发受限
Apache
Kylin
OLAP
预计算
+
Hadoop
计算框架
KAP
Kylin
+灵活查询
+明细查询
+智能优化
+企业级安全
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
空间换时间:Cube基础原理简介
数据立方体:是一种多维分析的技术,通过预计算,
将计算结果存储在某多个维度值所映射的空间中,在
查询时通过对Cube的再处理而快速获取结果。
维度模型:数据仓库建设中的一种数据建模
方式,按照事实表、维度表的方式来进行数
据建模,星型模型是应用最广泛的方法
预先进行
汇总、分类、排序
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
Cube预计算是KAP核心技术理念
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
利用Hadoop强大的并行计算能力
Kyligence
Analytics Platform
KyAnalyzer,BI Tools, Web App…
ANSI SQL
KyStorage
Map Reduce/Spark/Streaming…
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
预计算能够充分保证查询性能的稳定
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
KAP: 超高性能、超高并发
在标准性能测试数据集上,提供亚秒级查询响应,相对Hive有百倍以上加速比
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
Kylin/KAP 全球典型用户
互联网
• eBay
• Yahoo!
Japan
• Baidu 地图
• 美团点评
• 网易
• Expedia
• 京东
• 唯品会
• 360
• 今日头条
大金融
• 太平洋保险
• 花旗银行
• 银联
• 华泰证券
• 国泰证券
• 陆金所
• JPMorgan
电信
• 中国移动
• 中国电信
• 中国联通
• AT&T
制造业
• 上汽集团
• 华为
• 联想
• OPPO
• 小米
• VIVO
其他
• MachineZone
• Inovex
• Glispa
• Adobe
• 科大讯飞
统计数据来与公开渠道及Kylin社区
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
Kyligence in the Cloud
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
Kyligence is the partner of Azure and AWS
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
KAP has landed Azure
KAP has on boarded Azure global and Mooncake
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
Kyligence Cloud: 一键式部署PaaS服务,支持多朵云
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
Kyligence Cloud:解决云上⼤大数据困难的问题
• 一键部署:在几分钟内完成KAP及
Hadoop部署
• 动态伸缩:基于实际使用情况动态伸缩
计算资源,实现高扩展性。
• Cloud Native: S3 as storage, Auto
scaling, Cloud Formation模版部署
• 节省成本: 读写分离、按需启停可有效
节省运营成本
• 无缝集成BI: 从Hadoop到KAP到BI
工具,在AWS云上获得端到端的解决方案
• 轻松运维:全托管站点令运维更轻松,
使您将注意力集中到业务中
Cloud
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
Hadoop & KAP 在云上的挑战
• Local Disk 变的易失;HDFS不再是云上适合 Hadoop/Spark 的可靠存储
• VM 删除时,数据一并擦除,导致HDFS产生丢失块
• Local Disk 价格昂贵
• 计算与存储分离的架构
• AWS S3, Azure Blob Store 等是更可靠,成本更低的存储服务,适合大数据场景
• 将计算与存储分离,使得架构变成真正可扩展;AWS EMR, Azure HDInsight 支持 S3, WASB 做 Hadoop
文件存储
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
Hadoop & KAP 在云上的挑战
• S3,Blob Store 与 HDFS 区别大
• 性能受网络带宽影响大
• 最终一致性
• Meta Data 操作耗时
• KAP 云上方案
• 临时方案:HDFS 用作计算,S3 做备份;
• 更好方案:需要一种透明的,在 S3 之上的快速缓存层
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
KAP + Alluxio
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
Alluxio
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
Highlights of Alluxio
• Memory speed virtual distributed
storage system
• Spark/MapReduce can run over Alluxio
just like other FS
• Support most cloud storage services
like S3, GCS, WASB
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
KAP + Alluxio 架构
• Alluxio 挂载 S3 bucket 做为底层文件系统
• KAP 使用 Alluxio 作为文件系统,替代 S3
• 对应用程序透明,几乎没有代码改动
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
Alluxio 与 EMR 的集成部署
• EMR的master节点部署alluxio master;Core节点启动alluxio worker; 通过bootstrap
action安装
• https://github.com/shaofengshi/emr-bootstrap-alluxio
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
KAP 配置使用 Alluxio
• 配置 Alluxio
• 拷贝 alluxio-core-client-runtime-<version>.jar 到KAP的spark目录
• 拷贝 alluxio-site.properties 到 spark/conf
• 使用 S3 做写操作的文件系统
• kylin.env.hdfs-working-dir=s3://mybucket/kylin
• 使用 Alluxio 做读操作的文件系统
• kylin.storage.columnar.file-system=alluxio://<master-node>:19998/
• 不需要开启读写分离开关
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
KAP 使用 Alluxio 曾遇到的问题
• 性能不升反降
• Alluxio 部署在独立的集群上,查询性能反而更慢
• 解决办法:部署在与 Spark 相同集群
• 新文件在 Alluxio 中找不到
• 新文件写入 S3 后,从 Alluxio 查询不到
• 解决办法:递归 ls 上级目录,触发 Alluxio 与 S3 同
步meta data
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
KAP 使用 Alluxio 曾遇到的问题 (cont.)
• Azure 相关文档少
• 较多配置、jar冲突等问题;
• 解决办法:使用新版本Azure Storage Java lib,
使用HDInsight script action自动安装和卸载
• 自动化安装脚本:
https://github.com/shaofengshi/hdinsight-
scriptaction-alluxio
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
SSB Benchmark – S3 vs Alluxio
• https://github.com/Kyligence/ssb-kylin
• Raw data: 91 millions; Cube size: 20 GB
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
SSB Benchmark – S3 vs Alluxio
• In average, KAP query latency is reduced to ¼ on Alluxio than on S3
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
User profile – WASB vs local HDFS vs Alluxio
• User behavior data, 200 millions rows
• Cube size 15 GB
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
User profile – WASB vs local HDFS vs Alluxio
• Alluxio provides close to local HDFS performance, which is 3 to 4X faster than
WASB
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
Summary
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
Summary
• 结论
• Alluxio 能帮助 KAP 透明加速云上的 OLAP 查询,获得与本地数据接近的性能
• 后续
• Alluxio 提供统一的数据命名空间,帮助 KAP 接入和管理不同数据层
• 利用分层存储,支持缓存更大数据量
• 通过分析 Alluxio 数据使用情况,统计和优化 KAP 的存储使用
Confidential, all rights reserved ©Kyligence Inc.
http://kyligence.io
免费90天试用Kyligence Cloud
• Kyligence Cloud is a managed Apache
Kylin service that offers elastic
enterprise OLAP on Hadoop in the
cloud.
• Support Azure / AWS in Global + China
regions.
• Console: https://cloud.kyligence.io
THANK YOU
网站:http://kyligence.io
邮箱:info@kyligence.io
Twitter:@Kyligence

More Related Content

What's hot

Apache hadoop and cdh(cloudera distribution) introduction 基本介紹
Apache hadoop and cdh(cloudera distribution) introduction 基本介紹Apache hadoop and cdh(cloudera distribution) introduction 基本介紹
Apache hadoop and cdh(cloudera distribution) introduction 基本介紹Anna Yen
 
OpenStack and Docke Integration V6
OpenStack and Docke Integration V6OpenStack and Docke Integration V6
OpenStack and Docke Integration V6Guangya Liu
 
Cloudera introduction
Cloudera introductionCloudera introduction
Cloudera introductionPhate334
 
架設Hadoop叢集以及mapreduce開發環境
架設Hadoop叢集以及mapreduce開發環境架設Hadoop叢集以及mapreduce開發環境
架設Hadoop叢集以及mapreduce開發環境Phate334
 
FIT2CLOUD:云管理及DevOps协作平台
FIT2CLOUD:云管理及DevOps协作平台FIT2CLOUD:云管理及DevOps协作平台
FIT2CLOUD:云管理及DevOps协作平台Fit2Cloud
 
Introduction to K8S Big Data SIG
Introduction to K8S Big Data SIGIntroduction to K8S Big Data SIG
Introduction to K8S Big Data SIGJazz Yao-Tsung Wang
 
Full Stack Monitoring with Prometheus and Grafana (Updated)
Full Stack Monitoring with Prometheus and Grafana (Updated)Full Stack Monitoring with Prometheus and Grafana (Updated)
Full Stack Monitoring with Prometheus and Grafana (Updated)Jazz Yao-Tsung Wang
 
美团点评技术沙龙14美团云-Docker平台
美团点评技术沙龙14美团云-Docker平台美团点评技术沙龙14美团云-Docker平台
美团点评技术沙龙14美团云-Docker平台美团点评技术团队
 
阿里自研数据库 Ocean base实践
阿里自研数据库 Ocean base实践阿里自研数据库 Ocean base实践
阿里自研数据库 Ocean base实践drewz lin
 
Hybrid Cloud Based on Ceph Object Storage - ShanChun
Hybrid Cloud Based on Ceph Object Storage - ShanChunHybrid Cloud Based on Ceph Object Storage - ShanChun
Hybrid Cloud Based on Ceph Object Storage - ShanChunCeph Community
 
Cloud formation 基礎設施即程式碼和aws資源佈建-workshop
Cloud formation 基礎設施即程式碼和aws資源佈建-workshopCloud formation 基礎設施即程式碼和aws資源佈建-workshop
Cloud formation 基礎設施即程式碼和aws資源佈建-workshopCKmates
 
Bd paa s - big-data platform as a service
Bd paa s - big-data platform as a serviceBd paa s - big-data platform as a service
Bd paa s - big-data platform as a serviceinwin stack
 
Continuous Delivery - Opening
Continuous Delivery - OpeningContinuous Delivery - Opening
Continuous Delivery - OpeningRick Hwang
 
Train.IO 【第六期-OpenStack 二三事】
Train.IO 【第六期-OpenStack 二三事】Train.IO 【第六期-OpenStack 二三事】
Train.IO 【第六期-OpenStack 二三事】inwin stack
 
AWS雲端架構師 培訓&考試課程介紹
AWS雲端架構師 培訓&考試課程介紹AWS雲端架構師 培訓&考試課程介紹
AWS雲端架構師 培訓&考試課程介紹QCloudMentor
 
00.exalogic概览
00.exalogic概览00.exalogic概览
00.exalogic概览Meng He
 
Establish The Core of Cloud Computing Application by Using Hazelcast (Chinese)
Establish The Core of  Cloud Computing Application  by Using Hazelcast (Chinese)Establish The Core of  Cloud Computing Application  by Using Hazelcast (Chinese)
Establish The Core of Cloud Computing Application by Using Hazelcast (Chinese)Joseph Kuo
 
新一代企業級雲端資料庫系統
新一代企業級雲端資料庫系統新一代企業級雲端資料庫系統
新一代企業級雲端資料庫系統iServDB & iServCloud
 
Comboware ComboStack 202105
Comboware ComboStack 202105Comboware ComboStack 202105
Comboware ComboStack 202105Elroy Peng
 

What's hot (20)

Databases on AWS
Databases on AWSDatabases on AWS
Databases on AWS
 
Apache hadoop and cdh(cloudera distribution) introduction 基本介紹
Apache hadoop and cdh(cloudera distribution) introduction 基本介紹Apache hadoop and cdh(cloudera distribution) introduction 基本介紹
Apache hadoop and cdh(cloudera distribution) introduction 基本介紹
 
OpenStack and Docke Integration V6
OpenStack and Docke Integration V6OpenStack and Docke Integration V6
OpenStack and Docke Integration V6
 
Cloudera introduction
Cloudera introductionCloudera introduction
Cloudera introduction
 
架設Hadoop叢集以及mapreduce開發環境
架設Hadoop叢集以及mapreduce開發環境架設Hadoop叢集以及mapreduce開發環境
架設Hadoop叢集以及mapreduce開發環境
 
FIT2CLOUD:云管理及DevOps协作平台
FIT2CLOUD:云管理及DevOps协作平台FIT2CLOUD:云管理及DevOps协作平台
FIT2CLOUD:云管理及DevOps协作平台
 
Introduction to K8S Big Data SIG
Introduction to K8S Big Data SIGIntroduction to K8S Big Data SIG
Introduction to K8S Big Data SIG
 
Full Stack Monitoring with Prometheus and Grafana (Updated)
Full Stack Monitoring with Prometheus and Grafana (Updated)Full Stack Monitoring with Prometheus and Grafana (Updated)
Full Stack Monitoring with Prometheus and Grafana (Updated)
 
美团点评技术沙龙14美团云-Docker平台
美团点评技术沙龙14美团云-Docker平台美团点评技术沙龙14美团云-Docker平台
美团点评技术沙龙14美团云-Docker平台
 
阿里自研数据库 Ocean base实践
阿里自研数据库 Ocean base实践阿里自研数据库 Ocean base实践
阿里自研数据库 Ocean base实践
 
Hybrid Cloud Based on Ceph Object Storage - ShanChun
Hybrid Cloud Based on Ceph Object Storage - ShanChunHybrid Cloud Based on Ceph Object Storage - ShanChun
Hybrid Cloud Based on Ceph Object Storage - ShanChun
 
Cloud formation 基礎設施即程式碼和aws資源佈建-workshop
Cloud formation 基礎設施即程式碼和aws資源佈建-workshopCloud formation 基礎設施即程式碼和aws資源佈建-workshop
Cloud formation 基礎設施即程式碼和aws資源佈建-workshop
 
Bd paa s - big-data platform as a service
Bd paa s - big-data platform as a serviceBd paa s - big-data platform as a service
Bd paa s - big-data platform as a service
 
Continuous Delivery - Opening
Continuous Delivery - OpeningContinuous Delivery - Opening
Continuous Delivery - Opening
 
Train.IO 【第六期-OpenStack 二三事】
Train.IO 【第六期-OpenStack 二三事】Train.IO 【第六期-OpenStack 二三事】
Train.IO 【第六期-OpenStack 二三事】
 
AWS雲端架構師 培訓&考試課程介紹
AWS雲端架構師 培訓&考試課程介紹AWS雲端架構師 培訓&考試課程介紹
AWS雲端架構師 培訓&考試課程介紹
 
00.exalogic概览
00.exalogic概览00.exalogic概览
00.exalogic概览
 
Establish The Core of Cloud Computing Application by Using Hazelcast (Chinese)
Establish The Core of  Cloud Computing Application  by Using Hazelcast (Chinese)Establish The Core of  Cloud Computing Application  by Using Hazelcast (Chinese)
Establish The Core of Cloud Computing Application by Using Hazelcast (Chinese)
 
新一代企業級雲端資料庫系統
新一代企業級雲端資料庫系統新一代企業級雲端資料庫系統
新一代企業級雲端資料庫系統
 
Comboware ComboStack 202105
Comboware ComboStack 202105Comboware ComboStack 202105
Comboware ComboStack 202105
 

Similar to Kyligence Leverages Alluxio to Accelerate OLAP in the Cloud

Kubernetes project update and how to contribute
Kubernetes project update and how to contributeKubernetes project update and how to contribute
Kubernetes project update and how to contributeinwin stack
 
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告Etu Solution
 
雲端環境的快取策略-Global Azure Bootcamp 2015 臺北場
雲端環境的快取策略-Global Azure Bootcamp 2015 臺北場雲端環境的快取策略-Global Azure Bootcamp 2015 臺北場
雲端環境的快取策略-Global Azure Bootcamp 2015 臺北場twMVC
 
Hyper: 让Pod以VM为边界
Hyper: 让Pod以VM为边界Hyper: 让Pod以VM为边界
Hyper: 让Pod以VM为边界Xu Wang
 
Oracle 全方位云解决方案概要
Oracle 全方位云解决方案概要Oracle 全方位云解决方案概要
Oracle 全方位云解决方案概要Ethan M. Liu
 
数据科学分析协作平台CDSW
数据科学分析协作平台CDSW数据科学分析协作平台CDSW
数据科学分析协作平台CDSWJianwei Li
 
AWS Summit: Strikingly analytics
AWS Summit:  Strikingly analyticsAWS Summit:  Strikingly analytics
AWS Summit: Strikingly analyticsChase Zhang
 
KubeVela:标准化的云原生平台构建引擎
KubeVela:标准化的云原生平台构建引擎KubeVela:标准化的云原生平台构建引擎
KubeVela:标准化的云原生平台构建引擎suncbing1
 
Oracle saa s paas overview
Oracle saa s paas overviewOracle saa s paas overview
Oracle saa s paas overviewChris Lee
 
Oracle雲端服務介紹 taiwan
Oracle雲端服務介紹   taiwanOracle雲端服務介紹   taiwan
Oracle雲端服務介紹 taiwanChieh-An Yu
 
美团技术沙龙04 美团下一代分布式存储系统
美团技术沙龙04   美团下一代分布式存储系统美团技术沙龙04   美团下一代分布式存储系统
美团技术沙龙04 美团下一代分布式存储系统美团点评技术团队
 
MySQL5.6&5.7 Cluster 7.3 Review
MySQL5.6&5.7 Cluster 7.3 ReviewMySQL5.6&5.7 Cluster 7.3 Review
MySQL5.6&5.7 Cluster 7.3 Review郁萍 王
 
Cloudera企业数据中枢平台
Cloudera企业数据中枢平台Cloudera企业数据中枢平台
Cloudera企业数据中枢平台Jianwei Li
 
Raising The MySQL Bar-Manyi Lu
Raising The MySQL Bar-Manyi LuRaising The MySQL Bar-Manyi Lu
Raising The MySQL Bar-Manyi Lu郁萍 王
 
分会场八和Net backup一起进入云备份时代
分会场八和Net backup一起进入云备份时代分会场八和Net backup一起进入云备份时代
分会场八和Net backup一起进入云备份时代ITband
 
Kube-OVN Introduction
Kube-OVN IntroductionKube-OVN Introduction
Kube-OVN Introduction梦馨 刘
 
2015中国软件技术大会-开放云介绍
2015中国软件技术大会-开放云介绍2015中国软件技术大会-开放云介绍
2015中国软件技术大会-开放云介绍Li Jiansheng
 
hicloud PaaS 雲創平台 for java developer
hicloud PaaS 雲創平台 for java developerhicloud PaaS 雲創平台 for java developer
hicloud PaaS 雲創平台 for java developerhicloud-paas
 

Similar to Kyligence Leverages Alluxio to Accelerate OLAP in the Cloud (20)

Kubernetes project update and how to contribute
Kubernetes project update and how to contributeKubernetes project update and how to contribute
Kubernetes project update and how to contribute
 
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
 
雲端環境的快取策略-Global Azure Bootcamp 2015 臺北場
雲端環境的快取策略-Global Azure Bootcamp 2015 臺北場雲端環境的快取策略-Global Azure Bootcamp 2015 臺北場
雲端環境的快取策略-Global Azure Bootcamp 2015 臺北場
 
Hyper: 让Pod以VM为边界
Hyper: 让Pod以VM为边界Hyper: 让Pod以VM为边界
Hyper: 让Pod以VM为边界
 
Oracle 全方位云解决方案概要
Oracle 全方位云解决方案概要Oracle 全方位云解决方案概要
Oracle 全方位云解决方案概要
 
数据科学分析协作平台CDSW
数据科学分析协作平台CDSW数据科学分析协作平台CDSW
数据科学分析协作平台CDSW
 
AWS Summit: Strikingly analytics
AWS Summit:  Strikingly analyticsAWS Summit:  Strikingly analytics
AWS Summit: Strikingly analytics
 
KubeVela:标准化的云原生平台构建引擎
KubeVela:标准化的云原生平台构建引擎KubeVela:标准化的云原生平台构建引擎
KubeVela:标准化的云原生平台构建引擎
 
Oracle saa s paas overview
Oracle saa s paas overviewOracle saa s paas overview
Oracle saa s paas overview
 
Oracle雲端服務介紹 taiwan
Oracle雲端服務介紹   taiwanOracle雲端服務介紹   taiwan
Oracle雲端服務介紹 taiwan
 
美团技术沙龙04 美团下一代分布式存储系统
美团技术沙龙04   美团下一代分布式存储系统美团技术沙龙04   美团下一代分布式存储系统
美团技术沙龙04 美团下一代分布式存储系统
 
MySQL5.6&5.7 Cluster 7.3 Review
MySQL5.6&5.7 Cluster 7.3 ReviewMySQL5.6&5.7 Cluster 7.3 Review
MySQL5.6&5.7 Cluster 7.3 Review
 
Cloudera企业数据中枢平台
Cloudera企业数据中枢平台Cloudera企业数据中枢平台
Cloudera企业数据中枢平台
 
QIoT ,QuAI
QIoT ,QuAI  QIoT ,QuAI
QIoT ,QuAI
 
Raising The MySQL Bar-Manyi Lu
Raising The MySQL Bar-Manyi LuRaising The MySQL Bar-Manyi Lu
Raising The MySQL Bar-Manyi Lu
 
分会场八和Net backup一起进入云备份时代
分会场八和Net backup一起进入云备份时代分会场八和Net backup一起进入云备份时代
分会场八和Net backup一起进入云备份时代
 
Kube-OVN Introduction
Kube-OVN IntroductionKube-OVN Introduction
Kube-OVN Introduction
 
2015中国软件技术大会-开放云介绍
2015中国软件技术大会-开放云介绍2015中国软件技术大会-开放云介绍
2015中国软件技术大会-开放云介绍
 
Retrive&amp;rank
Retrive&amp;rankRetrive&amp;rank
Retrive&amp;rank
 
hicloud PaaS 雲創平台 for java developer
hicloud PaaS 雲創平台 for java developerhicloud PaaS 雲創平台 for java developer
hicloud PaaS 雲創平台 for java developer
 

More from Alluxio, Inc.

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioAlluxio, Inc.
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingAlluxio, Inc.
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio, Inc.
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...Alluxio, Inc.
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionAlluxio, Inc.
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeAlluxio, Inc.
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudAlluxio, Inc.
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderAlluxio, Inc.
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionAlluxio, Inc.
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio, Inc.
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...Alluxio, Inc.
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAlluxio, Inc.
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...Alluxio, Inc.
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...Alluxio, Inc.
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAlluxio, Inc.
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAlluxio, Inc.
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio, Inc.
 

More from Alluxio, Inc. (20)

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
 

Kyligence Leverages Alluxio to Accelerate OLAP in the Cloud

  • 1. 使⽤用 Alluxio 加速云上 OLAP 分析 史少锋 Kyligence 资深架构师 shaofeng.shi@kyligence.io
  • 2. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io 议程 • Apache Kylin and Kyligence Inc. • Kyligence Analytics Platform • KAP in the Cloud • Alluxio + KAP • Summary
  • 3. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io Apache Kylin:全球领先的大数据分析技术(OLAP-on-Hadoop) 全球最大的开源软件基金会 • 顶级项目 Ø Apache Kylin, 中国第一个 Apache顶级开源项目,核心 开发者及贡献者都在中国 • 行业认可 Ø 连续两年荣获InfoWorld“最佳 开源大数据工具奖”,与 Spark,TensorFlow一起获奖 • 用户认可 Ø 全球超过500家领先企业使用 Kylin大数据分析平台解决方案 与Apache Kylin团队一起合作使 Kylin通过孵化成为顶级项目对我而言 非常激动人心,Kylin在技术方面当然 是振奋人心的,但同样令人兴奋的是 Kylin代表了亚洲国家,特别是中国, 在开源社区中越来越高的参与度。 —Ted Dunning, Apache 孵化项目副总裁 • 生态社区 Ø 活跃的社区,众多用户及开发者, 广泛的开源、商业合作伙伴体系 • 技术优势 Ø 基于预计算+并行计算+列式存储 等优化技术,实现海量数据+高 并发+亚秒级响应的实时数据分 析平台 44 0.32 0 10 20 30 40 50 SparkSQL Kylin 某金融机构,6.9亿数据,15年数据,查询Top用户 SQL查询延迟
  • 4. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io Kyligence = Kylin + Intelligence 构建领先的 全球开源社区 企业级 产品 专业服务 管理 与 自动化 云计算 行业 解决方案 Apache Kylin原创团队组建 ü 拥有50% Apache Kylin PMC ü 贡献90%+的Kylin源代码 以Kylin为核心的企业级产品 ü KAP:企业级OLAP平台 ü Kyligence Cloud: 云计算+大数据+智能运维 全方位的原厂专业服务 ü 产品支持 & 认证培训 ü 平台实施 & 架构咨询 ü 硅谷上海 & 全球服务
  • 5. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io 关于我们 2014.11 加入Apache孵化器,Apache Kylin正式开源 2015.11 毕业成为Apache 顶级项目 2016.3 Kyligence 公司建立, 获得红点创投数百万天使投资 2017.4 完成A轮融资(800 万美金),由宽带资 本、顺为资本领投, 红点中国跟投 2016.8 发布企业级智能大 数据解决方案 Kyligence Analytics Platform 2017.5 Kyligence美 国分公司成立 2016.9 二次获得InfoWorld 最佳开源大数据工具 奖 2017.8 Kyligence成为 AWS Technical Partner 2017.9 Kyligence Robot 发布,支持 Apache Kylin在 线智能优化 2017.12 Kyligence Cloud 发布
  • 7. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io Kyligence Analytics Platform: 搭建⽤用户和⼤大数据之间的桥梁梁 • 高性能:亚秒级查询延迟, 满足交互式分析的时效性要 求,为mission-critical场景高 度优化 • 高并发:线性扩展,满足大 数据时代爆发的数据分析需 求,支持internet scale在线服 务 • 易使用:标准SQL访问,降 低技术门槛,屏蔽复杂的技术 接口
  • 8. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io Kyligence Analytics Platform (KAP) 架构 Kyligence RoBot 在线自助服务平台 为DevOps提供 系统监控 Cube优化 SQL调优 Kyligence Analytics Platform Kylin/Open Source KAP/Commercial Online Service Apache Kylin Open Source OLAP On Hadoop KyAnalyzer Agile BI KyStudio Data Model Designer KyManager Administrator Tool KyStorage Columnar Storage Security Cell Level ACL On Demand Deployment On-Premises On-Hybrid On-Cloud
  • 9. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io Apache Kylin to KAP 新兴Hadoop技术 分布式计算框架 Scale-Out架构 SQL查询性能差 传统DW产品 经典OLAP理论 Scale-Up架构 Cube容量、性能、并发受限 Apache Kylin OLAP 预计算 + Hadoop 计算框架 KAP Kylin +灵活查询 +明细查询 +智能优化 +企业级安全
  • 10. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io 空间换时间:Cube基础原理简介 数据立方体:是一种多维分析的技术,通过预计算, 将计算结果存储在某多个维度值所映射的空间中,在 查询时通过对Cube的再处理而快速获取结果。 维度模型:数据仓库建设中的一种数据建模 方式,按照事实表、维度表的方式来进行数 据建模,星型模型是应用最广泛的方法 预先进行 汇总、分类、排序
  • 11. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io Cube预计算是KAP核心技术理念
  • 12. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io 利用Hadoop强大的并行计算能力 Kyligence Analytics Platform KyAnalyzer,BI Tools, Web App… ANSI SQL KyStorage Map Reduce/Spark/Streaming…
  • 13. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io 预计算能够充分保证查询性能的稳定
  • 14. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io KAP: 超高性能、超高并发 在标准性能测试数据集上,提供亚秒级查询响应,相对Hive有百倍以上加速比
  • 15. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io Kylin/KAP 全球典型用户 互联网 • eBay • Yahoo! Japan • Baidu 地图 • 美团点评 • 网易 • Expedia • 京东 • 唯品会 • 360 • 今日头条 大金融 • 太平洋保险 • 花旗银行 • 银联 • 华泰证券 • 国泰证券 • 陆金所 • JPMorgan 电信 • 中国移动 • 中国电信 • 中国联通 • AT&T 制造业 • 上汽集团 • 华为 • 联想 • OPPO • 小米 • VIVO 其他 • MachineZone • Inovex • Glispa • Adobe • 科大讯飞 统计数据来与公开渠道及Kylin社区
  • 16. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io Kyligence in the Cloud
  • 17. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io Kyligence is the partner of Azure and AWS
  • 18. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io KAP has landed Azure KAP has on boarded Azure global and Mooncake
  • 19. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io Kyligence Cloud: 一键式部署PaaS服务,支持多朵云
  • 20. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io Kyligence Cloud:解决云上⼤大数据困难的问题 • 一键部署:在几分钟内完成KAP及 Hadoop部署 • 动态伸缩:基于实际使用情况动态伸缩 计算资源,实现高扩展性。 • Cloud Native: S3 as storage, Auto scaling, Cloud Formation模版部署 • 节省成本: 读写分离、按需启停可有效 节省运营成本 • 无缝集成BI: 从Hadoop到KAP到BI 工具,在AWS云上获得端到端的解决方案 • 轻松运维:全托管站点令运维更轻松, 使您将注意力集中到业务中 Cloud
  • 21. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io Hadoop & KAP 在云上的挑战 • Local Disk 变的易失;HDFS不再是云上适合 Hadoop/Spark 的可靠存储 • VM 删除时,数据一并擦除,导致HDFS产生丢失块 • Local Disk 价格昂贵 • 计算与存储分离的架构 • AWS S3, Azure Blob Store 等是更可靠,成本更低的存储服务,适合大数据场景 • 将计算与存储分离,使得架构变成真正可扩展;AWS EMR, Azure HDInsight 支持 S3, WASB 做 Hadoop 文件存储
  • 22. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io Hadoop & KAP 在云上的挑战 • S3,Blob Store 与 HDFS 区别大 • 性能受网络带宽影响大 • 最终一致性 • Meta Data 操作耗时 • KAP 云上方案 • 临时方案:HDFS 用作计算,S3 做备份; • 更好方案:需要一种透明的,在 S3 之上的快速缓存层
  • 23. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io KAP + Alluxio
  • 24. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io Alluxio
  • 25. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io Highlights of Alluxio • Memory speed virtual distributed storage system • Spark/MapReduce can run over Alluxio just like other FS • Support most cloud storage services like S3, GCS, WASB
  • 26. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io KAP + Alluxio 架构 • Alluxio 挂载 S3 bucket 做为底层文件系统 • KAP 使用 Alluxio 作为文件系统,替代 S3 • 对应用程序透明,几乎没有代码改动
  • 27. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io Alluxio 与 EMR 的集成部署 • EMR的master节点部署alluxio master;Core节点启动alluxio worker; 通过bootstrap action安装 • https://github.com/shaofengshi/emr-bootstrap-alluxio
  • 28. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io KAP 配置使用 Alluxio • 配置 Alluxio • 拷贝 alluxio-core-client-runtime-<version>.jar 到KAP的spark目录 • 拷贝 alluxio-site.properties 到 spark/conf • 使用 S3 做写操作的文件系统 • kylin.env.hdfs-working-dir=s3://mybucket/kylin • 使用 Alluxio 做读操作的文件系统 • kylin.storage.columnar.file-system=alluxio://<master-node>:19998/ • 不需要开启读写分离开关
  • 29. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io KAP 使用 Alluxio 曾遇到的问题 • 性能不升反降 • Alluxio 部署在独立的集群上,查询性能反而更慢 • 解决办法:部署在与 Spark 相同集群 • 新文件在 Alluxio 中找不到 • 新文件写入 S3 后,从 Alluxio 查询不到 • 解决办法:递归 ls 上级目录,触发 Alluxio 与 S3 同 步meta data
  • 30. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io KAP 使用 Alluxio 曾遇到的问题 (cont.) • Azure 相关文档少 • 较多配置、jar冲突等问题; • 解决办法:使用新版本Azure Storage Java lib, 使用HDInsight script action自动安装和卸载 • 自动化安装脚本: https://github.com/shaofengshi/hdinsight- scriptaction-alluxio
  • 31. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io SSB Benchmark – S3 vs Alluxio • https://github.com/Kyligence/ssb-kylin • Raw data: 91 millions; Cube size: 20 GB
  • 32. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io SSB Benchmark – S3 vs Alluxio • In average, KAP query latency is reduced to ¼ on Alluxio than on S3
  • 33. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io User profile – WASB vs local HDFS vs Alluxio • User behavior data, 200 millions rows • Cube size 15 GB
  • 34. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io User profile – WASB vs local HDFS vs Alluxio • Alluxio provides close to local HDFS performance, which is 3 to 4X faster than WASB
  • 35. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io Summary
  • 36. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io Summary • 结论 • Alluxio 能帮助 KAP 透明加速云上的 OLAP 查询,获得与本地数据接近的性能 • 后续 • Alluxio 提供统一的数据命名空间,帮助 KAP 接入和管理不同数据层 • 利用分层存储,支持缓存更大数据量 • 通过分析 Alluxio 数据使用情况,统计和优化 KAP 的存储使用
  • 37. Confidential, all rights reserved ©Kyligence Inc. http://kyligence.io 免费90天试用Kyligence Cloud • Kyligence Cloud is a managed Apache Kylin service that offers elastic enterprise OLAP on Hadoop in the cloud. • Support Azure / AWS in Global + China regions. • Console: https://cloud.kyligence.io