SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
1©	Cloudera,	Inc.	All	rights	reserved.
Cloudera	Data	Science	Workbench
企业级数据科学家自助分析合作平台
李建伟|大数据架构师@Cloudera
2©	Cloudera,	Inc.	All	rights	reserved.
议程
• 数据科学及其面临的挑战
• CDSW功能介绍
• CDSW原理及架构
• 基于CDSW实现客户流失预警
• Q&A
3©	Cloudera,	Inc.	All	rights	reserved.
客户流失预警
4©	Cloudera,	Inc.	All	rights	reserved.
客户流失预警
‹#›©	Cloudera,	Inc.	All	rights	reserved.
•KS, 128, 415, 382-4657, no, yes, 25, 265.1, 110, 45.07, 197.4, 99, 16.78, 244.7, 91, 11.01, 10, 3, 2.7, 1, False.
•OH, 107, 415, 371-7191, no, yes, 26, 161.6, 123, 27.47, 195.5, 103, 16.62, 254.4, 103, 11.45, 13.7, 3, 3.7, 1, False.
•NJ, 137, 415, 358-1921, no, no, 0, 243.4, 114, 41.38, 121.2, 110, 10.3, 162.6, 104, 7.32, 12.2, 5, 3.29, 0, False.
•OH, 84, 408, 375-9999, yes, no, 0, 299.4, 71, 50.9, 61.9, 88, 5.26, 196.9, 89, 8.86, 6.6, 7, 1.78, 2, False.
•OK, 75, 415, 330-6626, yes, no, 0, 166.7, 113, 28.34, 148.3, 122, 12.61, 186.9, 121, 8.41, 10.1, 3, 2.73, 3, True
客户时间,号码,国际漫游,语⾳邮箱,留⾔个数,⽩天电话分钟,⽩天电话次数,⽩
天电话费⽤,晚上…,半夜…, 客服电话次数,是否流失?
客户流失预警-数据
6©	Cloudera,	Inc.	All	rights	reserved.
设备维护的三种⽅式
维修维护
• 设备出现故障后⼈⼯维修
• 被动响应
预防维护
• 定期对设备进⾏维护
• 定期响应
预测维护
• 持续监控设备的运⾏指标,
根据异常情况进⾏干预维护
• 主动响应
业务价值
被动 主动
⼤多数企业利⽤这两种⽅式
7©	Cloudera,	Inc.	All	rights	reserved.
• 通过传感器实时监控设备的状态及性能指标
• 检测异常变量,模式可能导致潜在的故障,预测设备何时会发生故障
• 制定相应的检修,维护计划
降低成本
减少宕机时间
提升质量
8©	Cloudera,	Inc.	All	rights	reserved.
1
预测性维护的业务价值
通过实时数据预测,预防
系统宕机,减少宕机时间
50%
50%
维修&更换 预测&预防
预测性维护减少设备维
护成本10%到40%
40%
减少宕机时间
- 数据源:	麦肯锡
降低成本
9©	Cloudera,	Inc.	All	rights	reserved.
能源
» 设备故障预测
» 提高生产效率
» 降低成本
风力发电
案例分析
• ⻛机整体状态评估
• 测⻛仪健康状态评估
• ⻛机内部齿轮箱状态评估
• ⻛机外部部件状态评估
10©	Cloudera,	Inc.	All	rights	reserved.
⻛机故障预测
11©	Cloudera,	Inc.	All	rights	reserved.
开放的数据科学⼯具集
12©	Cloudera,	Inc.	All	rights	reserved.
数据科学面临的挑战
数据工程 数据科学 (Exploratory) 生产(Operational)
Data	Governance
⼤部分的数据科学算法
在个⼈的⼯具上⼩规模
数据上运⾏,⽅案很难
复制
很少的模型进⼊⽣
产阶段
不同的部⻔,团队对⼯
具,编程语⾔有不同的
需求
需要在不同系统之间进
⾏⼤量数据拷⻉
13©	Cloudera,	Inc.	All	rights	reserved.
数据科学家遇到的问题
数据访问
l 企业内部的数据由于安全的限制
不能访问⼤数据集群的数据
l 已有的数据分析⼯具不能对接企
业的Hadoop系统
平台扩展
n 个⼈电脑提供有限的存储及
计算能⼒
n 基于抽样数据进⾏建模
n 模型训练时间⻓(基于SAS的
模型训练8⼩时)
⽤户体验
l 软件⼯具版本维护困难
l Python vs R
l Python 2.7 vs 3.5
l 开发的模型很难上⽣产环境
l Notebooks ⼯具很难对接⼤数
据技术
14©	Cloudera,	Inc.	All	rights	reserved.
IT团队遇到的问题
• 多租户管理:
• 多个软件的管理及软件的依赖关系管理
• 软件的版本管理
• 数据⼯程师与数据科学家的集群共享
• 安全监管:
• 通过Notebook⼯具,失去数据⾎统关系分析
• 数据质量与数据拷⻉:
• 本地数据拷⻉过期
• 多个数据集拷⻉
15©	Cloudera,	Inc.	All	rights	reserved.
Hadoop与机器学习
提升数据科学效率,缩短挖掘数据价值时间
数据资源 数据消化 分布存储和处理 数据分析和智能(机器学习)
Apache Kafka
Stream or batch ingestion of IoT
data
Apache Sqoop
Ingestion of data from relational
sources
Apache HDFS
Storage (HDFS) & deep batch
processing
Apache Kudu
Storage & serving for fast changing
data
Apache HBase
NoSQL data store for real time
applications
Apache Impala
MPP SQL for fast analytics
Cloudera Search
Real time searchIoT数据
企业内部数据 安全, 扩展& 易管理
部署灵活:
数据中⼼ 云
Apache Spark
Stream & iterative processing, ML
16©	Cloudera,	Inc.	All	rights	reserved.
Hadoop与机器学习
提升数据科学效率,缩短挖掘数据价值时间
• 更多的数据,不止更好的算法
• 更多种类数据,不止结构化数据
• 更多计算引擎, 不止基于Schema的SQL引擎
• 易于水平扩展 vs 垂直扩展
• 一个平台,多个计算框架,支持批处理,流处理,数据服务等 vs 多个系统
17©	Cloudera,	Inc.	All	rights	reserved.
https://medium.com/@KevinSchmidtBiz/data-engineer-vs-data-scientist-vs-business-analyst-b68d201364bc
18©	Cloudera,	Inc.	All	rights	reserved.
Cloudera	Data	Science	Workbench
企业级⾃服务数据科学平台
• 基于Hadoop进⾏数据科学分析
• 数据集中存放在HDFS
• 利⽤Spark, Impala及其他Hadoop计算引擎
• 解决分析“烟囱”问题
• ⾃服务协作平台
• 在浏览器上运⾏Python, R及Scala
• ⾃定义项⺫软件,环境变量
• 数据分析过程合作,分析结果共享
• 满⾜企业⽤户需求
• 业务部⻔⾃服务数据探索分析
• 保证数据安全前提下的数据分析(Kerberos)
• 部署灵活:数据中⼼,云
19©	Cloudera,	Inc.	All	rights	reserved.
Cloudera企业数据中心
数据治理 运维管理
CDH – 100% 开源 商业版
公有云
数据中心
所有X86服务器
部署
云应⽤迁移
Navigator
Optimizer
传统数据库
迁移到
Hadoop
Cloudera Data Science Workbench (CDSW)
R,	Python,	Scala
Data	Science	at	Scale
PaaS
私有云
数据加⼯、处理 发现与分析 在线服务
统⼀数据服务
存储
批处理 流处理 SQL 全⽂检索 建模 在线
资源管理— YARN, Zookeeper
安全管理— SENTRY + Record Service
MR,
HIve, Pig
Spark
Streaming
Impala Solr Spark
MLLib
HBase
HDFS Kudu HBase
数据接⼊ — Sqoop, Flume, Kafka
分布式⽂件系统 关系数据 NoSQL
Cloudera
Navigator
安全
审计
溯源
加密
Cloudera
Manager
管理
监控
诊断
集成
Cloudera
Director
云上⼤数
据
20©	Cloudera,	Inc.	All	rights	reserved.
端到端的数据科学流程
数据工程 数据科学(Exploratory) 生产 (Operational)
数据清洗
特征选择
数据可视化及
分析
模型训练及测
试
生产模型准备
离线应用
在线应有
模型
服务
开发工具:	IDEs/Notebooks,	合作 运维工具:	版本控制,	定期作业,	工作流,	模型发布
Data	Governance数据转换
数据预处理
数据获取
模型质量
模型试验
21©	Cloudera,	Inc.	All	rights	reserved.
Cloudera	Data	Science	Workbench
企业级⾃服务数据科学平台
开发
集成工具
运维
作业管理
22©	Cloudera,	Inc.	All	rights	reserved.
Cloudera	Data	Science	Workbench
企业级⾃服务数据科学平台
开发
集成工具
运维
作业管理
23©	Cloudera,	Inc.	All	rights	reserved.
功能特性-数据预处理
支持多种类
型数据源,
简化了数据
建模、分析
前大量繁重、
重复的数据
加工、清洗
工作
24©	Cloudera,	Inc.	All	rights	reserved.
功能特性-开发模型
使用最强大的工具,包括R,Python,SQL,
Spark等,来构建数据科学和高级分析解决方
案,加速数据科学从探索到部署。
25©	Cloudera,	Inc.	All	rights	reserved.
功能特性-数据可视化
自动部署模型程序,发布数据可视化图表,实现数据科学家和业务团
队紧密合作,构建分析管道和模型,为企业带来更深入的洞察。
26©	Cloudera,	Inc.	All	rights	reserved.
功能特性-作业调度管理
构建及管理R,Python,SQL,Spark等的ETL和模型分析工作流。 构建
分析基础架构,实现无限制的分析。
27©	Cloudera,	Inc.	All	rights	reserved.
CDSW部署架构
Cloudera	Manager
HTTP
Users
CDH
Nodes
CDH
Nodes
CDH
Nodes
CDH	Cluster	1
Cloudera	Manager
CDH
Nodes
CDH
Nodes
CDH	Cluster	2
CDSW Application
CDSW
Nodes
CDSW
Nodes
CDSW
Nodes
CDH
Nodes
Config
Spark,	Impala,
Hive,	HDFS,	etc.
•做为“edge node cluster”运⾏
• 在Docker + Kubernetes
• CDH 5.11, Spark 2.0+
•或者是AWS等云环境
• 使⽤虚拟镜像VMs/AMIs
• 脚本化安装
•安全策略⽀持
LDAP/SAML/Kerberos
28©	Cloudera,	Inc.	All	rights	reserved.
CDWS软件架构
CDH
Gateway
CDH
Node
CDH
Node
CDH
Node
Cloudera	Manager
CDSW
Worker	Node
Spark,	Impala,
Hive,	HDFS,	…
CDH
Gateway
CDSW
Master	Node
Docker
Application	Pods
Engine	Pods
Kubernetes
Cloudera	Manager	Agent
CDSW 应用组件及用户负载
容器调度服务
容器运行环境
Local	management	of	CDH	services
CDH
Gateway
CDSW
Worker	Node
29©	Cloudera,	Inc.	All	rights	reserved.
CDSW + Spark Architecture
30©	Cloudera,	Inc.	All	rights	reserved.
• 操作系统: RHEL/CentOS 7.2
• 硬件配置
• 1个主CDSW节点, 0个或多个CDSW从节点
• CPU: 16+ CPU (vCPU) 核
• 内存: 32+ GB
• 硬盘:
• Root Volume: 100+ GB
• Docker Image Block Device(s): 500+ GB
• Application Block Device(s) (Master Node Only): 500+ GB
• 网络:
• 通配域名, 例如: *.cdsw.company.com
• 禁用防火墙
• 建议: 8 CPU cores and 16GB of RAM/用户
⺴关节点要求
‹#›©	Cloudera,	Inc.	All	rights	reserved.
•KS, 128, 415, 382-4657, no, yes, 25, 265.1, 110, 45.07, 197.4, 99, 16.78, 244.7, 91, 11.01, 10, 3, 2.7, 1, False.
•OH, 107, 415, 371-7191, no, yes, 26, 161.6, 123, 27.47, 195.5, 103, 16.62, 254.4, 103, 11.45, 13.7, 3, 3.7, 1, False.
•NJ, 137, 415, 358-1921, no, no, 0, 243.4, 114, 41.38, 121.2, 110, 10.3, 162.6, 104, 7.32, 12.2, 5, 3.29, 0, False.
•OH, 84, 408, 375-9999, yes, no, 0, 299.4, 71, 50.9, 61.9, 88, 5.26, 196.9, 89, 8.86, 6.6, 7, 1.78, 2, False.
•OK, 75, 415, 330-6626, yes, no, 0, 166.7, 113, 28.34, 148.3, 122, 12.61, 186.9, 121, 8.41, 10.1, 3, 2.73, 3, True
客户时间,号码,国际漫游,语⾳邮箱,留⾔个数,⽩天电话分钟,⽩天电话次数,⽩
天电话费⽤,晚上…,半夜…, 客服电话次数,是否流失?
客户流失预警-数据
‹#›©	Cloudera,	Inc.	All	rights	reserved.
建模流程
33©	Cloudera,	Inc.	All	rights	reserved.
获取数据
34©	Cloudera,	Inc.	All	rights	reserved.
特征抽取&特征转换
35©	Cloudera,	Inc.	All	rights	reserved.
训练数据集&测试数据集
‹#›©	Cloudera,	Inc.	All	rights	reserved.
模型效果评估
‹#›©	Cloudera,	Inc.	All	rights	reserved.
模型效果评估: ROC
38©	Cloudera,	Inc.	All	rights	reserved.
模型效果评估
39©	Cloudera,	Inc.	All	rights	reserved.
Thank	you

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Azure SQL DB
Introduction to Azure SQL DBIntroduction to Azure SQL DB
Introduction to Azure SQL DBChristopher Foot
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseDataStax
 
DNS High-Availability Tools - Open-Source Load Balancing Solutions
DNS High-Availability Tools - Open-Source Load Balancing SolutionsDNS High-Availability Tools - Open-Source Load Balancing Solutions
DNS High-Availability Tools - Open-Source Load Balancing SolutionsMen and Mice
 
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019confluent
 
How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB MongoDB
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberXiang Fu
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsDataWorks Summit
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...DataWorks Summit
 
Recovery Time Objective and Recovery Point Objective
Recovery Time Objective and Recovery Point ObjectiveRecovery Time Objective and Recovery Point Objective
Recovery Time Objective and Recovery Point ObjectiveYankee Maharjan
 

Was ist angesagt? (20)

Introduction to Azure SQL DB
Introduction to Azure SQL DBIntroduction to Azure SQL DB
Introduction to Azure SQL DB
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Convert single instance to RAC
Convert single instance to RACConvert single instance to RAC
Convert single instance to RAC
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
 
DNS High-Availability Tools - Open-Source Load Balancing Solutions
DNS High-Availability Tools - Open-Source Load Balancing SolutionsDNS High-Availability Tools - Open-Source Load Balancing Solutions
DNS High-Availability Tools - Open-Source Load Balancing Solutions
 
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
 
How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB
 
Zero Data Loss Recovery Appliance 設定手順例
Zero Data Loss Recovery Appliance 設定手順例Zero Data Loss Recovery Appliance 設定手順例
Zero Data Loss Recovery Appliance 設定手順例
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Construindo um data lake na nuvem aws
Construindo um data lake na nuvem awsConstruindo um data lake na nuvem aws
Construindo um data lake na nuvem aws
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
 
Best Practices Using RTI Connext DDS
Best Practices Using RTI Connext DDSBest Practices Using RTI Connext DDS
Best Practices Using RTI Connext DDS
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
 
25 snowflake
25 snowflake25 snowflake
25 snowflake
 
Recovery Time Objective and Recovery Point Objective
Recovery Time Objective and Recovery Point ObjectiveRecovery Time Objective and Recovery Point Objective
Recovery Time Objective and Recovery Point Objective
 

Ähnlich wie 数据科学分析协作平台CDSW

Cloudera企业数据中枢平台
Cloudera企业数据中枢平台Cloudera企业数据中枢平台
Cloudera企业数据中枢平台Jianwei Li
 
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告Etu Solution
 
Qcon2013 罗李 - hadoop在阿里
Qcon2013 罗李 - hadoop在阿里Qcon2013 罗李 - hadoop在阿里
Qcon2013 罗李 - hadoop在阿里li luo
 
Cloudera security and enterprise license by Athemaster(繁中)
Cloudera security and enterprise license by Athemaster(繁中)Cloudera security and enterprise license by Athemaster(繁中)
Cloudera security and enterprise license by Athemaster(繁中)Athemaster Co., Ltd.
 
新浪云计算公开课第一期:Let’s run @ sae(丛磊)
新浪云计算公开课第一期:Let’s run @ sae(丛磊)新浪云计算公开课第一期:Let’s run @ sae(丛磊)
新浪云计算公开课第一期:Let’s run @ sae(丛磊)锐 张
 
Oracle db 12c 加速企业转型之十大功能
Oracle db 12c 加速企业转型之十大功能Oracle db 12c 加速企业转型之十大功能
Oracle db 12c 加速企业转型之十大功能Ethan M. Liu
 
How do we manage more than one thousand of Pegasus clusters - engine part
How do we manage more than one thousand of Pegasus clusters - engine partHow do we manage more than one thousand of Pegasus clusters - engine part
How do we manage more than one thousand of Pegasus clusters - engine partacelyc1112009
 
如何快速实现数据编织架构
如何快速实现数据编织架构如何快速实现数据编织架构
如何快速实现数据编织架构Denodo
 
王涛:基于Cloudera impala的非关系型数据库sql执行引擎
王涛:基于Cloudera impala的非关系型数据库sql执行引擎王涛:基于Cloudera impala的非关系型数据库sql执行引擎
王涛:基于Cloudera impala的非关系型数据库sql执行引擎hdhappy001
 
Analytics in a Day.pptx
Analytics in a Day.pptxAnalytics in a Day.pptx
Analytics in a Day.pptxLigangJin
 
大型电商的数据服务的要点和难点
大型电商的数据服务的要点和难点 大型电商的数据服务的要点和难点
大型电商的数据服务的要点和难点 Chao Zhu
 
Pegasus KV Storage, Let the Users focus on their work (2018/07)
Pegasus KV Storage, Let the Users focus on their work (2018/07)Pegasus KV Storage, Let the Users focus on their work (2018/07)
Pegasus KV Storage, Let the Users focus on their work (2018/07)涛 吴
 
雲端環境的快取策略-Global Azure Bootcamp 2015 臺北場
雲端環境的快取策略-Global Azure Bootcamp 2015 臺北場雲端環境的快取策略-Global Azure Bootcamp 2015 臺北場
雲端環境的快取策略-Global Azure Bootcamp 2015 臺北場twMVC
 
Q con成都主题演讲【弹性计算】by马介悦
Q con成都主题演讲【弹性计算】by马介悦Q con成都主题演讲【弹性计算】by马介悦
Q con成都主题演讲【弹性计算】by马介悦drewz lin
 
Accelerate Database as a Service(DBaaS) in Cloud era
Accelerate Database as a Service(DBaaS) in Cloud eraAccelerate Database as a Service(DBaaS) in Cloud era
Accelerate Database as a Service(DBaaS) in Cloud eraJunchi Zhang
 
海通证券金融云思考与实践(数据技术嘉年华2017)
海通证券金融云思考与实践(数据技术嘉年华2017)海通证券金融云思考与实践(数据技术嘉年华2017)
海通证券金融云思考与实践(数据技术嘉年华2017)Zhaoyang Wang
 
Hadoop的典型应用与企业化之路 for HBTC 2012
Hadoop的典型应用与企业化之路 for HBTC 2012Hadoop的典型应用与企业化之路 for HBTC 2012
Hadoop的典型应用与企业化之路 for HBTC 2012James Chen
 
淘宝双11双12案例分享
淘宝双11双12案例分享淘宝双11双12案例分享
淘宝双11双12案例分享vanadies10
 
ODB in the Cloud (Cn)
ODB in the Cloud (Cn)ODB in the Cloud (Cn)
ODB in the Cloud (Cn)Lei Xu
 
Raising The MySQL Bar-Manyi Lu
Raising The MySQL Bar-Manyi LuRaising The MySQL Bar-Manyi Lu
Raising The MySQL Bar-Manyi Lu郁萍 王
 

Ähnlich wie 数据科学分析协作平台CDSW (20)

Cloudera企业数据中枢平台
Cloudera企业数据中枢平台Cloudera企业数据中枢平台
Cloudera企业数据中枢平台
 
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
 
Qcon2013 罗李 - hadoop在阿里
Qcon2013 罗李 - hadoop在阿里Qcon2013 罗李 - hadoop在阿里
Qcon2013 罗李 - hadoop在阿里
 
Cloudera security and enterprise license by Athemaster(繁中)
Cloudera security and enterprise license by Athemaster(繁中)Cloudera security and enterprise license by Athemaster(繁中)
Cloudera security and enterprise license by Athemaster(繁中)
 
新浪云计算公开课第一期:Let’s run @ sae(丛磊)
新浪云计算公开课第一期:Let’s run @ sae(丛磊)新浪云计算公开课第一期:Let’s run @ sae(丛磊)
新浪云计算公开课第一期:Let’s run @ sae(丛磊)
 
Oracle db 12c 加速企业转型之十大功能
Oracle db 12c 加速企业转型之十大功能Oracle db 12c 加速企业转型之十大功能
Oracle db 12c 加速企业转型之十大功能
 
How do we manage more than one thousand of Pegasus clusters - engine part
How do we manage more than one thousand of Pegasus clusters - engine partHow do we manage more than one thousand of Pegasus clusters - engine part
How do we manage more than one thousand of Pegasus clusters - engine part
 
如何快速实现数据编织架构
如何快速实现数据编织架构如何快速实现数据编织架构
如何快速实现数据编织架构
 
王涛:基于Cloudera impala的非关系型数据库sql执行引擎
王涛:基于Cloudera impala的非关系型数据库sql执行引擎王涛:基于Cloudera impala的非关系型数据库sql执行引擎
王涛:基于Cloudera impala的非关系型数据库sql执行引擎
 
Analytics in a Day.pptx
Analytics in a Day.pptxAnalytics in a Day.pptx
Analytics in a Day.pptx
 
大型电商的数据服务的要点和难点
大型电商的数据服务的要点和难点 大型电商的数据服务的要点和难点
大型电商的数据服务的要点和难点
 
Pegasus KV Storage, Let the Users focus on their work (2018/07)
Pegasus KV Storage, Let the Users focus on their work (2018/07)Pegasus KV Storage, Let the Users focus on their work (2018/07)
Pegasus KV Storage, Let the Users focus on their work (2018/07)
 
雲端環境的快取策略-Global Azure Bootcamp 2015 臺北場
雲端環境的快取策略-Global Azure Bootcamp 2015 臺北場雲端環境的快取策略-Global Azure Bootcamp 2015 臺北場
雲端環境的快取策略-Global Azure Bootcamp 2015 臺北場
 
Q con成都主题演讲【弹性计算】by马介悦
Q con成都主题演讲【弹性计算】by马介悦Q con成都主题演讲【弹性计算】by马介悦
Q con成都主题演讲【弹性计算】by马介悦
 
Accelerate Database as a Service(DBaaS) in Cloud era
Accelerate Database as a Service(DBaaS) in Cloud eraAccelerate Database as a Service(DBaaS) in Cloud era
Accelerate Database as a Service(DBaaS) in Cloud era
 
海通证券金融云思考与实践(数据技术嘉年华2017)
海通证券金融云思考与实践(数据技术嘉年华2017)海通证券金融云思考与实践(数据技术嘉年华2017)
海通证券金融云思考与实践(数据技术嘉年华2017)
 
Hadoop的典型应用与企业化之路 for HBTC 2012
Hadoop的典型应用与企业化之路 for HBTC 2012Hadoop的典型应用与企业化之路 for HBTC 2012
Hadoop的典型应用与企业化之路 for HBTC 2012
 
淘宝双11双12案例分享
淘宝双11双12案例分享淘宝双11双12案例分享
淘宝双11双12案例分享
 
ODB in the Cloud (Cn)
ODB in the Cloud (Cn)ODB in the Cloud (Cn)
ODB in the Cloud (Cn)
 
Raising The MySQL Bar-Manyi Lu
Raising The MySQL Bar-Manyi LuRaising The MySQL Bar-Manyi Lu
Raising The MySQL Bar-Manyi Lu
 

数据科学分析协作平台CDSW