More Related Content
Similar to Cloudera 助力台灣大數據產業的發展 (20)
More from Etu Solution (20)
Cloudera 助力台灣大數據產業的發展
- 1. 1
©
Cloudera,
Inc.
All
rights
reserved.
Cloudera
助力台湾
大数据产业的发展
Kai
X.
Miao
(苗凯翔)
Vice
President,
Cloudera
Corpora@on
- 2. 2
©
Cloudera,
Inc.
All
rights
reserved.
Big
Data
Is
Only
GeGng
Bigger
Par@cularly
Relevant
in
the
Telecom
Space
Data
Growth
STRUCTURED
DATA
–
10%
COMPLEX
DATA
–
90%
1980
TODAY
USER
PROFILES
USAGE
DATA
MOBILE
&
DEVICES
NETWORK
MARKETING
&
CRM
PUBLIC
&
TRADE
3rd Platform
Clients
Rich User
Experiences
IOT Clients
By 2020,world data
will reach 40ZB
In 2012,we have
2.8ZB1
- 3. 3
©
Cloudera,
Inc.
All
rights
reserved.
TradiGonal
Data
Architecture
Can’t
Handle
Big
Data
Instrumenta@on
Storage
Grid
(Original
Raw
Data)
Collec@on
ETL
Compute
Grid
BI
Reports
+
Interac@ve
Apps
RDBMS/EDW
Can’t
explore
original
raw
data
Can’t
scale
Sending
data
to
graveyard
- 4. 4
©
Cloudera,
Inc.
All
rights
reserved.
A
Major
LimitaGon
of
RDBMS/EDW
• Schema
must
be
created
before
any
data
can
be
loaded
• An
explicit
load
opera@on
has
to
take
place
which
transforms
data
to
DB
internal
structure
• New
columns
must
be
added
explicitly
before
new
data
for
such
columns
can
be
added
into
the
data
base
Schema-‐on-‐Write
- 5. 5
©
Cloudera,
Inc.
All
rights
reserved.
Expanding
Data
Requires
A
New
Approach
©2014
Cloudera,
Inc.
All
rights
5
1980s
Bring
Data
to
Compute
Now
Bring
Compute
to
Data
RelaGve
size
&
complexity
Data
InformaGon-‐centric
businesses
use
all
data:
Mul@-‐structured,
internal
&
external
data
of
all
types
Compute
Compute
Compute
Process-‐centric
businesses
use:
• Structured
data
mainly
• Internal
data
only
• “Important”
data
only
Comput
e
Comput
e
Comput
e
Data
Data
Data
Data
- 6. 6
©
Cloudera,
Inc.
All
rights
reserved.
Hadoop改变处理数据方式
Hadoop方式
传统方式
$30,000+
per
TB
• Hard
to
scale
• Network
is
a
bogleneck
• Only
handles
rela@onal
data
• Difficult
to
add
new
fields
&
data
types
昂贵的、专有的、“可靠的”服务器
昂贵的软件许可
Network
数据存储
(SAN,
NAS)
计算
(RDBMS,
EDW)
$300
-‐
$1,000
per
TB
• Scales
out
forever
• No
boglenecks
• Easy
to
ingest
any
data
• Agile
data
access
廉价的PC服务器
便宜的、开源的软件
Compute
(CPU)
Memory
Storage
(Disk)
z
z
- 7. 7
©
Cloudera,
Inc.
All
rights
reserved.
7
A
Strong
Track
Record
of
Innova@on
2008
CLOUDERA
FOUNDED
BY
MIKE
OLSON
AMR
AWADALLAH
&
JEFF
HAMMERBACHER
2009
HADOOP
CREATOR
DOUG
CUTTING
JOINS
CLOUDERA
2009
CLOUDERA
RELEASES
CDH
THE
FIRST
COMMERCIAL
APACHE
HADOOP
DISTRIBUTION
2010
CLOUDERA
MANAGER:
FIRST
MANAGEMENT
APPLICATION
FOR
HADOOP
2011
CLOUDERA
REACHES
100
PRODUCTION
CUSTOMERS
2011
CLOUDERA
UNIVERSITY
EXPANDS
TO
140
COUNTRIES
2012
CLOUDERA
ENTERPRISE
4
THE
STANDARD
FOR
HADOOP
IN
THE
ENTERPRISE
2012
CLOUDERA
CONNECT
REACHES
300
PARTNERS
2014
THE
ENTERPRISE
DATA
HUB
LAUNCHED
2013
CLOUDERA
IMPALA
CLOUDERA
NAVIGATOR
CLOUDERA
SEARCH
2013
TOM
REILLY
JOINS
AS
CEO
OVER
800
PARTNERS
IN
CLOUDERA
CONNECT
2014
SERIES
F
FUNDING
WITH
INTEL
AS
KEY
PARTNER
OVER
900
PARTNERS
IN
CLOUDERA
CONNECT
2014
CLOUDERA
ENTERPRISE
5
CDH
Cloudera
Manager
CLOUDERA
ENTERPRISE
4
ASK
BIGGER
QUESTIONS
ENTERPRISE
DATA
HUB
CLOUDERA
ENTERPRISE
5
- 8. 8
©
Cloudera,
Inc.
All
rights
reserved.
Cloudera公司简介
©2014
Cloudera,
Inc.
All
rights
reserved.
创始
2008年, 由前
员工共同创始
員工人數
900人以上
世界级技術支持
24x7的全球工作人员
积极主动与预测技術支持方案
关键任务
数以千计的企业用户
几百多个付费客户
最广泛的生态系统
1400多个商业合作伙伴
Cloudera University
培训100,000人以上
开源领袖
Cloudera的员工是业界领先的开发者和提供商
我们与英特尔的合作将能成功地开拓市场
- 9. 9
©
Cloudera,
Inc.
All
rights
reserved.
9
Open
Source
Scalable
Flexible
Cost-‐EffecGve
✔
Managed
✖
Open
Architecture
✖
Secure
and
Governed
✖
✔
✔
✔
3RD
PARTY
APPS
STORAGE
FOR
ANY
TYPE
OF
DATA
UNIFIED,
ELASTIC,
RESILIENT,
SECURE
CLOUDERA’S
ENTERPRISE
DATA
HUB
BATCH
PROCESSING
MAPREDUCE
ANALYTIC
SQL
IMPALA
SEARCH
ENGINE
SOLR
MACHINE
LEARNING
SPARK
STREAM
PROCESSING
SPARK
STREAMING
WORKLOAD
MANAGEMENT
YARN
FILESYSTEM
HDFS
ONLINE
NOSQL
HBASE
DATA
MANAGEMENT
CLOUDERA
NAVIGATOR
SYSTEM
MANAGEMENT
CLOUDERA
MANAGER
SENTRY
DBMS
Sensors
LOGS
Sqoop
Flume
- 10. 10
©
Cloudera,
Inc.
All
rights
reserved.
WEB/MOBILE
APPLICATION
ENTERPRISE
DATA
WAREHOUSE
ENTERPRISE
REPORTING
BI
/
ANALYTICS
DATA
MODELING
DEVELOPER
SDKs
CLOUDERA
MANAGER
CLOUDERA
NAVIGATOR
ENTERPRISE
DATA
HUB
Security
Admins
System
Admins
Engineers
Data
Scien@sts
Analysts
Business
Users
Customers
&
End
Users
SYS
LOGS
WEB
LOGS
FILES
RDBMS
The
Modern
InformaGon
Architecture
- 11. 11
©
Cloudera,
Inc.
All
rights
reserved.
Customer
Success
Across
Industries
Financial
&
Business
Services
Telecom
Technology
Healthcare
Life
Sciences
Media
Retail
Consumer
Energy
Public
Sector
- 12. 12
©
Cloudera,
Inc.
All
rights
reserved.
客户360度分析
• Enhanced
customer
experience
&
support
• Personaliza@on,
targeted
offerings,
loyalty
programs
• Sen@ment
analysis
渠道优化
• Campaign
management
• Selec@on
process
op@miza@on
供应链优化
• Manufacturing
process
efficiency
• Supplier/merchant
management
⻛风险管理
• Fraud
detec@on
• Intrusion
detec@on
&
digital
forensics
审计
• Regulatory
compliance
(reten@on,
privacy)
• Usage
analysis
and
media@on
• e-‐Discovery
市场资讯
• Compe@@ve
analysis
• Economic
factor
analysis
• Customer
segmenta@on
数据服务
• Data
as-‐a-‐product
• Data
enriched
with
insights/inferences
Cloudera⼤大数据应⽤用案例种类
12
- 13. 13
©
Cloudera,
Inc.
All
rights
reserved.
制造业的数据来自哪里?
设备&传感器
• Device
Readings
• Device
Performance
• Device
Diagnos@cs
• Bagery
/
Power
Consump@on
• Sotware
Logs
• Environmental
Interac@ons
• R&D
• Quality
/
Tes@ng
工厂&作业
• MES
• Sensors
• Video
/
Surveillance
• Line
Produc@vity
• Machines
• Staffing
/
Scheduling
供应链&库存
• ERP
• Supplier
/
Manufacturer
• Orders
/
Receivables
• Commodity
Supplies
/
Prices
市场
& CRM
• Transac@ons
• Accounts
• Warran@es
/
Atermarket
• Customer
Service
Logs
• Campaigns
/
Promo@ons
• Website
/
SEO
• Affiliates
/
Merchants
• Surveys
• Compe@@ve
Intelligence
公共 & 交易
• Market
Intelligence
• Policy
/
Regula@on
• Demographic
/
Census
• Psychographic
• Infla@on
/
Macroeconomic
• Gas
Prices
• Labor
Sta@s@cs
• Social
/
Search
• Public
Health
Data
• Clinical
Studies
• Store
Schema@cs
• Journals
/
Editorial
• Seismic
/
Specula@on
- 14. 14
©
Cloudera,
Inc.
All
rights
reserved.
• reduce
the
cost
of
sending
deepwater
drillships
out
into
the
ocean
(1M$/day)
• doing
a
beger
job
of
processing
the
vast
amounts
of
data
that
can
help
iden@fy
reservoirs
of
oil(0.5PB)
• Chevron
gathers
informa@on
in
five
dimensions
–
the
x
and
y
coordinates
of
both
the
wave’s
source
and
target,
along
with
the
@me
it
was
collected.
• Construct
picture
of
what
the
terrain
looks
like
under
the
ocean
floor
• The
company
uses
CDH
to
sort
that
data.
Solu@on
优化运营–雪佛龙
• The
more
data
Chevron
can
collect,
the
beger
it
can
find
pockets
of
oil
and
natural
gas
underground.
•
Hadoop
can
do
some
of
the
seismic
data
processing
in
a
less
expensive
way
–
10x
less
than
tradi@onal
technologies
on
average.
Challenge
Benefit
Chevron
is
reducing
their
cost
of
sending
deepwater
drillships
into
the
ocean
by
more
precisely
iden@fying
oil
reservoirs.
- 15. 15
©
Cloudera,
Inc.
All
rights
reserved.
Automo@ve
&
Industrial
Problem
Solu+on
Backgroun
d
Proac+ve
Quality
Assurance
Build
machine
learning
algorithms
that
iden@fy
produc@on
anomalies
prior
to
field
tes@ng
and
find
performance
flaws
that
could
not
be
iden@fied
in
R&D.
Silos
Limit
Op+ons
Legacy
systems
hold
historical
data
from
produc@on
line
telemetry,
factory
surveillance
and
sensors,
call
centers,
in-‐car
telema@cs,
etc.
That
data
is
useless
if
it
is
kept
offline
and
in
silos.
Anomaly
Detec+on
Spark
includes
MLLib,
a
library
of
machine
learning
algorithms
for
large
data,
enabling
clustering
to
iden@fy
outliers
from
typical
produc@on
pagerns.
Use
Case
卡特彼勒
卡特彼勒公司总部位于美国伊利诺州。是世界上最
大的工程机械和矿山设备生产厂家、燃气发动机和
工业用燃气轮机生产厂家之一,也是世界上最大的
柴油机厂家之一。
- 16. 16
©
Cloudera,
Inc.
All
rights
reserved.
Telco
Consumer
Profile
16
©2014
Cloudera,
Inc.
All
rights
Contact,
Credit
info,
date
of
renewal
Device
type:
phone,
mobile
broadband,
tablet
Data/Voice
Usage
and
Top-‐
up
App
Preference,
interests,
usage
Usage
trends:
@me
of
day,
data
amounts
Loca@on
Website
usage
Social
Networks
Like/dislike,
profile
info
- 17. 17
©
Cloudera,
Inc.
All
rights
reserved.
©2014
Cloudera,
Inc.
All
rights
reserved.
Use
Case
Problem
Solu+on
Partners
Ac(onable
Sen(ment
Analysis
Isolate
customer
profiles
to
personalize
mix
of
plans,
services,
offers
based
on
convergence
of
informa@on
from
network,
GPS,
social,
call
centers,
accounts,
etc.
Can’t
Scale
Beyond
Silos
Current
systems
can
not
integrate
social,
telemetric,
and
systems
data
in
real
@me
with
historical
data
to
tailor
product
mix
and
incen@ve
plans
to
the
user.
Calculate
Anything
HBase
is
a
real-‐@me
database
accommoda@ng
complex
historic
data.
Spark
and
Impala
converge
ETL,
analy@cs,
and
repor@ng
for
on-‐demand
modeling.
Customer
360o
View
17
- 18. 18
©
Cloudera,
Inc.
All
rights
reserved.
Where
Is
the
Financial
Services
Data?
Mapping
and
Consolida@on
Are
the
Tip
of
the
Iceberg
for
Big
Data
Retail
Banking
• Bank
Transac@ons
• Customer
Data
• ATM
Ac@vity
• Online
Ac@vity
• Mobile
Ac@vity
• Demographic
/
Census
Data
• Marke@ng
/
CRM
• Social
/
Sen@ment
Credit
Cards
&
Payments
• Card
Transac@ons
• Customer
Data
• Online
Ac@vity
• Demographic
/
Census
Data
• Marke@ng
/
CRM
• Integra@on
with
Retailers
/
Loyalty
• Social
/
Sen@ment
Investment
Banking
• Trade
Data
• Customer
Data
• Web
Logs
• Research
/
Publica@ons
• Market
Data
• Communica@ons
/
Documenta@on
Insurance
• Claims
/
Policy
Data
• Customer
Data
• Demographic
/
Census
Data
• Weather
Data
• Vehicle
Telemetry
• Video
/
Surveillance
• Sensors
• Internet
of
Things
Services
&
SROs
• Trade
Data
• Communica@ons
/
Documenta@on
• Market
Data
• Research
/
Publica@ons
• Surveys
- 19. 19
©
Cloudera,
Inc.
All
rights
reserved.
Data
silos
spread
across
company
with
80+
years’
history
• Analysis
on
1
state
takes
24
hours
• Can’t
analyze
all
50
states
at
once
Universal
data
archive
on
Cloudera
• Supports
storage,
ETL,
applied
math
Solu@on
Customer
Spotlight:
Allstate
Holis@c
analysis
on
all
50
states
in
16
hours
• 75X
faster
performance
Challenge
Benefit
Combining
80+
years
of
data
across
all
business
units
&
all
50
states.