Weitere ähnliche Inhalte
Ähnlich wie The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's Big Data Tools (20)
Mehr von DataWorks Summit (20)
Kürzlich hochgeladen (20)
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's Big Data Tools
- 2. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
The
Most
Valuable
Customer
on
Earth-‐1298
Comic
Book
Analysis
With
Oracle’s
Big
Data
Tools
Dan
McClary
Big
Data
Product
Management
Oracle
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
- 3. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Safe
Harbor
Statement
The
following
is
intended
to
outline
our
general
product
direcFon.
It
is
intended
for
informaFon
purposes
only,
and
may
not
be
incorporated
into
any
contract.
It
is
not
a
commitment
to
deliver
any
material,
code,
or
funcFonality,
and
should
not
be
relied
upon
in
making
purchasing
decisions.
The
development,
release,
and
Fming
of
any
features
or
funcFonality
described
for
Oracle’s
products
remains
at
the
sole
discreFon
of
Oracle.
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
3
- 4. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
50%
• Of
this
talk
is
about
Oracle
&
Hadoop
• Of
this
talk
is
about
comic
books
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
4
- 5. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
This
is
a
Vendor
Talk…
• Does
Oracle
really
use
Hadoop?
– Yes
• Does
Oracle
build
anything
using
Hadoop?
– Also,
yes
• Does
anyone
actually
use
Oracle’s
soluFons
for
Hadoop?
– Definitely
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
5
- 6. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Does
Oracle
Really
Use
Hadoop?
• Oracle
Global
Support
Services
– 300TB
Hadoop
cluster
– Every
HW
failure
from
every
Oracle
Engineered
System
– System
configs,
diagnosFcs
and
service
history
• Today
– Predict
HW
Batch
Failures
early
• Tomorrow
– Promote
be^er
service
experience
– Minimize
customer
downFme,
and
save
money
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
6
- 7. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Oracle
ConfidenFal
–
Internal
7
Oracle
Big
Data
Discovery:
Built
on
Explore
Transform
Discover
Find
- 8. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
#StrataHadoop
-‐
Oracle
Big
Data
Architecture
Oracle
Data
Integrator:
Polyglot
ELT
Code
GeneraFon
Flume
Hive
on
MR,
Tez,
Spark
Logs
OLTP
DB
SQOOP
OGG
Pig
on
MR,
Tez,
Spark
Oracle
Data
Integrator
SQOOP
Any
DW
OGG
Spark
OEMM
Metadata
Mgmt
&
Lineage
API/File
Hive/HCat,
HDFS,HBase
Hive/HCat,
HDFS,HBase
NoSQL
KaWa
- 9. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Oracle
Big
Data
SQL
Unified
SQL
access
to
all
enterprise
data
9
NoSQL
- 10. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Fine,
but
does
anyone
use
this
stuff?
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
10
- 11. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Hadoop
and
Oracle
Success
Stories
• Spain’s
largest
bank
– Mainframe
downsizing
– Modernizing
data
models
• West
Africa’s
largest
telephone
company
– Save
35,000
call
minutes/day
– Analyze
network
traffic
40x
faster
• The
world’s
largest
and
most
profitable
CPG
company
– OpportuniFes
in
Upstream
research,
Manufacturing,
Supply
Chain
– QuesFons
answered
in
3
days
instead
of
4
weeks
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
11
- 12. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Now
let’s
talk
about
comics!
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
12
- 13. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
13
Who’s
the
most
important
Marvel
Hero?
- 14. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
There
Are
Lots
of
Answers
to
That
• Answers
from
Aggrega]on
– Who
has
the
most
appearances?
– Who
has
the
most
series?
– Who
has
the
most
movie
appearances,
toys,
etc?
• Answers
from
Connec]vity
– Who’s
most
central
to
teams?
– Who
has
the
strongest
cross-‐overs?
– How
important
is
the
hero’s
community?
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
14
Tabular
ques]ons:
Well-‐suited
to
SQL-‐like
tools
Graph
ques]ons:
We
need
something
different!
- 15. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Oracle
Big
Data
Graph
• Massively-‐Scalable
Graph
Database
– Scales
to
trillions
edges
– Apache
HBase
– Oracle
NoSQL
Database
• In-‐Memory
Graph
AnalyFcs
– More
than
30
graph
analysis
algorithms
• Simple
interfaces
– Java
– Tinkerpop:
Blueprints,
Gremlin,
Rexster
– Python
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
15
DetecFng
Components
and
CommuniFes
Ranking/Walking
EvaluaFng
CommuniFes
∑
∑
Path-‐Finding
- 16. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
So
How
Do
Super
Heroes
Become
a
Graph?
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
16
Hulk
Thor
Ant-‐
Man
Wasp
Iron
Man
Cpt.
Amer
• Each
hero
is
a
vertex
• Each
joint
appearance
is
an
edge
• The
original
Avengers
are
fully
connected
• Hulk
leaves
and
Captain
America
joins
• No
longer
fully
connected
• This
graph
gets
complex
- 17. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
17
- 18. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Are
your
customers
any
different
from
superheroes?
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
18
- 19. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Who’s
the
most
important
Marvel
Hero?
• Captain
America
• Iron
Man
• Spiderman
• Wolverine
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
19
- 20. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Importance
as
Degree
Centrality
• The
more
edges
a
vertex
has,
the
higher
its
degree
• The
greater
the
degree,
the
more
important
the
vertex
is
• This
is
one
way
to
look
at
importance
• Is
your
most
connected
customer
most
important?
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
20
Cpt.
Amer
Iron
Man
- 21. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
21
- 22. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Who’s
the
most
important
Marvel
Hero?
• Captain
America
• Iron
Man
• Spiderman
• Wolverine
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
22
- 23. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Importance
as
Page
Rank
• Importance
can
flow
through
a
graph
• A
node
connected
to
by
important
nodes
is
also
important
• This
is
importance
as
a
measure
of
– Trust
– Prominence
• Thinking
about
customers
in
a
graph
requires
mulFple
definiFons
of
importance
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
23
Rocket
Racer
Cpt.
Amer
Iron
Man
- 24. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
24
- 25. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Who
the
Heck
is
Moon
Knight?
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
25
- 26. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
He’s
like
Batman,
but
Wears
White,
and
is
Dead
• Importance
doesn’t
have
to
be
a
global
property
• Most
customers
are
like
Moon
Knight
– Very
important
to
some
– But
get
lost
behind
the
Avengers
and
Galactus
• How
can
we
judge
importance
rela]ve
to
a
single
vertex?
• How
can
we
make
suggesFons
based
on
a
single
customer?
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
26
- 27. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Personalized
Page
Rank
à
Personalized
Importance
• Random
walks
centralize
around
a
vertex
of
interest
• Produces
a
localized
measure
of
importance
• Can
be
extended
into
WTF
– Who
to
Follow
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
27
Hulk
Thor
Ant-‐
Man
Wasp
Iron
Man
Cpt.
Amer
Moon
Knght
Viibro
Carbon
- 28. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
28
- 29. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
29
CommuniFes
Ma^er
Too
- 30. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
CommuniFes
Are
Just
Special
Subgraphs
• Community
– A
subgraph
in
which
• Nodes
are
more
connected
to
each
other
• CommuniFes
provoke
interesFng
quesFons
– How
many?,
How
large?
– How
do
they
relate
to
each
other?
• Graph
can
be
performed
on
a
community
– Who’s
the
most
valuable
customer
in
a
community?
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
30
Hulk
Thor
Ant-‐
Man
Wasp
Iron
Man
Cpt.
Amer
- 31. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
31
- 32. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Graph
CommuniFes
can
be
Whole
Universes
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
32
- 33. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Answers
From
ConnecFvity
Enhance
Tabulated
Results
• Finding
answers
from
connecFvity
someFmes
requires
new
tools
– Graph
semanFcs
!=
table
semanFcs
• Merging
both
types
of
answers
paints
a
richer
picture
– Requires
mapping
the
graph
to
tables
• Fortunately,
this
is
ouen
how
graphs
are
stored!
– A
graph
is
a
sparse
matrix
– Or
a
table
of
edges
and
a
table
of
verFces
• Or
any
other
efficient
representaFon
of
the
matrix
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
33
- 34. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
34
- 35. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Wrapping
Up
• “Big
Data
AnalyFcs”
someFmes
involves
blending
answers
across
paradigms
• Answers
from
Aggrega]on
– Who
has
the
most
appearances?
• Answers
from
Connec]vity
– How
important
is
the
community?
• Regardless
of
implementaFon,
think
ahead
to
integra]on
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
35
- 36. Copyright
©
2014,
Oracle
and/or
its
affiliates.
All
rights
reserved.
|
Oracle
ConfidenFal
–
Internal/Restricted/Highly
Restricted
36