Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Hadoop Summit San Diego Feb2013
1. Hadoop
Use
Cases
At
Salesforce.com
Narayan
Bharadwaj
Director,
Product
Management
Monitoring
&
Big
Data
Salesforce.com
@nadubharadwaj
2. Safe
harbor
Safe
harbor
statement
under
the
Private
Securi8es
Li8ga8on
Reform
Act
of
1995:
This
presenta8on
may
contain
forward-‐looking
statements
that
involve
risks,
uncertain8es,
and
assump8ons.
If
any
such
uncertain8es
materialize
or
if
any
of
the
assump8ons
proves
incorrect,
the
results
of
salesforce.com,
inc.
could
differ
materially
from
the
results
expressed
or
implied
by
the
forward-‐looking
statements
we
make.
All
statements
other
than
statements
of
historical
fact
could
be
deemed
forward-‐looking,
including
any
projec8ons
of
product
or
service
availability,
subscriber
growth,
earnings,
revenues,
or
other
financial
items
and
any
statements
regarding
strategies
or
plans
of
management
for
future
opera8ons,
statements
of
belief,
any
statements
concerning
new,
planned,
or
upgraded
services
or
technology
developments
and
customer
contracts
or
use
of
our
services.
The
risks
and
uncertain8es
referred
to
above
include
–
but
are
not
limited
to
–
risks
associated
with
developing
and
delivering
new
func8onality
for
our
service,
new
products
and
services,
our
new
business
model,
our
past
opera8ng
losses,
possible
fluctua8ons
in
our
opera8ng
results
and
rate
of
growth,
interrup8ons
or
delays
in
our
Web
hos8ng,
breach
of
our
security
measures,
the
outcome
of
intellectual
property
and
other
li8ga8on,
risks
associated
with
possible
mergers
and
acquisi8ons,
the
immature
market
in
which
we
operate,
our
rela8vely
limited
opera8ng
history,
our
ability
to
expand,
retain,
and
mo8vate
our
employees
and
manage
our
growth,
new
releases
of
our
service
and
successful
customer
deployment,
our
limited
history
reselling
non-‐salesforce.com
products,
and
u8liza8on
and
selling
to
larger
enterprise
customers.
Further
informa8on
on
poten8al
factors
that
could
affect
the
financial
results
of
salesforce.com,
inc.
is
included
in
our
annual
report
on
Form
10-‐Q
for
the
most
recent
fiscal
quarter
ended
July
31,
2012.
This
documents
and
others
containing
important
disclosures
are
available
on
the
SEC
Filings
sec8on
of
the
Investor
Informa8on
sec8on
of
our
Web
site.
Any
unreleased
services
or
features
referenced
in
this
or
other
presenta8ons,
press
releases
or
public
statements
are
not
currently
available
and
may
not
be
delivered
on
8me
or
at
all.
Customers
who
purchase
our
services
should
make
the
purchase
decisions
based
upon
features
that
are
currently
available.
Salesforce.com,
inc.
assumes
no
obliga8on
and
does
not
intend
to
update
these
forward-‐looking
statements.
3. Agenda
• Technology
• Big
Data
use
cases
• Use
case
discussion
• Q&A
4. Got
“Cloud
Data”?
130k
customers
1
billion
transac8ons/day
Millions
of
users
Terabytes/day
7. Phoenix
“We
put
the
SQL
back
in
NoSQL”
• SQL
layer
on
HBase
• Seamless
applica8on
integra8on
– Standard
JDBC
interface
– DDL
statement
support
• Low
query
latency
– SQL
query
è
Mul8ple
HBase
scans
– Co-‐processors,
custom
filters
– Milliseconds
for
small
queries
– Seconds
for
tens
of
millions
rows
• hdps://github.com/forcedotcom/phoenix
8. Contribu8ons
@pRaShAnT1784
:
Prashant
Kommireddi
Lars
Ho<ansl
@thefutureian
:
Ian
Varley
10. Big
Data
Use
Cases
User
behavior
Product
Metrics
Capacity
planning
analysis
Monitoring
Query
Run8me
Collec8ons
intelligence
Predic8on
Early
Warning
Collabora8ve
Search
Relevancy
System
Filtering
Internal
App
Product
feature
12. Product
Metrics
–
Problem
Statement
• Track
feature
usage/adop8on
across
130k+
customers
– Eg:
Accounts,
Contacts,
Visualforce,
Apex,…
• Track
standard
metrics
across
all
features
– Eg:
#Requests,
#UniqueOrgs,
#UniqueUsers,
AvgResponseTime,…
• Track
features
and
metrics
across
all
channels
– API,
UI,
Mobile
• Primary
audience:
Execu8ves,
Product
Managers
13. Product
Metrics
Pipeline
User
Input
CollaboraWon
Reports,
Dashboards
(Page
Layout)
(ChaXer)
Workflow
Formula
Fields
Feature
Metrics
Trend
Metrics
(Custom
Object)
(Custom
Object)
API
API
Client
Machine
Java
Program
Pig
script
generator
Workflow
Log
Pull
Hadoop
Log
Files
18. Problem
Statement
§ How
do
we
reduce
number
of
clicks
on
the
user
interface?
§ What
are
the
top
user
click
path
sequences?
§ What
are
the
user
clusters/personas?
• Approach:
• Markov
transi8on
for
click
path,
D3.js
visuals
• K-‐means
(unsupervised)
clustering
for
user
groups
25. We
found
this
relaWonship
using
item-‐to-‐item
collaboraWve
filtering
• Amazon
published
this
algorithm
in
2003.
– Amazon.com
RecommendaJons:
Item-‐to-‐Item
CollaboraJve
Filtering,
by
Gregory
Linden,
Brent
Smith,
and
Jeremy
York.
IEEE
Internet
Compu8ng,
January-‐February
2003.
• At
Salesforce,
we
adapted
this
algorithm
for
Hadoop,
and
we
use
it
to
recommend
files
to
view
and
users
to
follow.
26. Example:
CF
on
5
files
Vision
Statement
Annual
Report
Dilbert
Comic
Darth
Vader
Cartoon
Disk
Usage
Report
27. View
History
Table
Darth
Annual
Vision
Dilbert
Disk
Usage
Vader
Report
Statement
Cartoon
Report
Cartoon
Miranda
1
1
1
0
0
(CEO)
Bob
(CFO)
1
1
1
0
0
Susan
0
1
1
1
0
(Sales)
Chun
0
0
1
1
0
(Sales)
Alice
(IT)
0
0
1
1
1
28. RelaWonships
between
the
files
Annual
Report
Vision
Statement
Darth
Vader
Cartoon
Dilbert
Cartoon
Disk
Usage
Report
29. RelaWonships
between
the
files
Annual
Report
2 Vision
Statement
0 1
3
2
0 Darth
Vader
0 Cartoon
Dilbert
Cartoon
3
1
1
Disk
Usage
Report
30. Sorted
relaWonships
for
each
file
Annual
Vision
Dilbert
Darth
Disk
Usage
Report
Statement
Cartoon
Vader
Report
Cartoon
Dilbert
(2)
Dilbert
(3)
Vision
Stmt.
(3)
Dilbert
(3)
Dilbert
(1)
Vision
Stmt.
(2)
Annual
Rpt.
(2)
Darth
Vader
(3)
Vision
Stmt.
(1)
Darth
Vader
(1)
Darth
Vader
(1)
Annual
Rpt.
(2)
Disk
Usage
(1)
Disk
Usage
(1)
The
popularity
problem:
no8ce
that
Dilbert
appears
first
in
every
list.
This
is
probably
not
what
we
want.
The
solu8on:
divide
the
relaWonship
tallies
by
file
populariWes.
31. Normalized
relaWonships
between
the
files
Annual
Report
.82
Vision
Statement
0 .33
.63
.77
0
0 Darth
Vader
Cartoon
Dilbert
Cartoon
.77
.58
.45
Disk
Usage
Report
32. Sorted
relaWonships
for
each
file,
normalized
by
file
populariWes
Annual
Vision
Dilbert
Darth
Vader
Disk
Usage
Report
Statement
Cartoon
Cartoon
Report
Vision
Stmt.
Annual
Report
Darth
Vader
Darth
Vader
Dilbert
(.77)
(.82)
(.82)
(.77)
(.58)
Vision
Stmt.
Disk
Usage
Dilbert
Dilbert
(.63)
Dilbert
(.77)
(.77)
(.58)
(.45)
Darth
Vader
Annual
Report
Vision
Stmt.
(.33)
(.63)
(.33)
Disk
Usage
(.45)
High
rela8onship
tallies
AND
similar
popularity
values
now
drive
closeness.
33. The
item-‐to-‐item
CF
algorithm
1) Compute
file
populari8es
2) Compute
rela8onship
tallies
and
divide
by
file
populari8es
3) Sort
and
store
the
results
34. MapReduce
Overview
Map
Shuffle
Reduce
(adapted
from
hdp://code.google.com/p/mapreduce-‐framework/wiki/
MapReduce)
35. 1.
Compute
File
PopulariWes
<user,
file>
Inverse
iden8ty
map
<file,
List<user>>
Reduce
<file,
(user
count)>
Result
is
a
table
of
(file,
popularity)
pairs
that
you
store
in
the
Hadoop
distributed
cache.
39. 2b.
Tally
the
relaWonship
votes
-‐
just
a
word
count,
where
each
relaWonship
occurrence
is
a
word
<(file1,
file2),
Integer(1)>
Iden8ty
map
<(file1,
file2),
List<Integer(1)>
Reduce:
count
and
divide
by
populari8es
<file1,
(file2,
similarity
score)>,
<file2,
(file1,
similarity
score)>
Note
that
we
emit
each
result
twice,
one
for
each
file
that
belongs
to
a
rela8onship.
40. Example
2b:
the
Dilbert/Darth
Vader
relaWonship
<(Dilbert,
Vader),
Integer(1)>,
<(Dilbert,
Vader),
Integer(1)>,
<(Dilbert,
Vader),
Integer(1)>
Iden8ty
map
<(Dilbert,
Vader),
{1,
1,
1}>
Reduce:
count
and
divide
by
populari8es
<Dilbert,
(Vader,
sqrt(3/5))>,
<Vader,
(Dilbert,
sqrt(3/5))>
41. 3.
Sort
and
store
results
<file1,
(file2,
similarity
score)>
Iden8ty
map
<file1,
List<(file2,
similarity
score)>>
Reduce
<file1,
{top
n
similar
files}>
Store
the
results
in
your
loca8on
of
choice
43. Appendix
• Cosine
formula
and
normaliza8on
trick
to
avoid
the
distributed
cache
A• B A B
cosθ AB = = •
A B A B
• Mahout
has
CF
• Asympto8c
order
of
the
algorithm
is
O(M*N2)
€
in
worst
case,
but
is
helped
by
sparsity.