1. A Journey In The Public Clouds
With Datadog
Alexis Lê-Quôc (Product Guy) at Datadog
IASA New York Chapter
June 28th, 2011
2. What I’m going to talk about
‣What we do and for whom
‣The kind of data we deal with
‣Our architecture
‣Our architecture in a public cloud (AWS)
‣What we learned
‣Q+A
4. The Mess
Usage Analytics
Too many data streams,
IAAS / PAAS
too many silos
Issue Resolution
t
ics
Servers and Devices
ics igh
ices
etr ins
metr
g
billin Too many choices to
m m
cho
et
ri c s
s
?!? change make, too often
Dev team
changes !?
ics choices
metr
Ops team Applications
tri
cs ch
an
Only getting worse as
me
nts ge
SaaS Silos multiply
me
even s
ve ts
tri
ad
e + fe
es edb
cs
vic
oic ack
ch
e
me
s
s
tric
choice
tri
me
cs
Separate Dev and Ops
Cap. Planning SDLC support
Monitoring
teams, looking at separate
Hosting
data streams
Asset Mgmt
CDNs
Data-Driven decision making in IT is rarely happening.
Too slow, Too expensive, requires too much discipline.
5. We Simplify
Datadog to the rescue
system metrics
key metrics
quality metrics to Alice Dev
SaaS data
visibility
capacity metrics
usage analytics
recommendations
cloud billing to Bob Ops
code metrics
visibility
config changes
IaaS pricing
business metrics
perf. data to Charlie CEO
vendors info
curated metadata
Aggregation Correlation Collaboration
15. Atomicity Basically
Concistency Available
Isolation Soft-state
Durability Eventual
consistency
e.g. SQL DBs
e.g. DNS
CLASSICS
http://en.wikipedia.org/wiki/Eventual_consistency
16. Data
Intensive
Real
Time
e.g. real-time web
NEW COMER
Brian Cantrill: http://dtrace.org/resources/bmc/DIRT.pdf
17. Aggregation
Constant data influx
Large data sets
Correlation
On-demand visualization
Background data analysis
Collaboration
Real-time updates
On-the-fly data analysis
18. Aggregation
SE
Constant data influx
BA
Large data sets
Correlation
On-demand visualization
Background data analysis
Collaboration
Real-time updates
On-the-fly data analysis
19. Aggregation
SE
T
Constant data influx
IR
BA
D
Large data sets
Correlation
On-demand visualization
Background data analysis
Collaboration
Real-time updates
On-the-fly data analysis
20. Aggregation
SE
T
Constant data influx
IR
BA
D
Large data sets
Correlation
SE
On-demand visualization
BA
Background data analysis
Collaboration
Real-time updates
On-the-fly data analysis
21. Aggregation
SE
T
Constant data influx
IR
BA
D
Large data sets
Correlation
SE
On-demand visualization
BA
Background data analysis
Collaboration
T
Real-time updates
IR
D
On-the-fly data analysis
22. Aggregation
SE
T
Constant data influx
IR
BA
D
Large data sets
Correlation
SE
On-demand visualization
BA
Background data analysis
Collaboration
T
Real-time updates
IR
D
On-the-fly data analysis
Datadog = DIRT + BASE + a tiny bit of ACID
23. How It All Fits Together
http://www.flickr.com/photos/tom-margie/1253798184/
33. Compute Network
Fast Fast
Inelastic Localized
Storage
Fast
Centralized
Redundant
ON-PREMISE TRAITS
http://www.flickr.com/photos/theplanetdotcom/4879419788/sizes/l/in/photostream/
34. Compute Network
Fast Fast
Inelastic Localized
Storage
Fast Management
Centralized People-based
Redundant Full access
ON-PREMISE TRAITS
http://www.flickr.com/photos/theplanetdotcom/4879419788/sizes/l/in/photostream/
45. Latency
BASE
Amazon S3
BASE
Apache Cassandra
ACID
PostgreSQL
DIRT
Redis
Capacity
Storage
46. Latency
BASE
y
nc
Amazon S3
te
La
t
BASE
pu
y
gh
er
Apache Cassandra
ou
ACID tt
hr
Ji
dt
PostgreSQL
i te
Lim
DIRT
y
or
em
Redis
Capacity
m
w
Lo
Storage
60. Network Block Storage
Is The Dark Side
Bait For Enterprise
Customers
Hard Problem For
Cloud Providers
61. Don’t rely on networked block storage
Small data sets only if you have to
Don’t trust data-at-rest
Copy, replicate, back up
Do use S3 if you can
Object semantics a limitation
Slow but durable
Some Do’s And Don’t
63. “Performance”
Scale up Shard
ACID
Nodes
BASE DIRT Add more
Nodes Nodes
Number
Compute
64. Don’t rely on scale-ups
Low memory a hard limit for DBs
Noisy neighbors
Individual performance poor and jittery
Scale out
First scale up
Then Shard
Parallelize across machines
Vector-processing via GPUs
Some Do’s And Don’t
66. An API for everything
Compute
Storage
Network
Management
67. Do use the AWS APIs
Almost like magic
Rich libraries
Ever expanding
Do use tools
e.g. Chef, Puppet, cfengine, etc.
Datadog
Do Kill and Respawn
Low-level debugging impossible
Instance creation is cheap
Some Do’s And Don’t