© 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 1
© 2014 MapR Technologies 2 
Agenda 
• The Internet is turning upside down 
• Distributed nervous system 
• The last (mile)...
© 2014 MapR Technologies 3 
How the Internet Works 
• Big content servers feed data across the backbone to 
• Regional cac...
© 2014 MapR Technologies 4 
How The Internet Works 
Server 
Cache 
Cache 
Gateway 
Switch 
Firewall 
c1 
c2 
Gateway 
Swit...
© 2014 MapR Technologies 5 
Conservation of Bits Decreases Bandwidth 
Server 
Cache 
Cache 
Gateway 
Switch 
Firewall 
c1 ...
© 2014 MapR Technologies 6 
Total Investment Dominated by Last Mile 
Server 
Cache 
Cache 
Gateway 
Switch 
Firewall 
c1 
...
© 2014 MapR Technologies 7 
The Rub 
• What's the problem? 
– Speed (end-to-end latency, backbone bw) 
– Feasibility (cost...
© 2014 MapR Technologies 8 
What has changed? 
Where will it lead?
© 2014 MapR Technologies 9
© 2014 MapR Technologies 10
© 2014 MapR Technologies 11
© 2014 MapR Technologies 12
© 2014 MapR Technologies 13
© 2014 MapR Technologies 14
© 2014 MapR Technologies 15
© 2014 MapR Technologies 16
© 2014 MapR Technologies 17
© 2014 MapR Technologies 18
© 2014 MapR Technologies 19 
Things
© 2014 MapR Technologies 20 
Emitting data
© 2014 MapR Technologies 21 
How The Internet Works 
Server 
Cache 
Cache 
Gateway 
Switch 
Firewall 
c1 
c2 
Gateway 
Swi...
© 2014 MapR Technologies 22 
How the Internet is Going to Work 
Server 
Cache 
Cache 
Controller Switch Gateway 
m4 
m3 
G...
© 2014 MapR Technologies 23 
Where Will The $ Go? 
Server 
Cache 
Cache 
Controller Switch Gateway 
m4 
m3 
Gateway 
Switc...
© 2014 MapR Technologies 24 
Sensors
© 2014 MapR Technologies 25 
Controllers
© 2014 MapR Technologies 26 
The Problems 
• Sensors and controllers have little processing or space 
– SIM cards = 20Mhz ...
© 2014 MapR Technologies 27 
What Do We Need to Do With a Time Series 
• Acquire 
– Measurement, transmission, reception 
...
© 2014 MapR Technologies 28 
Retrieval Requirements 
• Retrieve by time-series, time range, tags 
– Possibly pull millions...
© 2014 MapR Technologies 29 
Storage choices and trade-offs 
• Flat files 
– Great for rapid ingest with massive data 
– H...
© 2014 MapR Technologies 30 
Specific Example 
• Consider a server farm 
• Lots of system metrics 
• Typically 100-300 sta...
© 2014 MapR Technologies 31 
The General Outline 
• 10 samples / second / machine 
x 1,000 machines 
= 10,000 samples / se...
© 2014 MapR Technologies 32 
Specific Example 
• Consider oil drilling rigs 
• When drilling wells, there are *lots* of mo...
© 2014 MapR Technologies 33 
The General Outline 
• 10K samples / second / rig 
x 100 rigs 
= 1M samples / second
© 2014 MapR Technologies 34 
The General Outline 
• 10K samples / second / rig 
x 100 rigs 
= 1M samples / second 
• But w...
© 2014 MapR Technologies 35 
How does that Work (Open TSDB on MapR)? 
Message 
queue 
Collector 
MapR 
table Samples 
Web ...
© 2014 MapR Technologies 36 
Introduction to Open TSDB 
HBase 
or 
MapR-DB
© 2014 MapR Technologies 37 
Wide Table Design: Point-by-Point
© 2014 MapR Technologies 38 
Wide Table Design: Hybrid Point-by-Point + Blob 
Insertion of data as blob makes original col...
© 2014 MapR Technologies 39 
Speeding up OpenTSDB 
20,000 data points per second per node in the test cluster 
Why can’t i...
© 2014 MapR Technologies 40 
Status to This Point 
• Each sample requires one insertion, compaction requires 
another 
• T...
© 2014 MapR Technologies 41 
Small Trick … Buffer Data in Memory 
Message 
queue Samples 
Users 
Collector 
MapR 
table 
W...
© 2014 MapR Technologies 42 
Speeding up OpenTSDB: open source MapR extensions 
Available on Github: https://github.com/ma...
© 2014 MapR Technologies 43 
Status to This Point 
• 3600 samples require one insertion 
• Typical results on SE cluster 
...
© 2014 MapR Technologies 44 
Key Lessons 
• Ingestion is network limited 
– Edge nodes are the critical resource 
– Number...
© 2014 MapR Technologies 45 
Overall Ingestion Rate 
Nodes 
Total Ingestion Rate (millions of points / second) 
4 5 8 9 
0...
© 2014 MapR Technologies 46 
Normalized Ingestion Rate 
Nodes 
Ingestion per node (millions of points / second) 
4 5 8 9 
...
© 2014 MapR Technologies 47 
Why MapR? 
• MapR tables are inherently faster, safer 
– Sustained > 1GB/s ingest rate in tes...
© 2014 MapR Technologies 48 
When is this All Wrong? 
• In some cases, retrieval by series-id + time range not sufficient ...
© 2014 MapR Technologies 49 
Summary 
• The internet is turning upside down 
• This will make time series ubiquitous 
• Cu...
© 2014 MapR Technologies 50 
Questions
© 2014 MapR Technologies 51 
Thank You 
@mapr maprtech 
tdunning@mapr.com 
tdunning@apache.org 
Ted Dunning, Chief Applica...
Nächste SlideShare
Wird geladen in …5
×

Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL matters Barcelona 2014

1.465 Aufrufe

Veröffentlicht am

Ted Dunning – Very High Bandwidth Time Series Database Implementation

This talk will describe our work in creating time series databases with very high ingest rates (over 100 million points / second) on very small clusters. Starting with openTSDB and the off-the-shelf version of MapR-DB, we were able to accelerate ingest by >1000x. I will describe our techniques in detail and talk about the architectural changes required. We are also working to allow access to openTSDB data using SQL via Apache Drill. In addition, I will talk about how this work has implications regarding the much fabled Internet of Things. And tell some stories about the origins of open source big data in the 19th century at sea.

Veröffentlicht in: Daten & Analysen
0 Kommentare
4 Gefällt mir
Statistik
Notizen
  • Als Erste(r) kommentieren

Keine Downloads
Aufrufe
Aufrufe insgesamt
1.465
Auf SlideShare
0
Aus Einbettungen
0
Anzahl an Einbettungen
8
Aktionen
Geteilt
0
Downloads
48
Kommentare
0
Gefällt mir
4
Einbettungen 0
Keine Einbettungen

Keine Notizen für die Folie

Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL matters Barcelona 2014

  1. 1. © 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 1
  2. 2. © 2014 MapR Technologies 2 Agenda • The Internet is turning upside down • Distributed nervous system • The last (mile) shall be first • Time series on NO-SQL • Faster time series on NO-SQL
  3. 3. © 2014 MapR Technologies 3 How the Internet Works • Big content servers feed data across the backbone to • Regional caches and servers feed data across neighborhood transport to • The “last mile” • Bits are nearly conserved, $ are concentrated centrally – But total $ mass at the edge is much higher
  4. 4. © 2014 MapR Technologies 4 How The Internet Works Server Cache Cache Gateway Switch Firewall c1 c2 Gateway Switch Firewall c1 c2 Switch Firewall c1 c2
  5. 5. © 2014 MapR Technologies 5 Conservation of Bits Decreases Bandwidth Server Cache Cache Gateway Switch Firewall c1 c2 Gateway Switch Firewall c1 c2 Switch Firewall c1 c2
  6. 6. © 2014 MapR Technologies 6 Total Investment Dominated by Last Mile Server Cache Cache Gateway Switch Firewall c1 c2 Gateway Switch Firewall c1 c2 Switch Firewall c1 c2
  7. 7. © 2014 MapR Technologies 7 The Rub • What's the problem? – Speed (end-to-end latency, backbone bw) – Feasibility (cost for consumer links) – Caching • What do we need? – Cheap last-mile hardware – Good caches
  8. 8. © 2014 MapR Technologies 8 What has changed? Where will it lead?
  9. 9. © 2014 MapR Technologies 9
  10. 10. © 2014 MapR Technologies 10
  11. 11. © 2014 MapR Technologies 11
  12. 12. © 2014 MapR Technologies 12
  13. 13. © 2014 MapR Technologies 13
  14. 14. © 2014 MapR Technologies 14
  15. 15. © 2014 MapR Technologies 15
  16. 16. © 2014 MapR Technologies 16
  17. 17. © 2014 MapR Technologies 17
  18. 18. © 2014 MapR Technologies 18
  19. 19. © 2014 MapR Technologies 19 Things
  20. 20. © 2014 MapR Technologies 20 Emitting data
  21. 21. © 2014 MapR Technologies 21 How The Internet Works Server Cache Cache Gateway Switch Firewall c1 c2 Gateway Switch Firewall c1 c2 Switch Firewall c1 c2
  22. 22. © 2014 MapR Technologies 22 How the Internet is Going to Work Server Cache Cache Controller Switch Gateway m4 m3 Gateway Switch Controller m6 m5 Switch m2 Controller m1
  23. 23. © 2014 MapR Technologies 23 Where Will The $ Go? Server Cache Cache Controller Switch Gateway m4 m3 Gateway Switch Controller m6 m5 Switch m2 Controller m1
  24. 24. © 2014 MapR Technologies 24 Sensors
  25. 25. © 2014 MapR Technologies 25 Controllers
  26. 26. © 2014 MapR Technologies 26 The Problems • Sensors and controllers have little processing or space – SIM cards = 20Mhz processor, 128kb space = 16kB – Arduino mini = 15kB RAM (more EPROM) – BeagleBone/Raspberry Pi = 500 kB RAM • Sensors and controllers have little power – Very common to power down 99% of the time • Sensors and controls often have very low bandwidth – Mesh networks with base rates << 1Mb/s – Power line networking – Intermittent 3G/4G/LTE connectivity
  27. 27. © 2014 MapR Technologies 27 What Do We Need to Do With a Time Series • Acquire – Measurement, transmission, reception – Mostly not our problem • Store – We own this • Retrieve – We have to allow this • Analyze and visualize – We facilitate this via retrieval
  28. 28. © 2014 MapR Technologies 28 Retrieval Requirements • Retrieve by time-series, time range, tags – Possibly pull millions of data points at a time – Possibly do on-the-fly windowed aggregations • Search by unstructured data – Typically require time windowed facetting after search – Also need to dive in with first kind of retrieval
  29. 29. © 2014 MapR Technologies 29 Storage choices and trade-offs • Flat files – Great for rapid ingest with massive data – Handles essentially any data type – Less good for data requiring frequent updates – Harder to find specific ranges • Traditional relational db – Ingests up to 10,000’s/ sec; prefers well structured (numerical) data; expensive • Non-relational db: Tables (such as MapR tables in M7 or HBase) – Ingests up to 100,000 rows/sec – Handles wide variety of data – Good for frequent updates – Easily scanned in a range
  30. 30. © 2014 MapR Technologies 30 Specific Example • Consider a server farm • Lots of system metrics • Typically 100-300 stats / 30 s • Loads, RPC’s, packets, requests/s • Common to have 100 – 10,000 machines
  31. 31. © 2014 MapR Technologies 31 The General Outline • 10 samples / second / machine x 1,000 machines = 10,000 samples / second • This is what Open TSDB was designed to handle • Install and go, but don’t test at scale
  32. 32. © 2014 MapR Technologies 32 Specific Example • Consider oil drilling rigs • When drilling wells, there are *lots* of moving parts • Typically a drilling rig makes about 10K samples/s • Temperatures, pressures, magnetics, machine vibration levels, salinity, voltage, currents, many others • Typical project has 100 rigs
  33. 33. © 2014 MapR Technologies 33 The General Outline • 10K samples / second / rig x 100 rigs = 1M samples / second
  34. 34. © 2014 MapR Technologies 34 The General Outline • 10K samples / second / rig x 100 rigs = 1M samples / second • But wait, there’s more – Suppose you want to test your system – Perhaps with a year of data – And you want to load that data in << 1 year • 100x real-time = 100M samples / second
  35. 35. © 2014 MapR Technologies 35 How does that Work (Open TSDB on MapR)? Message queue Collector MapR table Samples Web service Users
  36. 36. © 2014 MapR Technologies 36 Introduction to Open TSDB HBase or MapR-DB
  37. 37. © 2014 MapR Technologies 37 Wide Table Design: Point-by-Point
  38. 38. © 2014 MapR Technologies 38 Wide Table Design: Hybrid Point-by-Point + Blob Insertion of data as blob makes original columns redundant This is the way that TSD should work, not quite how it does work
  39. 39. © 2014 MapR Technologies 39 Speeding up OpenTSDB 20,000 data points per second per node in the test cluster Why can’t it be faster ?
  40. 40. © 2014 MapR Technologies 40 Status to This Point • Each sample requires one insertion, compaction requires another • Typical performance on SE cluster – 1 edge node + 4 cluster nodes – 20,000 samples per second observed – Would be faster on performance cluster, possibly not a lot • Suitable for server monitoring • Not suitable for large scale history ingestion • Bulk load helps a little, but not much • Still 1000x too slow for industrial work
  41. 41. © 2014 MapR Technologies 41 Small Trick … Buffer Data in Memory Message queue Samples Users Collector MapR table Web service Log Buffering data for 1 hour in collector allows >1000x decrease in insertion rate Logging latest hour of data allows clean restart of collector (lambda + epsilon architecture) Web service queries database and collector
  42. 42. © 2014 MapR Technologies 42 Speeding up OpenTSDB: open source MapR extensions Available on Github: https://github.com/mapr-demos/opentsdb
  43. 43. © 2014 MapR Technologies 43 Status to This Point • 3600 samples require one insertion • Typical results on SE cluster – 1 edge node + 4 cluster nodes – 14 million samples per second observed – ~700x faster ingestion • Typical results on performance cluster – 2-4 edge nodes + 4-9 cluster nodes – 110 million samples/s (4 nodes) to >200 million samples/s (8 nodes) • Suitable for large scale history ingestion • 30 million data points retrieved in 20s • Ready for industrial work
  44. 44. © 2014 MapR Technologies 44 Key Lessons • Ingestion is network limited – Edge nodes are the critical resource – Number of edge nodes defines a limit to scaling • With enough edge nodes scaling is near perfect • Performance of raw OpenTSDB is limited by stateless demon • Modified OpenTSDB can run 1000x faster
  45. 45. © 2014 MapR Technologies 45 Overall Ingestion Rate Nodes Total Ingestion Rate (millions of points / second) 4 5 8 9 0 50 150 250
  46. 46. © 2014 MapR Technologies 46 Normalized Ingestion Rate Nodes Ingestion per node (millions of points / second) 4 5 8 9 0 10 20 30 40
  47. 47. © 2014 MapR Technologies 47 Why MapR? • MapR tables are inherently faster, safer – Sustained > 1GB/s ingest rate in tests • Mirror to M5 or M7 cluster to isolate analytics load • Transaction logs involves frequent appends, many files
  48. 48. © 2014 MapR Technologies 48 When is this All Wrong? • In some cases, retrieval by series-id + time range not sufficient • May need very flexible retrieval of events based on text-like criteria • Search may be better than class time-series database • Can scale Lucene based search to > 1 million events / second
  49. 49. © 2014 MapR Technologies 49 Summary • The internet is turning upside down • This will make time series ubiquitous • Current open source systems are much too slow • We can fix that with modern NoSQL systems – (I wear a red hat for a reason)
  50. 50. © 2014 MapR Technologies 50 Questions
  51. 51. © 2014 MapR Technologies 51 Thank You @mapr maprtech tdunning@mapr.com tdunning@apache.org Ted Dunning, Chief Application Architect MapRTechnologies maprtech mapr-technologies

×