3. Who Am I?
Member of NetïŹixâs Platform
Engineering team, working on
very large scale data
infrastructure (@g9yuayon)
Friday, March 1, 13 2
4. Who Am I?
Member of NetïŹixâs Platform
Engineering team, working on
very large scale data
infrastructure (@g9yuayon)
Built and operated NetïŹixâs
cloud crypto service
Friday, March 1, 13 2
5. Who Am I?
Member of NetïŹixâs Platform
Engineering team, working on
very large scale data
infrastructure (@g9yuayon)
Built and operated NetïŹixâs
cloud crypto service
Worked with Jae Bae on
querying multi-dimensional data
in real time
Friday, March 1, 13 2
6. Friday, March 1, 13 3
Developers usually think about monitoring metrics when âreal-timeâ data is
mentioned. We have powerful monitoring systems that track millions of metrics
per second. But Iâm not going to talk about it today. Monitoring metric is crucial
data. That itself would warrant another multi-hour talk by our monitoring
team. :-)
7. No Monitoring Metrics Today
Friday, March 1, 13 3
Developers usually think about monitoring metrics when âreal-timeâ data is
mentioned. We have powerful monitoring systems that track millions of metrics
per second. But Iâm not going to talk about it today. Monitoring metric is crucial
data. That itself would warrant another multi-hour talk by our monitoring
team. :-)
11. Server Farm
Log Filter Sink Plugin Hadoop
Server Farm Kafka
Log Filter Sink Plugin Druid
Log Collectors
Server Farm
Log Filter Sink Plugin ElasticSearch
photo credit: http://www.ïŹickr.com/photos/decade_null/142235888/sizes/m/in/photostream/
Friday, March 1, 13 7
We have this tens of thousands of machines, all of which send log data over a robust data
pipeline to highly reliable data collectors. The collectors then ïŹlter the data, transform the
data, and dispatch the data to to different destinations for further processing.
Photo credit: http://www.ïŹickr.com/photos/decade_null/142235888/sizes/m/in/
photostream/
12. Highly Reliable Data Pipeline
Server Farm
Log Filter Sink Plugin Hadoop
Server Farm Kafka
Log Filter Sink Plugin Druid
Log Collectors
Server Farm
Log Filter Sink Plugin ElasticSearch
photo credit: http://www.ïŹickr.com/photos/decade_null/142235888/sizes/m/in/photostream/
Friday, March 1, 13 7
We have this tens of thousands of machines, all of which send log data over a robust data
pipeline to highly reliable data collectors. The collectors then ïŹlter the data, transform the
data, and dispatch the data to to different destinations for further processing.
Photo credit: http://www.ïŹickr.com/photos/decade_null/142235888/sizes/m/in/
photostream/
13. A Humble Beginning
Friday, March 1, 13 8
We didnât build everything in one night. Actually, we had a humble start. I did a lot of log
scraping like these. I also used R to analyze logs. But these are speciïŹc tasks, and at some
point
14. A Humble Beginning
Friday, March 1, 13 8
We didnât build everything in one night. Actually, we had a humble start. I did a lot of log
scraping like these. I also used R to analyze logs. But these are speciïŹc tasks, and at some
point
15. A Humble Beginning
Friday, March 1, 13 8
We didnât build everything in one night. Actually, we had a humble start. I did a lot of log
scraping like these. I also used R to analyze logs. But these are speciïŹc tasks, and at some
point
16. A Humble Beginning
Friday, March 1, 13 8
We didnât build everything in one night. Actually, we had a humble start. I did a lot of log
scraping like these. I also used R to analyze logs. But these are speciïŹc tasks, and at some
point
17. Friday, March 1, 13 9
Something happened. Our traffic turned into a hockey stick, and the number of applications
exploded. So, log traffic also exploded. Simple log scraping wouldnât cut it any more.
18. Friday, March 1, 13 9
Something happened. Our traffic turned into a hockey stick, and the number of applications
exploded. So, log traffic also exploded. Simple log scraping wouldnât cut it any more.
19. Application
Application
Application
Application Application
Application
Application Application
Application Application
Friday, March 1, 13 9
Something happened. Our traffic turned into a hockey stick, and the number of applications
exploded. So, log traffic also exploded. Simple log scraping wouldnât cut it any more.
20. So We Evolved
Friday, March 1, 13 10
So we evolved. One thing we built was a hadoop grep. This tool searches TBs of data. It is
much more useful that the one provided by Apache Hadoop Distribution, because it supports
many more Grep options like context, sorting by columns, and etc. And DSEâs Hadoop-as-a-
service greatly helps each team.
21. So We Evolved
Friday, March 1, 13 10
So we evolved. One thing we built was a hadoop grep. This tool searches TBs of data. It is
much more useful that the one provided by Apache Hadoop Distribution, because it supports
many more Grep options like context, sorting by columns, and etc. And DSEâs Hadoop-as-a-
service greatly helps each team.
22. So We Evolved
hgrep -C 10 -k 5,2,3 'users.*[1-9]{3}' *catalina.out s3//bucket
Friday, March 1, 13 10
So we evolved. One thing we built was a hadoop grep. This tool searches TBs of data. It is
much more useful that the one provided by Apache Hadoop Distribution, because it supports
many more Grep options like context, sorting by columns, and etc. And DSEâs Hadoop-as-a-
service greatly helps each team.
23. So We Evolved
hgrep -C 10 -k 5,2,3 'users.*[1-9]{3}' *catalina.out s3//bucket
Friday, March 1, 13 10
So we evolved. One thing we built was a hadoop grep. This tool searches TBs of data. It is
much more useful that the one provided by Apache Hadoop Distribution, because it supports
many more Grep options like context, sorting by columns, and etc. And DSEâs Hadoop-as-a-
service greatly helps each team.
24. Friday, March 1, 13 11
A search tool that searches live instancesâ logs is also developed.
25. Friday, March 1, 13 11
A search tool that searches live instancesâ logs is also developed.
26. Friday, March 1, 13 11
A search tool that searches live instancesâ logs is also developed.
27. Friday, March 1, 13 11
A search tool that searches live instancesâ logs is also developed.
28. Friday, March 1, 13 11
A search tool that searches live instancesâ logs is also developed.
29. Friday, March 1, 13 11
A search tool that searches live instancesâ logs is also developed.
30. Field Name Field Value
Client âAPIâ
Server âCryptexâ
StatusCode 200
ResponseTime 73
Friday, March 1, 13 12
Hive becomes indispensable.
34. Friday, March 1, 13 14
So we built yet another tool to scratch it with the help of Druid.
35. Still, We Have a Real-Time Itch
Friday, March 1, 13 14
So we built yet another tool to scratch it with the help of Druid.
36. Friday, March 1, 13 15
Error summary in the past 10 seconds. You get to slice and dice through arbitrary
combination of different dimension across multiple time series.
Trends over search query of â90210â by Canadians
How many people started streaming any episode of House of Cards in the past hour, grouped
37. Friday, March 1, 13 15
Error summary in the past 10 seconds. You get to slice and dice through arbitrary
combination of different dimension across multiple time series.
Trends over search query of â90210â by Canadians
How many people started streaming any episode of House of Cards in the past hour, grouped
38. Friday, March 1, 13 15
Error summary in the past 10 seconds. You get to slice and dice through arbitrary
combination of different dimension across multiple time series.
Trends over search query of â90210â by Canadians
How many people started streaming any episode of House of Cards in the past hour, grouped
39. Friday, March 1, 13 16
A query of all the users who started streaming House of Cards in the past three hours, and
results came back in seconds.
40. Friday, March 1, 13 16
A query of all the users who started streaming House of Cards in the past three hours, and
results came back in seconds.
41. Friday, March 1, 13 16
A query of all the users who started streaming House of Cards in the past three hours, and
results came back in seconds.
43. See You
Tomorrow
Friday, March 1, 13 18
If youâre interested in how we did the real-time interactive queries with the help of Druid, do
come to our talk. See you tomorrow