How do you run analytics queries on a dataset that grows by billions of entries a day? What if you need to be able to drill into it, filter it or aggregate it by any available dimension? On demand, without precomputing, and with sub second latency? Oh and this isn't for an internal dashboard. The interface is customer-facing and is going to be accessed by thousands of clients.
In this talk, I'll tell you how Criteo, one of the biggest ad tech firms in the world, uses Druid to build its new analytics platform.
Druid is an open-source, real-time data store designed to power interactive applications at scale. I’ll walk you through its architecture, explain how it scales and how data is stored on disk and in memory to serve queries faster than you can blink.
44. SELECT sum(revenue) AS “Revenue",
sum(sales) AS “Sales"
FROM customer-insights
WHERE client_id = 2255 AND date BETWEEN "2014-08-01" AND "2014-08-08"
GROUP BY day(date)
58. Iphone Google 0.35€08:12:00
Android Yahoo 0.2€08:12:00
Iphone Google 0.1€08:12:37
Android Yahoo 0.2€08:12:38
Iphone Google 0.15€08:12:39
Iphone Google 0.1€08:12:40
59. Iphone Google 0.1€08:12:37
Android Yahoo 0.2€08:12:38
Iphone Google 0.15€08:13:02
Iphone Google 0.1€08:13:08
bob@mail.com
joe@mail.com
bob@mail.com
tony@mail.com
60.
61.
62.
63.
64. Iphone Google Computer 0.1€08:12:37
Android Yahoo Cloth 0.2€08:12:38
Iphone Google Computer 0.1€08:12:37
Android Yahoo Cloth 0.2€08:12:38