2. Snowplow was born in 2012
Web data: rich but GA /
SiteCatalyst are limited
“Big data” tech
• Marketing, not product
analytics
• Silo’d: can’t join with
other customer data
Snowplow
• Open source
frameworks
• Cloud services
• Open source click
stream data warehouse
• Event level: any query
• Built on top of
Cloudfront / EMR /
Hadoop
3. The plan: spend 6 months
building a pipeline…
…then get back to using the data
5. Increased project scope
• Click stream data warehouse -> Event analytics
platform
• Collect events from anywhere, not just the web
• Make event data actionable in real-time
• Support more in-pipeline processing steps (enrichment
and modeling)
• Support more storage targets (where your data is has big
implications for what you can do with that data)
9. What makes Snowplow special?
• Data pipeline evolves with your business
• Channel coverage
• Flexibility: where your data is delivered
• Flexibility: how your data is processed
(enrichment and modeling)
• Data quality
• Speed
• Transparency
10. Used by 100s (1000s?) of companies…
…to answer their most important business questions
11. But there’s still much more to
build!
• Improve automation around schema evolution
• Make modeling event data easier, more robust,
more performant
• Support more storage targets
• Make it easier to act on event data
Data modeling in Spark
Druid, BigQuery, graph databases
Analytics SDKs, Sauna
Iglu: machine-readable schema registry