Collecting and Making Sense of Diverse Data at WayUp
1. Collecting and Making Sense of
Diverse Data at WayUp
Harlan D. Harris, PhD
Director of Data Science
DataEngConf 2017
Thanks to:
JJ Fliegelman (CTO)
WayUp Engineers!
2. Why we built WayUp...
The leading digital platform for
employers to reach, recruit, and
engage candidates in an authentic way.
… with the focus on college students and recent grads.
One of thirty innovative
companies changing the
world.
2
3.
4. Talking about Choices
● Where we focus effort
○ Event Collection & Data Refinement
● Tech stack
○ Segment, Redshift, dbt, Periscope
● Warehouse table design
○ ELT, layers & abstractions
● We’re Hiring!
4
6. Why We Warehouse
Support Business Analytics and Product (Data Science)
● Clean, Normalized Tables
● Abstract over Changes in Systems
● Right Type of Domain Knowledge
6
Data Reflects
the World
Decisions & Products
Reflect the World
8. Event Tracking
● Heap approach
○ Developers don’t make choices
○ Automatically get every load and click
○ UI changes can lose continuity
● Traditional approach
○ Developers choose what to track
○ Can miss stuff -- requires communication!
○ Can keep semantic continuity across
changes
○ Less lock-in
8
“Actions with
Meaning”
9. Redshift and Spectrum
● Value of familiarity, broad support
● Sweet spot in scale, room to grow
● Spectrum
○ External tables on S3 CSV
○ Query and join like internal tables
○ Avoid or delay loading until needed
○ Use Transform tools to load
9
10. The ELT Pattern
● “Data Lake” in columnar database
● Piped in via Segment data loader
● Transform on-database vs. in-transit
● Requires compute power, but space is cheap
● Can be more agile, “schema on read”
10
“most data transformation use cases can be much more
effectively handled in-database rather than in some
external processing layer” -dbt
13. Dimension Tables and Activity Streams
13
hist_user
now_userdim_user
act_
user
actor
ts
action
object
ob_type
properties
Alice
Nov 2nd
viewed
sales-123
listing
{ pos: 3 }
fact table with
specific, consistent
structure
(see WeWork talk!)
14. What We’ve Learned; Where We’re Going
● Pay close attention to
what you store, and
how you refine data
● Tools now are amazing
● Design with empathy
and creativity
14
● Grow it with the
business!
● Build insights &
products to help our
users and customers!
15. Thanks!
Data Scientist (Recommender Systems)
Data Engineer (this stuff!)
FS & BE Engineers (Python)
harlan@wayup.com, @harlanh
We’re Hiring!