What Are The Drone Anti-jamming Systems Technology?
Path to 400M Members: LinkedIn’s Data Powered Journey
1. Xin Fu, Carl Steinbach
Hadoop Summit
Tokyo, October 26, 2016
Path to 400M* Members: LinkedIn’s Data Powered Journey
* As of Q2 2016, LinkedIn had 450M members world wide
7. What is This Phase Comprised of?
7
● Dashboards
● Reports
● Trend explanation
○ Short term fluctuation:
investigation
○ Long term trend:
strategic analysis
8. Past Challenges
8
Reliability
● Easily broken without operational support, huge time spent in
maintenance
Diverse technology
● Self maintained pipelines
● Various UIs with different visualization capabilities
● Redundant computation
9. Standardized Reporting Tool
9
● Reduces dependency on 3rd party BI tools
● Closer integration with LinkedIn’s ecosystem of experimentation
and anomaly detection solutions
10. Towards Real Time Monitoring
10
Sign-up
Country
Platform
Language
Browser
Signup Type
OS
12. What is This Phase Comprised of?
12
● Experiment design
● Experiment analysis to inform ramp decisions
● Learning from multiple experiments to identify what works and
what doesn’t work
13. Past Challenges
13
Experiment design
● Interaction between experiments
Experiment analysis and ramp decision
● Manual analysis, extended time-to-
decision
● Ramp decisions based on localized
metrics
● Reruns needed sometimes due to
undetected errors in setup
Worst of all, some ramps happened without
A/B testing
● e.g. infrastructural changes
14. Experimentation Platform @ LinkedIn
14
● Company-wide platform for A/B
testing, ramping, and advanced
targeting needs
● Automated reporting and analysis
capabilities
15. Tiering of Metrics
15
Metrics at different tier:
● Different review processes
● Different levels of visibility in dashboards
and experiment scorecards
● Different computation priorities and
SLAs in data pipelines
● Different life cycles
18. Tracking Data Lifecycle and Teams
18
Product teams:
PMs, Developers, TestEng
Infra teams:
Hadoop, Kafka, DWH,
...
Data teams:
Analytics, Relevance Engineers,...
19. Example: How Do We Track a Profile View?
19
PageViewEvent
Record 1:
{
"header" : {
"memberId" : 12345,
"time" : 1454745292951,
"appName" : {
"string" : "LinkedIn"
"pageKey" : "profile_page"
},
},
"trackingInfo" : {
["vieweeID" : "23456"],
...
}
}
pageViews = LOAD ‘/data/tracking/PageViewEvent’;
profileViews = FILTER pageViews by
header.pageKey==‘profile_page’;
20. Example: How Do We Track a Profile View?
20
PageViewEvent
Record 1:
{
"header" : {
"memberId" : 12345,
"time" : 1454745292951,
"appName" : {
"string" : "LinkedIn"
"pageKey" : "new_profile_page"
},
},
"trackingInfo" : {
["vieweeID" : "23456"],
...
}
}
pageViews = LOAD ‘/data/tracking/PageViewEvent’;
profileViews = FILTER pageViews by
header.pageKey==‘profile_page’ or
header.pageKey==‘new_profile_page’;
22. How Do We Handle Old and New?
22
Producers Consumers
23. DALI: A Data Access Layer for LinkedIn
Abstract away underlying physical details to allow users
to focus solely on the logical concerns
Logical Tables + Views
Logical FileSystem
We had been working on something that could
help...
24. 24
Data Catalog +
Discovery
(DALI)
DaliFileSystem Client
Data Source
(HDFS)
Data Sink
(HDFS)
Processing Engine
(MapReduce, Spark, Presto)
DALI Datasets (Tables + Views)
Query Layers
(Hive, Pig, Spark)
View Defs + UDFs
(Artifactory, Git)
Dataflow APIs
(MR, Spark, Scalding)DALI CLI
DALI: Implementation Details in Context
26. State of the World Today with Dali
~ 100 producer views
~ 200 consumer views
~ 80 unique tracking event data sources
What’s next?
! Views on streaming data
! Selective materialization and caching
! Open source
31. Interesting Challenges
- Metric trade-off, e.g.
between engagement
vs. monetization
- Real-time everything?
- A/B test in a social
network
- Human judge for
personalized search
- Value of an action
31
32. It Took a Village
32
Thanks to all the Data Scientists, Engineers and Product partners at
LinkedIn for being part of this great journey!
https://engineering.linkedin.com/data