SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Developing Data Products
SF Data Science Meetup
Pete Skomoroch @peteskomoroch
September 19 2013
©2012 LinkedIn Corporation. All Rights Reserved.
Developing Data Products
Examples, Techniques, & Lessons Learned
Our Mission
Connect the world’s professionals to make them
more productive and successful.
Our Vision
Create economic opportunity for every
professional in the world.
Members First!
LinkedIn is the leading professional network site
Worldwide Workforce
3,300M+
2
Worldwide
Professionals
640M+
2
LinkedIn Members
238M+
1
©2012 LinkedIn Corporation. All Rights Reserved. 4
LinkedIn profiles represent our professional identity
©2012 LinkedIn Corporation. All Rights Reserved. 5
238MMembers 238M Member
Profiles
1 2
We have a lot of data.
©2012 LinkedIn Corporation. All Rights Reserved.
We have a lot of data.
And (like everyone else), we store it in Hadoop.
©2012 LinkedIn Corporation. All Rights Reserved.
We have a lot of data.
And (like everyone else), we store it in Hadoop.
And people build awesome things with that data.
©2012 LinkedIn Corporation. All Rights Reserved.
What do we mean by data
products?
Building products from data at LinkedIn
A few examples:
 People You May Know
 Skills and Endorsements
 Year in Review
 Network Updates Digest
 InMaps
 Who’s viewed my profile
 Collaborative Filtering
 Groups You May Like
 and more…
©2012 LinkedIn Corporation. All Rights Reserved.
Collaborative Filtering: LinkedIn Skill Pages
©2012 LinkedIn Corporation. All Rights Reserved.
Classification: giving structure to unstructured data
©2012 LinkedIn Corporation. All Rights Reserved.
Extract
Clustering & Disambiguation
©2012 LinkedIn Corporation. All Rights Reserved.
De-duplication and Normalization
©2012 LinkedIn Corporation. All Rights Reserved.
©2012 LinkedIn Corporation. All Rights Reserved. 15
Network Algorithms: Relevance & Ranking
Prediction: Personalized Skill Recommendations
©2012 LinkedIn Corporation. All Rights Reserved.
Skill Endorsements: Over 2 Billion and Growing
©2012 LinkedIn Corporation. All Rights Reserved.
©2012 LinkedIn Corporation. All Rights Reserved. 20
Social Proof and the Skill Endorsement Graph
The Economic Graph: Skills, Jobs, People, Locations…
©2012 LinkedIn Corporation. All Rights Reserved. 21
Location
Lessons learned developing data
products
Collect the right data at the right time
Large amounts of data can reveal new patterns
©2012 LinkedIn Corporation. All Rights Reserved. 24
ProbabilityofJobTitle
Time since graduation
Be wary of “black-box” approaches
©2012 LinkedIn Corporation. All Rights Reserved. 25
Look at your data
©2012 LinkedIn Corporation. All Rights Reserved. 26
Aggregate statistics can be misleading
©2012 LinkedIn Corporation. All Rights Reserved. 27
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9 10
Build a viewer app, “micro-listen”
©2012 LinkedIn Corporation. All Rights Reserved. 28
Algorithmic intuition: include data geeks in design
©2012 LinkedIn Corporation. All Rights Reserved. 29
OODA: Think like a jet fighter
©2012 LinkedIn Corporation. All Rights Reserved. 30
OODA: Observe, Orient, Decide, Act
©2012 LinkedIn Corporation. All Rights Reserved. 31
OODA: The speed you can move determines victory
©2012 LinkedIn Corporation. All Rights Reserved. 32
Red teaming: what can go wrong likely will
©2012 LinkedIn Corporation. All Rights Reserved. 33
Error data is valuable, analyze it and adapt
©2012 LinkedIn Corporation. All Rights Reserved. 34
Conclusion: tips for developing data products
 Collect the right data at the right time
 Large amounts of data can reveal new patterns
 Be wary of “black box” approaches
 Look at your raw data
 Aggregate statistics can be misleading
 Build and use viewer apps
 Include data geeks in design process
 OODA: Think like a jet fighter
 Red-teaming: anticipate edge cases
 Find opportunity in your error data
©2012 LinkedIn Corporation. All Rights Reserved.
Questions?
@peteskomoroch
©2012 LinkedIn Corporation. All Rights Reserved. 36

Weitere ähnliche Inhalte

Was ist angesagt?

Bg wesleyan liberal arts to silicon valley oct 2016
Bg wesleyan liberal arts to silicon valley oct 2016Bg wesleyan liberal arts to silicon valley oct 2016
Bg wesleyan liberal arts to silicon valley oct 2016Bhaskar Ghosh
 
Data Infused Product Design and Insights at LinkedIn
Data Infused Product Design and Insights at LinkedInData Infused Product Design and Insights at LinkedIn
Data Infused Product Design and Insights at LinkedInYael Garten
 
The Data Effect: Canadian Big Data & Analytics Update - Dr. Alison Brooks Dir...
The Data Effect: Canadian Big Data & Analytics Update - Dr. Alison Brooks Dir...The Data Effect: Canadian Big Data & Analytics Update - Dr. Alison Brooks Dir...
The Data Effect: Canadian Big Data & Analytics Update - Dr. Alison Brooks Dir...CityAge
 
Become an ai product manager
Become an ai product managerBecome an ai product manager
Become an ai product managerKetan Raval
 
Data Prep - A Key Ingredient for Cloud-based Analytics
Data Prep - A Key Ingredient for Cloud-based AnalyticsData Prep - A Key Ingredient for Cloud-based Analytics
Data Prep - A Key Ingredient for Cloud-based AnalyticsDATAVERSITY
 
Is Data Scientist the Sexiest Job of the 21st century?
Is Data Scientist the Sexiest Job of the 21st century?Is Data Scientist the Sexiest Job of the 21st century?
Is Data Scientist the Sexiest Job of the 21st century?Edureka!
 
Successfully Kickstarting Data Governance's Social Dynamics: Define, Collabor...
Successfully Kickstarting Data Governance's Social Dynamics: Define, Collabor...Successfully Kickstarting Data Governance's Social Dynamics: Define, Collabor...
Successfully Kickstarting Data Governance's Social Dynamics: Define, Collabor...Stijn (Stan) Christiaens
 
Democratizing Intelligence - Sri Ambati, CEO & Co-Founder, H2O.ai
Democratizing Intelligence - Sri Ambati, CEO & Co-Founder, H2O.aiDemocratizing Intelligence - Sri Ambati, CEO & Co-Founder, H2O.ai
Democratizing Intelligence - Sri Ambati, CEO & Co-Founder, H2O.aiSri Ambati
 
Lessons Learned The Hard Way: 32+ Data Science Interviews
Lessons Learned The Hard Way: 32+ Data Science InterviewsLessons Learned The Hard Way: 32+ Data Science Interviews
Lessons Learned The Hard Way: 32+ Data Science InterviewsGregory Kamradt
 
Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!Edureka!
 
Leave IT Alone – The Vast Value of Self-Service
Leave IT Alone – The Vast Value of Self-ServiceLeave IT Alone – The Vast Value of Self-Service
Leave IT Alone – The Vast Value of Self-ServiceDATAVERSITY
 
Artificial Intelligence Beyond Theory & Concepts - Our AI Summer Academy Empo...
Artificial Intelligence Beyond Theory & Concepts - Our AI Summer Academy Empo...Artificial Intelligence Beyond Theory & Concepts - Our AI Summer Academy Empo...
Artificial Intelligence Beyond Theory & Concepts - Our AI Summer Academy Empo...Jothi Periasamy
 
Building Data-Centric Businesses
Building Data-Centric BusinessesBuilding Data-Centric Businesses
Building Data-Centric BusinessesThoughtworks
 
Executive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkExecutive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkPeter Skomoroch
 
Data cleansing presentation by Digital Bucket Company
Data cleansing presentation by Digital Bucket CompanyData cleansing presentation by Digital Bucket Company
Data cleansing presentation by Digital Bucket CompanyHumayun Qureshi
 
Machine Learning and Blockchain by Director of Product at Target
Machine Learning and Blockchain by Director of Product at TargetMachine Learning and Blockchain by Director of Product at Target
Machine Learning and Blockchain by Director of Product at TargetProduct School
 
2017 06-14-getting started with data science
2017 06-14-getting started with data science2017 06-14-getting started with data science
2017 06-14-getting started with data scienceThinkful
 
What does an internet of things business look like?
What does an internet of things business look like?What does an internet of things business look like?
What does an internet of things business look like?Alexandra Deschamps-Sonsino
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
 

Was ist angesagt? (20)

Bg wesleyan liberal arts to silicon valley oct 2016
Bg wesleyan liberal arts to silicon valley oct 2016Bg wesleyan liberal arts to silicon valley oct 2016
Bg wesleyan liberal arts to silicon valley oct 2016
 
Data Infused Product Design and Insights at LinkedIn
Data Infused Product Design and Insights at LinkedInData Infused Product Design and Insights at LinkedIn
Data Infused Product Design and Insights at LinkedIn
 
Benchmarking IT Agility Final Report
Benchmarking IT Agility Final ReportBenchmarking IT Agility Final Report
Benchmarking IT Agility Final Report
 
The Data Effect: Canadian Big Data & Analytics Update - Dr. Alison Brooks Dir...
The Data Effect: Canadian Big Data & Analytics Update - Dr. Alison Brooks Dir...The Data Effect: Canadian Big Data & Analytics Update - Dr. Alison Brooks Dir...
The Data Effect: Canadian Big Data & Analytics Update - Dr. Alison Brooks Dir...
 
Become an ai product manager
Become an ai product managerBecome an ai product manager
Become an ai product manager
 
Data Prep - A Key Ingredient for Cloud-based Analytics
Data Prep - A Key Ingredient for Cloud-based AnalyticsData Prep - A Key Ingredient for Cloud-based Analytics
Data Prep - A Key Ingredient for Cloud-based Analytics
 
Is Data Scientist the Sexiest Job of the 21st century?
Is Data Scientist the Sexiest Job of the 21st century?Is Data Scientist the Sexiest Job of the 21st century?
Is Data Scientist the Sexiest Job of the 21st century?
 
Successfully Kickstarting Data Governance's Social Dynamics: Define, Collabor...
Successfully Kickstarting Data Governance's Social Dynamics: Define, Collabor...Successfully Kickstarting Data Governance's Social Dynamics: Define, Collabor...
Successfully Kickstarting Data Governance's Social Dynamics: Define, Collabor...
 
Democratizing Intelligence - Sri Ambati, CEO & Co-Founder, H2O.ai
Democratizing Intelligence - Sri Ambati, CEO & Co-Founder, H2O.aiDemocratizing Intelligence - Sri Ambati, CEO & Co-Founder, H2O.ai
Democratizing Intelligence - Sri Ambati, CEO & Co-Founder, H2O.ai
 
Lessons Learned The Hard Way: 32+ Data Science Interviews
Lessons Learned The Hard Way: 32+ Data Science InterviewsLessons Learned The Hard Way: 32+ Data Science Interviews
Lessons Learned The Hard Way: 32+ Data Science Interviews
 
Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!
 
Leave IT Alone – The Vast Value of Self-Service
Leave IT Alone – The Vast Value of Self-ServiceLeave IT Alone – The Vast Value of Self-Service
Leave IT Alone – The Vast Value of Self-Service
 
Artificial Intelligence Beyond Theory & Concepts - Our AI Summer Academy Empo...
Artificial Intelligence Beyond Theory & Concepts - Our AI Summer Academy Empo...Artificial Intelligence Beyond Theory & Concepts - Our AI Summer Academy Empo...
Artificial Intelligence Beyond Theory & Concepts - Our AI Summer Academy Empo...
 
Building Data-Centric Businesses
Building Data-Centric BusinessesBuilding Data-Centric Businesses
Building Data-Centric Businesses
 
Executive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkExecutive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you think
 
Data cleansing presentation by Digital Bucket Company
Data cleansing presentation by Digital Bucket CompanyData cleansing presentation by Digital Bucket Company
Data cleansing presentation by Digital Bucket Company
 
Machine Learning and Blockchain by Director of Product at Target
Machine Learning and Blockchain by Director of Product at TargetMachine Learning and Blockchain by Director of Product at Target
Machine Learning and Blockchain by Director of Product at Target
 
2017 06-14-getting started with data science
2017 06-14-getting started with data science2017 06-14-getting started with data science
2017 06-14-getting started with data science
 
What does an internet of things business look like?
What does an internet of things business look like?What does an internet of things business look like?
What does an internet of things business look like?
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 

Ähnlich wie SF Data Science: Developing Data Products

Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips. Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips. Hakka Labs
 
7 Badass Tactics for SlideShare Content Domination
7 Badass Tactics for SlideShare Content Domination7 Badass Tactics for SlideShare Content Domination
7 Badass Tactics for SlideShare Content DominationLinkedIn
 
7 Badass Tactics for Slideshare Content Domination
7 Badass Tactics for Slideshare Content Domination 7 Badass Tactics for Slideshare Content Domination
7 Badass Tactics for Slideshare Content Domination Jason Miller
 
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)Social Fresh Conference
 
Big Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInBig Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInMinh-Hoang Nguyen
 
Big data arch_analytics
Big data arch_analyticsBig data arch_analytics
Big data arch_analyticsSrinu Adira
 
How Linkedin uses Automic for Big Data Processes
How Linkedin uses Automic for Big Data ProcessesHow Linkedin uses Automic for Big Data Processes
How Linkedin uses Automic for Big Data ProcessesCA | Automic Software
 
Emil Eifrém - The Data Platform for Today’s Intelligent Applications
Emil Eifrém - The Data Platform for Today’s Intelligent ApplicationsEmil Eifrém - The Data Platform for Today’s Intelligent Applications
Emil Eifrém - The Data Platform for Today’s Intelligent ApplicationsNeo4j
 
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup Jason Miller
 
Solving Biz Problems with SugarExchange: Session 9: How to Run Contributor Ca...
Solving Biz Problems with SugarExchange: Session 9: How to Run Contributor Ca...Solving Biz Problems with SugarExchange: Session 9: How to Run Contributor Ca...
Solving Biz Problems with SugarExchange: Session 9: How to Run Contributor Ca...SugarCRM
 
SugarCon 2012 Presentation
SugarCon 2012 PresentationSugarCon 2012 Presentation
SugarCon 2012 PresentationJesus Hoyos
 
Keynote Presentation at GraphTalk Oslo 2023
Keynote Presentation at GraphTalk Oslo 2023Keynote Presentation at GraphTalk Oslo 2023
Keynote Presentation at GraphTalk Oslo 2023Neo4j
 
Government GraphSummit: Keynote - Graphs in Government
Government GraphSummit: Keynote - Graphs in GovernmentGovernment GraphSummit: Keynote - Graphs in Government
Government GraphSummit: Keynote - Graphs in GovernmentNeo4j
 
Tamm & kitt
Tamm & kittTamm & kitt
Tamm & kittJeff Roy
 
The LCG Digital Transformation Maturity Model
The LCG Digital Transformation Maturity ModelThe LCG Digital Transformation Maturity Model
The LCG Digital Transformation Maturity ModelLima Consulting Group
 
Rethink business impact of technology
Rethink business impact of technologyRethink business impact of technology
Rethink business impact of technologyMicrosoft Schweiz
 
Remarkable Content for AMA New Orleans, Nov 2014
Remarkable Content for AMA New Orleans, Nov 2014Remarkable Content for AMA New Orleans, Nov 2014
Remarkable Content for AMA New Orleans, Nov 2014David Shephard
 
Building the perfect profile on LinkedIn
Building the perfect profile on LinkedInBuilding the perfect profile on LinkedIn
Building the perfect profile on LinkedInAlex Charraudeau
 
Best practices For Creating Compelling Dashboards
Best practices For Creating Compelling DashboardsBest practices For Creating Compelling Dashboards
Best practices For Creating Compelling DashboardsGoodData
 

Ähnlich wie SF Data Science: Developing Data Products (20)

Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips. Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
 
7 Badass Tactics for SlideShare Content Domination
7 Badass Tactics for SlideShare Content Domination7 Badass Tactics for SlideShare Content Domination
7 Badass Tactics for SlideShare Content Domination
 
7 Badass Tactics for Slideshare Content Domination
7 Badass Tactics for Slideshare Content Domination 7 Badass Tactics for Slideshare Content Domination
7 Badass Tactics for Slideshare Content Domination
 
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
 
Big Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInBig Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedIn
 
Big data arch_analytics
Big data arch_analyticsBig data arch_analytics
Big data arch_analytics
 
How Linkedin uses Automic for Big Data Processes
How Linkedin uses Automic for Big Data ProcessesHow Linkedin uses Automic for Big Data Processes
How Linkedin uses Automic for Big Data Processes
 
Emil Eifrém - The Data Platform for Today’s Intelligent Applications
Emil Eifrém - The Data Platform for Today’s Intelligent ApplicationsEmil Eifrém - The Data Platform for Today’s Intelligent Applications
Emil Eifrém - The Data Platform for Today’s Intelligent Applications
 
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
 
Solving Biz Problems with SugarExchange: Session 9: How to Run Contributor Ca...
Solving Biz Problems with SugarExchange: Session 9: How to Run Contributor Ca...Solving Biz Problems with SugarExchange: Session 9: How to Run Contributor Ca...
Solving Biz Problems with SugarExchange: Session 9: How to Run Contributor Ca...
 
SugarCon 2012 Presentation
SugarCon 2012 PresentationSugarCon 2012 Presentation
SugarCon 2012 Presentation
 
Keynote Presentation at GraphTalk Oslo 2023
Keynote Presentation at GraphTalk Oslo 2023Keynote Presentation at GraphTalk Oslo 2023
Keynote Presentation at GraphTalk Oslo 2023
 
Government GraphSummit: Keynote - Graphs in Government
Government GraphSummit: Keynote - Graphs in GovernmentGovernment GraphSummit: Keynote - Graphs in Government
Government GraphSummit: Keynote - Graphs in Government
 
The value of our data
The value of our dataThe value of our data
The value of our data
 
Tamm & kitt
Tamm & kittTamm & kitt
Tamm & kitt
 
The LCG Digital Transformation Maturity Model
The LCG Digital Transformation Maturity ModelThe LCG Digital Transformation Maturity Model
The LCG Digital Transformation Maturity Model
 
Rethink business impact of technology
Rethink business impact of technologyRethink business impact of technology
Rethink business impact of technology
 
Remarkable Content for AMA New Orleans, Nov 2014
Remarkable Content for AMA New Orleans, Nov 2014Remarkable Content for AMA New Orleans, Nov 2014
Remarkable Content for AMA New Orleans, Nov 2014
 
Building the perfect profile on LinkedIn
Building the perfect profile on LinkedInBuilding the perfect profile on LinkedIn
Building the perfect profile on LinkedIn
 
Best practices For Creating Compelling Dashboards
Best practices For Creating Compelling DashboardsBest practices For Creating Compelling Dashboards
Best practices For Creating Compelling Dashboards
 

Mehr von Peter Skomoroch

Bridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder SupportBridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder SupportPeter Skomoroch
 
Managing Machines: The New AI Dev Stack
Managing Machines: The New AI Dev StackManaging Machines: The New AI Dev Stack
Managing Machines: The New AI Dev StackPeter Skomoroch
 
Product Management for AI
Product Management for AIProduct Management for AI
Product Management for AIPeter Skomoroch
 
O'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data ExhaustO'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data ExhaustPeter Skomoroch
 
LinkedIn Endorsements: Reputation, Virality, and Social Tagging
LinkedIn Endorsements: Reputation, Virality, and Social TaggingLinkedIn Endorsements: Reputation, Virality, and Social Tagging
LinkedIn Endorsements: Reputation, Virality, and Social TaggingPeter Skomoroch
 
Practical Problem Solving with Data - Onlab Data Conference, Tokyo
Practical Problem Solving with Data - Onlab Data Conference, TokyoPractical Problem Solving with Data - Onlab Data Conference, Tokyo
Practical Problem Solving with Data - Onlab Data Conference, TokyoPeter Skomoroch
 
Street Fighting Data Science
Street Fighting Data ScienceStreet Fighting Data Science
Street Fighting Data SciencePeter Skomoroch
 
Data Mashups -Data Science Summit
Data Mashups -Data Science SummitData Mashups -Data Science Summit
Data Mashups -Data Science SummitPeter Skomoroch
 
Geo Analytics Tutorial - Where 2.0 2011
Geo Analytics Tutorial - Where 2.0 2011Geo Analytics Tutorial - Where 2.0 2011
Geo Analytics Tutorial - Where 2.0 2011Peter Skomoroch
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopPeter Skomoroch
 
Prototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPeter Skomoroch
 

Mehr von Peter Skomoroch (12)

Bridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder SupportBridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder Support
 
Managing Machines: The New AI Dev Stack
Managing Machines: The New AI Dev StackManaging Machines: The New AI Dev Stack
Managing Machines: The New AI Dev Stack
 
Product Management for AI
Product Management for AIProduct Management for AI
Product Management for AI
 
O'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data ExhaustO'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data Exhaust
 
LinkedIn Endorsements: Reputation, Virality, and Social Tagging
LinkedIn Endorsements: Reputation, Virality, and Social TaggingLinkedIn Endorsements: Reputation, Virality, and Social Tagging
LinkedIn Endorsements: Reputation, Virality, and Social Tagging
 
Practical Problem Solving with Data - Onlab Data Conference, Tokyo
Practical Problem Solving with Data - Onlab Data Conference, TokyoPractical Problem Solving with Data - Onlab Data Conference, Tokyo
Practical Problem Solving with Data - Onlab Data Conference, Tokyo
 
Street Fighting Data Science
Street Fighting Data ScienceStreet Fighting Data Science
Street Fighting Data Science
 
Data Mashups -Data Science Summit
Data Mashups -Data Science SummitData Mashups -Data Science Summit
Data Mashups -Data Science Summit
 
Geo Analytics Tutorial - Where 2.0 2011
Geo Analytics Tutorial - Where 2.0 2011Geo Analytics Tutorial - Where 2.0 2011
Geo Analytics Tutorial - Where 2.0 2011
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With Hadoop
 
Prototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.org
 
Elasticwulf Pycon Talk
Elasticwulf Pycon TalkElasticwulf Pycon Talk
Elasticwulf Pycon Talk
 

Kürzlich hochgeladen

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Kürzlich hochgeladen (20)

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

SF Data Science: Developing Data Products

  • 1. Developing Data Products SF Data Science Meetup Pete Skomoroch @peteskomoroch September 19 2013 ©2012 LinkedIn Corporation. All Rights Reserved.
  • 2. Developing Data Products Examples, Techniques, & Lessons Learned
  • 3. Our Mission Connect the world’s professionals to make them more productive and successful. Our Vision Create economic opportunity for every professional in the world. Members First!
  • 4. LinkedIn is the leading professional network site Worldwide Workforce 3,300M+ 2 Worldwide Professionals 640M+ 2 LinkedIn Members 238M+ 1 ©2012 LinkedIn Corporation. All Rights Reserved. 4
  • 5. LinkedIn profiles represent our professional identity ©2012 LinkedIn Corporation. All Rights Reserved. 5 238MMembers 238M Member Profiles 1 2
  • 6. We have a lot of data. ©2012 LinkedIn Corporation. All Rights Reserved.
  • 7. We have a lot of data. And (like everyone else), we store it in Hadoop. ©2012 LinkedIn Corporation. All Rights Reserved.
  • 8. We have a lot of data. And (like everyone else), we store it in Hadoop. And people build awesome things with that data. ©2012 LinkedIn Corporation. All Rights Reserved.
  • 9. What do we mean by data products?
  • 10. Building products from data at LinkedIn A few examples:  People You May Know  Skills and Endorsements  Year in Review  Network Updates Digest  InMaps  Who’s viewed my profile  Collaborative Filtering  Groups You May Like  and more… ©2012 LinkedIn Corporation. All Rights Reserved.
  • 11. Collaborative Filtering: LinkedIn Skill Pages ©2012 LinkedIn Corporation. All Rights Reserved.
  • 12. Classification: giving structure to unstructured data ©2012 LinkedIn Corporation. All Rights Reserved. Extract
  • 13. Clustering & Disambiguation ©2012 LinkedIn Corporation. All Rights Reserved.
  • 14. De-duplication and Normalization ©2012 LinkedIn Corporation. All Rights Reserved.
  • 15. ©2012 LinkedIn Corporation. All Rights Reserved. 15 Network Algorithms: Relevance & Ranking
  • 16. Prediction: Personalized Skill Recommendations ©2012 LinkedIn Corporation. All Rights Reserved.
  • 17.
  • 18.
  • 19. Skill Endorsements: Over 2 Billion and Growing ©2012 LinkedIn Corporation. All Rights Reserved.
  • 20. ©2012 LinkedIn Corporation. All Rights Reserved. 20 Social Proof and the Skill Endorsement Graph
  • 21. The Economic Graph: Skills, Jobs, People, Locations… ©2012 LinkedIn Corporation. All Rights Reserved. 21 Location
  • 22. Lessons learned developing data products
  • 23. Collect the right data at the right time
  • 24. Large amounts of data can reveal new patterns ©2012 LinkedIn Corporation. All Rights Reserved. 24 ProbabilityofJobTitle Time since graduation
  • 25. Be wary of “black-box” approaches ©2012 LinkedIn Corporation. All Rights Reserved. 25
  • 26. Look at your data ©2012 LinkedIn Corporation. All Rights Reserved. 26
  • 27. Aggregate statistics can be misleading ©2012 LinkedIn Corporation. All Rights Reserved. 27 0 2 4 6 8 10 12 1 2 3 4 5 6 7 8 9 10
  • 28. Build a viewer app, “micro-listen” ©2012 LinkedIn Corporation. All Rights Reserved. 28
  • 29. Algorithmic intuition: include data geeks in design ©2012 LinkedIn Corporation. All Rights Reserved. 29
  • 30. OODA: Think like a jet fighter ©2012 LinkedIn Corporation. All Rights Reserved. 30
  • 31. OODA: Observe, Orient, Decide, Act ©2012 LinkedIn Corporation. All Rights Reserved. 31
  • 32. OODA: The speed you can move determines victory ©2012 LinkedIn Corporation. All Rights Reserved. 32
  • 33. Red teaming: what can go wrong likely will ©2012 LinkedIn Corporation. All Rights Reserved. 33
  • 34. Error data is valuable, analyze it and adapt ©2012 LinkedIn Corporation. All Rights Reserved. 34
  • 35. Conclusion: tips for developing data products  Collect the right data at the right time  Large amounts of data can reveal new patterns  Be wary of “black box” approaches  Look at your raw data  Aggregate statistics can be misleading  Build and use viewer apps  Include data geeks in design process  OODA: Think like a jet fighter  Red-teaming: anticipate edge cases  Find opportunity in your error data ©2012 LinkedIn Corporation. All Rights Reserved.

Hinweis der Redaktion

  1. Mission: For us, fundamentally changing the way the world works begins with our mission statement: To connect the world’s professionals to make them more productive and successful. This means not only helping people to find their dream jobs, but also enabling them to be great at the jobs they’re already in. Vision: But, we’re just getting started. By our measure,there are more than 640 million professionals in the world. And roughly 3.3 billion people in the global workforce. Ultimately, our vision is to create economic opportunity for every professional, which we believe is an especially crucial objective in light of current macroeconomic trends.Our most important core value is that members come first.