SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
Big Data EcoSystem @ LinkedIn
October 20, 2012
LinkedIn Confidential ©2013 All Rights Reserved
LinkedIn Confidential ©2013 All Rights Reserved
Sunil Shirguppi
Head of Data Services- International
LinkedIn Corporation
http://www.linkedin.com/in/sunilshirguppi
Outline
LinkedIn Overview
Data Science
Big Data Eco-System
Learnings
LinkedIn Confidential ©2013 All Rights Reserved 3
Our Mission
Connect the world’s professionals
to make them more productive and successful
LinkedIn Confidential ©2013 All Rights Reserved 4
We are the professional profile of record
Googled yourself lately?
Don’t feel bad, we all do it.
Executives from all
Companies are
LinkedIn members
The LinkedIn Opportunity
LinkedIn Confidential ©2013 All Rights Reserved 7
Fundamentally transforming the way the world worksFundamentally transforming the way the world works
Connect talent with opportunity at massive scale
+
The World’s Largest Professional Network
LinkedIn Confidential ©2013 All Rights Reserved 8
*as of Nov 4, 2011
**as of June 30, 2011
2
4
8
17
32
55
90
2004 2005 2006 2007 2008 2009 2010
LinkedIn Members (Millions)
175M+*
82%
Fortune 100 Companies
use LinkedIn to hire
Company Pages
>2M
**
New Members joining
~2/sec
Professional
searches in 2011
~4.2B
Multiple revenue channels
 Premium Subscriptions
 Self Serve Ads
 Hiring Solutions
 Marketing Solutions
Let’s talk Data…
Business is recognizing the importance of analytics
Data Scientist = Curiosity + Intuition + Data
gathering + Standardization + Statistics + Modeling
+ Visualization + Communication
What makes a Data Scientist?
Big Data at LinkedIn
LinkedIn Confidential ©2013 All Rights Reserved 13
* Chart from Philip Russom- Research Director: TDWI
What do we do with Data?
 Data Standardization
 Build innovative data products to help professionals
 Draw insights
 Drive the business
Before we can do that...
There are a few challenges that we have to overcome
• Scale
• Standardization
• Infrastructure
Few Data-Driven Products
LinkedIn Confidential ©2013 All Rights Reserved 15
Pandora Search for People
Events You
May Be
Interested In
Groups browse maps
How do we do it?
LinkedIn Sample Data Stack
Crowdsourcing
Big Data at LinkedIn
LinkedIn Confidential ©2013 All Rights Reserved 19
Users
Online Data
Store
Near-Line
Data Store
Application Offline Data
Store
Web
Logs
High-level data environment
Challenges so complex which
off-the-shelf or a few
technologies can’t address
Built our own combination of
toolsets/ technologies to
meet specific requirements
LinkedIn Data Stack – Online
LinkedIn Confidential ©2013 All Rights Reserved 20
Users
Online Data
Store
Near-Line
Data Store
Application Offline Data
Store
Web
Logs
Systems Capabilities
• Rich structures (e.g. indexes)
• Change capture capability
LinkedIn Data Stack – Nearline
LinkedIn Confidential ©2013 All Rights Reserved 21
Users
Online Data
Store
Near-Line
Data Store
Application Offline Data
Store
Web
Logs
Systems Capabilities
• Key value accessVoldemort
• Search platform
• Distributed Graph engine
Zoie Bobo Sensei
D-Graph
LinkedIn Data Stack – Pipeline
LinkedIn Confidential ©2013 All Rights Reserved 22
Users
Online Data
Store
Near-Line
Data Store
Application Offline Data
Store
Web
Logs
Systems Capabilities
• Messaging for site events,
monitoring
• Change data capture streams
LinkedIn Data Stack – Offline
LinkedIn Confidential ©2013 All Rights Reserved 23
Users
Online Data
Store
Near-Line
Data Store
Application Offline Data
Store
Web
Logs
Systems Capabilities
• Machine learning, ranking,
relevance
• Warehouse and analytics
LinkedIn with Hadoop, Aster, and Teradata
LinkedIn Confidential ©2013 All Rights Reserved 24
Integrated Data
Warehouse
• Exec Dashboards
• Adhoc/OLAP
• Complex SQL
• SQL
Data transformation
& batch processing
• Image processing
• Search indexes
• Graph (PYMK)
• MapReduce
Analytic Platform for data
discovery
• nPath Pattern/Path
• Clickstream analysis
• A/B site testing
• Data Sciences discovery
• SQL-MapReduce
Aster/Teradata
Bi-Directional Connector
Aster/Teradata
Hadoop Connectors
Batch data transformations for
engineering groups using HDFS +
MapReduce
Batch data transformations for
engineering groups using HDFS +
MapReduce
Interactive MapReduce
analytics for the enterprise using
MapReduce Analytics &
SQL-MapReduce
Interactive MapReduce
analytics for the enterprise using
MapReduce Analytics &
SQL-MapReduce
Integration with structured data,
operational intelligence, scalable
distribution of analytics
Integration with structured data,
operational intelligence, scalable
distribution of analytics
It’s a global economy
Country connectedness on LinkedIn
Data deep dives
Job migration after financial collapse
How Often do people change jobs?
Visualization is important
If your name is Chip, you are likely in sales!
31
Industry Growth
Buzzwords
What next?
• Self service analytics
• Metadata framework
• Integrate reporting solutions
• Go Mobile!
• Scalability and Data Quality
Challenges
• Data volumes and availability
– Billion+ rows every day
– Users in Global locations need data
• Multiple platforms
– Agile development
– Data Integration
 Data Quality
– User input data
– Data standardization
Key Learnings
 Self Service
– Making data accessible to key stakeholders in a timely
manner creates tremendous value.
– Viz is more important than we think
• Measuring your future investments
– Performance is not the only measure
– Company fundamentals matter
• As an Data team, be in control of your destiny
– Identify what to measure and lead by metrics
– Become the Think-tank
Web 3.0 – It’s all about data!!
LinkedIn Confidential ©2013 All Rights Reserved 36
ULTIMATELY…
It is all about the people!
LinkedIn Confidential ©2013 All Rights Reserved 39
Thank You!

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Cl...
Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Cl...Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Cl...
Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Cl...
 
Dell hans timmerman v1.1
Dell hans timmerman v1.1Dell hans timmerman v1.1
Dell hans timmerman v1.1
 
GraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business GraphGraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business Graph
 
Big Data on The Rise: Views of Emerging Trends from real life end-users by Ro...
Big Data on The Rise: Views of Emerging Trends from real life end-users by Ro...Big Data on The Rise: Views of Emerging Trends from real life end-users by Ro...
Big Data on The Rise: Views of Emerging Trends from real life end-users by Ro...
 
Unlock Data-driven Insights in Databricks Using Location Intelligence
Unlock Data-driven Insights in Databricks Using Location IntelligenceUnlock Data-driven Insights in Databricks Using Location Intelligence
Unlock Data-driven Insights in Databricks Using Location Intelligence
 
Share point saturday access services 2015 final 2
Share point saturday access services 2015 final 2Share point saturday access services 2015 final 2
Share point saturday access services 2015 final 2
 
Opportunities in Big Data by Arihant Patni
Opportunities in Big Data by Arihant PatniOpportunities in Big Data by Arihant Patni
Opportunities in Big Data by Arihant Patni
 
Cisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt onlyCisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt only
 
Top 10 Big Data Technologies | Edureka
Top 10 Big Data Technologies | EdurekaTop 10 Big Data Technologies | Edureka
Top 10 Big Data Technologies | Edureka
 
Logitech Accelerates Cloud Analytics Using Data Virtualization by Avinash Des...
Logitech Accelerates Cloud Analytics Using Data Virtualization by Avinash Des...Logitech Accelerates Cloud Analytics Using Data Virtualization by Avinash Des...
Logitech Accelerates Cloud Analytics Using Data Virtualization by Avinash Des...
 
Building a Collaborative Data Architecture
Building a Collaborative Data ArchitectureBuilding a Collaborative Data Architecture
Building a Collaborative Data Architecture
 
Slides: Accelerate and Assure the Adoption of Cloud Data Platforms Using Inte...
Slides: Accelerate and Assure the Adoption of Cloud Data Platforms Using Inte...Slides: Accelerate and Assure the Adoption of Cloud Data Platforms Using Inte...
Slides: Accelerate and Assure the Adoption of Cloud Data Platforms Using Inte...
 
The Connected Data Imperative: An Introduction to Neo4j
The Connected Data Imperative: An Introduction to Neo4jThe Connected Data Imperative: An Introduction to Neo4j
The Connected Data Imperative: An Introduction to Neo4j
 
The Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud WorldThe Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud World
 
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
Rocking the World of Big Data at Centrica
Rocking the World of Big Data at CentricaRocking the World of Big Data at Centrica
Rocking the World of Big Data at Centrica
 
Knowledge Architecture: Graphing Your Knowledge
Knowledge Architecture: Graphing Your KnowledgeKnowledge Architecture: Graphing Your Knowledge
Knowledge Architecture: Graphing Your Knowledge
 
How Data is Driving AI Innovation
How Data is Driving AI InnovationHow Data is Driving AI Innovation
How Data is Driving AI Innovation
 
Strata 2015 - Architecting For The Cloud
Strata 2015 - Architecting For The CloudStrata 2015 - Architecting For The Cloud
Strata 2015 - Architecting For The Cloud
 

Ähnlich wie Big Data Ecosystem @ LinkedIn

How to Place Data at the Center of Digital Transformation in BFSI
How to Place Data at the Center of Digital Transformation in BFSIHow to Place Data at the Center of Digital Transformation in BFSI
How to Place Data at the Center of Digital Transformation in BFSI
Denodo
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Matt Stubbs
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
ConnectaDigital
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Denodo
 
Modernizing Integration with Data Virtualization
Modernizing Integration with Data VirtualizationModernizing Integration with Data Virtualization
Modernizing Integration with Data Virtualization
Denodo
 

Ähnlich wie Big Data Ecosystem @ LinkedIn (20)

Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
How Linkedin uses Automic for Big Data Processes
How Linkedin uses Automic for Big Data ProcessesHow Linkedin uses Automic for Big Data Processes
How Linkedin uses Automic for Big Data Processes
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data World
 
Big Data with Data Virtualization (session 3 from Packed Lunch Webinar Series)
Big Data with Data Virtualization (session 3 from Packed Lunch Webinar Series)Big Data with Data Virtualization (session 3 from Packed Lunch Webinar Series)
Big Data with Data Virtualization (session 3 from Packed Lunch Webinar Series)
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
 
Big Data Analytics with Microsoft
Big Data Analytics with MicrosoftBig Data Analytics with Microsoft
Big Data Analytics with Microsoft
 
Trends in Enterprise Advanced Analytics
Trends in Enterprise Advanced AnalyticsTrends in Enterprise Advanced Analytics
Trends in Enterprise Advanced Analytics
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
 
How to Place Data at the Center of Digital Transformation in BFSI
How to Place Data at the Center of Digital Transformation in BFSIHow to Place Data at the Center of Digital Transformation in BFSI
How to Place Data at the Center of Digital Transformation in BFSI
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
 
Group 1 LinkedIn
Group 1 LinkedInGroup 1 LinkedIn
Group 1 LinkedIn
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
Cloud Con 2015 - Integration & Web APIs
Cloud Con 2015 - Integration & Web APIsCloud Con 2015 - Integration & Web APIs
Cloud Con 2015 - Integration & Web APIs
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
 
Modernizing Integration with Data Virtualization
Modernizing Integration with Data VirtualizationModernizing Integration with Data Virtualization
Modernizing Integration with Data Virtualization
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Big Data Ecosystem @ LinkedIn

  • 1. Big Data EcoSystem @ LinkedIn October 20, 2012 LinkedIn Confidential ©2013 All Rights Reserved
  • 2. LinkedIn Confidential ©2013 All Rights Reserved Sunil Shirguppi Head of Data Services- International LinkedIn Corporation http://www.linkedin.com/in/sunilshirguppi
  • 3. Outline LinkedIn Overview Data Science Big Data Eco-System Learnings LinkedIn Confidential ©2013 All Rights Reserved 3
  • 4. Our Mission Connect the world’s professionals to make them more productive and successful LinkedIn Confidential ©2013 All Rights Reserved 4
  • 5. We are the professional profile of record Googled yourself lately? Don’t feel bad, we all do it.
  • 6. Executives from all Companies are LinkedIn members
  • 7. The LinkedIn Opportunity LinkedIn Confidential ©2013 All Rights Reserved 7 Fundamentally transforming the way the world worksFundamentally transforming the way the world works Connect talent with opportunity at massive scale +
  • 8. The World’s Largest Professional Network LinkedIn Confidential ©2013 All Rights Reserved 8 *as of Nov 4, 2011 **as of June 30, 2011 2 4 8 17 32 55 90 2004 2005 2006 2007 2008 2009 2010 LinkedIn Members (Millions) 175M+* 82% Fortune 100 Companies use LinkedIn to hire Company Pages >2M ** New Members joining ~2/sec Professional searches in 2011 ~4.2B
  • 9. Multiple revenue channels  Premium Subscriptions  Self Serve Ads  Hiring Solutions  Marketing Solutions
  • 11. Business is recognizing the importance of analytics
  • 12. Data Scientist = Curiosity + Intuition + Data gathering + Standardization + Statistics + Modeling + Visualization + Communication What makes a Data Scientist?
  • 13. Big Data at LinkedIn LinkedIn Confidential ©2013 All Rights Reserved 13 * Chart from Philip Russom- Research Director: TDWI
  • 14. What do we do with Data?  Data Standardization  Build innovative data products to help professionals  Draw insights  Drive the business Before we can do that... There are a few challenges that we have to overcome • Scale • Standardization • Infrastructure
  • 15. Few Data-Driven Products LinkedIn Confidential ©2013 All Rights Reserved 15 Pandora Search for People Events You May Be Interested In Groups browse maps
  • 16. How do we do it?
  • 17.
  • 18. LinkedIn Sample Data Stack Crowdsourcing
  • 19. Big Data at LinkedIn LinkedIn Confidential ©2013 All Rights Reserved 19 Users Online Data Store Near-Line Data Store Application Offline Data Store Web Logs High-level data environment Challenges so complex which off-the-shelf or a few technologies can’t address Built our own combination of toolsets/ technologies to meet specific requirements
  • 20. LinkedIn Data Stack – Online LinkedIn Confidential ©2013 All Rights Reserved 20 Users Online Data Store Near-Line Data Store Application Offline Data Store Web Logs Systems Capabilities • Rich structures (e.g. indexes) • Change capture capability
  • 21. LinkedIn Data Stack – Nearline LinkedIn Confidential ©2013 All Rights Reserved 21 Users Online Data Store Near-Line Data Store Application Offline Data Store Web Logs Systems Capabilities • Key value accessVoldemort • Search platform • Distributed Graph engine Zoie Bobo Sensei D-Graph
  • 22. LinkedIn Data Stack – Pipeline LinkedIn Confidential ©2013 All Rights Reserved 22 Users Online Data Store Near-Line Data Store Application Offline Data Store Web Logs Systems Capabilities • Messaging for site events, monitoring • Change data capture streams
  • 23. LinkedIn Data Stack – Offline LinkedIn Confidential ©2013 All Rights Reserved 23 Users Online Data Store Near-Line Data Store Application Offline Data Store Web Logs Systems Capabilities • Machine learning, ranking, relevance • Warehouse and analytics
  • 24. LinkedIn with Hadoop, Aster, and Teradata LinkedIn Confidential ©2013 All Rights Reserved 24 Integrated Data Warehouse • Exec Dashboards • Adhoc/OLAP • Complex SQL • SQL Data transformation & batch processing • Image processing • Search indexes • Graph (PYMK) • MapReduce Analytic Platform for data discovery • nPath Pattern/Path • Clickstream analysis • A/B site testing • Data Sciences discovery • SQL-MapReduce Aster/Teradata Bi-Directional Connector Aster/Teradata Hadoop Connectors Batch data transformations for engineering groups using HDFS + MapReduce Batch data transformations for engineering groups using HDFS + MapReduce Interactive MapReduce analytics for the enterprise using MapReduce Analytics & SQL-MapReduce Interactive MapReduce analytics for the enterprise using MapReduce Analytics & SQL-MapReduce Integration with structured data, operational intelligence, scalable distribution of analytics Integration with structured data, operational intelligence, scalable distribution of analytics
  • 25.
  • 26. It’s a global economy Country connectedness on LinkedIn
  • 27. Data deep dives Job migration after financial collapse
  • 28. How Often do people change jobs?
  • 30. If your name is Chip, you are likely in sales!
  • 33. What next? • Self service analytics • Metadata framework • Integrate reporting solutions • Go Mobile! • Scalability and Data Quality
  • 34. Challenges • Data volumes and availability – Billion+ rows every day – Users in Global locations need data • Multiple platforms – Agile development – Data Integration  Data Quality – User input data – Data standardization
  • 35. Key Learnings  Self Service – Making data accessible to key stakeholders in a timely manner creates tremendous value. – Viz is more important than we think • Measuring your future investments – Performance is not the only measure – Company fundamentals matter • As an Data team, be in control of your destiny – Identify what to measure and lead by metrics – Become the Think-tank
  • 36. Web 3.0 – It’s all about data!! LinkedIn Confidential ©2013 All Rights Reserved 36
  • 38. It is all about the people!
  • 39. LinkedIn Confidential ©2013 All Rights Reserved 39 Thank You!