SlideShare a Scribd company logo
1 of 52
How LinkedIn leveraged its data to become the world's
largest professional network
About me
©2013 LinkedIn Corporation. All Rights Reserved. 2
Vitaly Gordon
©2013 LinkedIn Corporation. All Rights Reserved.
Agenda
1 What is Big Data?
2 Big Data Applications
3 LinkedIn’s Big Data Solutions
4 Finding Experts
5 Big Data Recipe
6 Summary
©2013 LinkedIn Corporation. All Rights Reserved.
1 What is Big Data?
2 Big Data Applications
3 LinkedIn’s Big Data Solutions
4 Finding Experts
5 Big Data Recipe
6 Summary
Data sets that are too large and complex
to manipulate or interrogate with standard
methods or tools.
Oxford Dictionary
©2013 LinkedIn Corporation. All Rights Reserved.
Data sets that are too large and complex
to manipulate or interrogate with standard
methods or tools.
Oxford Dictionary
©2013 LinkedIn Corporation. All Rights Reserved.
Big Data Growth
©2013 LinkedIn Corporation. All Rights Reserved. 7
1E+00
1E+01
1E+02
1E+03
1E+04
1E+05
1E+06
1E+07
1E+08
1E+09
Storage Growth Data Growth
©2013 LinkedIn Corporation. All Rights Reserved.
2 Big Data Applications
3 LinkedIn’s Big Data Solutions
4 Finding Experts
5 Big Data Recipe
6 Summary
1 What is Big Data?
©2013 LinkedIn Corporation. All Rights Reserved. 9
©2013 LinkedIn Corporation. All Rights Reserved. 10
increase in sales
©2013 LinkedIn Corporation. All Rights Reserved. 11
©2013 LinkedIn Corporation. All Rights Reserved. 12
of watched content
©2013 LinkedIn Corporation. All Rights Reserved. 13
©2013 LinkedIn Corporation. All Rights Reserved. 14
40M users in 18 months
Big Data is more about Business
than Data
©2013 LinkedIn Corporation. All Rights Reserved.
3 LinkedIn’s Big Data Solutions
4 Finding Experts
5 Big Data Recipe
6 Summary
1 What is Big Data?
2 Big Data Applications
©2013 LinkedIn Corporation. All Rights Reserved. 17
LinkedIn Revenue
Quarterly Revenue
------------------200 ----------------------------------2010-------------------------------2011----------------
Hiring Solutions Marketing Solutions Premium Subscriptions
($ millions)
-----------------2012-------------------2013---
©2013 LinkedIn Corporation. All Rights Reserved. 18
23 28 30
39 45
55 62
82
94
121
139
168
188
228
252
304
325
0
50
100
150
200
250
300
350
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1
©2013 LinkedIn Corporation. All Rights Reserved. 19
Premium Subscriptions
©2013 LinkedIn Corporation. All Rights Reserved. 20
Marketing Solutions
©2013 LinkedIn Corporation. All Rights Reserved. 21
Talent Solutions
©2013 LinkedIn Corporation. All Rights Reserved. 22
Connecting Talent With Opportunity
©2013 LinkedIn Corporation. All Rights Reserved.
Jobs You May Be Interested In (JYMBII) – Case Study
©2013 LinkedIn Corporation. All Rights Reserved. 24
Software Engineer at
Data Scientist at
Product Manager at
Jobs You May Be Interested In – Case Study
©2013 LinkedIn Corporation. All Rights Reserved. 25
Design
JYMBII – Building The Product
©2013 LinkedIn Corporation. All Rights Reserved. 26
Algorithms
Design
Design Algorithms Framework
Design
©2013 LinkedIn Corporation. All Rights Reserved. 27
Design
©2013 LinkedIn Corporation. All Rights Reserved. 28
Design
©2013 LinkedIn Corporation. All Rights Reserved. 29
1,000X more users
Start simple
Grow with success
©2013 LinkedIn Corporation. All Rights Reserved.
Algorithms
©2013 LinkedIn Corporation. All Rights Reserved. 31
`
Algorithms
©2013 LinkedIn Corporation. All Rights Reserved. 32
Algorithms
©2013 LinkedIn Corporation. All Rights Reserved. 33
50% better results
Start simple
Grow with success
©2013 LinkedIn Corporation. All Rights Reserved.
Technology
©2013 LinkedIn Corporation. All Rights Reserved. 35
Some people, when confronted with a big
data problem, think, I'll use Hadoop.
Now they have a big data problem and a
big Hadoop cluster.
Dmitry Ryaboy, Twitter Engineering Manager
Technology
©2013 LinkedIn Corporation. All Rights Reserved. 36
Technology Advancement
©2013 LinkedIn Corporation. All Rights Reserved. 37
Technology Advancement
©2013 LinkedIn Corporation. All Rights Reserved. 38
50X faster
Kafka
Start simple, grow with success
©2013 LinkedIn Corporation. All Rights Reserved.
4 Finding Experts
5 Big Data Recipe
6 Summary
1 What is Big Data?
2 Big Data Applications
3 LinkedIn’s Big Data Solutions
Finding Data Experts
©2013 LinkedIn Corporation. All Rights Reserved. 41
Increase in demand for big data experts
X
Finding Data Experts
©2013 LinkedIn Corporation. All Rights Reserved. 42
Are new analytics experts
33
Finding Data Experts
©2013 LinkedIn Corporation. All Rights Reserved. 43
Be challenged at LinkedIn
We're looking for superb analytical minds
of all levels to expand our small team that
will build some of the most innovative
products at LinkedIn.
No specific technical skills are required
(we'll help you learn SQL, Python, and R).
You should be extremely intelligent, have a
quantitative background, and be able to
learn quickly and work independently.
This is the perfect job for someone who's
really smart, driven, and extremely skilled
at creatively solving problems. You'll learn
statistics, data mining, programming, and
product design, but you've gotta start with
what we can't teach—intellectual
sharpness and creativity.
LinkedIn Experts
©2013 LinkedIn Corporation. All Rights Reserved. 44
LinkedIn Experts
©2013 LinkedIn Corporation. All Rights Reserved. 45
Don't wait for a big data expert to
knock on your door - create your own
©2013 LinkedIn Corporation. All Rights Reserved.
5 Big Data Recipe
6 Summary
1 What is Big Data?
2 Big Data Applications
3 LinkedIn’s Big Data Solutions
4 Finding Experts
©2013 LinkedIn Corporation. All Rights Reserved. 48
Big Data Recipe
©2013 LinkedIn Corporation. All Rights Reserved. 49
Big Data Recipe
INGREDIENTS
1. Important business metric
2. Correlating factors
3. Causing factors
4. Product to affect the behavior
METHOD OF PREPARATION
1. Build a simple prototype
2. Measure the effect
3. Improve logic and scale
4. Measure the effect
5. Improve logic and scale
6. Measure the effect
©2013 LinkedIn Corporation. All Rights Reserved.
6 Summary
1 What is Big Data?
2 Big Data Applications
3 LinkedIn’s Big Data Solutions
4 Finding Experts
5 Big Data Recipe
©2013 LinkedIn Corporation. All Rights Reserved. 51
©2013 LinkedIn Corporation. All Rights Reserved. 52
감사합니다

More Related Content

Similar to Big Data World 2013 - How LinkedIn leveraged its data to become the world's largest professional network

7 Badass Tactics for SlideShare Content Domination
7 Badass Tactics for SlideShare Content Domination7 Badass Tactics for SlideShare Content Domination
7 Badass Tactics for SlideShare Content Domination
LinkedIn
 
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
Social Fresh Conference
 
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Jason Miller
 
Big data arch_analytics
Big data arch_analyticsBig data arch_analytics
Big data arch_analytics
Srinu Adira
 
Blackcoffer Business development
Blackcoffer Business  developmentBlackcoffer Business  development
Blackcoffer Business development
Harshita Singh
 
Blackcoffer Business Development
Blackcoffer Business Development Blackcoffer Business Development
Blackcoffer Business Development
Harshita Singh
 
A Super Solution Integrator Drives Business Outcomes by Orchestrating Technology
A Super Solution Integrator Drives Business Outcomes by Orchestrating TechnologyA Super Solution Integrator Drives Business Outcomes by Orchestrating Technology
A Super Solution Integrator Drives Business Outcomes by Orchestrating Technology
Insight
 

Similar to Big Data World 2013 - How LinkedIn leveraged its data to become the world's largest professional network (20)

Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips. Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
 
SF Data Science: Developing Data Products
SF Data Science: Developing Data ProductsSF Data Science: Developing Data Products
SF Data Science: Developing Data Products
 
7 Badass Tactics for SlideShare Content Domination
7 Badass Tactics for SlideShare Content Domination7 Badass Tactics for SlideShare Content Domination
7 Badass Tactics for SlideShare Content Domination
 
7 Badass Tactics for Slideshare Content Domination
7 Badass Tactics for Slideshare Content Domination 7 Badass Tactics for Slideshare Content Domination
7 Badass Tactics for Slideshare Content Domination
 
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
 
Emil Eifrém - The Data Platform for Today’s Intelligent Applications
Emil Eifrém - The Data Platform for Today’s Intelligent ApplicationsEmil Eifrém - The Data Platform for Today’s Intelligent Applications
Emil Eifrém - The Data Platform for Today’s Intelligent Applications
 
Big Data Brussels 2019 v.4.0 I 'How to Build Big Data Analytics Capabilities ...
Big Data Brussels 2019 v.4.0 I 'How to Build Big Data Analytics Capabilities ...Big Data Brussels 2019 v.4.0 I 'How to Build Big Data Analytics Capabilities ...
Big Data Brussels 2019 v.4.0 I 'How to Build Big Data Analytics Capabilities ...
 
How Linkedin uses Automic for Big Data Processes
How Linkedin uses Automic for Big Data ProcessesHow Linkedin uses Automic for Big Data Processes
How Linkedin uses Automic for Big Data Processes
 
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
 
Big Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInBig Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedIn
 
Big data arch_analytics
Big data arch_analyticsBig data arch_analytics
Big data arch_analytics
 
Blackcoffer Business development
Blackcoffer Business  developmentBlackcoffer Business  development
Blackcoffer Business development
 
Blackcoffer Business Development
Blackcoffer Business Development Blackcoffer Business Development
Blackcoffer Business Development
 
Data Infused Product Design and Insights at LinkedIn
Data Infused Product Design and Insights at LinkedInData Infused Product Design and Insights at LinkedIn
Data Infused Product Design and Insights at LinkedIn
 
Are You Underestimating the Value Within Your Data? A conversation about grap...
Are You Underestimating the Value Within Your Data? A conversation about grap...Are You Underestimating the Value Within Your Data? A conversation about grap...
Are You Underestimating the Value Within Your Data? A conversation about grap...
 
BIg data dan data mining
BIg data dan data miningBIg data dan data mining
BIg data dan data mining
 
Integra Sources Presentation
Integra Sources PresentationIntegra Sources Presentation
Integra Sources Presentation
 
The value of our data
The value of our dataThe value of our data
The value of our data
 
A Super Solution Integrator Drives Business Outcomes by Orchestrating Technology
A Super Solution Integrator Drives Business Outcomes by Orchestrating TechnologyA Super Solution Integrator Drives Business Outcomes by Orchestrating Technology
A Super Solution Integrator Drives Business Outcomes by Orchestrating Technology
 
Big Data : From HindSight to Insight to Foresight
Big Data : From HindSight to Insight to ForesightBig Data : From HindSight to Insight to Foresight
Big Data : From HindSight to Insight to Foresight
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Big Data World 2013 - How LinkedIn leveraged its data to become the world's largest professional network

Editor's Notes

  1. LinkedIn – I am currently a data scientist at LinkedIn, one of the world's most advanced big data companies.LivePerson – I have previously worked at LivePerson where I was the first person hired to build their big data solution, so I have experienced both the very beginning of big data solutions and the cutting edge.I will share with you the lessons I've learned while working on big data from both ends of the spectrumI also have a business degree from the Israeli Institute of Technology and a computer science degree from Ben-Gurion university
  2. This is what I am going to talk about, I chose these subjects because they answer the most burning questions both when I was starting with big data and when I was perfecting my craft
  3. The term, Big Data, is used in many ways, so before I'll start talking about big data, I want to explain what big data is
  4. Yes, there is an entry in the Oxford English Dictionary for Big Data
  5. The main word here is standard. Before Big Data, standard methods and tools were enough to process the data we had and now it's not, but what happened?
  6. Data created opportunities, which in turn created demand for even more data and the amount of data in the world grew larger and larger
  7. So what are those big data opportunities I've mentioned? The best way to see is through examples.
  8. Amazon, the ecommerce giant analyzes data about its shoppers. It analyzes what products they are looking at, what products they are searching for and most importantly, what products they are buying.This analysis enables them to produce a product I am sure you have all seen ...
  9. Here we can see that if I look at the book "Big Data Analytics", Amazon provides me with other recommendations about similar books.-- Show increase in sales –So why did it increase sales so much? The logic here is simple, the more products customers see, the higher the chance they will buy something. Amazon wants to show us as many products as it can in order to get us to buy something.
  10. My second example is Netflix.Netflix is an American company that started as a DVD rental service and quickly became a streaming platform for movies and TV shows. It has about 30 million subscribers.At the end of each movie, Netflix asks the viewer to rate the movie he just watched. Netflix has billions of movie ratings from millions of users and it uses this data to create the following product.
  11. Using our rating history, Netflix calculates a unique "taste" for every one of its subscribers and uses this taste to recommend them movies. This product is so important to Netflix, that in 2006 Netflix offered a prize of million dollars to whoever can improve their algorithm by more than 10%.-- Show statisticsSo why is this recommendation engine is so important? The more users find movies they like on Netflix, the longer they will keep their subscription, earning money to Netflix.
  12. My third example is a small Israeli startup. Waze is a GPS mobile app that tracks where people are and at what speed are they travelling.
  13. Waze uses this data to compute traffic maps where they show which streets are have traffic jams and route you according to this data, providing much better traffic suggestions than apps that don't use traffic information.After gaining more than 50 million users for its app, Waze was acquired by Google for about 1.1 billion dollars.Side note: I understand there will be a talk later today by a Korean company that does something very similar.
  14. The above examples, and many more, lead me to the first lesson I've learned about big data
  15. These are great examples. But to dive even deeper to big data applications, let's look at the company I currently work for, LinkedIn.Since we said that Big Data is more about business than data, let me show you first what is LinkedIn's business.
  16. LinkedIn is the largest professional social network in the world. It has more than 225M members. Our largest markets today are North America and Europe, but Asia is growing very well too, with several countries having more than a million members on LinkedIn.
  17. Not only LinkedIn has a lot of members, it also makes significant revenue. Across it 3 bussiness lines, LinkedIn has made almost a billion dollars last year and about 325 million in the first quarter of 2013.
  18. These 3 product lines are Premium Subscriptions, Marketing Solutions and Talent Solutions.Let's dive more deeply into each one of them to understand them better
  19. The premium subscriptions business is for LinkedIn users that want to get extra features on LinkedIn. Those features might be better analytics about who viewed their profile and the ability to contact anyone on LinkedIn through In Mails, LinkedIn's personal messaging system.This product really separates LinkedIn from other social networks in the fact that some of the users of the network pay extra to use it.
  20. Marketing solutions is more similar to what you can find on other social networks. We offer companies the ability to market their products to our members. Since LinkedIn is a professional network with most members having a job or even a lucrative one. The target population is very appealing for marketers who want to market their products.
  21. Our third and largest in terms of revenue product line is the talent solution. Here companies like Sony, Walmart and Loreal pay for their recruiters to have additional functionality for their recruiting needs. This is almost like another product inside LinkedIn for our recruiter members. This product line bring about 57% of LinkedIn's revenue.
  22. LinkedIn's number 1 mission is connecting talent with opportunity. Both helping companies find new talent and helping our 225+ million members find new opportunities when they need themOne of the first big data applications at LinkedIn was to help members find a new job, and I will now dive deep into how it was done
  23. JYMBII is a big data product that matches members with job postings on LinkedIn. For example: here is me, and some of the jobs companies posted on LinkedIn. For every job, we create a score on how much this job is a good fit for the member. Here you can see that I am a good match for a data scientist position at Facebook, and not such a good match for a product manager at Yahoo.
  24. After creating scores for all the jobs in our database, we create a small widget on our homepage where every member can see his top matching jobs.
  25. I will walk you through the 3 pillars of every big data product – Design, Algorithms and Infrastructure/Framework.
  26. Let's start with design. In a consumer oriented company design is very important, because this is how users interact with your product. Also, in many cases, design is the hardest thing for a single small team to change because so many teams are involved.In most companies the big data team is separate from the team that works on the main product, so those of you who already started implementing big data solutions probably know how difficult it is to try to do some tests on the main product. Try to do anything you can to bypass other teams in your organization to test your big data solutions.When LinkedIn's Data Science team decided to build JYMBII, they wanted a very very simple way to test whether their product is working without making too many changes to the main site. This is how they did it. They started with email. Here you can see how the actual email looks today, where I got some recommendations for jobs I might be interested in.The reason why they chose email, is because it is a way to test your product on a small subset of users, without everyone who comes to your website being affected by it and also there is no need to make any changes to the main website.
  27. After the initial emails showed great success and that people are actually interested in it. Our team has built this very small widget that shows the top jobs you might be interested in. Again, it was done with minimum integration with the main website, by having this widget replace one of the ads we had on the site for a certain percentage of the users.
  28. After the great success of the widget, Jobs have now their own section at the LinkedIn website where users can search for jobs and more.Having the job section resulted in having 1000 times more users looking at the LinkedIn jobs than beforehandRemember, JYMBII did not start with its own website, but grew up to have it.
  29. My main message about how to design data products is to start simple and grow with success.
  30. Let's now talk about algorithms, or how does LinkedIn matches members with job postings.The first iteration of the algorithm was very simple. We look at the member's profile, we look at the job posting and we do keyword matching. Very similar to how recruiters screen candidate resumes for a potential match. In this example we can see that my profile is a pretty decent match for this job opportunity.There is no need for a natural language processing expert or a computer science doctor to implement this algorithm. It is pretty simple and worked pretty well for our first prototype.
  31. When the first protype of the email succeeded the team moved to imrove the algorithm a bit further, adding features like education and experience which are also very important for determining the candidate's fit to a position. These improvement, improved the recommendations even further, resulting in more people engaging with jobs on the LinkedIn website
  32. Finally, now that we have our job page on the website where users can search for jobs, save jobs and apply for jobs. We can use all of these signals to recommend users similar jobs to the ones the found themselves.All of these improvements resulted in a 50% more accurate job recommendations to our members.
  33. The message for algorithms is the same as it for design, don't try to implememnt something very difficult before you know your customers even want it. Start simple and grow with success.
  34. Here is a quote from a Twitter engineering manager that I like very much. What it says that most of the time, Hadoop doesn't solve a big data problem, it actually brings a set of new problems to deal with even before we know that what we are trying to build is worth building.
  35. The first JYMBII prototype was developed using a very simple technology. Oracle, some perl scripts in between in some shell scripts. The process involved someone copying files manually from one computer to another, running some scripts on that computer and then copying back the results. The process was so inefficient that it took 6 weeks to run.But 6 weeks is better than never.
  36. After the success of the initial product, LinkedIn has decided to make some infrastructure invetment in buying a parallel database from companies like GreenPlum and AsterData. This sped up the process to run now in a single week instead of 6.
  37. Eventually LinkedIn moved not only to Hadoop but also built it's own infrastucture with project like Kafka, Voldemort and Zoie. You can find more information about them on the linkedin open source page.Now we are generating new recommendations every day, which is 50 times better than having it every 6 weeks.You probably figured out the second lessong by now ...
  38. One of the most important questions that kept me busy for a long time as well is where you find big data expertsBefore I give you the answer, I would like to show you 2 graphs
  39. Here you can see that in the beginning of 2011 the demand for big data experts was 30 times higher than the year before. Now it is even higher. Everyone is looking for big data experts.
  40. Here is a graph from LinkedIn's own analytics team. Here you can see that 33% of the people who started a job as data scientist or analysts are new to this job.You can probably see where I am going with this. Most people who work in big data are new to big data.LinkedIn have realized it quickly and here is the proof ...
  41. Here is an actual LinkedIn job posting from 2008 when LinkedIn just started with big data.The key message is this ... No specific technical skills are requiredHere is an example of how LinkedIn have implemented this strategy on 2 of my colleagues.
  42. Joseph Adler came to LinkedIn from Netflix, where he did Operations Engineering. Now he is one of our top experts on big data and even written a very successful book about it.
  43. Jason is a new data scientist at linkedin. Prior to that he was radar signal processing expert. He is still just at the beginning of his career at LinkedIn, but so far he is doing very well and educating himself quickly,
  44. My third lesson is a bit hard to chew, but if you follow my previous 2, it becomes easier. Look for big data experts everywhere and at all times, but don't let it stop you from starting your projects.
  45. So how do you start a big data project? I would like to show you a very simple recipe you could follow
  46. As always, in order to make it more clear, I will use an example to guide us through the recipe.People You May Know is a LinkedIn Big Data product that traverses your profile and the entire LinkedIn graph to suggest people you should connect with.Let's see how can we use our recipe to create big data applications such as People You May Know.
  47. Important business metric – how often members visit the websiteCorrelating factors – How many new items they have on their news feed. But that is not the root of the cause, something else is affecting it.Causing factors – How many connections do the have.Product – Recommend new connections to users – People You May Know.Beware of the second-system effect, how many of you have been involved with projects where the first prototype was pretty succesful and the second one was much bigger and failed?