Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Building Data Products using Hadoop at Linkedin <ul><li>Mitul Tiwari </li></ul><ul><li>Search, Network, and Analytics (SNA...
Who am I?
<ul><li>What do I mean by Data Products? </li></ul>
People You May Know
Profile Stats: WVMP
Viewers of this profile also ...
Skills
InMaps
Data Products: Key Ideas <ul><li>Recommendations </li></ul><ul><ul><li>People You May Know, Viewers of this profile ... </...
Data Products: Challenges <ul><li>LinkedIn: 2nd largest social network </li></ul><ul><li>120 million members on LinkedIn <...
Outline <ul><li>What do I mean by Data Products?  </li></ul><ul><li>Systems and Tools we use </li></ul><ul><li>Let’s build...
Systems and Tools <ul><li>Kafka (LinkedIn) </li></ul><ul><li>Hadoop (Apache) </li></ul><ul><li>Azkaban (LinkedIn) </li></u...
Systems and Tools <ul><li>Kafka </li></ul><ul><ul><li>publish-subscribe messaging system </li></ul></ul><ul><ul><li>transf...
Systems and Tools <ul><li>Kafka </li></ul><ul><li>Hadoop </li></ul><ul><ul><li>Java MapReduce and Pig </li></ul></ul><ul><...
Systems and Tools <ul><li>Kafka </li></ul><ul><li>Hadoop </li></ul><ul><li>Azkaban </li></ul><ul><ul><li>Hadoop workflow m...
Systems and Tools <ul><li>Kafka </li></ul><ul><li>Hadoop </li></ul><ul><li>Azkaban </li></ul><ul><li>Voldemort </li></ul><...
Outline <ul><li>What do I mean by Data Products?  </li></ul><ul><li>Systems and Tools we use </li></ul><ul><li>Let’s build...
People You May Know Alice Bob Carol How do people know each other?
People You May Know Alice Bob Carol How do people know each other?
People You May Know Alice Bob Carol Triangle closing How do people know each other?
People You May Know Alice Bob Carol Triangle closing Prob(Bob knows Carol) ~ the # of common connections How do people kno...
Triangle Closing in Pig --  connections in (source_id, dest_id) format in both directions connections = LOAD `connections`...
Pig Overview <ul><li>Load: load data, specify format  </li></ul><ul><li>Store: store data, specify format </li></ul><ul><l...
Triangle Closing in Pig --  connections in (source_id, dest_id) format in both directions connections = LOAD `connections`...
Triangle Closing in Pig --  connections in (source_id, dest_id) format in both directions connections = LOAD `connections`...
Triangle Closing in Pig --  connections in (source_id, dest_id) format in both directions connections = LOAD `connections`...
Triangle Closing in Pig --  connections in (source_id, dest_id) format in both directions connections = LOAD `connections`...
Triangle Closing in Pig --  connections in (source_id, dest_id) format in both directions connections = LOAD `connections`...
Triangle Closing Example Alice Bob Carol <ul><li>(A,B),(B,A),(A,C),(C,A)  </li></ul><ul><li>(A,{B,C}),(B,{A}),(C,{A}) </li...
Triangle Closing Example Alice Bob Carol <ul><li>(A,B),(B,A),(A,C),(C,A)  </li></ul><ul><li>(A,{B,C}),(B,{A}),(C,{A}) </li...
Triangle Closing Example Alice Bob Carol <ul><li>(A,B),(B,A),(A,C),(C,A)  </li></ul><ul><li>(A,{B,C}),(B,{A}),(C,{A}) </li...
Triangle Closing Example Alice Bob Carol <ul><li>(A,B),(B,A),(A,C),(C,A)  </li></ul><ul><li>(A,{B,C}),(B,{A}),(C,{A}) </li...
Our Workflow triangle-closing
Our Workflow triangle-closing top-n
Our Workflow triangle-closing top-n push-to-prod
Outline <ul><li>What do I mean by Data Products?  </li></ul><ul><li>Systems and Tools we use </li></ul><ul><li>Let’s build...
Our Workflow triangle-closing top-n push-to-prod
Our Workflow triangle-closing top-n push-to-prod remove connections
Our Workflow triangle-closing top-n push-to-prod remove connections push-to-qa
PYMK Workflow
Workflow Requirements <ul><li>Dependency management </li></ul><ul><li>Regular Scheduling </li></ul><ul><li>Monitoring </li...
Workflow Requirements <ul><li>Dependency management </li></ul><ul><li>Regular Scheduling </li></ul><ul><li>Monitoring </li...
Sample Azkaban Job Spec <ul><li>type=pig </li></ul><ul><li>pig.script=top-n.pig </li></ul><ul><li>dependencies=remove-conn...
Azkaban Workflow
Azkaban Workflow
Azkaban Workflow
Our Workflow triangle-closing top-n push-to-prod remove connections
Our Workflow triangle-closing top-n push-to-prod remove connections
Outline <ul><li>What do I mean by Data Products?  </li></ul><ul><li>Systems and Tools we use </li></ul><ul><li>Let’s build...
Production Storage <ul><li>Requirements </li></ul><ul><ul><li>Large amount of data/Scalable </li></ul></ul><ul><ul><li>Qui...
Voldemort Storage <ul><ul><li>Large amount of data/Scalable </li></ul></ul><ul><ul><li>Quick lookup/low latency </li></ul>...
Data Cycle
Voldemort RO Store
Our Workflow triangle-closing top-n push-to-prod remove connections
Outline <ul><li>What do I mean by Data Products?  </li></ul><ul><li>Systems and Tools we use </li></ul><ul><li>Let’s build...
Data Quality <ul><li>Verification </li></ul><ul><li>QA store with viewer </li></ul><ul><li>Explain </li></ul><ul><li>Versi...
Outline <ul><li>What do I mean by Data Products?  </li></ul><ul><li>Systems and Tools we use </li></ul><ul><li>Let’s build...
Performance <ul><li>Symmetry </li></ul><ul><ul><li>Bob knows Carol then Carol knows Bob </li></ul></ul><ul><li>Limit </li>...
Things Covered <ul><li>What do I mean by Data Products?  </li></ul><ul><li>Systems and Tools we use </li></ul><ul><li>Let’...
SNA Team <ul><li>Thanks to SNA Team at LinkedIn </li></ul><ul><li>http://sna-projects.com </li></ul><ul><li>We are hiring!...
Questions?
Outline <ul><li>What do I mean by Data Products?  </li></ul><ul><li>Systems and Tools we use </li></ul><ul><li>Let’s build...
Outline <ul><li>What do I mean by Data Products?   </li></ul><ul><li>Systems and Tools we use </li></ul><ul><li>Let’s buil...
Nächste SlideShare
Wird geladen in …5
×

Building Data Products using Hadoop at Linkedin - Mitul Tiwari

3.669 Aufrufe

Veröffentlicht am

Hadoop and other big data tools such as Voldemort, Azkaban, and Kafka, drive many data driven products at LinkedIn such as “People You MayKnow” and various recommendation products such as “Jobs You May Be Interested In”. Each of these products can be viewed as a large scale social recommendation problems, which analyzes billions of possible options, and suggest appropriate recommendation.

Since these products analyzes billions of edges and terabytes of data daily, it can be built only using a large scale distributed compute infrastructure. Kafka publish-subscribe messaging system is used to get the data in Hadoop file system. Hadoop MapReduce is used as the basic building block to analyze billions of potential options, and predict recommendation. Over a hundred MapReduce tasks are combined together in a work-flow uising Azkaban, a Hadoop work-flow management tool. The output of Hadoop jobs is finally stored in Voldemort key-value store to serve the data at run-time for efficiency.

During this talk audience will get a basic understanding of link prediction problem behind “ People You May Know” feature, which is a large scale social recommendation problem. Overview of the solution of this problem using Hadoop MapReduce, Azkaban workflow management tool, and Voldemort key-value store will be presented. I will also describe how to efficiently compute the number of common connections (triangle closing) using Hadoop Mapreduce, which is one of the many signals in link prediction.

Overall, people interested in building interesting applications using Hadoop MapReduce will hugely benefit from this talk.

Veröffentlicht in: Technologie, Business
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... ,DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... ,DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • DOWNLOAD FULL. BOOKS INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

Building Data Products using Hadoop at Linkedin - Mitul Tiwari

  1. 1. Building Data Products using Hadoop at Linkedin <ul><li>Mitul Tiwari </li></ul><ul><li>Search, Network, and Analytics (SNA) </li></ul><ul><li>LinkedIn </li></ul>
  2. 2. Who am I?
  3. 3. <ul><li>What do I mean by Data Products? </li></ul>
  4. 4. People You May Know
  5. 5. Profile Stats: WVMP
  6. 6. Viewers of this profile also ...
  7. 7. Skills
  8. 8. InMaps
  9. 9. Data Products: Key Ideas <ul><li>Recommendations </li></ul><ul><ul><li>People You May Know, Viewers of this profile ... </li></ul></ul><ul><li>Analytics and Insight </li></ul><ul><ul><li>Profile Stats: Who Viewed My Profile, Skills </li></ul></ul><ul><li>Visualization </li></ul><ul><ul><li>InMaps </li></ul></ul>
  10. 10. Data Products: Challenges <ul><li>LinkedIn: 2nd largest social network </li></ul><ul><li>120 million members on LinkedIn </li></ul><ul><li>Billions of connections </li></ul><ul><li>Billions of pageviews </li></ul><ul><li>Terabytes of data to process </li></ul>
  11. 11. Outline <ul><li>What do I mean by Data Products? </li></ul><ul><li>Systems and Tools we use </li></ul><ul><li>Let’s build “People You May Know” </li></ul><ul><li>Managing workflow </li></ul><ul><li>Serving data in production </li></ul><ul><li>Data Quality </li></ul><ul><li>Performance </li></ul>
  12. 12. Systems and Tools <ul><li>Kafka (LinkedIn) </li></ul><ul><li>Hadoop (Apache) </li></ul><ul><li>Azkaban (LinkedIn) </li></ul><ul><li>Voldemort (LinkedIn) </li></ul>
  13. 13. Systems and Tools <ul><li>Kafka </li></ul><ul><ul><li>publish-subscribe messaging system </li></ul></ul><ul><ul><li>transfer data from production to HDFS </li></ul></ul><ul><li>Hadoop </li></ul><ul><li>Azkaban </li></ul><ul><li>Voldemort </li></ul>
  14. 14. Systems and Tools <ul><li>Kafka </li></ul><ul><li>Hadoop </li></ul><ul><ul><li>Java MapReduce and Pig </li></ul></ul><ul><ul><li>process data </li></ul></ul><ul><li>Azkaban </li></ul><ul><li>Voldemort </li></ul>
  15. 15. Systems and Tools <ul><li>Kafka </li></ul><ul><li>Hadoop </li></ul><ul><li>Azkaban </li></ul><ul><ul><li>Hadoop workflow management tool </li></ul></ul><ul><ul><li>to manage hundreds of Hadoop jobs </li></ul></ul><ul><li>Voldemort </li></ul>
  16. 16. Systems and Tools <ul><li>Kafka </li></ul><ul><li>Hadoop </li></ul><ul><li>Azkaban </li></ul><ul><li>Voldemort </li></ul><ul><ul><li>Key-value store </li></ul></ul><ul><ul><li>store output of Hadoop jobs and serve in production </li></ul></ul>
  17. 17. Outline <ul><li>What do I mean by Data Products? </li></ul><ul><li>Systems and Tools we use </li></ul><ul><li>Let’s build “People You May Know” </li></ul><ul><li>Managing workflow </li></ul><ul><li>Serving data in production </li></ul><ul><li>Data Quality </li></ul><ul><li>Performance </li></ul>
  18. 18. People You May Know Alice Bob Carol How do people know each other?
  19. 19. People You May Know Alice Bob Carol How do people know each other?
  20. 20. People You May Know Alice Bob Carol Triangle closing How do people know each other?
  21. 21. People You May Know Alice Bob Carol Triangle closing Prob(Bob knows Carol) ~ the # of common connections How do people know each other?
  22. 22. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage();
  23. 23. Pig Overview <ul><li>Load: load data, specify format </li></ul><ul><li>Store: store data, specify format </li></ul><ul><li>Foreach, Generate: Projections, similar to select </li></ul><ul><li>Group by: group by column(s) </li></ul><ul><li>Join, Filter, Limit, Order, ... </li></ul><ul><li>User Defined Functions (UDFs) </li></ul>
  24. 24. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage();
  25. 25. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage();
  26. 26. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage();
  27. 27. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage();
  28. 28. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage();
  29. 29. Triangle Closing Example Alice Bob Carol <ul><li>(A,B),(B,A),(A,C),(C,A) </li></ul><ul><li>(A,{B,C}),(B,{A}),(C,{A}) </li></ul><ul><li>(A,{B,C}),(A,{C,B}) </li></ul><ul><li>(B,C,1), (C,B,1) </li></ul>connections = LOAD `connections` USING PigStorage();
  30. 30. Triangle Closing Example Alice Bob Carol <ul><li>(A,B),(B,A),(A,C),(C,A) </li></ul><ul><li>(A,{B,C}),(B,{A}),(C,{A}) </li></ul><ul><li>(A,{B,C}),(A,{C,B}) </li></ul><ul><li>(B,C,1), (C,B,1) </li></ul>group_conn = GROUP connections BY source_id;
  31. 31. Triangle Closing Example Alice Bob Carol <ul><li>(A,B),(B,A),(A,C),(C,A) </li></ul><ul><li>(A,{B,C}),(B,{A}),(C,{A}) </li></ul><ul><li>(A,{B,C}),(A,{C,B}) </li></ul><ul><li>(B,C,1), (C,B,1) </li></ul>pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2);
  32. 32. Triangle Closing Example Alice Bob Carol <ul><li>(A,B),(B,A),(A,C),(C,A) </li></ul><ul><li>(A,{B,C}),(B,{A}),(C,{A}) </li></ul><ul><li>(A,{B,C}),(A,{C,B}) </li></ul><ul><li>(B,C,1), (C,B,1) </li></ul>common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections;
  33. 33. Our Workflow triangle-closing
  34. 34. Our Workflow triangle-closing top-n
  35. 35. Our Workflow triangle-closing top-n push-to-prod
  36. 36. Outline <ul><li>What do I mean by Data Products? </li></ul><ul><li>Systems and Tools we use </li></ul><ul><li>Let’s build “People You May Know” </li></ul><ul><li>Managing workflow </li></ul><ul><li>Serving data in production </li></ul><ul><li>Data Quality </li></ul><ul><li>Performance </li></ul>
  37. 37. Our Workflow triangle-closing top-n push-to-prod
  38. 38. Our Workflow triangle-closing top-n push-to-prod remove connections
  39. 39. Our Workflow triangle-closing top-n push-to-prod remove connections push-to-qa
  40. 40. PYMK Workflow
  41. 41. Workflow Requirements <ul><li>Dependency management </li></ul><ul><li>Regular Scheduling </li></ul><ul><li>Monitoring </li></ul><ul><li>Diverse jobs: Java, Pig, Clojure </li></ul><ul><li>Configuration/Parameters </li></ul><ul><li>Resource control/locking </li></ul><ul><li>Restart/Stop/Retry </li></ul><ul><li>Visualization </li></ul><ul><li>History </li></ul><ul><li>Logs </li></ul>
  42. 42. Workflow Requirements <ul><li>Dependency management </li></ul><ul><li>Regular Scheduling </li></ul><ul><li>Monitoring </li></ul><ul><li>Diverse jobs: Java, Pig, Clojure </li></ul><ul><li>Configuration/Parameters </li></ul><ul><li>Resource control/locking </li></ul><ul><li>Restart/Stop/Retry </li></ul><ul><li>Visualization </li></ul><ul><li>History </li></ul><ul><li>Logs </li></ul>Azkaban
  43. 43. Sample Azkaban Job Spec <ul><li>type=pig </li></ul><ul><li>pig.script=top-n.pig </li></ul><ul><li>dependencies=remove-connections </li></ul><ul><li>top.n.size=100 </li></ul>
  44. 44. Azkaban Workflow
  45. 45. Azkaban Workflow
  46. 46. Azkaban Workflow
  47. 47. Our Workflow triangle-closing top-n push-to-prod remove connections
  48. 48. Our Workflow triangle-closing top-n push-to-prod remove connections
  49. 49. Outline <ul><li>What do I mean by Data Products? </li></ul><ul><li>Systems and Tools we use </li></ul><ul><li>Let’s build “People You May Know” </li></ul><ul><li>Managing workflow </li></ul><ul><li>Serving data in production </li></ul><ul><li>Data Quality </li></ul><ul><li>Performance </li></ul>
  50. 50. Production Storage <ul><li>Requirements </li></ul><ul><ul><li>Large amount of data/Scalable </li></ul></ul><ul><ul><li>Quick lookup/low latency </li></ul></ul><ul><ul><li>Versioning and Rollback </li></ul></ul><ul><ul><li>Fault tolerance </li></ul></ul><ul><ul><li>Offline index building </li></ul></ul>
  51. 51. Voldemort Storage <ul><ul><li>Large amount of data/Scalable </li></ul></ul><ul><ul><li>Quick lookup/low latency </li></ul></ul><ul><ul><li>Versioning and Rollback </li></ul></ul><ul><ul><li>Fault tolerance through replication </li></ul></ul><ul><ul><li>Read only </li></ul></ul><ul><ul><li>Offline index building </li></ul></ul>
  52. 52. Data Cycle
  53. 53. Voldemort RO Store
  54. 54. Our Workflow triangle-closing top-n push-to-prod remove connections
  55. 55. Outline <ul><li>What do I mean by Data Products? </li></ul><ul><li>Systems and Tools we use </li></ul><ul><li>Let’s build “People You May Know” </li></ul><ul><li>Managing workflow </li></ul><ul><li>Serving data in production </li></ul><ul><li>Data Quality </li></ul><ul><li>Performance </li></ul>
  56. 56. Data Quality <ul><li>Verification </li></ul><ul><li>QA store with viewer </li></ul><ul><li>Explain </li></ul><ul><li>Versioning/Rollback </li></ul><ul><li>Unit tests </li></ul>
  57. 57. Outline <ul><li>What do I mean by Data Products? </li></ul><ul><li>Systems and Tools we use </li></ul><ul><li>Let’s build “People You May Know” </li></ul><ul><li>Managing workflow </li></ul><ul><li>Serving data in production </li></ul><ul><li>Data Quality </li></ul><ul><li>Performance </li></ul>
  58. 58. Performance <ul><li>Symmetry </li></ul><ul><ul><li>Bob knows Carol then Carol knows Bob </li></ul></ul><ul><li>Limit </li></ul><ul><ul><li>Ignore members with > k connections </li></ul></ul><ul><li>Sampling </li></ul><ul><ul><li>Sample k-connections </li></ul></ul>
  59. 59. Things Covered <ul><li>What do I mean by Data Products? </li></ul><ul><li>Systems and Tools we use </li></ul><ul><li>Let’s build “People You May Know” </li></ul><ul><li>Managing workflow </li></ul><ul><li>Serving data in production </li></ul><ul><li>Data Quality </li></ul><ul><li>Performance </li></ul>
  60. 60. SNA Team <ul><li>Thanks to SNA Team at LinkedIn </li></ul><ul><li>http://sna-projects.com </li></ul><ul><li>We are hiring! </li></ul>
  61. 61. Questions?
  62. 62. Outline <ul><li>What do I mean by Data Products? </li></ul><ul><li>Systems and Tools we use </li></ul><ul><li>Let’s build “People You May Know” </li></ul><ul><li>Managing workflow </li></ul><ul><li>Serving data in production </li></ul><ul><li>Data Quality </li></ul><ul><li>Performance </li></ul>
  63. 63. Outline <ul><li>What do I mean by Data Products? </li></ul><ul><li>Systems and Tools we use </li></ul><ul><li>Let’s build “People You May Know” </li></ul><ul><li>Managing workflow </li></ul><ul><li>Serving data in production </li></ul><ul><li>Data Quality </li></ul><ul><li>Performance </li></ul>

×