Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

CouchbasetoHadoop_Matt_Michael_Justin v4

468 Aufrufe

Veröffentlicht am

  • Loggen Sie sich ein, um Kommentare anzuzeigen.

CouchbasetoHadoop_Matt_Michael_Justin v4

  1. 1. Couchbase to Hadoop at Linkedin Kafka is Enabling the Big Data Pipeline
  2. 2. • Define Problem Domain Justin Michaels | Solution Architect, Couchbase • Use case at LinkedIn Michael Kehoe | Site Reliability Engineer, Linkedin • Supporting Technology Overview and Demo Matt Ingenthron | Senior Director, Couchbase • Q&A Agenda 2
  3. 3. Lambda Architecture 4 1 2 3 4 5 DATA BATCH SPEED SERVE QUER Y
  4. 4. Lambda Architecture 5 Interactive and Real Time Applications 1 2 3 4 5 DATA BATCH SPEED SERVE QUER YHADOOP COUCHBASE STORM COUCHBASEBroker Cluster Spout for Topic Kafka Producers Ordered Subscriptions
  5. 5. • Hadoop … an open-source framework written for distributed storage and distributed processing of very large data sets on commodity hardware • Kafka … append only write-ahead log that records messages to a persistent store and allows subscribers to read and apply these changes to their own stores in an appropriate time-frame • Storm … distributed framework that uses custom created "spouts" and "bolts" to define information sources and manipulations for processing of streaming data • Couchbase … an open source, distributed NoSQL document- oriented database that is optimized for interactive applications with an integrated data cache and incremental map reduce facility 6
  6. 6. COMPLEX EVENT PROCESSING Real Time REPOSITORY PERPETUAL STORE ANALYTICAL DB BUSINESS INTELLIGENCE MONITORING CHAT/VOICE SYSTEM BATCH TRACK REAL-TIME TRACK DASHBOARD
  7. 7. TRACKING and COLLECTION ANALYSIS AND VISUALIZATION REST FILTER METRICS
  8. 8. Use Case at Linkedin 10
  9. 9. • Site Reliability Engineer (SRE) at LinkedIn • SRE for Profile & Higher-Education • Member of LinkedIn’s CBVT • B.E. (Electrical Engineering) from the University of Queensland, Australia Michael Kehoe
  10. 10. • Kafka was created by LinkedIn • Kafka is a publish-subcribe system built as a distributed commit log • Processes 500+ TB/ day (~500 billion messages) @ LinkedIn Kafka @ LinkedIn
  11. 11. • Monitoring • InGraphs • Traditional Messaging (Pub-Sub) • Analytics • Who Viewed my Profile • Experiment reports • Executive reports • Building block for (log) distributibuted applications • Pinot • Espresso LinkedIn’s uses of Kafka
  12. 12. Use Case: Kafka to Hadoop (Analytics) • LinkedIn tracks data to better understand how members use our products • Information such as which page got viewed and which content got clicked on are sent into a Kafka cluster in each data center • Some of these events are all centrally collected and pushed onto our Hadoop grid for analysis and daily report generation
  13. 13. Couchbase @ LinkedIn • About 25 separate services with one or more clusters in multiple data centers • Up to 100 servers in a cluster • Single and Multi-tenant clusters
  14. 14. Use Case: Jobs Cluster • Read scaling, Couchbase ~80k QPS, 24 server cluster(s) • Hadoop to pre-build data by partition • Couchbase 99 percentile latencies
  15. 15. Hadoop to Couchbase • Our primary use-case for Hadoop  Couchbase is for building (warming) / recovering Couchbase buckets • LinkedIn built it’s own in-house solution to work with our ETL processes, cache invalidation procedures etc

×