Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Change data capture with MongoDB and Kafka.

7.186 Aufrufe

Veröffentlicht am

In any modern web platform you end up with a need to store different views of your data in many different datastores. I will cover how we have coped with doing this in a reliable way at State.com across a range of different languages, tools and datastores.

Veröffentlicht in: Internet

Change data capture with MongoDB and Kafka.

  1. 1. Change Data Capture with Mongo + Kafka By Dan Harvey
  2. 2. High level stack React.js - Website Node.js - API Routing Ruby on Rails + MongoDB - Core API Java - Opinion Streams, Search, Suggestions Redshift - SQL Analytics
  3. 3. Problems • Keep user experience consistent • Streams / search index need to update • Keep developers efficient • Loosely couple services • Trust denormalisations
  4. 4. Use case • User to User recommender • Suggest “interesting” users to a user • Update as soon as you make a new opinion • Instant feedback for contributing content
  5. 5. Log transformation Java$Services Avro Rails$API JSON/BSON Mongo Opinion Optaileroplog Kafka: User Topic User Recommender Change$data$capture Stream$processing User Kafka: Opinion Topic
  6. 6. Op(log)tailer • Converts BSON/JSON to Avro • Guarantees latest document in topic (eventually) • Does not guarantee all changes • Compacting Kafka topic (only keeps latest)
  7. 7. Avro Schemas • Each Kafka topic has a schema • Schemas evolve over time • Readers and Writers will have different schemas • Allows us to update services independently
  8. 8. Schema Changes • Schema to ID managed by Confluent registry • Readers and writers discover schemas • Avro deals with resolution to compiled schema • Must be forwards and backwards compatible Ka#a$message:$byte[] message:$byte[]schema$ID:$int
  9. 9. Search indexing • User / Topic / Opinion search • Re-use Kafka topics from before • Index from Kafka to Elasticsearch • Need to update quickly and reliably
  10. 10. Samza Indexers • Index from Kafka to Elasticsearch • Used Samza for transform and loading • Far less code than Java Kafka consumers • Stores offsets and state in Kafka
  11. 11. Elasticsearch Producer • Samza consumers/producers deal with I/O • Wrote new ElasticsearchSystemProducer • Contributed back to Samza project • Included in Samza 0.10.0 (released soon)
  12. 12. Samza Good/Bad • Good API • Simple transformations easy • Simple ops: logging, metrics all built in • Only depends on Kafka • Inbuilt state management • Joins tricky, need consistent partitioning • Complex flows are hard (Flink/Spark better)
  13. 13. Decoupling Good/Bad • Easy to try out complex new services • Easy to keep data stores in sync, low latency • Started to duplicate core logic • More overhead with more services • Need high level framework for denormalisations • Samza SQL being developed
  14. 14. Ruby Workers • Ruby Kafka consumers not great… • Optailer to AWS SQS (Shoryuken gem) • No order guarantee like Kafka topics • But guaranteed trigger off database writes • Better for core data transformations
  15. 15. Future • Segment.io user interaction logs to Kafka • Use in product, view counts, etc… • Fill Redshift for analytics (currently batch) • Kafka CopyCat instead of our Optailer • Avro transformation in Samza
  16. 16. Questions? • email: dan@state.com • twitter: @danharvey