Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 34 Anzeige

0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019

Tesla ingests trillions of events every day from hundreds of unique data sources through our streaming data platform. Find out how we developed a set of high-throughput, non-blocking primitives that allow us to transform and ingest data into a variety of data stores with minimal development time. Additionally, we will discuss how these primitives allowed us to completely migrate the streaming platform in just a few months. Finally, we will talk about how we scale team size sub-linearly to data volumes, while continuing to onboard new use cases.

Tesla ingests trillions of events every day from hundreds of unique data sources through our streaming data platform. Find out how we developed a set of high-throughput, non-blocking primitives that allow us to transform and ingest data into a variety of data stores with minimal development time. Additionally, we will discuss how these primitives allowed us to completely migrate the streaming platform in just a few months. Finally, we will talk about how we scale team size sub-linearly to data volumes, while continuing to onboard new use cases.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie 0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019 (20)

Anzeige

Weitere von confluent (20)

Aktuellste (20)

Anzeige

0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019

  1. 1. 0-60: Tesla’s Streaming Data Platform Jesse Yates Staff Engineer Kafka Summit SF September 30th, 2019
  2. 2. • Staff Engineer @ Tesla, Big Data • “Big Data” & Cloud specialist • Apache HBase Committer • Apache Phoenix PMC • Occasional blogger • Triathlete Who am I?
  3. 3. Agenda • Challenges • Design Overview • Build a data flow • Operations
  4. 4. Data Challenges Usual suspects • Volume • Velocity • Variety IoT Challenges • Bursty • Low-latency • Payload explosion
  5. 5. 100s of Powerpacks 10s pods/pack 1000s signals/sec/pod
  6. 6. Payload Explosion
  7. 7. A little bit of history Trillions/day
  8. 8. Designing an ETL System
  9. 9. Design Requirements • ”Just works” • Flexible batching • One-stream, one-app • Scale with multiple degrees of freedom
  10. 10. Kafka Channels • Backpressure, buffering, non-blocking, fault tolerant • Mostly configurable, extendable as needed • Highly Composable Limit functionality to increase operability
  11. 11. Channel Building Blocks Database FileSystem Broadcast Filter Batch Source KafkaToFileSystem KafkaToDB KafkaToCanonical KafkaToKafka
  12. 12. Composable Components Mirror Raw to Canonical Storage Payload (K:V) (K:V)
  13. 13. Let’s build a data flow! Not really, because, you know, live demos
  14. 14. A Simple Use Case • Gzipped, custom event format • Collected in an edge Kafka cluster • Land in central DataLake for analysts • … • Profit!
  15. 15. Mirror from Edge Kafka • Its just another Channel application • No surprise bugs or operations • Regex mapping • Sampling edge.cool_data cool_data edge.legacy_data
  16. 16. Our Flow 10% Mirror
  17. 17. Raw to Canonical • Many-to-many • Decoders: gzip, b64 • Built-in parsers: JSON, CSV • Custom Parser • Flexible for unplanned uses • High parallelizable Kafka Source Commit Decode Parse Produce
  18. 18. Parser API public parse(byte[]) :: Iterator<Map<String, Object>> • Exception during call -> Skip record • Exception during iteration -> Halt the stream • Sometimes means early materialization
  19. 19. Using Our Custom Parser
  20. 20. Our Flow 10% Mirror Custom Parser to Avro
  21. 21. KafkaToFileSystem • Parquet format • System-time partitioned • Many-to-one • Native batching /root /cool.db /sys_date=2019-09-30 /channel.cool_data /somedata.parquet.1
  22. 22. Our Flow 10% Mirror Custom Parser to Avro FS HDFS
  23. 23. Monitoring & Operations
  24. 24. Kafka Monitoring • Kafka is very simple to monitor + observe • One dashboard can tell you everything at a glance But… people don’t think in offsets and counts % SLOs and time-based lag monitoring
  25. 25. Operations • Many open source tools • Kafka Monitor https://github.com/linkedin/kafka-monitor • Burrow https://github.com/linkedin/Burrow • Cruise Control https://github.com/linkedin/cruise-control • Our own tools https://github.com/teslamotors/kafka-helmsman • Freshness tracker • Topic Enforcer • Rolling Restart
  26. 26. Kubernetes • Dynamic scalability • Incidents or usual growth • Handle daily peaks • Load smearing across streams • Not free – infra is non-trivial
  27. 27. What about when things go sideways • A rack fails • Your database chokes • The network is having a bad day And your users need their data RIGHT NOW!
  28. 28. Channels Backfill • “Freshest” data can be ingest immediately • Looks just like a regular channel • Just select a range in the past & deploy
  29. 29. Channels Backfill
  30. 30. Summary • Lots of kinds of data + IoT challenges • Simplicity for operations at scale • Backpressure, non-blocking, high-throughput • Flexibly configuration based
  31. 31. A note on global warming
  32. 32. Accelerate the world’s transition to sustainable energy We are hiring! Jesse Yates@jesse_yates jyates@tesla.com
  33. 33. Backup Slides
  34. 34. Future Directions • S3 storage as first-class citizen • Managing hundreds of flows with multiple steps • Internal library • Self-service flows exposed to users • Open source?

×