Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

The Future of ETL Isn't What It Used to Be

2.143 Aufrufe

Veröffentlicht am

Speaker: Gwen Shapira, Principal Data Architect, Confluent

Join Gwen Shapira, Apache Kafka® committer and co-author of ""Kafka: The Definitive Guide,"" as she presents core patterns of modern data engineering and explains how you can use microservices, event streams and a streaming platform like Apache Kafka to build scalable and reliable data pipelines designed to evolve over time.

This is part 1 of 3 in Streaming ETL - The New Data Integration series.

Watch the recording: https://videos.confluent.io/watch/q7roRtNZBnjiT9C3ii88fo?.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

The Future of ETL Isn't What It Used to Be

  1. 1. 1 Streaming ETL - Part 01
 The Future of ETL Gwen Shapira, Principal Data Architect
 @gwenshap https://www.confluent.io/blog/the-future-of-etl-isnt-what-it-used-to-be/
  2. 2. Gwen is a principal data architect at Confluent helping customers achieve success with their Apache Kafka® implementation. She has 15 years of experience working with code and customers to build scalable data architectures. Gwen is a committer on the Apache Kafka project, author of “Kafka - The Definitive Guide," and a frequent presenter at industry conferences. Gwen Shapira Principal Data Architect, Confluent
  3. 3. 3 House Keeping Items! • This session will last about an hour.
 • This session will be recorded.
 • You can submit your questions by entering them into the GoToWebinar panel. 
 • The last 10-15 minutes will consist of Q&A.
 • The slides and recording will be available after the talk.

  4. 4. You extrapolate the trends that you see
  5. 5. 7 So, to think about future of ETL, we want to look at… • The future we imagined in the past • The trends we are seeing now • The future these trends indicate • Practical suggestions • Final thoughts
  6. 6. 8 What was it like 15 years ago? • Very few sources • One Data Warehouse to rule them all • One big “friendly” ETL tool • Significant effort modeling, cleaning and converging. • “Although it is a huge leap to move from a monthly or weekly process to a nightly one…”
  7. 7. 10 Main pain-points of data integration?
  8. 8. 10 What did we dream about then? • Data modeling is painful. 
 Is there a way around that? • We are always missing data.
 Can we handle more diverse data sources? • Scale is painful. Our daily run takes 25 hours.
 Can we do it in 5? • Data Warehouses are expensive. 
 If only there was a free option.
  9. 9. 11 And 5 years ago? • Many diverse sources. • One big target • Complete confusion of ETL tools • “Schema on Read” • … and still loading data around once a day.
  10. 10. 12 And since change is difficult, we also got this:
  11. 11. 13 What did we dream about then? • Real-time. 
 We don’t want to wait even 5 minutes for insights. • Governance. 
 Save us from the mess we created. • Better tooling.
 We imagined really fancy drag-and-drop.
  12. 12. But we missed some key ways 
 in which the world is changing
  13. 13. #1 Cloud 15
  14. 14. 16 We want ETL that: • Integrates with Managed Services • Seamlessly integrate across many DCs • Cloud-native. 
 Containerized, dynamic scale, automatic recovery
  15. 15. #2 Microservices
  16. 16. We want ETL that: • Integrates applications and data stores • Platform for building data-intensive apps • Agile and Composable. DRY.
  17. 17. Front End Middleware Backend ETL Databases Ops Sysadmins Software Engineers Software Engineers Software Engineers Software Engineers Software Engineers Cloud and Microservices drove more changes
  18. 18. #3 DevOps 20
  19. 19. 21 We want ETL that: • Open Source • Integrates with engineering tools • Low / No inter-team dependencies • APIs • Source Control, CI/CD, Test frameworks • Part of the application
  20. 20. What shall we build now?
  21. 21. #1 Model Everything as a Stream 23
  22. 22. 24 all your data as a stream Imagine…
  23. 23. 25
  24. 24. 19
  25. 25. 20
  26. 26. 29 A stream fixes the madness
  27. 27. #2 Move Fast, Don’t Break Compatibility 30
  28. 28. 32 Push, Merge, Build Test, Staging Dev Prod MVN Plugin Test Registry Prod Registry
  29. 29. #3 Integrate Databases and Applications at Scale 33
  30. 30. Confidential 34 Do you think that’s a table you are querying?
  31. 31. 35 The Stream/Table Duality Account ID Balance 12345 €50 Account ID Amount 12345 + €50 12345 + €25 12345 -€60 Account ID Balance 12345 €75Account ID Balance 12345 €15 Time Stream Table http://docs.confluent.io/current/streams/concepts.html#duality-of-streams-and-tables
  32. 32. 36 Full streaming platform with Connect + Streams
  33. 33. 39 Promotion! We want to send Platinum members Who are looking at beach properties An email about Discount package in new Florida hotel Lets look at an Example. Our marketing team has a small request:
  34. 34. Pageview Events Consume Enrich Decide Produce Promotions Events
  35. 35. 39
  36. 36. Pageview Events Consumer Enrich Decide Produce Promotions Events DB
  37. 37. Pageview Events Consumer Enrich Decide Produce Promotions Events latency, throughput, availability DB
  38. 38. Pageview Events Consumer Enrich Decide Produce Promotions Events DB Cache
  39. 39. Pageview Events Consumer Enrich Decide Produce Promotions Events DB DB Change Events: Insert, update, delete Consumer
  40. 40. Kafka Connect
 + CDC Connector DB KSQL> CREATE STREAM promotions AS
 SELECT customer_email, customer_name FROM pageview_events 
 LEFT OUTER JOIN customers on c_id=pageview_cust_id WHERE customer_class = ‘Prime’;
  41. 41. What’s next?
  42. 42. 46
  43. 43. • Serverless • Observability • Operational Patterns • Metadata for Engineers • Stronger guarantees • Standardization • Return of ETL basics • Metadata-driven, ML-driven ETL • … • … • Organizational Enlightenment Almost here? Will this ever happen?
  44. 44. Is ETL dead? 48
  45. 45. Resources and Next Steps https://github.com/confluentinc/ksql
 https://github.com/confluentinc/cp-demo
 https://www.confluent.io/download/ https://slackpass.io/confluentcommunity https://www.confluent.io/blog
  46. 46. Questions? @gwenshap
 gwen@confluent.io
  47. 47. 51 Thank you for joining us!

×