Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Data governance and discoverability at AO.com | Jon Vines, AO.com and Christoph Schubert, Confluent

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige

Hier ansehen

1 von 34 Anzeige

Data governance and discoverability at AO.com | Jon Vines, AO.com and Christoph Schubert, Confluent

Herunterladen, um offline zu lesen

One challenge of widespread adoption of any technology within an organisation is balancing organic growth and maintaining standards and best practice. At AO.com - one of the UK's largest online electrical retailers - we’ve invested in tooling to simplify application onboarding into the event processing platform. This includes creating topics, defining access to the platform and supporting our governance functions.
A key part of any data function is data governance and discoverability. Through standardised definitions of Kafka topics and use of Avro schemas, we can map which topics exist, what data they contain and who has access to them. This allows us to support multiple cross-functional teams using automatic data gathering.
In this session, AO.com and Confluent Professional Services will share how we tackled the challenge of platform adoption and provide hands-on examples of the open-source tools, ""Kafka Clusterstate Tools"" and ""Kafka Streams Inspector"", we developed.

One challenge of widespread adoption of any technology within an organisation is balancing organic growth and maintaining standards and best practice. At AO.com - one of the UK's largest online electrical retailers - we’ve invested in tooling to simplify application onboarding into the event processing platform. This includes creating topics, defining access to the platform and supporting our governance functions.
A key part of any data function is data governance and discoverability. Through standardised definitions of Kafka topics and use of Avro schemas, we can map which topics exist, what data they contain and who has access to them. This allows us to support multiple cross-functional teams using automatic data gathering.
In this session, AO.com and Confluent Professional Services will share how we tackled the challenge of platform adoption and provide hands-on examples of the open-source tools, ""Kafka Clusterstate Tools"" and ""Kafka Streams Inspector"", we developed.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Data governance and discoverability at AO.com | Jon Vines, AO.com and Christoph Schubert, Confluent (20)

Anzeige

Weitere von HostedbyConfluent (20)

Aktuellste (20)

Anzeige

Data governance and discoverability at AO.com | Jon Vines, AO.com and Christoph Schubert, Confluent

  1. 1. ©2020AO World All Rights Reserved Data Governance and Discoverability at AO.com Jon Vines, Christoph Schubert, Mirko Kämpf
  2. 2. Agenda  Introduction  AOandKafka  DataGovernanceandDiscoverability  Introducingtools developed  Next Steps
  3. 3. JonVines EngineeringLead,AO.com Christoph Schubert SeniorSolutionsArchitect,Confluent Mirko Kämpf SeniorSolutionsArchitect,Confluent Who are we?
  4. 4. 4
  5. 5. “The pandemicaccelerateda shift in customer behaviour towards online shopping – we saw ten years’ change in consumer behaviour injust ten weeks.” JohnRoberts, AOWorld CEO 2020
  6. 6. 6 Kafkaat AO 2018 2021 2019 2020 Exploratory usage First high-scale production use case - clickstream Platform thinking Confluent PS Engagement Migrate to Confluent Cloud Kafka Connect, Schema Registry, KSQL .Net Client APIs
  7. 7. 7 AO Clickstream  Generate Session and Visitor views in real-time  Createdecision points from that insight  Tailor visitor experience based on those decisions
  8. 8. 8 One AO
  9. 9. 9 Understandingcontext clickstream products pricing stock orders van location …
  10. 10. 10  Number of teams and data sources  No defined way to onboard newdata  Managewider data concerns around governanceand discoverability Problem outline
  11. 11. 11  Lead time to platform onboarding  Security, Governance and Discoverability baked in  Reducecognitive load on teams Goals
  12. 12. Kafka Cluster State Tools Declarative resource management Christoph Schubert Senior Solutions Architect, Confluent
  13. 13. 13 A step back: what is the state of a Kafka cluster What resources exist in an Apache Kafka® cluster?  Topics  Schemas and othermetadata  Information about applications (consumer groups, transactional IDs)  What to beable to restrict access to each of these
  14. 14. 14 Goals for resource management  Declarative: Describedesiredstate Tooling will figure outawaytoreachthis state  Decentralized Giveeach domain/applicationteamanamespaceto workin Changesin onenamespaceshould noteffect othernamespaces Eachteam/domaingoverns access totheir namespace  Right level of abstraction Abstractandextensiblerolemodel foraccesscontrol Will be compiled to properACLs orrolebindings
  15. 15. 15 Kafka Clusterstate Tool  Describe resourcesas domainfiles (YAML)  Each domain corresponds to namespace  Namespacing implemented using name prefixes  Domain files can contain additional metadata
  16. 16. 16 Abstract role definitions Abstract roles such as • Consumer / producer • Kafka Streams application will becompiled to access rights accordingto best practices.
  17. 17. 17 Implementation domain files compile desired cluster-state actual cluster-state extract diff state-diff list of actions apply standardized format can be filtered based on policy (e.g. no deletes, all schema changes backward compatible)
  18. 18. 18 Governance workflow  No mandated governance workflow  Natural fit for gitopsworkflow All resources/accesschangesleadsto pullrequest Datastewardsneeds toapprovepull request Optional:downstreamapplicationsneed toapproveaswell
  19. 19. KS Inspector Decision Support for Architects & Operators Mirko Kämpf Senior Solutions Architect, Confluent
  20. 20. 20 Application Deployment Connect a streaming application to a data plane
  21. 21. 21 Application Lifecycle Management Develop the app Test in DEV Deploy to DEV Release the app Performance Test in QA Deploy to QA Deploy to PROD
  22. 22. 22 The Full Picture: Develop the app Test in DEV Deploy to DEV Release the app Performance Test in QA Deploy to QA Deploy to PROD Flexibility Strong Governance
  23. 23. 23 Let's Build a Knowledge Graph How do we improve transparency ?
  24. 24. 24 How to use all the facts? • Choose from a collection of named queries for compliance analysis. • Use ad-hoc queries for inspection and exploration.
  25. 25. 25 Components of our solution:
  26. 26. 26 Enjoy working with facts ...  Application life-cycle management starts with onboarding:  Centraloperatorprovides details regardingthe definedtargetenvironment todeveloper.  Developerscan requestachangeonthe topicsetup.  Queries for impact analysis:  Operatorsvalidatechangerequestsusing queries on ourgraphdatamodel.  Architectscansimulate changessothatsideeffectscould beidentifiedbeforea change.  "ReadingtheGraph"supports flow optimisation and cost reduction:  Redundantprocessing flowscanbeidentified.  Dependenciesandcomplicatedcommunicationstructurescanbefoundin this data.
  27. 27. Next Steps Jon Vines EngineeringLead,AO.com
  28. 28. 28  Defined developer workflow  Topic naming strategy  Domain language  Domain agnostic platform capability  Guardrails built into deployment process  Discoverability through Knowledge Graph Achievements
  29. 29. Teamsown creationofsourceoftruth data Distributed domain driven architecture Teamsabletouseplatform componentswithappropriate guardrails Selfserve platform design Dataiscuratedas aproduct,withthe same level of ownershipas asoftware product Product thinking Data Mesh Paradigm Shift Ref: How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (martinfowler.com)
  30. 30. 30 Data Mesh Continued clickstream products pricing stock … Federated Governance and access policies Domain oriented event streams as a product Selfserve platform components topic provisioning access control schema creation topic naming Schema definition access control app lifecycle data pipeline knowledge graph encryption monitoring data lineage
  31. 31. 31 Data Mesh Continued clickstream products pricing stock … Federated Governance and access policies Domain oriented event streams as a product Self serve platform components topic provisioning access control schema creation topic naming Schema definition access control app lifecycle data pipeline knowledge graph encryption monitoring data lineage
  32. 32. 32  Industrialise self-serve onboarding  Integration with ServiceCatalogue  Eventdefinition and validation  Organisation wide enablement What’s next
  33. 33. 33
  34. 34. Thankyou 34 Links: • https://github.com/christophschubert/kafka-clusterstate-tools • https://github.com/kamir/ks-inspector

Hinweis der Redaktion

  • AO is one of the UK’s leading electricals retailers, operating predominantly online for the last 20 years and selling over 9,000 electrical products to millions of customers across the UK and Germany. A business that not only prides itself on putting customers first, AO is just as dedicated to creating an amazing place for their 4,000 employees to not only work but thrive. 
  • Trends in retail:

    Highly competitive market, work to thin margins
    Moving from High Street to Online
    Personalised Customer Experience – optimal buyer journey

    A number of disruptive trends in retail, we operate in a highly competitive market working to thin margins. In general, including our competitors, we’re seeing a growing trend moving from the high street to online. And we’re also seeing an increase in the use of personalisation, and building an optimal buyer journey

    We can find opportunities across Customer Experience, including personalised promotions and proactive decision making, into operational efficiencies, including real-time automation of customer interactions and across new business models, such as warehouse logistics aligned with real-time customer demand.

    The COVID-19 pandemic caused a dramatic shift in consumer shopping habits, which lead to  a sharp increase in growth at AO. John Roberts, AO Founder and Chief Executive Officer, underscored the scale of the change when he explained that “the pandemic accelerated a shift in customer behavior towards online shopping – we saw ten years’ change in consumer behaviour in just ten weeks.” 
  • 100-150rps from on site, 1,000s rps through system
    Approx 300GB
  • AO is not just onsite… we’ve got multiple lines of business which all interconnect
  • Reduce the lead time to widespread adoption of the data platform 
    Ensure security and governance policies are baked in from the start 
    Enable self-serve capabilities for onboarding new use cases onto the data platform.
    Reduce cognitive load on the teams by supplying reference architectures and established patterns of use
  • Finish by saying, we are growing massively at AO with positions across tech in engineering and data… if you’d like to learn more, please reach out.

×