Anzeige

Developing Enterprise Consciousness: Building Modern Open Data Platforms

ScyllaDB
ScyllaDB
30. Jan 2023
Anzeige

Más contenido relacionado

Similar a Developing Enterprise Consciousness: Building Modern Open Data Platforms(20)

Anzeige

Developing Enterprise Consciousness: Building Modern Open Data Platforms

  1. Developing Enterprise Consciousness: Building Modern Open Data Platforms Rahul Singh, CEO
  2. Rahul Singh ■ Built hosting companies and data centers in high-school. (Servers, Switches, DNS, etc.) ■ Built and managed CMS/KMS, Portals, SaaS apps for clients (.NET, Java, SQL Server, MySQL). ■ Dove deep into big data to get better at enterprise search for massive knowledge / content systems. (Spark, Scala, Cassandra, Solr, Elastic) ■ Focus now on global scale real-time data platforms for large organizations or organizations with a large audience. ■ Published Cassandra.Link, Cassandra.Tools, more on the way.
  3. ■ Playbook ■ Design ■ Framework ■ Approach Presentation Agenda ■ Use Cases ■ Migration ■ Standard Data Fabric ■ Cloud vs. Open Core
  4. Business / Platform Dream Enterprise Consciousness : - People - Processes, - Information - Systems Connected / Synchronized. Business has been chasing this dream for a while. As technologies improve, this becomes more accessible. Image Source: Digital Business Technology Platforms, Gartner 2016
  5. Going Beyond “Reactive Manifesto” / 12 Factor References: https:/ /12factor.net/, https:/ /www.reactivemanifesto.org/ - Current Business Information is available to People in the swiftest way possible within the bounds of reasonable costs. - Business Information is generally available to the enterprise, siloed only by security and governance. - Data platforms make use of appropriate resources for hot vs. cold, raw vs. enhanced data. - Data platforms are always available, redundant, always trying to achieve a RPO/RTO of zero.
  6. Challenges of Managing Data Platforms in a Growing Enterprise
  7. Phases of Business Modularity Business Silos Standardized Platform Optimized Core Business Modularity Optimized Core enabled Business Modularity This process needs to be done in sequence. Otherwise we end up having to redo the work.
  8. Generic Data Platform Operations
  9. How Distributed Data Helps Transformation XDCR: Cross datacenter replication is the ultimate data fabric. Resilience, performance, availability, and scale. Made widely available by Cassandra and Couchbase, expanded and accelerated by ScyllaDB
  10. Modern Open Data Platform
  11. Design Contexts Responsibilities Approach Framework Tools
  12. So Many Different “Modern Stacks?” Lots of “reference” architectures available. They tend not to think about the speed layer since they are focusing on batch. What about SPEED?
  13. How Do You Choose From the Landscape? Lots and lots of components in the Data & AI Landscape. Which ones are the right ones for your business?
  14. Playbook for Modern Open Data Platform Platform Design Discovery (Inventory) - People - Process - Information (Objects) - Systems (Apps) Evaluate Framework User Experience - No-Code/Low Code Apps/Form Builders - Automatic API Generator/Platform - Customer App/API Framework Cloud - Public - Private - Hybrid Data - Data:Object - Data:Stream - Data:Table - Data:Index - Processor:Batch - Processor:Stream DevOps - Infrastructure as Code - Systems Automation - Application CICD DataOps - ETL/ELT/EtLT - Reverse ETL - Orchestration Execute Approach Architecture (Design) - Cloud - Data - DevOps - DataOps Engineering - Configuration - Scripting - Programming Operation - Setup / Deploy - Monitoring/Alerts - Administration
  15. Framework
  16. Design Distributed Realtime Extendable / Open Automated Monitored / Managed
  17. Public Cloud Native - Amazon 100% Serverless Data Platform Architecture on Amazon AWS
  18. Public Cloud Native - Microsoft 100% Azure + Azure Databricks Data Platform Architecture on Microsoft Azure Cloud
  19. Public Cloud Native - Google 100% GCP / Google Data Platform Architecture on GCP
  20. Use Case: Optimizing Distributed Data with Cloud vs. Open Core
  21. Open Core Distributed Data Platform To create globally distributed and real time platforms, we need to use distributed realtime technologies to build your platform. Here are some. Which ones should you choose?
  22. Open Core Data Modernization / Automation /Integration In addition to vastly scalable tools, there are also modern innovations that can help teams automate and maximize human capital by making data platform management easier.
  23. Framework Components ■ Major Components ■ Persistent Queues (RAM/BUS) ■ Queue Processing & Compute (CPU) ■ Persistent Storage (DISK/RAM) ■ Reporting Engine (Display) ■ Orchestration Framework (Motherboard) ■ Scheduler (Operating System) ■ Strategies ■ Cloud Native on Google ■ Self-Managed Open Source ■ Self-Managed Commercial Source ■ Managed Commercial Source Customers want options, so we decided to create a Framework that can scale with whatever Infrastructure and Software strategy they want to use.
  24. 24 Framework
  25. Approach
  26. Approach Setup Training Administration Configuration Knowledge
  27. Approach
  28. Sample STACK Outline Framework Platform Components Resources Platform Setup Training Administration Configuration Knowledge ● Components ○ Infrastructure ■ Source / Git ■ Github ■ Gitlab ■ Cloud / Public ■ AWS ■ Azure ■ GCP ■ DO ■ Orchestration ■ Terraform ■ Terraform / Atlanits ■ Configuration ■ Ansible ■ Ansible / AWX / Semaphore ○ Compute ■ Datastax / Spark ■ Datastax / Livy ■ Databricks ○ Data / Open Core ■ Datastax Enterprise ■ Cassandra ■ Search / Solr ■ Graph ■ Confluent Platform ○ Data / Cloud ■ Datastax / Astra ■ Confluent Cloud ○ Data / Open Source ■ Cassandra ■ Kafka ■ Elassandra ■ YugaByteDB ■ ScyllaDB ■ Pulsar ○ Application ■ Airflow ■ Airbyte ■ Kafka Streams ■ Jupyter ■ Redash ■ Metabase ■ Superset ■ Zeppelin
  29. Use Case: Standard Data Fabric
  30. How ScyllaDB Allows Us To Go Further… All the benefits of XDCR and …. - More Data Density at High Speed / Multiple Workloads on the Same Datacenter - Better Memory / CPU management due to C++ Seastar Framework, Faster Caches - CQL Queries to support Non Relational / C* CQL like queries. - DynamoDB Queries to support legacy Dynamo - Transactions/Consistency - …
  31. Let’s Get Data Into ScyllaDB - Easier Today Open Source: - Airbyte / RudderStack makes ETL Easier and are open source - Kafka Connect / Pulsar IO can convert ETL into Streaming ETL SaaS/PaaS: - SaaS like Stitch/HevoData/Make - Supported versions of Airbyte/RudderStack Once It’s There, Serve it, Do More Processing Open Source: - Flink / Spark / Kafka Streams can be used to save Analytics / ML processed data. - Accelerator can help serve data as DynamoDB via REST. - Several GraphQL Solutions Available Let’s Send It Back via Reverse ETL! Open Source: - Grouparoo / Airbyte , RudderStack are free. Others are paid. - You can always use Kafka Connect / Pulsar IO to send data back also. Reverse ETL is the process of copying data from a warehouse into business applications like CRM, analytics, and marketing automation software. You perform this process by using a reverse ETL tool that integrates with your data source and your business SaaS tools. - Segment Blog
  32. Let’s Put It All Together Now - ONE DATA FABRIC Still need design, but hopefully less useless plumbing code. One cluster, many workloads. With any other pure relational database, this would be problematic. With ScyllaDB, this is a core feature.
  33. Key Takeaways for Open Data Platforms Don’t reinvent the wheel. Identify the Objectives Prioritize DevOps / DataOps Use open tools that are well supported Document the STACK - Identify the objectives so that you know what success looks like. - DevOps / DataOps combined with a true agile approach allows you to iterate your platform quickly. - Put the data into ScyllaDB, and possibly archive it into Parquet/Iceberg (historical data) - Get the data out to your Systems using “Reverse ETL” tools.
  34. Thank You and Dream Big Check us out - Design Workshops - Innovation Sprints - Service Catalog - Big Data DLM Toolkit Anant.us - Read our Playbook - Join our Mailing List - Read up on Data Platforms - Watch our Videos - Download Examples Weekly Webinars - Data Engineer’s Lunch - Cassandra Lunch
  35. Thank You Stay in Touch Rahul Singh rahul.singh@anant.us @xingh xingh https://www.linkedin.com/in/xingh
Anzeige