Developing Enterprise Consciousness: Building Modern Open Data Platforms

Developing Enterprise
Consciousness: Building
Modern Open Data Platforms
Rahul Singh, CEO

Rahul Singh
■ Built hosting companies and data centers in high-school.
(Servers, Switches, DNS, etc.)
■ Built and managed CMS/KMS, Portals, SaaS apps for clients
(.NET, Java, SQL Server, MySQL).
■ Dove deep into big data to get better at enterprise search for
massive knowledge / content systems. (Spark, Scala,
Cassandra, Solr, Elastic)
■ Focus now on global scale real-time data platforms for large
organizations or organizations with a large audience.
■ Published Cassandra.Link, Cassandra.Tools, more on the way.

■ Playbook
■ Design
■ Framework
■ Approach
Presentation Agenda
■ Use Cases
■ Migration
■ Standard Data Fabric
■ Cloud vs. Open Core

Business / Platform Dream
Enterprise Consciousness :
- People
- Processes,
- Information
- Systems
Connected / Synchronized.
Business has been chasing this
dream for a while. As technologies
improve, this becomes more
accessible.
Image Source: Digital Business
Technology Platforms, Gartner 2016

Going Beyond “Reactive Manifesto” / 12 Factor
References: https:/
/12factor.net/,
https:/
/www.reactivemanifesto.org/
- Current Business Information is available to People in the swiftest way
possible within the bounds of reasonable costs.
- Business Information is generally available to the enterprise, siloed only by
security and governance.
- Data platforms make use of appropriate resources for hot vs. cold, raw vs.
enhanced data.
- Data platforms are always available, redundant, always trying to achieve a
RPO/RTO of zero.

Challenges of Managing
Data Platforms in a
Growing Enterprise

Phases of Business Modularity
Business
Silos
Standardized
Platform
Optimized
Core
Business
Modularity
Optimized Core enabled Business Modularity
This process needs to
be done in sequence.
Otherwise we end up
having to redo the
work.

Generic Data Platform Operations

How Distributed Data Helps Transformation
XDCR: Cross datacenter
replication is the ultimate
data fabric.
Resilience, performance,
availability, and scale.
Made widely available by
Cassandra and Couchbase,
expanded and accelerated
by ScyllaDB

Design
Contexts
Responsibilities
Approach
Framework
Tools

So Many Different “Modern Stacks?”
Lots of “reference” architectures
available. They tend not to think about the
speed layer since they are focusing on
batch. What about SPEED?

How Do You Choose From the Landscape?
Lots and lots of components in the Data &
AI Landscape. Which ones are the right
ones for your business?

Playbook for Modern Open Data Platform
Platform Design
Discovery (Inventory)
- People
- Process
- Information (Objects)
- Systems (Apps)
Evaluate Framework
User Experience
- No-Code/Low Code Apps/Form Builders
- Automatic API Generator/Platform
- Customer App/API Framework
Cloud
- Public
- Private
- Hybrid
Data
- Data:Object
- Data:Stream
- Data:Table
- Data:Index
- Processor:Batch
- Processor:Stream
DevOps
- Infrastructure as Code
- Systems Automation
- Application CICD
DataOps
- ETL/ELT/EtLT
- Reverse ETL
- Orchestration
Execute Approach
Architecture (Design)
- Cloud
- Data
- DevOps
- DataOps
Engineering
- Conﬁguration
- Scripting
- Programming
Operation
- Setup / Deploy
- Monitoring/Alerts
- Administration

Design
Distributed
Realtime
Extendable / Open
Automated
Monitored / Managed

Public Cloud Native - Amazon
100% Serverless Data Platform Architecture on Amazon AWS

Public Cloud Native - Microsoft
100% Azure + Azure Databricks Data
Platform Architecture on Microsoft
Azure Cloud

Public Cloud Native - Google
100% GCP / Google
Data Platform
Architecture on GCP

Use Case: Optimizing
Distributed Data with
Cloud vs. Open Core

Open Core Distributed Data Platform
To create globally distributed and real time platforms, we need to use
distributed realtime technologies to build your platform. Here are some.
Which ones should you choose?

Open Core Data Modernization / Automation
/Integration
In addition to vastly scalable tools, there are also modern
innovations that can help teams automate and maximize
human capital by making data platform management easier.

Framework Components
■ Major Components
■ Persistent Queues (RAM/BUS)
■ Queue Processing & Compute (CPU)
■ Persistent Storage (DISK/RAM)
■ Reporting Engine (Display)
■ Orchestration Framework (Motherboard)
■ Scheduler (Operating System)
■ Strategies
■ Cloud Native on Google
■ Self-Managed Open Source
■ Self-Managed Commercial Source
■ Managed Commercial Source
Customers want options, so we decided to create a
Framework that can scale with whatever Infrastructure
and Software strategy they want to use.

Approach
Setup
Training
Administration
Conﬁguration
Knowledge

Sample STACK Outline
Framework
Platform
Components
Resources
Platform
Setup
Training
Administration
Conﬁguration
Knowledge
● Components
○ Infrastructure
■ Source / Git
■ Github
■ Gitlab
■ Cloud / Public
■ AWS
■ Azure
■ GCP
■ DO
■ Orchestration
■ Terraform
■ Terraform / Atlanits
■ Configuration
■ Ansible
■ Ansible / AWX / Semaphore
○ Compute
■ Datastax / Spark
■ Datastax / Livy
■ Databricks
○ Data / Open Core
■ Datastax Enterprise
■ Cassandra
■ Search / Solr
■ Graph
■ Confluent Platform
○ Data / Cloud
■ Datastax / Astra
■ Confluent Cloud
○ Data / Open Source
■ Cassandra
■ Kafka
■ Elassandra
■ YugaByteDB
■ ScyllaDB
■ Pulsar
○ Application
■ Airflow
■ Airbyte
■ Kafka Streams
■ Jupyter
■ Redash
■ Metabase
■ Superset
■ Zeppelin

Use Case: Standard
Data Fabric

How ScyllaDB Allows Us To Go Further…
All the beneﬁts of XDCR and ….
- More Data Density at High Speed /
Multiple Workloads on the Same
Datacenter
- Better Memory / CPU management
due to C++ Seastar Framework,
Faster Caches
- CQL Queries to support Non
Relational / C* CQL like queries.
- DynamoDB Queries to support
legacy Dynamo
- Transactions/Consistency
- …

Let’s Get Data Into ScyllaDB - Easier
Today
Open Source:
- Airbyte / RudderStack makes
ETL Easier and are open source
- Kafka Connect / Pulsar IO can
convert ETL into Streaming ETL
SaaS/PaaS:
- SaaS like Stitch/HevoData/Make
- Supported versions of Airbyte/RudderStack
Once It’s There, Serve it, Do More
Processing
Open Source:
- Flink / Spark / Kafka Streams
can be used to save Analytics /
ML processed data.
- Accelerator can help serve data
as DynamoDB via REST.
- Several GraphQL Solutions
Available
Let’s Send It Back via Reverse ETL!
Open Source:
- Grouparoo / Airbyte ,
RudderStack are free. Others
are paid.
- You can always use Kafka
Connect / Pulsar IO to send
data back also.
Reverse ETL is the process of copying data from a warehouse into business applications like CRM,
analytics, and marketing automation software. You perform this process by using a reverse ETL tool
that integrates with your data source and your business SaaS tools.
- Segment Blog

Let’s Put It All Together Now - ONE DATA
FABRIC
Still need design, but
hopefully less
useless plumbing
code.
One cluster, many workloads.
With any other pure relational
database, this would be
problematic. With ScyllaDB, this
is a core feature.

Key Takeaways for Open Data Platforms
Don’t reinvent the wheel.
Identify the Objectives
Prioritize DevOps / DataOps
Use open tools that are well
supported
Document the STACK
- Identify the objectives so that you
know what success looks like.
- DevOps / DataOps combined with a
true agile approach allows you to
iterate your platform quickly.
- Put the data into ScyllaDB, and
possibly archive it into
Parquet/Iceberg (historical data)
- Get the data out to your Systems using
“Reverse ETL” tools.

Thank You and Dream Big
Check us out
- Design Workshops
- Innovation Sprints
- Service Catalog
- Big Data DLM Toolkit
Anant.us
- Read our Playbook
- Join our Mailing List
- Read up on Data Platforms
- Watch our Videos
- Download Examples
Weekly Webinars
- Data Engineer’s Lunch
- Cassandra Lunch

Thank You
Stay in Touch
Rahul Singh
rahul.singh@anant.us
@xingh
xingh
https://www.linkedin.com/in/xingh

Developing Enterprise Consciousness: Building Modern Open Data Platforms

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Developing Enterprise Consciousness: Building Modern Open Data Platforms

Ähnlich wie Developing Enterprise Consciousness: Building Modern Open Data Platforms (20)

Mehr von ScyllaDB

Mehr von ScyllaDB (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Developing Enterprise Consciousness: Building Modern Open Data Platforms