SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Building Zeotap's Privacy
Compliant Customer Data
Platform (CDP) with ScyllaDB
Shubham Patil, Lead Software Engineer
Safal Pandita, Senior Software Engineer
Presenters
Shubham Patil, Lead Software Engineer
■ Leads the platform engineering team at Zeotap for CDP product suite
■ Responsible for its architecture, design and engineering delivery
■ 6 years of experience building scalable distributed systems
Safal Pandita, Senior Software Engineer
■ Leads the Scylla integrations at Zeotap for CDP product suite
■ 4 years of experience in building scalable distributed systems
About Zeotap
Zeotap is a privacy-focused 360º Customer Data Platform
(CDP) made for privacy-sensitive marketers
■ Enables brands to better understand their customers
- 360º view
■ Built on GCP
■ Native 3P data enrichment from over 130 premium
sources
PRIVACY AND SECURITY IS IN OUR DNA
2018-2021:
Customer Data
Platform
2014-2021:
Stitching Data from 120 companies for
500m customers under Strict EU Privacy
Law For Better Targeting for Brands
https://www.youtube.com/watch?v=XS790sG1Y7I
Vertical: CDP/CIP
What is a Customer Data Platform (CDP) ?
CONSENTED AND ACTIONABLE
TRUSTED GOLDEN RECORDS OF
1P CUSTOMER PROFILES TO
SUPPORT MARKETING GOALS
Data
Unification
Build your single
customer view
Consent
Unification
Unify consent across
user Ids and channels
Client ID
MAID
Email
Phone
Web
Cookies
Other IDs
Marketing Preferences
Consent Purposes
A GOLDEN RECORD
Your own private identity graph
Universal ID
Contract History
Demographics
Loyalty Status
CDP: Unification of all silos
Zeotap’s CDP Tech Requirements
Batch
(Data Onboarding)
Realtime
(Event Orchestration)
Privacy/Compliance
(Consent Mastering)
■ Ingestion of user data from
website interactions in real
time.
■ Real time activation of user
audience.
■ User opt-out, consent
management and mastering
etc.
■ Ingestion of e.g.
CRM/database dumps.
■ Batch activation of user
audience in DMPs
■ Bulk data exports to client
databases/sinks
CDP Tech Matrix
Requirements v1 v2 v3
Multi-regional, Multi-Tenant, Privacy and GDPR compliant deployment
Sub-second/Realtime writes (with BQ streaming inserts)
Sub-second/Realtime reads/deletes (for ‘On The Fly’ User Unification)
Point Lookups
Works for data at every scale (few MegaBs to PetaBs)
Mature and transparent monitoring stack
Supports Spark integration to export data dumps to data lakes
Complete control on sizing of cluster/processing
Supports Encryption: At rest, value level, rotation (RawPII)
Complete control on underlying data model and scans
Simple SQL-like query capabilities
Enterprise Support
Before Scylla : CDP v1.0
CDP Tech Matrix Review
Requirements v1 v2 v3
Multi-regional, Multi-Tenant, Privacy and GDPR compliant deployment ✅
Sub-second/Realtime writes (with BQ streaming inserts) ✅
Sub-second/Realtime reads/deletes (for ‘On The Fly’ User Unification) ❌
Point Lookups ❌
Works for data at every scale (few MegaBs to PetaBs) ✅
Mature and transparent monitoring stack ❌
Supports Spark integration to export data dumps to data lakes ✅
Complete control on sizing of cluster/processing ❌
Supports Encryption: At rest, value level, rotation (RawPII) ✅
Complete control on underlying data model and scans ✅
Simple SQL-like query capabilities ✅
Enterprise Support ✅
Before Scylla : CDP v2.0
CDP Tech Matrix Review
Requirements v1 v2 v3
Multi-regional, Multi-Tenant, Privacy and GDPR compliant deployment ✅ ✅
Sub-second/Realtime writes (with BQ streaming inserts) ✅ ✅
Sub-second/Realtime reads/deletes (for ‘On The Fly’ User Unification) ❌ ✅
Point Lookups ❌ ❌
Works for data at every scale (few MegaBs to PetaBs) ✅ ❌
Mature and transparent monitoring stack ❌ ❌
Supports Spark integration to export data dumps to data lakes ✅ ✅
Complete control on sizing of cluster/processing ❌ ✅
Supports Encryption: At rest, value level, rotation (RawPII) ✅ ✅
Complete control on underlying data model and scans ✅ ❌
Simple SQL-like query capabilities ✅ ❌
Enterprise Support ✅ ❌
With Scylla : CDP v3.0
CDP Tech Matrix Review
Requirements v1 v2 v3
Multi-regional, Multi-Tenant, Privacy and GDPR compliant deployment ✅ ✅ ✅
Sub-second/Realtime writes (with BQ streaming inserts) ✅ ✅ ✅
Sub-second/Realtime reads/deletes (for ‘On The Fly’ User Unification) (600 ms in JG vs 30ms in Scylla) ❌ ✅ ✅
Point Lookups ❌ ❌ ✅
Works for data at every scale (few MegaBs to PetaBs) ✅ ❌ ✅
Mature and transparent monitoring stack ❌ ❌ ✅
Supports Spark integration to export data dumps to data lakes ✅ ✅ ✅
Complete control on sizing of cluster/processing ❌ ✅ ✅
Supports Encryption: At rest, value level, rotation (RawPII) ✅ ✅ ✅
Complete control on underlying data model and scans ✅ ❌ ✅
Simple SQL-like query capabilities ✅ ❌ ✅
Enterprise Support ✅ ❌ ✅
The Data Model
Requirements from a User Store
■ On-The-Fly User Unification (ID Resolution)
■ Fast lookup store with low latencies for both read and write
■ Flexible enough to be used as a Profile/Consent/ID store
■ Needed to be used as a linkage store
■ We needed TTL in a few different ways
• Profiles/Consents (Attributes in a Map)
• ID (Elements of a collection)
• ID Store (row level)
Pattern 2: Find User Profiles by UCID
Pattern 3: Stamp UCID in Id Store
Pattern 4: Insert profiles/consents/preferences in User Store
Read/Write Query Patterns
Pattern 1: Read IdStore by Id Type and Value
■ Isolation for each client achieved through keyspaces
■ Separate clusters for each region (EU, US, IN, UK)
■ Each keyspace could have a different schema
■ RF = 3, ICS Compaction, CL=QUORUM for Read/Write
■ Single table - acts as our profile, consent and
linkage store
Data Model v1.0
Problems faced
■ Batch sizes became a bottleneck since our
transactions needed to be atomic across partitions.
We crossed the recommended limit of ~100K per
batch.
■ Collection sizes started increasing beyond the
recommended size of ~1MB
■ Latencies worsened due to large batches and
collection sizes. In some cases, queries started timing
out
Applied Solutions
■ Split queries into multiple batches with multiple
retries each batch.
■ Use Prepared statements to improve
performance
■ Use TTL to keep total volume under check
Bottlenecks - Hot Rows
■ Storing linkages in a collection became a
bottleneck due to our increasing scale
■ Going beyond recommended ~1MB per collection
reduces latency SLAs
■ Collections go through a
serialization/deserialization step in Scylla which
makes them slower compared to other data types
Data Model v2.0
Updated Data Model
■ Queries that were timing out earlier(>10s) due to high linkages started succeeding
within our SLA’s (~30ms)
■ Separate linkages store - TTL’s easier to maintain on rows which was earlier
complicated on individual elements of a collection
■ No arbitrary limit on the number of linkages(~1MB) which allowed us to scale more
effectively
Production Gotchas - PK Migration
■ Problem : No easy way to migrate your primary key once the data is live in the tables.
■ Solution : Use Scylla Migrator to move the data to intermediate/temporary table with the
required schema.
• Since we wanted to reuse the names of our original tables (You can’t rename a
table), we had to copy the SSTables from our migrated schema.
• Lesson : Choose your PK wisely
Production Gotchas - Schema Corruption
■ Problem : Schemas can get corrupted while copying SSTables. Schema settlement under load can
sometimes take more than a minute and can cause cluster to crash.
■ Solutions
• Always check that your schema is correctly replicated on all nodes before attempting
SSTable copying.
• Ask the scylla team/manually SSH/write your own service around cqlsh
• Scylla team resolved our issue by restoring our snapshots and redoing the migration for the
affected schemas.
• Lesson : ALWAYS BACKUP YOUR DATA
Production Setup
4 Clusters (EU, UK,
IN, US) - 6
n2-higmem-64 nodes
- Scylla v2021.1.5
130+
client/keyspaces
being managed
Max 60K QPS
30 ms avg. read
10 ms avg. write
5.4 TBs - data
ingested
50 GBs - Max
keyspace
Future Plans
■ Microservices around handling Schema corruption and updates
■ Explore LightWeight Transactions (LWT’s) for consistency guarantees
■ Explore encrypted data rotation w/o blocking real time writes
Thank you!
Stay in touch
Shubham Patil & Safal Pandita
/itsshubhpatil, /safalpandita
patil.sm17@gmail.com
safalpandita@gmail.com

Weitere ähnliche Inhalte

Was ist angesagt?

Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward
 
Survey of High Performance NoSQL Systems
Survey of High Performance NoSQL SystemsSurvey of High Performance NoSQL Systems
Survey of High Performance NoSQL SystemsScyllaDB
 
Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About ShardingMongoDB
 
Big Data Platform at Pinterest
Big Data Platform at PinterestBig Data Platform at Pinterest
Big Data Platform at PinterestQubole
 
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWSMatthew (정재화)
 
Grokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking VN
 
Big Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media AnalyticsBig Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media Analyticshafeeznazri
 
Distributed SQL Databases Deconstructed
Distributed SQL Databases DeconstructedDistributed SQL Databases Deconstructed
Distributed SQL Databases DeconstructedYugabyte
 
Grokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKIGrokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKIGrokking VN
 
Battle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveBattle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveYingjun Wu
 
High Concurrency Architecture at TIKI
High Concurrency Architecture at TIKIHigh Concurrency Architecture at TIKI
High Concurrency Architecture at TIKINghia Minh
 
쿠키런 1년, 서버개발 분투기
쿠키런 1년, 서버개발 분투기쿠키런 1년, 서버개발 분투기
쿠키런 1년, 서버개발 분투기Brian Hong
 
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low LatencyAggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low LatencyScyllaDB
 
Kafka: All an engineer needs to know
Kafka: All an engineer needs to knowKafka: All an engineer needs to know
Kafka: All an engineer needs to knowThao Huynh Quang
 
Hardening Kafka Replication
Hardening Kafka Replication Hardening Kafka Replication
Hardening Kafka Replication confluent
 
Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...
Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...
Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...Flink Forward
 
HandsOn ProxySQL Tutorial - PLSC18
HandsOn ProxySQL Tutorial - PLSC18HandsOn ProxySQL Tutorial - PLSC18
HandsOn ProxySQL Tutorial - PLSC18Derek Downey
 

Was ist angesagt? (20)

Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
 
Survey of High Performance NoSQL Systems
Survey of High Performance NoSQL SystemsSurvey of High Performance NoSQL Systems
Survey of High Performance NoSQL Systems
 
Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About Sharding
 
Big Data Platform at Pinterest
Big Data Platform at PinterestBig Data Platform at Pinterest
Big Data Platform at Pinterest
 
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
 
Grokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous Communications
 
Big Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media AnalyticsBig Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media Analytics
 
Distributed SQL Databases Deconstructed
Distributed SQL Databases DeconstructedDistributed SQL Databases Deconstructed
Distributed SQL Databases Deconstructed
 
Grokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKIGrokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKI
 
Battle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveBattle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWave
 
High Concurrency Architecture at TIKI
High Concurrency Architecture at TIKIHigh Concurrency Architecture at TIKI
High Concurrency Architecture at TIKI
 
쿠키런 1년, 서버개발 분투기
쿠키런 1년, 서버개발 분투기쿠키런 1년, 서버개발 분투기
쿠키런 1년, 서버개발 분투기
 
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low LatencyAggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
 
LMAX Architecture
LMAX ArchitectureLMAX Architecture
LMAX Architecture
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Kafka: All an engineer needs to know
Kafka: All an engineer needs to knowKafka: All an engineer needs to know
Kafka: All an engineer needs to know
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 
Hardening Kafka Replication
Hardening Kafka Replication Hardening Kafka Replication
Hardening Kafka Replication
 
Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...
Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...
Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...
 
HandsOn ProxySQL Tutorial - PLSC18
HandsOn ProxySQL Tutorial - PLSC18HandsOn ProxySQL Tutorial - PLSC18
HandsOn ProxySQL Tutorial - PLSC18
 

Ähnlich wie Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platform (CDP) with ScyllaDB

Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data gridBogdan Dina
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)Spark Summit
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalAvere Systems
 
Lessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsLessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsClaudiu Barbura
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
Presentation mongo db munich
Presentation mongo db munichPresentation mongo db munich
Presentation mongo db munichMongoDB
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon RedshiftAmazon Web Services
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaRicardo Bravo
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreDataStax Academy
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsYong Feng
 
My sql cluster case study apr16
My sql cluster case study apr16My sql cluster case study apr16
My sql cluster case study apr16Sumi Ryu
 
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCPSimpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCPDaniel Zivkovic
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Spark Summit
 
DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIsCisco DevNet
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics PlatformSantanu Dey
 

Ähnlich wie Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platform (CDP) with ScyllaDB (20)

Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data grid
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute final
 
Lessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsLessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatterns
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
Presentation mongo db munich
Presentation mongo db munichPresentation mongo db munich
Presentation mongo db munich
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
 
My sql cluster case study apr16
My sql cluster case study apr16My sql cluster case study apr16
My sql cluster case study apr16
 
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCPSimpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
 
DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIs
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 

Mehr von ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 

Mehr von ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Kürzlich hochgeladen

Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 

Kürzlich hochgeladen (20)

Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platform (CDP) with ScyllaDB

  • 1. Building Zeotap's Privacy Compliant Customer Data Platform (CDP) with ScyllaDB Shubham Patil, Lead Software Engineer Safal Pandita, Senior Software Engineer
  • 2. Presenters Shubham Patil, Lead Software Engineer ■ Leads the platform engineering team at Zeotap for CDP product suite ■ Responsible for its architecture, design and engineering delivery ■ 6 years of experience building scalable distributed systems Safal Pandita, Senior Software Engineer ■ Leads the Scylla integrations at Zeotap for CDP product suite ■ 4 years of experience in building scalable distributed systems
  • 3. About Zeotap Zeotap is a privacy-focused 360º Customer Data Platform (CDP) made for privacy-sensitive marketers ■ Enables brands to better understand their customers - 360º view ■ Built on GCP ■ Native 3P data enrichment from over 130 premium sources PRIVACY AND SECURITY IS IN OUR DNA 2018-2021: Customer Data Platform 2014-2021: Stitching Data from 120 companies for 500m customers under Strict EU Privacy Law For Better Targeting for Brands https://www.youtube.com/watch?v=XS790sG1Y7I
  • 5. What is a Customer Data Platform (CDP) ? CONSENTED AND ACTIONABLE TRUSTED GOLDEN RECORDS OF 1P CUSTOMER PROFILES TO SUPPORT MARKETING GOALS Data Unification Build your single customer view Consent Unification Unify consent across user Ids and channels Client ID MAID Email Phone Web Cookies Other IDs Marketing Preferences Consent Purposes A GOLDEN RECORD Your own private identity graph Universal ID Contract History Demographics Loyalty Status CDP: Unification of all silos
  • 6.
  • 7. Zeotap’s CDP Tech Requirements Batch (Data Onboarding) Realtime (Event Orchestration) Privacy/Compliance (Consent Mastering) ■ Ingestion of user data from website interactions in real time. ■ Real time activation of user audience. ■ User opt-out, consent management and mastering etc. ■ Ingestion of e.g. CRM/database dumps. ■ Batch activation of user audience in DMPs ■ Bulk data exports to client databases/sinks
  • 8. CDP Tech Matrix Requirements v1 v2 v3 Multi-regional, Multi-Tenant, Privacy and GDPR compliant deployment Sub-second/Realtime writes (with BQ streaming inserts) Sub-second/Realtime reads/deletes (for ‘On The Fly’ User Unification) Point Lookups Works for data at every scale (few MegaBs to PetaBs) Mature and transparent monitoring stack Supports Spark integration to export data dumps to data lakes Complete control on sizing of cluster/processing Supports Encryption: At rest, value level, rotation (RawPII) Complete control on underlying data model and scans Simple SQL-like query capabilities Enterprise Support
  • 9. Before Scylla : CDP v1.0
  • 10. CDP Tech Matrix Review Requirements v1 v2 v3 Multi-regional, Multi-Tenant, Privacy and GDPR compliant deployment ✅ Sub-second/Realtime writes (with BQ streaming inserts) ✅ Sub-second/Realtime reads/deletes (for ‘On The Fly’ User Unification) ❌ Point Lookups ❌ Works for data at every scale (few MegaBs to PetaBs) ✅ Mature and transparent monitoring stack ❌ Supports Spark integration to export data dumps to data lakes ✅ Complete control on sizing of cluster/processing ❌ Supports Encryption: At rest, value level, rotation (RawPII) ✅ Complete control on underlying data model and scans ✅ Simple SQL-like query capabilities ✅ Enterprise Support ✅
  • 11. Before Scylla : CDP v2.0
  • 12. CDP Tech Matrix Review Requirements v1 v2 v3 Multi-regional, Multi-Tenant, Privacy and GDPR compliant deployment ✅ ✅ Sub-second/Realtime writes (with BQ streaming inserts) ✅ ✅ Sub-second/Realtime reads/deletes (for ‘On The Fly’ User Unification) ❌ ✅ Point Lookups ❌ ❌ Works for data at every scale (few MegaBs to PetaBs) ✅ ❌ Mature and transparent monitoring stack ❌ ❌ Supports Spark integration to export data dumps to data lakes ✅ ✅ Complete control on sizing of cluster/processing ❌ ✅ Supports Encryption: At rest, value level, rotation (RawPII) ✅ ✅ Complete control on underlying data model and scans ✅ ❌ Simple SQL-like query capabilities ✅ ❌ Enterprise Support ✅ ❌
  • 13. With Scylla : CDP v3.0
  • 14. CDP Tech Matrix Review Requirements v1 v2 v3 Multi-regional, Multi-Tenant, Privacy and GDPR compliant deployment ✅ ✅ ✅ Sub-second/Realtime writes (with BQ streaming inserts) ✅ ✅ ✅ Sub-second/Realtime reads/deletes (for ‘On The Fly’ User Unification) (600 ms in JG vs 30ms in Scylla) ❌ ✅ ✅ Point Lookups ❌ ❌ ✅ Works for data at every scale (few MegaBs to PetaBs) ✅ ❌ ✅ Mature and transparent monitoring stack ❌ ❌ ✅ Supports Spark integration to export data dumps to data lakes ✅ ✅ ✅ Complete control on sizing of cluster/processing ❌ ✅ ✅ Supports Encryption: At rest, value level, rotation (RawPII) ✅ ✅ ✅ Complete control on underlying data model and scans ✅ ❌ ✅ Simple SQL-like query capabilities ✅ ❌ ✅ Enterprise Support ✅ ❌ ✅
  • 16. Requirements from a User Store ■ On-The-Fly User Unification (ID Resolution) ■ Fast lookup store with low latencies for both read and write ■ Flexible enough to be used as a Profile/Consent/ID store ■ Needed to be used as a linkage store ■ We needed TTL in a few different ways • Profiles/Consents (Attributes in a Map) • ID (Elements of a collection) • ID Store (row level)
  • 17. Pattern 2: Find User Profiles by UCID Pattern 3: Stamp UCID in Id Store Pattern 4: Insert profiles/consents/preferences in User Store Read/Write Query Patterns Pattern 1: Read IdStore by Id Type and Value
  • 18. ■ Isolation for each client achieved through keyspaces ■ Separate clusters for each region (EU, US, IN, UK) ■ Each keyspace could have a different schema ■ RF = 3, ICS Compaction, CL=QUORUM for Read/Write ■ Single table - acts as our profile, consent and linkage store Data Model v1.0
  • 19. Problems faced ■ Batch sizes became a bottleneck since our transactions needed to be atomic across partitions. We crossed the recommended limit of ~100K per batch. ■ Collection sizes started increasing beyond the recommended size of ~1MB ■ Latencies worsened due to large batches and collection sizes. In some cases, queries started timing out
  • 20. Applied Solutions ■ Split queries into multiple batches with multiple retries each batch. ■ Use Prepared statements to improve performance ■ Use TTL to keep total volume under check
  • 21. Bottlenecks - Hot Rows ■ Storing linkages in a collection became a bottleneck due to our increasing scale ■ Going beyond recommended ~1MB per collection reduces latency SLAs ■ Collections go through a serialization/deserialization step in Scylla which makes them slower compared to other data types
  • 23. Updated Data Model ■ Queries that were timing out earlier(>10s) due to high linkages started succeeding within our SLA’s (~30ms) ■ Separate linkages store - TTL’s easier to maintain on rows which was earlier complicated on individual elements of a collection ■ No arbitrary limit on the number of linkages(~1MB) which allowed us to scale more effectively
  • 24. Production Gotchas - PK Migration ■ Problem : No easy way to migrate your primary key once the data is live in the tables. ■ Solution : Use Scylla Migrator to move the data to intermediate/temporary table with the required schema. • Since we wanted to reuse the names of our original tables (You can’t rename a table), we had to copy the SSTables from our migrated schema. • Lesson : Choose your PK wisely
  • 25. Production Gotchas - Schema Corruption ■ Problem : Schemas can get corrupted while copying SSTables. Schema settlement under load can sometimes take more than a minute and can cause cluster to crash. ■ Solutions • Always check that your schema is correctly replicated on all nodes before attempting SSTable copying. • Ask the scylla team/manually SSH/write your own service around cqlsh • Scylla team resolved our issue by restoring our snapshots and redoing the migration for the affected schemas. • Lesson : ALWAYS BACKUP YOUR DATA
  • 26. Production Setup 4 Clusters (EU, UK, IN, US) - 6 n2-higmem-64 nodes - Scylla v2021.1.5 130+ client/keyspaces being managed Max 60K QPS 30 ms avg. read 10 ms avg. write 5.4 TBs - data ingested 50 GBs - Max keyspace
  • 27. Future Plans ■ Microservices around handling Schema corruption and updates ■ Explore LightWeight Transactions (LWT’s) for consistency guarantees ■ Explore encrypted data rotation w/o blocking real time writes
  • 28. Thank you! Stay in touch Shubham Patil & Safal Pandita /itsshubhpatil, /safalpandita patil.sm17@gmail.com safalpandita@gmail.com