SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
August 8, 2012




Cassandra at eBay
    Time left: 29m 59s




                     Jay Patel
                     Architect, Platform Systems
                     @pateljay3001
eBay Marketplaces
 97 million active buyers and sellers
 200+ million items
 2 billion page views each day
 80 billion database calls each day
 5+ petabytes of site storage capacity
 80+ petabytes of analytics storage capacity

                                                2
How do we scale databases?
 Shard
   – Patterns: Modulus, lookup-based, range, etc.
   – Application sees only logical shard/database
 Replicate
   – Disaster recovery, read availability/scalability
 Big NOs
   – No transactions
   – No joins
   – No referential integrity constraints
                                                        3
We like Cassandra
 Multi-datacenter (active-active)    Write performance
 Availability - No SPOF              Distributed counters
 Scalability                         Hadoop support



We also utilize MongoDB & HBase




                                                              4
Are we replacing RDBMS with NoSQL?

          Not at all! But, complementing.
 Some use cases don’t fit well - sparse data, big data, schema
  optional, real-time analytics, …
 Many use cases don’t need top-tier set-ups - logging, tracking, …




                                                                  5
A glimpse on our Cassandra deployment
 Dozens of nodes across multiple clusters
 200 TB+ storage provisioned
 400M+ writes & 100M+ reads per day, and growing
 QA, LnP, and multiple Production clusters




                                                    6
Use Cases on Cassandra
      Social Signals on eBay product & item pages
      Hunch taste graph for eBay users & items
      Time series use cases (many):
     Mobile notification logging and tracking
     Tracking for fraud detection
     SOA request/response payload logging
     RedLaser server logs and analytics

                                                    7
Served by
Cassandra




            8
Manage signals via “Your Favorites”




                                      Whole page is
                                      served by
                                      Cassandra




                                                9
Why Cassandra for Social Signals?
 Need scalable counters
 Need real (or near) time analytics on collected social data
 Need good write performance
 Reads are not latency sensitive




                                                                10
Deployment
                 User request has no datacenter affinity


                           Non-sticky load balancing




Topology - NTS           Data is backed up periodically
RF - 2:2                 to protect against human or
Read CL - ONE            software error
Write CL – ONE

                                                       11
Data Model
             depends on query patterns




                                         12
Data Model (simplified)




                          13
Wait…



                    Duplicates!




        Oh, toggle button!
        Signal --> De-signal --> Signal…
                                       14
Yes, eventual consistency!
One scenario that produces duplicate signals in UserLike CF:
   1. Signal
   2. De-signal (1st operation is not propagated to all replica)
   3. Signal, again (1st operation is not propagated yet!)



 So, what’s the solution? Later…

                                                                   15
Social Signals, next phase: Real-time Analytics
 Most signaled or popular items per affinity groups (category, etc.)
 Aggregated item count per affinity group



                                                     Example affinity group




                                                                              16
Initial Data Model for real-time analytics

                                               Items in an affinitygroup
                                               is physically stored
                                               sorted by their signal
                                               count




                           Update counters for both individual item
                           and all the affinity groups that item
                           belongs to
Deployment, next phase




Topology - NTS
RF - 2:2:2
user1       bid
                                  item1
        buy

item2         watch               sell
                        user2




                                          19
Graph in Cassandra
Event consumers listen for site events (sell/bid/buy/watch) & populate graph in Cassandra




   30 million+ writes daily                Batch-oriented reads
   14 billion+ edges already                (for taste vector updates)
                                                                                    20
 Mobile notification logging and tracking
 Tracking for fraud detection
 SOA request/response payload logging
 RedLaser server logs and analytics




                                             21
A glimpse on Data Model
RedLaser tracking & monitoring console




                                         23
That’s all about the use cases..
Remember the duplicate problem in Use Case #1?




  Let’s see some options we considered to solve this…
                                                    24
Option 1 – Make ‘Like’ idempotent for UserLike
 Remove time (timeuuid) from the composite column name:
    Multiple signal operations are now Idempotent
    No need to read before de-signaling (deleting)




    X            Need timeuuid for ordering!
                 Already have a user with more than 1300 signals   25
Option 2 – Use strong consistency

 Local Quorum
  – Won’t help us. User requests are not geo-load balanced
    (no DC affinity).
 Quorum
  – Won’t survive during partition between DCs (or, one of the
    DC is down). Also, adds additional latency.

              X      Need to survive!
                                                             26
Option 3 – Adapt to eventual consistency
If desire survival!




                                                                              27
                      http://www.strangecosmos.com/content/item/101254.html
Adjustments to eventual consistency
 De-signal steps:
      – Don’t check whether item is already signaled by a user, or not
      – Read all (duplicate) signals from UserLike_unordered (new CF to avoid reading
        whole row from UserLike)
      – Delete those signals from UserLike_unordered and UserLike




Still, can get duplicate signals or false positives as there is a ‘read before delete’.
To shield further, do ‘repair on read’.                  Not a full story!
                                                                                     28
Lessons & Best Practices
• Choose proper Replication Factor and Consistency Level.
    – They alter latency, availability, durability, consistency and cost.
    – Cassandra supports tunable consistency, but remember strong consistency is not free.
• Consider all overheads in capacity planning.
    – Replicas, compaction, secondary indexes, etc.
• De-normalize and duplicate for read performance.
    – But don’t de-normalize if you don’t need to.
• Many ways to model data in Cassandra.
    – The best way depends on your use case and query patterns.
                More on http://ebaytechblog.com?p=1308
Thank You
  @pateljay3001
  #cassandra12
                  30

Weitere ähnliche Inhalte

Was ist angesagt?

Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compactionMIJIN AN
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL DatabasesDerek Stainer
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGuang Xu
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisArnab Mitra
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBaseHBaseCon
 
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016DataStax
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraRobert Stupp
 
DNS Security Presentation ISSA
DNS Security Presentation ISSADNS Security Presentation ISSA
DNS Security Presentation ISSASrikrupa Srivatsan
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache CassandraDataStax
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDBMongoDB
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysisAmazon Web Services
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's EvolutionDataWorks Summit
 

Was ist angesagt? (20)

Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
DNS Security Presentation ISSA
DNS Security Presentation ISSADNS Security Presentation ISSA
DNS Security Presentation ISSA
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysis
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
 

Andere mochten auch

Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandraNguyen Quang
 
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax EnterpriseSolr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax EnterpriseDataStax Academy
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraAdrian Cockcroft
 
Cql – cassandra query language
Cql – cassandra query languageCql – cassandra query language
Cql – cassandra query languageCourtney Robinson
 
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...DataStax Academy
 
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...DataStax
 

Andere mochten auch (6)

Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax EnterpriseSolr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global Cassandra
 
Cql – cassandra query language
Cql – cassandra query languageCql – cassandra query language
Cql – cassandra query language
 
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
 
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
 

Ähnlich wie Cassandra at eBay - Cassandra Summit 2012

Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...
Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...
Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...SL Corporation
 
Cassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and LimitationsCassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and LimitationsPanagiotis Papadopoulos
 
Stream Processing with CompletableFuture and Flow in Java 9
Stream Processing with CompletableFuture and Flow in Java 9Stream Processing with CompletableFuture and Flow in Java 9
Stream Processing with CompletableFuture and Flow in Java 9Trayan Iliev
 
Real World Cassandra
Real World CassandraReal World Cassandra
Real World CassandraGiltTech
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applicationsDing Li
 
The 5 Stages of Scale
The 5 Stages of ScaleThe 5 Stages of Scale
The 5 Stages of Scalexcbsmith
 
How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastMapR Technologies
 
Dynamo Systems - QCon SF 2012 Presentation
Dynamo Systems - QCon SF 2012 PresentationDynamo Systems - QCon SF 2012 Presentation
Dynamo Systems - QCon SF 2012 PresentationShanley Kane
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...DataStax Academy
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists jlacefie
 
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudHive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudJaipaul Agonus
 
Apache Cassandra overview
Apache Cassandra overviewApache Cassandra overview
Apache Cassandra overviewElifTech
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013Michael Hiskey
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Stavros Kontopoulos
 
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...netvis
 
Monitoring applications on cloud - Indicthreads cloud computing conference 2011
Monitoring applications on cloud - Indicthreads cloud computing conference 2011Monitoring applications on cloud - Indicthreads cloud computing conference 2011
Monitoring applications on cloud - Indicthreads cloud computing conference 2011IndicThreads
 
High-speed, Reactive Microservices 2017
High-speed, Reactive Microservices 2017High-speed, Reactive Microservices 2017
High-speed, Reactive Microservices 2017Rick Hightower
 

Ähnlich wie Cassandra at eBay - Cassandra Summit 2012 (20)

Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...
Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...
Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...
 
Cassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and LimitationsCassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and Limitations
 
Stream Processing with CompletableFuture and Flow in Java 9
Stream Processing with CompletableFuture and Flow in Java 9Stream Processing with CompletableFuture and Flow in Java 9
Stream Processing with CompletableFuture and Flow in Java 9
 
Real World Cassandra
Real World CassandraReal World Cassandra
Real World Cassandra
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
The 5 Stages of Scale
The 5 Stages of ScaleThe 5 Stages of Scale
The 5 Stages of Scale
 
How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and Fast
 
Dynamo Systems - QCon SF 2012 Presentation
Dynamo Systems - QCon SF 2012 PresentationDynamo Systems - QCon SF 2012 Presentation
Dynamo Systems - QCon SF 2012 Presentation
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists
 
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudHive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
 
Azure and cloud design patterns
Azure and cloud design patternsAzure and cloud design patterns
Azure and cloud design patterns
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
Apache Cassandra overview
Apache Cassandra overviewApache Cassandra overview
Apache Cassandra overview
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
 
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
 
Monitoring applications on cloud - Indicthreads cloud computing conference 2011
Monitoring applications on cloud - Indicthreads cloud computing conference 2011Monitoring applications on cloud - Indicthreads cloud computing conference 2011
Monitoring applications on cloud - Indicthreads cloud computing conference 2011
 
High-speed, Reactive Microservices 2017
High-speed, Reactive Microservices 2017High-speed, Reactive Microservices 2017
High-speed, Reactive Microservices 2017
 

Kürzlich hochgeladen

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Cassandra at eBay - Cassandra Summit 2012

  • 1. August 8, 2012 Cassandra at eBay Time left: 29m 59s Jay Patel Architect, Platform Systems @pateljay3001
  • 2. eBay Marketplaces  97 million active buyers and sellers  200+ million items  2 billion page views each day  80 billion database calls each day  5+ petabytes of site storage capacity  80+ petabytes of analytics storage capacity 2
  • 3. How do we scale databases?  Shard – Patterns: Modulus, lookup-based, range, etc. – Application sees only logical shard/database  Replicate – Disaster recovery, read availability/scalability  Big NOs – No transactions – No joins – No referential integrity constraints 3
  • 4. We like Cassandra  Multi-datacenter (active-active)  Write performance  Availability - No SPOF  Distributed counters  Scalability  Hadoop support We also utilize MongoDB & HBase 4
  • 5. Are we replacing RDBMS with NoSQL? Not at all! But, complementing.  Some use cases don’t fit well - sparse data, big data, schema optional, real-time analytics, …  Many use cases don’t need top-tier set-ups - logging, tracking, … 5
  • 6. A glimpse on our Cassandra deployment  Dozens of nodes across multiple clusters  200 TB+ storage provisioned  400M+ writes & 100M+ reads per day, and growing  QA, LnP, and multiple Production clusters 6
  • 7. Use Cases on Cassandra Social Signals on eBay product & item pages Hunch taste graph for eBay users & items Time series use cases (many):  Mobile notification logging and tracking  Tracking for fraud detection  SOA request/response payload logging  RedLaser server logs and analytics 7
  • 9. Manage signals via “Your Favorites” Whole page is served by Cassandra 9
  • 10. Why Cassandra for Social Signals?  Need scalable counters  Need real (or near) time analytics on collected social data  Need good write performance  Reads are not latency sensitive 10
  • 11. Deployment User request has no datacenter affinity Non-sticky load balancing Topology - NTS Data is backed up periodically RF - 2:2 to protect against human or Read CL - ONE software error Write CL – ONE 11
  • 12. Data Model depends on query patterns 12
  • 14. Wait… Duplicates! Oh, toggle button! Signal --> De-signal --> Signal… 14
  • 15. Yes, eventual consistency! One scenario that produces duplicate signals in UserLike CF: 1. Signal 2. De-signal (1st operation is not propagated to all replica) 3. Signal, again (1st operation is not propagated yet!) So, what’s the solution? Later… 15
  • 16. Social Signals, next phase: Real-time Analytics  Most signaled or popular items per affinity groups (category, etc.)  Aggregated item count per affinity group Example affinity group 16
  • 17. Initial Data Model for real-time analytics Items in an affinitygroup is physically stored sorted by their signal count Update counters for both individual item and all the affinity groups that item belongs to
  • 19. user1 bid item1 buy item2 watch sell user2 19
  • 20. Graph in Cassandra Event consumers listen for site events (sell/bid/buy/watch) & populate graph in Cassandra  30 million+ writes daily  Batch-oriented reads  14 billion+ edges already (for taste vector updates) 20
  • 21.  Mobile notification logging and tracking  Tracking for fraud detection  SOA request/response payload logging  RedLaser server logs and analytics 21
  • 22. A glimpse on Data Model
  • 23. RedLaser tracking & monitoring console 23
  • 24. That’s all about the use cases.. Remember the duplicate problem in Use Case #1? Let’s see some options we considered to solve this… 24
  • 25. Option 1 – Make ‘Like’ idempotent for UserLike  Remove time (timeuuid) from the composite column name:  Multiple signal operations are now Idempotent  No need to read before de-signaling (deleting) X Need timeuuid for ordering! Already have a user with more than 1300 signals 25
  • 26. Option 2 – Use strong consistency  Local Quorum – Won’t help us. User requests are not geo-load balanced (no DC affinity).  Quorum – Won’t survive during partition between DCs (or, one of the DC is down). Also, adds additional latency. X Need to survive! 26
  • 27. Option 3 – Adapt to eventual consistency If desire survival! 27 http://www.strangecosmos.com/content/item/101254.html
  • 28. Adjustments to eventual consistency De-signal steps: – Don’t check whether item is already signaled by a user, or not – Read all (duplicate) signals from UserLike_unordered (new CF to avoid reading whole row from UserLike) – Delete those signals from UserLike_unordered and UserLike Still, can get duplicate signals or false positives as there is a ‘read before delete’. To shield further, do ‘repair on read’. Not a full story! 28
  • 29. Lessons & Best Practices • Choose proper Replication Factor and Consistency Level. – They alter latency, availability, durability, consistency and cost. – Cassandra supports tunable consistency, but remember strong consistency is not free. • Consider all overheads in capacity planning. – Replicas, compaction, secondary indexes, etc. • De-normalize and duplicate for read performance. – But don’t de-normalize if you don’t need to. • Many ways to model data in Cassandra. – The best way depends on your use case and query patterns. More on http://ebaytechblog.com?p=1308
  • 30. Thank You @pateljay3001 #cassandra12 30