SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
Cassandra in
            |   Online Advertising:
                Real Time Bidding




the prospect engine for brands.
Who are we?
Costa Sevdinoglou & Edward Capriolo
Impressions look like…
A High Level look at RTB




1. Browsers visit Publishers and create impressions.
2. Publishers sell impressions via Exchanges.
3. Exchanges serve as auction houses for the impressions
4. On behalf of the marketer, m6d bids the impressions via the
   auction house. If m6d wins, we display our ad to the
   browser.
Performance and Data
• Billions and billions of bid requests a day
  • A single request can result in multiple
       Cassandra Operations!
  • One cluster is just under 10TB and growing
• Low latency requirement below 120 ms typical
• Limited data available to m6d via the exchange
Segment Data

Segments are how we assign product or service
affinity to a group of users. User’s we consider to be
like minded with respect to a given brand will be
placed in the same segment.

Segment Data is just one component of our
overarching data model.

Segments help to reduce the number of calculations
we do in real time.
Old Approach for Segment Data
                  Application Nodes
                  (Tomcat + MySQL )
                                                   Limitations
                                                   •Periodically updated.
MySQL Data Push                       Event Logs   •Only subsection of
                                                   the data.
                                                   •Cluster performance
                                                   is effected during a
                                                   data push.
        Aggregation              Hadoop
Cassandra Approach
        for Segment Data

Application Nodes                  Better!
 (Tomcat + Less     •   Updating in real time now
 MySQL Usage)           possible
                    •   Distributed not duplicated
                    •   Less complexity to manage
                    •   Storing more information
                    •   We can now bid on users
   Cassandra            sooner!
One Ring to rule them all




http://askyyy.blog.163.com/blog/static/12345759920104288193
99/
Peer to Peer
            per operation replication
   Fail fast, self-healing
   Each write goes to all natural endpoints
   Hinted handoff if destination is down
   Repair on Read
   No more:
            STOP SLAVE; SET GLOBAL
             SQL_SLAVE_SKIP_COUNTER = 1; START
             SLAVE;
Multi Data Center
 No designing and managing complex replication topologies
 create keyspace world
with placement_strategy =
  'org.apache.cassandra.locator.NetworkTopologyStrategy'
and strategy_options={1:3, 2:3, 3:3};
 The same process as single data center
 No log shipping, or separate processes to run
Monitoring & Management
   Many Many things to monitor with JMX
   Nice command line tools
   Most values can be tweaked at run time
Capacity Planning

   How many
          Rows
          Columns
          Size of Average Column
   Latency requirements
   Throughput read and writes per sec
Unit Tests FTW!
Max 2 billion columns per row

   Awesome
          Unless you accidentally write 2 billion
           columns to a row key named “null”
   Check maxRowSize JMX
   Watch logs for messages about compacting
    large rows
Local (NYC) Meetups

   www.meetup.com/NYC-Cassandra-User-
    Group/

Weitere ähnliche Inhalte

Was ist angesagt?

M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
Edward Capriolo
 

Was ist angesagt? (20)

Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
 
Apache Cassandra Management
Apache Cassandra ManagementApache Cassandra Management
Apache Cassandra Management
 
Seattle Cassandra Meetup - HasOffers
Seattle Cassandra Meetup - HasOffersSeattle Cassandra Meetup - HasOffers
Seattle Cassandra Meetup - HasOffers
 
DynamoDB at HasOffers
DynamoDB at HasOffers DynamoDB at HasOffers
DynamoDB at HasOffers
 
Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large NodesCassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodes
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
 
Архитектура приложений с использованием MySQL, Петр Зайцев (Percona)
Архитектура приложений с использованием MySQL, Петр Зайцев (Percona)Архитектура приложений с использованием MySQL, Петр Зайцев (Percona)
Архитектура приложений с использованием MySQL, Петр Зайцев (Percona)
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed Database
 
Nyc summit intro_to_cassandra
Nyc summit intro_to_cassandraNyc summit intro_to_cassandra
Nyc summit intro_to_cassandra
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
 
Webinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache CassandraWebinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache Cassandra
 
Cloud computing fundamentals with Microsoft Azure
Cloud computing fundamentals with Microsoft AzureCloud computing fundamentals with Microsoft Azure
Cloud computing fundamentals with Microsoft Azure
 
CrateDB - Giacomo Ceribelli
CrateDB - Giacomo CeribelliCrateDB - Giacomo Ceribelli
CrateDB - Giacomo Ceribelli
 
Cassandra
CassandraCassandra
Cassandra
 
Instaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr Apache Cassandra Best Practices & ToubleshootingInstaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr Apache Cassandra Best Practices & Toubleshooting
 
Apache Cassandra Lunch #78: Deploy Cassandra using DSE Operator to Kubernetes
Apache Cassandra Lunch #78: Deploy Cassandra using DSE Operator to KubernetesApache Cassandra Lunch #78: Deploy Cassandra using DSE Operator to Kubernetes
Apache Cassandra Lunch #78: Deploy Cassandra using DSE Operator to Kubernetes
 
Migrating Data Pipeline from MongoDB to Cassandra
Migrating Data Pipeline from MongoDB to CassandraMigrating Data Pipeline from MongoDB to Cassandra
Migrating Data Pipeline from MongoDB to Cassandra
 
Cassandra vs Databases
Cassandra vs Databases Cassandra vs Databases
Cassandra vs Databases
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 

Ähnlich wie Real World Cassandra

Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 

Ähnlich wie Real World Cassandra (20)

C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
 
Why Distributed Databases?
Why Distributed Databases?Why Distributed Databases?
Why Distributed Databases?
 
Scaling graphite for application metrics
Scaling graphite for application metricsScaling graphite for application metrics
Scaling graphite for application metrics
 
Cassandra tw presentation
Cassandra tw presentationCassandra tw presentation
Cassandra tw presentation
 
Azure and cloud design patterns
Azure and cloud design patternsAzure and cloud design patterns
Azure and cloud design patterns
 
Intro to Databases
Intro to DatabasesIntro to Databases
Intro to Databases
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
Internet Scale Architecture
Internet Scale ArchitectureInternet Scale Architecture
Internet Scale Architecture
 
Cassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and LimitationsCassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and Limitations
 
The MySQL High Availability Landscape and where Galera Cluster fits in
The MySQL High Availability Landscape and where Galera Cluster fits inThe MySQL High Availability Landscape and where Galera Cluster fits in
The MySQL High Availability Landscape and where Galera Cluster fits in
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availability
 
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
 
Webinar Slides: Geo-Scale MySQL in AWS
Webinar Slides: Geo-Scale MySQL in AWSWebinar Slides: Geo-Scale MySQL in AWS
Webinar Slides: Geo-Scale MySQL in AWS
 

Mehr von GiltTech (9)

Riak a successful failure
Riak   a successful failureRiak   a successful failure
Riak a successful failure
 
Gotszling mogo db-membase
Gotszling mogo db-membaseGotszling mogo db-membase
Gotszling mogo db-membase
 
Couchdb at AMEX
Couchdb at AMEXCouchdb at AMEX
Couchdb at AMEX
 
Scala for the web Lightning Talk
Scala for the web Lightning TalkScala for the web Lightning Talk
Scala for the web Lightning Talk
 
Clojure Lightning Talk
Clojure Lightning TalkClojure Lightning Talk
Clojure Lightning Talk
 
CoffeeScript Lightning Talk
CoffeeScript Lightning TalkCoffeeScript Lightning Talk
CoffeeScript Lightning Talk
 
Erlang Lightning Talk
Erlang Lightning TalkErlang Lightning Talk
Erlang Lightning Talk
 
Groovy and Grails
Groovy and GrailsGroovy and Grails
Groovy and Grails
 
Java to scala
Java to scalaJava to scala
Java to scala
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 

Real World Cassandra

  • 1. Cassandra in | Online Advertising: Real Time Bidding the prospect engine for brands.
  • 2. Who are we? Costa Sevdinoglou & Edward Capriolo
  • 4. A High Level look at RTB 1. Browsers visit Publishers and create impressions. 2. Publishers sell impressions via Exchanges. 3. Exchanges serve as auction houses for the impressions 4. On behalf of the marketer, m6d bids the impressions via the auction house. If m6d wins, we display our ad to the browser.
  • 5. Performance and Data • Billions and billions of bid requests a day • A single request can result in multiple Cassandra Operations! • One cluster is just under 10TB and growing • Low latency requirement below 120 ms typical • Limited data available to m6d via the exchange
  • 6. Segment Data Segments are how we assign product or service affinity to a group of users. User’s we consider to be like minded with respect to a given brand will be placed in the same segment. Segment Data is just one component of our overarching data model. Segments help to reduce the number of calculations we do in real time.
  • 7. Old Approach for Segment Data Application Nodes (Tomcat + MySQL ) Limitations •Periodically updated. MySQL Data Push Event Logs •Only subsection of the data. •Cluster performance is effected during a data push. Aggregation Hadoop
  • 8. Cassandra Approach for Segment Data Application Nodes Better! (Tomcat + Less • Updating in real time now MySQL Usage) possible • Distributed not duplicated • Less complexity to manage • Storing more information • We can now bid on users Cassandra sooner!
  • 9. One Ring to rule them all http://askyyy.blog.163.com/blog/static/12345759920104288193 99/
  • 10. Peer to Peer per operation replication  Fail fast, self-healing  Each write goes to all natural endpoints  Hinted handoff if destination is down  Repair on Read  No more:  STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE;
  • 11. Multi Data Center  No designing and managing complex replication topologies  create keyspace world with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options={1:3, 2:3, 3:3};  The same process as single data center  No log shipping, or separate processes to run
  • 12. Monitoring & Management  Many Many things to monitor with JMX  Nice command line tools  Most values can be tweaked at run time
  • 13. Capacity Planning  How many  Rows  Columns  Size of Average Column  Latency requirements  Throughput read and writes per sec
  • 15. Max 2 billion columns per row  Awesome  Unless you accidentally write 2 billion columns to a row key named “null”  Check maxRowSize JMX  Watch logs for messages about compacting large rows
  • 16. Local (NYC) Meetups  www.meetup.com/NYC-Cassandra-User- Group/