SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Yieldbot Tech Talk – MongoDB to k/v




                        © 2012 Yieldbot
            © 2012 Yieldbot / CONFIDENTIAL
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


                   What We Do
• Yieldbot technology creates marketplaces where
  advertisers target realtime consumer intent flowing
  through premium publishers.
• At a high level: Analytics + Ad Serving
   – Geo-distributed
      • Data collection
      • Realtime ad matching
   – Cascalog batch analytics
   – Rich Analytics Results visualizations



                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


          Why MongoDB (Dec 2009)
•   Needed manageable by dev team (1 person!)
•   Flexible
•   Easy to get started, run on laptop or deploy
•   Scale wasn’t initially biggest concern
•   Could focus on other stuff
     – Lucene
     – Analytics
     – Ad serving dynamics




                            © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


       How MongoDB Used Initially
• Configuration
   – Publisher profiles, ad matching rules, etc.
• Data collection
   – Pageviews, impressions, clicks
• Analytics results
• Task state tracking
• Lookup tables for ad serving
• Real-time ad stats




                           © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


          Couple Aspects of Note
• Master/Slave
   – convenient for simple durability
   – convenient for geo distribution
   – not unique to Mongo, now similar redis topology
• Indexing
   – Easy to set up, but eventually RAM scaling issue
   – initially great for efficient views of data in UI
   – moved analytics results as key/value in mongo
• Durable sharded config (replica sets) expensive



                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


                 Data Collection
• Mongo: collections for pageviews, impressions, clicks
   – Wasn’t archived anywhere else
   – Not where you want to infinitely scale
• Now flows through redis, to files, to S3




                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


     Data Collection with redis Assist
•   redis lists populated as events come in
•   Daemons pull off lists and write to files
•   Periodically compress and archive files to S3
•   S3 files used for input later
     – Hadoop (Cascalog) batch analytics
     – Advertising Stats Calculations




                            © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


          Matching Lookup Tables
• Mongo: collections for different lookup types
   – Eg., geo, url
   – Built periodically, updated on config change
   – Lookup in each, correlate results
• redis
   – Ability to pipeline operations in single server call
   – Set intersection across lookup dimensions and one
     response back
   – Same master/slave as Mongo for distribution



                           © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


                  Configuration
• Mongo
   – Database per publisher
   – Collections for objects
   – Denormalized where possible
   – Manual Foreign Keys
   – Obviously best candidate for relational model
• History and Versioning was paramount to us
   – Roll our own: HeroDB




                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


                        HeroDB
• History and granular versioning highest goal
• Database built on top of git
   – Golden database is a bare repo
   – Can clone to anywhere, make changes, push
   – Changes in single commit are atomic
• How, when, and who changed it
• Ability to set to specific previous state of DB
• Much more to do, in production 6+ months
   – Recent change, caching



                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


                Analytics Results
• ARCv1, Mongo: indexed collections
   – Very easy to code to
   – Initially with everything else in same server
   – Moved out to dedicated server
   – Memory became an issue
       • Indexes bigger than data itself
   – Overhead of importing Cascalog results
       • Pull json files from S3 to local disk
       • mongoimport files into DB



                           © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


         Analytics Results Cont’d
• ARCv2, Mongo: paged data, key/value
   – Migrated app to key/value access pattern
   – Much better memory usage
   – Application sharded, publishers spread around
   – DB per day per publisher, most recent 7 held
   – Still overhead of importing Hadoop results
      • Pull json files from S3 to local disk
      • mongoimport files into DB




                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


    Analytics Results - ElephantDB
• Cascalog support to directly write EDB format
   – Berkeley DB or LevelDB
• Ring Topology
   – Shards distributed around ring, consistent hashing
   – Configurable replication factor
   – Request to any node, forwards as necessary
   – Incrementally increase ring size
• Import from S3 efficient
   – Copy shard from S3 to local disk



                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


              Real-time Ad Stats
• Mongo: DB per day, collection by entity type
   – Document per entity instance
   – stat_type.hour.minute nested values, atomic
     increment
   – Never a good story around aggregating at larger
     timeframes
• Enter redis again




                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


          Real-time Ad Stats Cont’d
• redis has robust access patterns
    – More pipelining
•   Initially realtime and aggregated kept in redis
•   Issue with redis scaling is DB has to fit in memory
•   Time-period aggregations now kept in HBase
•   Only most recent hours kept in redis




                             © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


              Task State Tracking
• The last holdout
• Collection of tasks
   – Each task is a document
   – Indexed as needed
   – Mongo query and update syntax convenient
       • Both in static code, but also in Python or Mongo
         repl




                           © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


              Honorable Mention
• redis for the celery backend, used for task messaging
  infrastructure
• but was never mongo anyway...




                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


      MongoDB Migration Summary
•   Configuration                     HeroDB
•   Data Collection                   to S3 via redis
•   Analytics Results                 ElephantDB
•   Task State Tracking               still Mongo
•   Matcher Lookup Tables             redis
•   Real-time Ad Stats                redis/HBase




                          © 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012


                       Thanks!



Site: yieldbot.com
Blog: blog.yieldbot.com
Twitter: @yieldbot
Email: info@yieldbot.com




                           © 2012 Yieldbot

Weitere ähnliche Inhalte

Was ist angesagt?

Benefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSsBenefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSsMongoDB
 
The Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBThe Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBMongoDB
 
Webinar: The Visual Query Profiler and MongoDB Compass
Webinar: The Visual Query Profiler and MongoDB CompassWebinar: The Visual Query Profiler and MongoDB Compass
Webinar: The Visual Query Profiler and MongoDB CompassMongoDB
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2Raul Chong
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsSteven Francia
 
Why NoSQL and MongoDB for Big Data
Why NoSQL and MongoDB for Big DataWhy NoSQL and MongoDB for Big Data
Why NoSQL and MongoDB for Big DataWilliam LaForest
 
Introduction to structured authoring
Introduction to structured authoringIntroduction to structured authoring
Introduction to structured authoringRob Hanna, ECMs
 

Was ist angesagt? (8)

Benefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSsBenefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSs
 
The Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBThe Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDB
 
Webinar: The Visual Query Profiler and MongoDB Compass
Webinar: The Visual Query Profiler and MongoDB CompassWebinar: The Visual Query Profiler and MongoDB Compass
Webinar: The Visual Query Profiler and MongoDB Compass
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS Applications
 
Mongodb
MongodbMongodb
Mongodb
 
Why NoSQL and MongoDB for Big Data
Why NoSQL and MongoDB for Big DataWhy NoSQL and MongoDB for Big Data
Why NoSQL and MongoDB for Big Data
 
Introduction to structured authoring
Introduction to structured authoringIntroduction to structured authoring
Introduction to structured authoring
 

Ähnlich wie Yieldbot Tech Talk, Sept 20, 2012

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRavi Teja
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FSMongoDB
 
Introducing MongoDB into your Organization
Introducing MongoDB into your OrganizationIntroducing MongoDB into your Organization
Introducing MongoDB into your OrganizationMongoDB
 
Webinar: How Banks Manage Reference Data with MongoDB
 Webinar: How Banks Manage Reference Data with MongoDB Webinar: How Banks Manage Reference Data with MongoDB
Webinar: How Banks Manage Reference Data with MongoDBMongoDB
 
Mongodb Presentation
Mongodb PresentationMongodb Presentation
Mongodb PresentationHashim Shaikh
 
Branf final bringing mongodb into your organization - mongo db-boston2012
Branf final   bringing mongodb into your organization - mongo db-boston2012Branf final   bringing mongodb into your organization - mongo db-boston2012
Branf final bringing mongodb into your organization - mongo db-boston2012MongoDB
 
Mongodb Presentation
Mongodb PresentationMongodb Presentation
Mongodb PresentationHashim Shaikh
 
Mongodb hashim shaikh
Mongodb hashim shaikhMongodb hashim shaikh
Mongodb hashim shaikhHashim Shaikh
 
MongoDB World 2018: Data Analytics with MongoDB
MongoDB World 2018: Data Analytics with MongoDBMongoDB World 2018: Data Analytics with MongoDB
MongoDB World 2018: Data Analytics with MongoDBMongoDB
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDBMongoDB
 
Neo4j + MongoDB - SF Graph Database Meetup Group Presentation
Neo4j + MongoDB - SF Graph Database Meetup Group PresentationNeo4j + MongoDB - SF Graph Database Meetup Group Presentation
Neo4j + MongoDB - SF Graph Database Meetup Group PresentationWilliam Lyon
 
An Evening with MongoDB Detroit 2013
An Evening with MongoDB Detroit 2013An Evening with MongoDB Detroit 2013
An Evening with MongoDB Detroit 2013MongoDB
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB
 
Augmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure DataAugmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure DataTreasure Data, Inc.
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataTreasure Data, Inc.
 
When and why to use MongoDB?
When and why to use MongoDB?When and why to use MongoDB?
When and why to use MongoDB?adityakumar2080
 
Everything You Need to Know About MongoDB Development.pptx
Everything You Need to Know About MongoDB Development.pptxEverything You Need to Know About MongoDB Development.pptx
Everything You Need to Know About MongoDB Development.pptx75waytechnologies
 

Ähnlich wie Yieldbot Tech Talk, Sept 20, 2012 (20)

Mongo db operations_v2
Mongo db operations_v2Mongo db operations_v2
Mongo db operations_v2
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB Training
MongoDB TrainingMongoDB Training
MongoDB Training
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
 
Introducing MongoDB into your Organization
Introducing MongoDB into your OrganizationIntroducing MongoDB into your Organization
Introducing MongoDB into your Organization
 
Webinar: How Banks Manage Reference Data with MongoDB
 Webinar: How Banks Manage Reference Data with MongoDB Webinar: How Banks Manage Reference Data with MongoDB
Webinar: How Banks Manage Reference Data with MongoDB
 
Mongodb Presentation
Mongodb PresentationMongodb Presentation
Mongodb Presentation
 
Branf final bringing mongodb into your organization - mongo db-boston2012
Branf final   bringing mongodb into your organization - mongo db-boston2012Branf final   bringing mongodb into your organization - mongo db-boston2012
Branf final bringing mongodb into your organization - mongo db-boston2012
 
Mongodb Presentation
Mongodb PresentationMongodb Presentation
Mongodb Presentation
 
Mongodb hashim shaikh
Mongodb hashim shaikhMongodb hashim shaikh
Mongodb hashim shaikh
 
MongoDB World 2018: Data Analytics with MongoDB
MongoDB World 2018: Data Analytics with MongoDBMongoDB World 2018: Data Analytics with MongoDB
MongoDB World 2018: Data Analytics with MongoDB
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
Neo4j + MongoDB - SF Graph Database Meetup Group Presentation
Neo4j + MongoDB - SF Graph Database Meetup Group PresentationNeo4j + MongoDB - SF Graph Database Meetup Group Presentation
Neo4j + MongoDB - SF Graph Database Meetup Group Presentation
 
Mongo bbmw
Mongo bbmwMongo bbmw
Mongo bbmw
 
An Evening with MongoDB Detroit 2013
An Evening with MongoDB Detroit 2013An Evening with MongoDB Detroit 2013
An Evening with MongoDB Detroit 2013
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
 
Augmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure DataAugmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure Data
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
 
When and why to use MongoDB?
When and why to use MongoDB?When and why to use MongoDB?
When and why to use MongoDB?
 
Everything You Need to Know About MongoDB Development.pptx
Everything You Need to Know About MongoDB Development.pptxEverything You Need to Know About MongoDB Development.pptx
Everything You Need to Know About MongoDB Development.pptx
 

Kürzlich hochgeladen

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

Kürzlich hochgeladen (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Yieldbot Tech Talk, Sept 20, 2012

  • 1. Yieldbot Tech Talk – MongoDB to k/v © 2012 Yieldbot © 2012 Yieldbot / CONFIDENTIAL
  • 2. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 What We Do • Yieldbot technology creates marketplaces where advertisers target realtime consumer intent flowing through premium publishers. • At a high level: Analytics + Ad Serving – Geo-distributed • Data collection • Realtime ad matching – Cascalog batch analytics – Rich Analytics Results visualizations © 2012 Yieldbot
  • 3. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Why MongoDB (Dec 2009) • Needed manageable by dev team (1 person!) • Flexible • Easy to get started, run on laptop or deploy • Scale wasn’t initially biggest concern • Could focus on other stuff – Lucene – Analytics – Ad serving dynamics © 2012 Yieldbot
  • 4. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 How MongoDB Used Initially • Configuration – Publisher profiles, ad matching rules, etc. • Data collection – Pageviews, impressions, clicks • Analytics results • Task state tracking • Lookup tables for ad serving • Real-time ad stats © 2012 Yieldbot
  • 5. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Couple Aspects of Note • Master/Slave – convenient for simple durability – convenient for geo distribution – not unique to Mongo, now similar redis topology • Indexing – Easy to set up, but eventually RAM scaling issue – initially great for efficient views of data in UI – moved analytics results as key/value in mongo • Durable sharded config (replica sets) expensive © 2012 Yieldbot
  • 6. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Data Collection • Mongo: collections for pageviews, impressions, clicks – Wasn’t archived anywhere else – Not where you want to infinitely scale • Now flows through redis, to files, to S3 © 2012 Yieldbot
  • 7. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Data Collection with redis Assist • redis lists populated as events come in • Daemons pull off lists and write to files • Periodically compress and archive files to S3 • S3 files used for input later – Hadoop (Cascalog) batch analytics – Advertising Stats Calculations © 2012 Yieldbot
  • 8. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Matching Lookup Tables • Mongo: collections for different lookup types – Eg., geo, url – Built periodically, updated on config change – Lookup in each, correlate results • redis – Ability to pipeline operations in single server call – Set intersection across lookup dimensions and one response back – Same master/slave as Mongo for distribution © 2012 Yieldbot
  • 9. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Configuration • Mongo – Database per publisher – Collections for objects – Denormalized where possible – Manual Foreign Keys – Obviously best candidate for relational model • History and Versioning was paramount to us – Roll our own: HeroDB © 2012 Yieldbot
  • 10. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 HeroDB • History and granular versioning highest goal • Database built on top of git – Golden database is a bare repo – Can clone to anywhere, make changes, push – Changes in single commit are atomic • How, when, and who changed it • Ability to set to specific previous state of DB • Much more to do, in production 6+ months – Recent change, caching © 2012 Yieldbot
  • 11. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Analytics Results • ARCv1, Mongo: indexed collections – Very easy to code to – Initially with everything else in same server – Moved out to dedicated server – Memory became an issue • Indexes bigger than data itself – Overhead of importing Cascalog results • Pull json files from S3 to local disk • mongoimport files into DB © 2012 Yieldbot
  • 12. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Analytics Results Cont’d • ARCv2, Mongo: paged data, key/value – Migrated app to key/value access pattern – Much better memory usage – Application sharded, publishers spread around – DB per day per publisher, most recent 7 held – Still overhead of importing Hadoop results • Pull json files from S3 to local disk • mongoimport files into DB © 2012 Yieldbot
  • 13. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Analytics Results - ElephantDB • Cascalog support to directly write EDB format – Berkeley DB or LevelDB • Ring Topology – Shards distributed around ring, consistent hashing – Configurable replication factor – Request to any node, forwards as necessary – Incrementally increase ring size • Import from S3 efficient – Copy shard from S3 to local disk © 2012 Yieldbot
  • 14. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Real-time Ad Stats • Mongo: DB per day, collection by entity type – Document per entity instance – stat_type.hour.minute nested values, atomic increment – Never a good story around aggregating at larger timeframes • Enter redis again © 2012 Yieldbot
  • 15. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Real-time Ad Stats Cont’d • redis has robust access patterns – More pipelining • Initially realtime and aggregated kept in redis • Issue with redis scaling is DB has to fit in memory • Time-period aggregations now kept in HBase • Only most recent hours kept in redis © 2012 Yieldbot
  • 16. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Task State Tracking • The last holdout • Collection of tasks – Each task is a document – Indexed as needed – Mongo query and update syntax convenient • Both in static code, but also in Python or Mongo repl © 2012 Yieldbot
  • 17. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Honorable Mention • redis for the celery backend, used for task messaging infrastructure • but was never mongo anyway... © 2012 Yieldbot
  • 18. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 MongoDB Migration Summary • Configuration  HeroDB • Data Collection  to S3 via redis • Analytics Results  ElephantDB • Task State Tracking  still Mongo • Matcher Lookup Tables  redis • Real-time Ad Stats  redis/HBase © 2012 Yieldbot
  • 19. Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012 Thanks! Site: yieldbot.com Blog: blog.yieldbot.com Twitter: @yieldbot Email: info@yieldbot.com © 2012 Yieldbot