SlideShare ist ein Scribd-Unternehmen logo
1 von 137
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr
Finding the right NoSQL DB for the job - The path to a non-RDBMS solution at Traackr

Weitere ähnliche Inhalte

Empfohlen

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Empfohlen (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Hinweis der Redaktion

  1. While there were definitely people that mistook the rise of NoSQL as a complete replacement of RDBMS, there were equal misunderstandings on the RDBMS camp:Eventual consistency is not the only way to operate MongoDB: write-ahead journaling, commit acknowledgement with fsync and j options is available just as in RDBMS systemsOne does not need to be a distributed search engine managing petabytes of data to use these types of tools (which is our point)
  2. Taking a look at the amount of storage we are using as of a month ago in Mongo; this includes indexes
  3. The point is that we don’t need to track the entire web: just the subset belonging to influencers!
  4. There is a different perspective on “Web Scale” that has to do with the nature of the data on the web
  5. Take the approach of using a simplifiedentity model
  6. …withsemi-structured data storage formats like JSON:Facilitate capturing related attribute structures Enablethe flexibility of definingnew attributes as they are discovered
  7. Pre-web: we knew exactly the questions we wanted to ask and how to model the data for themPost-web: questions and data are hard to predict; we need storage tools that are built to support this
  8. CLOB pre-allocated space
  9. Sparse maps
  10. - This is something we thought we needed back in early 2010- Traack needs to score its’ entire DB of influencers on a weekly basis to adjust the weighted averages and stats that drive the scores. This means processing north of 750K of sites, over 650K influencers and soon, millions of posts (post-level attributes)
  11. Graph Databases: while we can model our domain as a graph we don’t want to pigeonhole ourselves into this structure. We’d rather use these tools for specialized data analysis but not as the main data store.
  12. Memcache: memory-based,we need true persistence
  13. Amazon SimpleDB: not willing to store our data in a proprietary datastore.
  14. Redis and LinkedIn’s Project Voldermort: no query filters, better used as queues or distributed caches
  15. CouchDB: no ad-hoc queries; maturity in early 2010 made us shy away although we did try early prototypes
  16. Cassandra: in early 2010, maturity questions, no secondary indexes and no batch processing options (came later on).
  17. MongoDB: in early 2010, maturity questions, adoption questions and no batch processing options
  18. Riak: very close but in early 2010, we had adoption questions
  19. HBase: came across as the most mature at the time, with several deployments, a healthy community, "out-of-the box" secondary indexes through a contrib and support for batch processing using Hadoop/MR Hadoop and its’ maturity was a big reason we picked HBase
  20. Had to deal with a complex setup right from the start:- minimum number of data nodes to support replication- odd number of zookeper nodes to avoid voting deadlocks- co-locating region servers = paying close attention to JVM resources- Master = SPOF- co-locating job trackers = paying close attention to JVM resources
  21. - Quick overview of how we modeled a list in hbase- This is what our customers see- Let's consider the name, the ranks of the influencers and the influencer references
  22. Each row has a unique key: the alist idWe would group general attributes under one family of columns appropriately named “attributes”. Benefit: can get Alist information without loading all the influencersWe would group the influencer references under another family of columns named “influencerIds”
  23. Column prefixes = family namesColumn suffixes = attribute names
  24. Now we can see where the attributes we see on the screen are stored
  25. - We coded the pagination and indexing features ourselves and contributed them back- Felt really good about it!
  26. It wasn’t bad enough we had to write our own code to support our indexing needs, we now had to maintain a third-party code base that was quickly becoming outdated!
  27. Simplified example for posts
  28. Denormalized/duplicated for fast runtime access and storage of influencer-to-site relationship properties
  29. Content attribution logic could sometimes mis-attribute posts because of the duplicated data.
  30. Exacerbated when we started tracking people’s content on a daily basis in mid-2011
  31. Graph Databases: we looked at Neo4J a bit closer but passed again for the same reasons as before
  32. CouchDB: more mature but still no ad-hoc queries
  33. Cassandra: matured quite a bit, added secondary indexes and batch processing options but more restrictive in its’ use than other solutions. After the Hbase lesson, simplicity of use was now more important
  34. Riak: strong contender still but adoption questions
  35. MongoDB: matured by leaps and bounds, increased adoption, support from 10gen, advanced indexing out-of-the-box as well as some batch processing options, breeze to use, well documented and fit into our existing code base very nicely.
  36. Embedded list of references to sites augmented with influencer-specific site attributes (e.g. percent contribution to content)
  37. siteId indexed for “find influencers connected to site X”
  38. Embedded list of influencer references augmented with “usernames” (useful for content attribution)
  39. Indexed for “find sites associated to influencer X”
  40. - This is an example of a simple report written in JavaScript meant to count the number of twitter profiles we have counted total retweets forEasy to write and test if you know JavaScript (no complicated Java MR jobs)Easy to execute as a cron job and pipe the results to an emailMR slightly more involved by still much more approachable than Java MR (or Pig)
  41. Easily configurable replica sets