2. Todays Presentation Problems faced with Social Media Monitoring/Analytics Why choose NoSQL over SQL Why choose MongoDB NOSQL Vs SQL Schema Design Infinite scalability with commodity hardware & .NET Why we still use .NET (why not Ruby/Java/Python) Lessons Learned
4. About BuzzNumbers SaaSWeb Product Company Web and Social Media Analytics Collect “big data”web content Near-Realtime data capture News, Blogs, Social Mediaetc Scraping, API’s, Feeds Analytics & Business Intelligence BI, Text, Sentiment, Locations, NLP, Machine Learning
5. BuzzNumbers Project Team Nick Holmes a Court - @nickhac Brett Anderson - @brehtt Steve Casey - @stevencasey Jacinto Santamaria Chris Fulstow - @chrisfulstow Josie Kidd - @jose9
7. Problems faced at BuzzNumbers Large and fast growing DB Tables Lots of Read/Writes from data collection 24/7 Massive Table Scans for user reports (< 3 sec SLA) Large Joins (10+ Tables) with Nested Views Complex Queries (Aggregates, Where’s, FullText) FullText Search Indexes needed real-time updates Read/Write Contention Rapid Index fragmentation, Slow rebuilds DB Locks occurring (with no implicit Transactions) Blocking Transactions (both small/large tables)
8. Outgrew SQL Server Enterprise 2008 “Free” Software from MSFT from BizSpark Tried everything with SQL Enterprise Significant SQL Performance Tuning Dirty Reads (nolock), Offline Index Rebuilds Replication / Clustering / Multi-Instance Problems Schema changes impossible with uptime requirements DBA tasks made system unavailable for hours/days Hardware / SQL DBA got very expensive Web users experienced annoying / unnecessary waits on blocked queries that were non-complex because of joins
10. What is NOSQL New generation of “Databases” “Not Only SQL” - Mostly Open Source NOSQL Distributed database designed to deliver Distributed “Big Data” storage Distributed processing of queries/calculations NOSQL Examples include Google– BigTable Yahoo -Hadoop (30k+ Nodes) Facebook - Cassandra FourSquare - MongoDB
11. Why NoSQL over SQL SQL Guaranteed consistency Transactions Schemas / DataTypes Joins / Foreign Keys TSQL/PL-SQL (Views, Procs) Scale Up (hardware) Many Benefits including Ease of use Many developers skilled in SQL Trusted for decades / Proven NoSQL Eventual Consistency No Transaction Support Key/Value Data (mostly) Flat Data (no joins) Key Lookups / MapReduce / Code Scale out (distributed) Many Benefits including Performance / Scale Lower license costs Solves Web2 problems
12. Why NoSQL over SQL CAP Theorem Consistency Availability Partitioning Only 2 of 3 are Possible Consistency/Availability RDBMS Availability / Partitioning NOSQL Consistency / Partitioning Availability Issues (No one wants this)
16. Why Mongo Proven for multiple usage scenarios High performance (eventual consistency) Data stored in JSON (not only Key/Value) Supports Multiple Indexes (Anywhere in JSON) Easy to Install, Easy to Use(Linux/Windows) Easy to Scale for High Volume Writes (Sharding) Easy to Scale for High Volume Reads (Replica Sets) Automatic Failover and Redundancy (Replica Sets) REST Interface and Drivers for Ruby/.NET/Java/Etc Easy to Query via multiple techniques Key/Value, Mongo Query, JavaScript, MapReduce
22. BuzzNumbers NOSQL Presentation RDMBS Schema Mongo JSON Document Store Line Items with rich data as Nested Arrays . Use JavaScript or MapReduce to Query
23. Basic SQL vs Mongo Syntax Select * from Clients db.clients.find() Select * from Clients where clientid = 1 db.clients.find({”ClientID” :1}) Insert into clients (ClientID, Name) Values (1, “ACME”) db.clients.ìnsert({”ClientID” :1,”Name”:”ACME” }) Create Table / Alter Table Just start inserting db.client.insert({JSON HERE}) Create Index db.clients.ensureIndex({“ClientID”:1, “Name”:1})
24. Basic SQL vs Mongo Syntax Select * from Clients db.clients.find() Select * from Clients where clientid = 1 db.clients.find({”ClientID” :1}) Insert into clients (ClientID, Name) Values (“ACME”, 1) db.clients.ìnsert({”ClientID” :1,”Name”:”ACME” }) Create Table Just start inserting Create Index db.clients.ensureIndex({“ClientID”:1, “Name”:1})
26. Infinite Scale with .NET Use .NET for Rapid Product Development Web Applications (IIS, ASP.NET, User Databases) Server Applications (Scraping, Apps, Services, Data) Scheduled Tasks / Backend Jobs Use Open Source for Infinite Scale on Linux MongoDB for Big Data Storage SOLR (distributed Lucene) for Full Text Indexing .NET Drivers Available for Mongo/SOLR
27. Infinite Scale with .NET Cloud Hosting for Low Cost Scale Rackspace Cloud ($200 p/m per 4GB-RAM server) Windows and Ubuntu – Image/Clone/API support Zabbix Monitoring – notify when near capacity Amazon/Heroku/dotCloud alternates Tips to deliver fantastic performance at scale Indexes MUST fit in RAM (Disk Reads are Slow) SSD’s HardDisks are worth the extra price 4GB RAM / 160GB Disk seems to be optimum price/performance per node in distributed system
29. Why we stay with .NET Visual Studio best IDE!!! SQL Server great database for most Data Proven Tech Stack (low corporate risk) Lots of support (MSFT and Consultants) Large online community with code samples Many Open Source libraries ASP.NET MVC RAZOR is RAD Non-Complex Sysadmin for Windows Servers Drivers/Integration available for most OSS Projects Lots of Agile/Scrum/TDD/CI/Project Management tools Lots of smart .NET web developers & engineers
31. Lessons Learned “Big Data” is not 100M records: but 1BN+ Don’t scale until you need to (Premature optimisation costs - big time) SQL RBDMS solves most problems but Scale up costs are prohibitive for startups so plan in advance when you might need to switch Mixing SQL for SmallData and NOSQL for BigData delivers both ease/speed of development and performance Mongo/SOLR works well to solve specific performance problems Not all problems are equal: optimiseeach solution per performance problem Don’t go NOSQL unless you absolutely need to Very early technology with lots of learning overhead, risks and production issues Skilled .NET/Mongo/SOLR engineers are very hard to find If client/data segmentation is possible, multiple SQL instances can deliver Ensure Indexes fit in Memory Spend time planning your schema in advances based on query requirements
33. Thanks for your time Speak with one of the Buzz Team tonight Join our Team? We’re Hiring! Web Developers Software Engineers UX / Web Designers Immediate and Future roles… Talk to us!
Hinweis der Redaktion
{"WebsiteID": 12345,"DomainName":"buzznumbershq.com","DateSummary": "2011-09-22","UserIDSummary":[1,2,3,4,5,6,7,8]"PageVisitSummary":{ "Home": ["VisitCount": 20000, "Uniques":55], "About": ["VisitCount": 1667, "Uniques":44], "Products": ["VisitCount": 1223, "Uniques":33], "Contact": ["VisitCount": 50, "Uniques":22]},"PageVisits":{ "PageVisit": ["UserID":1, "PageName":"Home"], "PageVisit": ["UserID":2, "PageName":"About"], "PageVisit": ["UserID":3, "PageName":"Products"], "PageVisit": ["UserID":4, "PageName":"Contact"],etcetc } } Proven Tech Stack (low risk) Lots of smart web developers/engineers Visual Studio best IDE by Miles Lots of support (MSFT and Consultants) Large online community with code samples Many Open Source libraries ASP.NET MVC RAZOR is RAD Low levels of SysAdmin Drivers/Integration available for most OSS Lots of Agile/Scrum/TDD/CI/Project Management tools