SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
From SQL Server to MongoDB
Ryan Hoffman, Senior Software Architect
@tekmaven
http://architectryan.com
TNTP + TeacherTrack


          • TNTP is a national nonprofit committed to ending the
            injustice of educational inequality. Founded by
            teachers in 1997, TNTP works with schools, districts
            and states to provide excellent teachers to the students
            who need them most and advance policies and
            practices that ensure effective teaching in every
            classroom.
          • TeacherTrack is a web-based applicant tracking and
            teacher evaluation system. TNTP recruits teachers for
            districts nationwide, including in New Orleans,
            Philadelphia, and New York City, with TeacherTrack.



© TNTP 2012                                                        2
TeacherTrack Technology


          • .NET 4.0
             o ASP.NET Web Forms
             o ASP.NET MVC
             o WCF
             o WF4
          • NHibernate ORM for SQL Server
          • MongoDB .NET Driver
          • NServiceBus
          • Lucene.NET
          • Much, much more…



© TNTP 2012                                 3
Survey Templates


          • TeacherTrack uses a flexible data structure called a
            Survey to store a majority of data. A survey works
            very similarly to the conceptual model of a
            SurveyMonkey survey.
          • A Survey Template is a “master” survey in which
            blank survey instances are created from. A Survey
            template consists of some header data (a string key, an
            ID, as well as what site it is for) and an array of
            questions.
          • Each question contains the question text, as well as
            properties that govern how the question is rendered
            (for example if it is a text box or a drop down).


© TNTP 2012                                                       4
Surveys


          • A blank survey is instantiated from the survey
            template. It contains header data that associates that
            survey to a user, and contains an array of responses.
          • Each response contains the entire set of data from the
            question.
             o If the original survey template is changed, we will
               always be able to load the original questions the
               survey was filled out with.
             o It also allows for rendering a survey without
               needing to load a template.




© TNTP 2012                                                          5
TeacherTrack Survey Demo
Storing Surveys and Survey Object



    One table for Surveys and      class Survey {
    another for Responses.           Guid Id { get;   set; }
                                     Guid AccountId   { get; set; }
                                     string Title {   get; set; }
    • 1 row in the Survey table.     List<Response>   Responses { get; set; }
                                   }
    • 1 row per response in
      Response table.              class Response {
                                     Guid Id { get; set; }
    A survey with 20 responses       string Value { get; set; }
    would be stored in 21 rows.      string QuestionText { get; set; }
                                     string QuestionTitle { get; set; }
                                     ElementTypes QuestionElementType { get; set; }
                                     ControlTypes QuestionControlType { get; set; }
                                     string Watermark { get; set; }
                                   }
                                   //Additional fields omitted for brevity 




© TNTP 2012                                                                           7
© TNTP 2012   8
SQL Server Challenges



      • Performance!
         • Joining between the two tables was slow! We had >1 million
            surveys and >16 million responses before converting to
            MongoDB.
         • Actual query time in the application could easily be >200ms
            for one survey.
         • There were existing pages in the application where we could
            easily need to load over 20 surveys. 10 second page load
            times are not fun to work with.
      • Iterative Development
              • When alter tables take 20 minutes to run, deployment scripts
                which were not designed with this in mind break and time
                out.



© TNTP 2012                                                                    9
© TNTP 2012   10
Why TNTP selected MongoDB
          • Performance, durability, and scaling.
             o Document databases allow for a richer schema.
             o Replica sets are elegant, easy to set up, and reliable.
             o Auto-sharding is a great future option to scale.
          • 10gen rocks.
             o Training. Switching from an RDBMS so a document database
               is a big paradigm shift. 10gen’s Developer and Administrator
               training did a great job giving key team members the skills to
               make this possible.
             o Great support options. TNTP uses MMS to get insight their
               MongoDB servers, and we love that 10gen proactively can
               reach out to us based on server telemetry.
             o Great people. From day one at training, I met many 10gen
               employees, including people responsible for the Windows
               version. This type of access and interaction can not be
               understated.

© TNTP 2012                                                                 11
Survey Documents in MongoDB

       • Surveys are a great match for MongoDB.
       • The number of responses never changes after a survey is instantiated,
         making it an ideal candidate for being an embedded array in the survey
         document.
       • <10ms query times!

              {
                  "_id" : BinData(3,"vD+ifVfvS0qlk5vN8OPQOQ=="),
                  "AccountId" : BinData(3,"B1giiULLskSEG7rYmdqBUA=="),
                  "Title" : "Registering",
                  "Responses" : [
                    {
                      "_id" : BinData(3,"UvqabcPS1UGZipKODPKgGA=="),
                      "Value" : "Ryan",
                      "QuestionText" : "What is your first name?",
                      "QuestionElementType" : 1,
                      "QuestionControlType" : 1
                    }
                  ]
              }


© TNTP 2012                                                                       12
Conversion




              Query              Insert
                       Convert
               SQL                into
                       to BSON
              Server             Mongo



© TNTP 2012                               13
Conversion - Multithreading


          • The original proof of concept was single threaded. It took over
            two days to convert the data. When we refactored to a
            multithreaded model, conversion took less then 20 hours.
          • Each of the three parts of the conversion run in their own thread
          • A queue between each thread allows the threads to pass data
            along.
             The query thread to add objects to a conversion queue for the
              conversion thread.
             Similarly, the conversion thread adds converted objects to the
              insert queue for the insert thread.
             System.Collections.Concurrent.BlockingCollection<T>
              made this very easy.




© TNTP 2012                                                                14
Conversion – Auto Batching


          • Returning millions of rows in one query is clearly not
            going to work well. We need to batch the source
            queries and iterate until 0 rows are returned in the
            batch.
          • Querying batches out of SQL Server was very
            inconsistent. With no other load on the server,
            batches would take 45 seconds to over 10 minutes.
          • Instead of making each batch a fixed number of rows,
            we had logic that timed how long the previous batch
            took. Based on trial and error, a 1 minute batch time
            became the target. The code would adjust the number
            of rows based on the previous query’s number of
            rows and the query’s time.

© TNTP 2012                                                      15
Conversion - Incremental


        • Converting the data is still a time consuming process. When we
          deploy code that uses MongoDB, all the data needs to be
          converted. Deployments generally take less then an hour. The “20
          hours of downtime” discussion is not a great conversation to have
          with stakeholders
        • The answer: pre-convert the data! When we deploy, convert only
          the last 24 hours of data, which may only take minutes.
        • Surveys have a ModifiedOn date field. Using this is the key to
          converting! We did a lot of work and testing to make sure this
          field was always updated when a change was made.
        • Surveys are never deleted. A delete flips a deleted flag on the row.
          This allowed us to not worry about incrementally tracking deletes.
        • A command line switch allowed us to specify the start date of the
          conversion.


© TNTP 2012                                                                 16
Deployment Lessons


          • Practice makes perfect. We took stories over 3 sprints
            (each sprint is 3 weeks) to prepare for the conversion.
          • Always explicitly set your oplog size! The defaults
            created a 40gb oplog on the production servers. Since
            MongoDB uses memory mapped files, that 40gb oplog
            was loaded into ram. The servers have 48gb of RAM.
            We resized to a more sane 3gb.
          • If you have profiling turned on, you can’t fsyncLock
            the server. We didn’t know this, and it immediately
            broke the backup scripts the first night. I added a
            ticket to 10gen for this, and the documentation now
            reflects this.


© TNTP 2012                                                      17
Using MongoDB as a .NET Developer


          • Since most users run MongoDB on Linux, I was
            concerned about reliability and performance running
            on Windows. I’m happy to say that MongoDB works
            very well on Windows and we’ve had no issues.
          • The MongoDB .NET Driver is excellent. It allows raw
            BsonDocument access, or can map documents to your
            objects. It has very good LINQ support, and is
            constantly improving its API.
          • Guids are the primary key for most structures.
            Working with them is very inconvenient in the shell.
            In fact, without the “UUID” helper from the C#
            driver’s git repo, it would be nearly impossible to use
            the shell to work with Guids.

© TNTP 2012                                                       18
Wrap Up



          • MongoDB was a game changer for TeacherTrack.
            Think. In. Documents.
          • 10gen is a great company to work with. We are
            depending on MonogDB, and knowing that the
            people behind MongoDB were available for us
            was a huge plus.
          • Pre-conversion and incremental conversion are
            the keys of minimizing deployment time when
            working with a large set of data.
          • Most importantly, this was all made possible
            because of very talented team members at TNTP.
            You guys rock!

© TNTP 2012                                                  19
Questions
Slides will be made available on my blog, located at
http://architectryan.com/

Weitere ähnliche Inhalte

Was ist angesagt?

MongoDB Best Practices in AWS
MongoDB Best Practices in AWS MongoDB Best Practices in AWS
MongoDB Best Practices in AWS
Chris Harris
 

Was ist angesagt? (20)

Migrating to MongoDB: Best Practices
Migrating to MongoDB: Best PracticesMigrating to MongoDB: Best Practices
Migrating to MongoDB: Best Practices
 
Webinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationWebinar: Performance Tuning + Optimization
Webinar: Performance Tuning + Optimization
 
The NoSQL Way in Postgres
The NoSQL Way in PostgresThe NoSQL Way in Postgres
The NoSQL Way in Postgres
 
Webinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage EngineWebinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage Engine
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
 
Webinar: When to Use MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDB
 
Relational to Graph - Import
Relational to Graph - ImportRelational to Graph - Import
Relational to Graph - Import
 
Webinar: Migrating from RDBMS to MongoDB (June 2015)
Webinar: Migrating from RDBMS to MongoDB (June 2015)Webinar: Migrating from RDBMS to MongoDB (June 2015)
Webinar: Migrating from RDBMS to MongoDB (June 2015)
 
NOSQL Overview
NOSQL OverviewNOSQL Overview
NOSQL Overview
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDB
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB Introdction
 
TechEd AU 2014: Microsoft Azure DocumentDB Deep Dive
TechEd AU 2014: Microsoft Azure DocumentDB Deep DiveTechEd AU 2014: Microsoft Azure DocumentDB Deep Dive
TechEd AU 2014: Microsoft Azure DocumentDB Deep Dive
 
JSONiq - The SQL of NoSQL
JSONiq - The SQL of NoSQLJSONiq - The SQL of NoSQL
JSONiq - The SQL of NoSQL
 
Document Validation in MongoDB 3.2
Document Validation in MongoDB 3.2Document Validation in MongoDB 3.2
Document Validation in MongoDB 3.2
 
OrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databasesOrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databases
 
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
 Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
 
NoSQL on ACID - Meet Unstructured Postgres
NoSQL on ACID - Meet Unstructured PostgresNoSQL on ACID - Meet Unstructured Postgres
NoSQL on ACID - Meet Unstructured Postgres
 
MongoDB Best Practices in AWS
MongoDB Best Practices in AWS MongoDB Best Practices in AWS
MongoDB Best Practices in AWS
 
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
 
NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011
 

Ähnlich wie From sql server to mongo db

Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
Minsk MongoDB User Group
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Lucidworks
 
Secrets of highly_avail_oltp_archs
Secrets of highly_avail_oltp_archsSecrets of highly_avail_oltp_archs
Secrets of highly_avail_oltp_archs
Tarik Essawi
 

Ähnlich wie From sql server to mongo db (20)

Webinar: How We Evaluated MongoDB as a Relational Database Replacement
Webinar: How We Evaluated MongoDB as a Relational Database ReplacementWebinar: How We Evaluated MongoDB as a Relational Database Replacement
Webinar: How We Evaluated MongoDB as a Relational Database Replacement
 
Use Case: Apollo Group at Oracle Open World
Use Case: Apollo Group at Oracle Open WorldUse Case: Apollo Group at Oracle Open World
Use Case: Apollo Group at Oracle Open World
 
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce PlatformMongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
 
ГАННА КАПЛУН «noSQL vs SQL: порівняння використання реляційних та нереляційни...
ГАННА КАПЛУН «noSQL vs SQL: порівняння використання реляційних та нереляційни...ГАННА КАПЛУН «noSQL vs SQL: порівняння використання реляційних та нереляційни...
ГАННА КАПЛУН «noSQL vs SQL: порівняння використання реляційних та нереляційни...
 
PostgreSQL as a Big Data Platform
PostgreSQL as a Big Data Platform PostgreSQL as a Big Data Platform
PostgreSQL as a Big Data Platform
 
Mongo DB at Community Engine
Mongo DB at Community EngineMongo DB at Community Engine
Mongo DB at Community Engine
 
MongoDB at community engine
MongoDB at community engineMongoDB at community engine
MongoDB at community engine
 
Converting Your Legacy Data to S1000D
Converting Your Legacy Data to S1000DConverting Your Legacy Data to S1000D
Converting Your Legacy Data to S1000D
 
Student Industrial Training Presentation Slide
Student Industrial Training Presentation SlideStudent Industrial Training Presentation Slide
Student Industrial Training Presentation Slide
 
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOL
 
Overview of the TREC 2019 Deep Learning Track
Overview of the TREC 2019 Deep Learning TrackOverview of the TREC 2019 Deep Learning Track
Overview of the TREC 2019 Deep Learning Track
 
Open Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsOpen Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design Patterns
 
Meteor Revolution: From DDP to Blaze Reactive Rendering
Meteor Revolution: From DDP to Blaze Reactive Rendering Meteor Revolution: From DDP to Blaze Reactive Rendering
Meteor Revolution: From DDP to Blaze Reactive Rendering
 
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
 
Taming Large Databases
Taming Large DatabasesTaming Large Databases
Taming Large Databases
 
MongoDB World 2019: Fast Machine Learning Development with MongoDB
MongoDB World 2019: Fast Machine Learning Development with MongoDBMongoDB World 2019: Fast Machine Learning Development with MongoDB
MongoDB World 2019: Fast Machine Learning Development with MongoDB
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
 
Secrets of highly_avail_oltp_archs
Secrets of highly_avail_oltp_archsSecrets of highly_avail_oltp_archs
Secrets of highly_avail_oltp_archs
 
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 

Kürzlich hochgeladen

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

From sql server to mongo db

  • 1. From SQL Server to MongoDB Ryan Hoffman, Senior Software Architect @tekmaven http://architectryan.com
  • 2. TNTP + TeacherTrack • TNTP is a national nonprofit committed to ending the injustice of educational inequality. Founded by teachers in 1997, TNTP works with schools, districts and states to provide excellent teachers to the students who need them most and advance policies and practices that ensure effective teaching in every classroom. • TeacherTrack is a web-based applicant tracking and teacher evaluation system. TNTP recruits teachers for districts nationwide, including in New Orleans, Philadelphia, and New York City, with TeacherTrack. © TNTP 2012 2
  • 3. TeacherTrack Technology • .NET 4.0 o ASP.NET Web Forms o ASP.NET MVC o WCF o WF4 • NHibernate ORM for SQL Server • MongoDB .NET Driver • NServiceBus • Lucene.NET • Much, much more… © TNTP 2012 3
  • 4. Survey Templates • TeacherTrack uses a flexible data structure called a Survey to store a majority of data. A survey works very similarly to the conceptual model of a SurveyMonkey survey. • A Survey Template is a “master” survey in which blank survey instances are created from. A Survey template consists of some header data (a string key, an ID, as well as what site it is for) and an array of questions. • Each question contains the question text, as well as properties that govern how the question is rendered (for example if it is a text box or a drop down). © TNTP 2012 4
  • 5. Surveys • A blank survey is instantiated from the survey template. It contains header data that associates that survey to a user, and contains an array of responses. • Each response contains the entire set of data from the question. o If the original survey template is changed, we will always be able to load the original questions the survey was filled out with. o It also allows for rendering a survey without needing to load a template. © TNTP 2012 5
  • 7. Storing Surveys and Survey Object One table for Surveys and class Survey { another for Responses. Guid Id { get; set; } Guid AccountId { get; set; } string Title { get; set; } • 1 row in the Survey table. List<Response> Responses { get; set; } } • 1 row per response in Response table. class Response { Guid Id { get; set; } A survey with 20 responses string Value { get; set; } would be stored in 21 rows. string QuestionText { get; set; } string QuestionTitle { get; set; } ElementTypes QuestionElementType { get; set; } ControlTypes QuestionControlType { get; set; } string Watermark { get; set; } } //Additional fields omitted for brevity  © TNTP 2012 7
  • 9. SQL Server Challenges • Performance! • Joining between the two tables was slow! We had >1 million surveys and >16 million responses before converting to MongoDB. • Actual query time in the application could easily be >200ms for one survey. • There were existing pages in the application where we could easily need to load over 20 surveys. 10 second page load times are not fun to work with. • Iterative Development • When alter tables take 20 minutes to run, deployment scripts which were not designed with this in mind break and time out. © TNTP 2012 9
  • 11. Why TNTP selected MongoDB • Performance, durability, and scaling. o Document databases allow for a richer schema. o Replica sets are elegant, easy to set up, and reliable. o Auto-sharding is a great future option to scale. • 10gen rocks. o Training. Switching from an RDBMS so a document database is a big paradigm shift. 10gen’s Developer and Administrator training did a great job giving key team members the skills to make this possible. o Great support options. TNTP uses MMS to get insight their MongoDB servers, and we love that 10gen proactively can reach out to us based on server telemetry. o Great people. From day one at training, I met many 10gen employees, including people responsible for the Windows version. This type of access and interaction can not be understated. © TNTP 2012 11
  • 12. Survey Documents in MongoDB • Surveys are a great match for MongoDB. • The number of responses never changes after a survey is instantiated, making it an ideal candidate for being an embedded array in the survey document. • <10ms query times! { "_id" : BinData(3,"vD+ifVfvS0qlk5vN8OPQOQ=="), "AccountId" : BinData(3,"B1giiULLskSEG7rYmdqBUA=="), "Title" : "Registering", "Responses" : [ { "_id" : BinData(3,"UvqabcPS1UGZipKODPKgGA=="), "Value" : "Ryan", "QuestionText" : "What is your first name?", "QuestionElementType" : 1, "QuestionControlType" : 1 } ] } © TNTP 2012 12
  • 13. Conversion Query Insert Convert SQL into to BSON Server Mongo © TNTP 2012 13
  • 14. Conversion - Multithreading • The original proof of concept was single threaded. It took over two days to convert the data. When we refactored to a multithreaded model, conversion took less then 20 hours. • Each of the three parts of the conversion run in their own thread • A queue between each thread allows the threads to pass data along.  The query thread to add objects to a conversion queue for the conversion thread.  Similarly, the conversion thread adds converted objects to the insert queue for the insert thread.  System.Collections.Concurrent.BlockingCollection<T> made this very easy. © TNTP 2012 14
  • 15. Conversion – Auto Batching • Returning millions of rows in one query is clearly not going to work well. We need to batch the source queries and iterate until 0 rows are returned in the batch. • Querying batches out of SQL Server was very inconsistent. With no other load on the server, batches would take 45 seconds to over 10 minutes. • Instead of making each batch a fixed number of rows, we had logic that timed how long the previous batch took. Based on trial and error, a 1 minute batch time became the target. The code would adjust the number of rows based on the previous query’s number of rows and the query’s time. © TNTP 2012 15
  • 16. Conversion - Incremental • Converting the data is still a time consuming process. When we deploy code that uses MongoDB, all the data needs to be converted. Deployments generally take less then an hour. The “20 hours of downtime” discussion is not a great conversation to have with stakeholders • The answer: pre-convert the data! When we deploy, convert only the last 24 hours of data, which may only take minutes. • Surveys have a ModifiedOn date field. Using this is the key to converting! We did a lot of work and testing to make sure this field was always updated when a change was made. • Surveys are never deleted. A delete flips a deleted flag on the row. This allowed us to not worry about incrementally tracking deletes. • A command line switch allowed us to specify the start date of the conversion. © TNTP 2012 16
  • 17. Deployment Lessons • Practice makes perfect. We took stories over 3 sprints (each sprint is 3 weeks) to prepare for the conversion. • Always explicitly set your oplog size! The defaults created a 40gb oplog on the production servers. Since MongoDB uses memory mapped files, that 40gb oplog was loaded into ram. The servers have 48gb of RAM. We resized to a more sane 3gb. • If you have profiling turned on, you can’t fsyncLock the server. We didn’t know this, and it immediately broke the backup scripts the first night. I added a ticket to 10gen for this, and the documentation now reflects this. © TNTP 2012 17
  • 18. Using MongoDB as a .NET Developer • Since most users run MongoDB on Linux, I was concerned about reliability and performance running on Windows. I’m happy to say that MongoDB works very well on Windows and we’ve had no issues. • The MongoDB .NET Driver is excellent. It allows raw BsonDocument access, or can map documents to your objects. It has very good LINQ support, and is constantly improving its API. • Guids are the primary key for most structures. Working with them is very inconvenient in the shell. In fact, without the “UUID” helper from the C# driver’s git repo, it would be nearly impossible to use the shell to work with Guids. © TNTP 2012 18
  • 19. Wrap Up • MongoDB was a game changer for TeacherTrack. Think. In. Documents. • 10gen is a great company to work with. We are depending on MonogDB, and knowing that the people behind MongoDB were available for us was a huge plus. • Pre-conversion and incremental conversion are the keys of minimizing deployment time when working with a large set of data. • Most importantly, this was all made possible because of very talented team members at TNTP. You guys rock! © TNTP 2012 19
  • 20. Questions Slides will be made available on my blog, located at http://architectryan.com/