Weitere ähnliche Inhalte
Ähnlich wie From sql server to mongo db (20)
Kürzlich hochgeladen (20)
From sql server to mongo db
- 1. From SQL Server to MongoDB
Ryan Hoffman, Senior Software Architect
@tekmaven
http://architectryan.com
- 2. TNTP + TeacherTrack
• TNTP is a national nonprofit committed to ending the
injustice of educational inequality. Founded by
teachers in 1997, TNTP works with schools, districts
and states to provide excellent teachers to the students
who need them most and advance policies and
practices that ensure effective teaching in every
classroom.
• TeacherTrack is a web-based applicant tracking and
teacher evaluation system. TNTP recruits teachers for
districts nationwide, including in New Orleans,
Philadelphia, and New York City, with TeacherTrack.
© TNTP 2012 2
- 3. TeacherTrack Technology
• .NET 4.0
o ASP.NET Web Forms
o ASP.NET MVC
o WCF
o WF4
• NHibernate ORM for SQL Server
• MongoDB .NET Driver
• NServiceBus
• Lucene.NET
• Much, much more…
© TNTP 2012 3
- 4. Survey Templates
• TeacherTrack uses a flexible data structure called a
Survey to store a majority of data. A survey works
very similarly to the conceptual model of a
SurveyMonkey survey.
• A Survey Template is a “master” survey in which
blank survey instances are created from. A Survey
template consists of some header data (a string key, an
ID, as well as what site it is for) and an array of
questions.
• Each question contains the question text, as well as
properties that govern how the question is rendered
(for example if it is a text box or a drop down).
© TNTP 2012 4
- 5. Surveys
• A blank survey is instantiated from the survey
template. It contains header data that associates that
survey to a user, and contains an array of responses.
• Each response contains the entire set of data from the
question.
o If the original survey template is changed, we will
always be able to load the original questions the
survey was filled out with.
o It also allows for rendering a survey without
needing to load a template.
© TNTP 2012 5
- 7. Storing Surveys and Survey Object
One table for Surveys and class Survey {
another for Responses. Guid Id { get; set; }
Guid AccountId { get; set; }
string Title { get; set; }
• 1 row in the Survey table. List<Response> Responses { get; set; }
}
• 1 row per response in
Response table. class Response {
Guid Id { get; set; }
A survey with 20 responses string Value { get; set; }
would be stored in 21 rows. string QuestionText { get; set; }
string QuestionTitle { get; set; }
ElementTypes QuestionElementType { get; set; }
ControlTypes QuestionControlType { get; set; }
string Watermark { get; set; }
}
//Additional fields omitted for brevity
© TNTP 2012 7
- 9. SQL Server Challenges
• Performance!
• Joining between the two tables was slow! We had >1 million
surveys and >16 million responses before converting to
MongoDB.
• Actual query time in the application could easily be >200ms
for one survey.
• There were existing pages in the application where we could
easily need to load over 20 surveys. 10 second page load
times are not fun to work with.
• Iterative Development
• When alter tables take 20 minutes to run, deployment scripts
which were not designed with this in mind break and time
out.
© TNTP 2012 9
- 11. Why TNTP selected MongoDB
• Performance, durability, and scaling.
o Document databases allow for a richer schema.
o Replica sets are elegant, easy to set up, and reliable.
o Auto-sharding is a great future option to scale.
• 10gen rocks.
o Training. Switching from an RDBMS so a document database
is a big paradigm shift. 10gen’s Developer and Administrator
training did a great job giving key team members the skills to
make this possible.
o Great support options. TNTP uses MMS to get insight their
MongoDB servers, and we love that 10gen proactively can
reach out to us based on server telemetry.
o Great people. From day one at training, I met many 10gen
employees, including people responsible for the Windows
version. This type of access and interaction can not be
understated.
© TNTP 2012 11
- 12. Survey Documents in MongoDB
• Surveys are a great match for MongoDB.
• The number of responses never changes after a survey is instantiated,
making it an ideal candidate for being an embedded array in the survey
document.
• <10ms query times!
{
"_id" : BinData(3,"vD+ifVfvS0qlk5vN8OPQOQ=="),
"AccountId" : BinData(3,"B1giiULLskSEG7rYmdqBUA=="),
"Title" : "Registering",
"Responses" : [
{
"_id" : BinData(3,"UvqabcPS1UGZipKODPKgGA=="),
"Value" : "Ryan",
"QuestionText" : "What is your first name?",
"QuestionElementType" : 1,
"QuestionControlType" : 1
}
]
}
© TNTP 2012 12
- 13. Conversion
Query Insert
Convert
SQL into
to BSON
Server Mongo
© TNTP 2012 13
- 14. Conversion - Multithreading
• The original proof of concept was single threaded. It took over
two days to convert the data. When we refactored to a
multithreaded model, conversion took less then 20 hours.
• Each of the three parts of the conversion run in their own thread
• A queue between each thread allows the threads to pass data
along.
The query thread to add objects to a conversion queue for the
conversion thread.
Similarly, the conversion thread adds converted objects to the
insert queue for the insert thread.
System.Collections.Concurrent.BlockingCollection<T>
made this very easy.
© TNTP 2012 14
- 15. Conversion – Auto Batching
• Returning millions of rows in one query is clearly not
going to work well. We need to batch the source
queries and iterate until 0 rows are returned in the
batch.
• Querying batches out of SQL Server was very
inconsistent. With no other load on the server,
batches would take 45 seconds to over 10 minutes.
• Instead of making each batch a fixed number of rows,
we had logic that timed how long the previous batch
took. Based on trial and error, a 1 minute batch time
became the target. The code would adjust the number
of rows based on the previous query’s number of
rows and the query’s time.
© TNTP 2012 15
- 16. Conversion - Incremental
• Converting the data is still a time consuming process. When we
deploy code that uses MongoDB, all the data needs to be
converted. Deployments generally take less then an hour. The “20
hours of downtime” discussion is not a great conversation to have
with stakeholders
• The answer: pre-convert the data! When we deploy, convert only
the last 24 hours of data, which may only take minutes.
• Surveys have a ModifiedOn date field. Using this is the key to
converting! We did a lot of work and testing to make sure this
field was always updated when a change was made.
• Surveys are never deleted. A delete flips a deleted flag on the row.
This allowed us to not worry about incrementally tracking deletes.
• A command line switch allowed us to specify the start date of the
conversion.
© TNTP 2012 16
- 17. Deployment Lessons
• Practice makes perfect. We took stories over 3 sprints
(each sprint is 3 weeks) to prepare for the conversion.
• Always explicitly set your oplog size! The defaults
created a 40gb oplog on the production servers. Since
MongoDB uses memory mapped files, that 40gb oplog
was loaded into ram. The servers have 48gb of RAM.
We resized to a more sane 3gb.
• If you have profiling turned on, you can’t fsyncLock
the server. We didn’t know this, and it immediately
broke the backup scripts the first night. I added a
ticket to 10gen for this, and the documentation now
reflects this.
© TNTP 2012 17
- 18. Using MongoDB as a .NET Developer
• Since most users run MongoDB on Linux, I was
concerned about reliability and performance running
on Windows. I’m happy to say that MongoDB works
very well on Windows and we’ve had no issues.
• The MongoDB .NET Driver is excellent. It allows raw
BsonDocument access, or can map documents to your
objects. It has very good LINQ support, and is
constantly improving its API.
• Guids are the primary key for most structures.
Working with them is very inconvenient in the shell.
In fact, without the “UUID” helper from the C#
driver’s git repo, it would be nearly impossible to use
the shell to work with Guids.
© TNTP 2012 18
- 19. Wrap Up
• MongoDB was a game changer for TeacherTrack.
Think. In. Documents.
• 10gen is a great company to work with. We are
depending on MonogDB, and knowing that the
people behind MongoDB were available for us
was a huge plus.
• Pre-conversion and incremental conversion are
the keys of minimizing deployment time when
working with a large set of data.
• Most importantly, this was all made possible
because of very talented team members at TNTP.
You guys rock!
© TNTP 2012 19