2. Jeff Lemmerman
B.S. Physics, B.S. Astrophysics
University of Minnesota
MS Software Engineering
University of Minnesota
Sr. Software Engineer – Medtronic (2006-Present)
3. Matt Chimento
B.S. Computer Engineering
Kettering University
Master of Business Administration (2014)
University of Minnesota Carlson School of Management
Prin. Test Engineer – Medtronic (2006-Present)
4. What are the “handcuffs” ?
Every application requires verification and validation
+
5. What if you make a change?
Every change requires re-validation
+
6. Are all changes equal?
Critical changes may even require FDA approval
+
7. What does it all mean?
High cost of collecting, curating, and maintaining data
=
8. Why MongoDB?
Strong user community
10gen enterprise support
C# driver
Performance
Flexibility, but…
noSQL doesn’t mean no schema
Here’s why:
18. Gaps
Enterprise acceptance of “new” approach
Integration with off-the-shelf reporting and analytics
User interface for managing the database cluster
Developer familiarity with JSON and MongoDB
LabVIEW to JSON
Released to open-source community
21 CFR Part 11 Compliance
Important to design systems which are modular to limit scope of changesOnce a system is validated, large barriers to changeWe try to develop using Agile Methodology but succumb to waterfall methodology to release a systemLeads to bad habits:Still using databases designed long time agoVery generic schema designsOther handcuffs:Audit tracing, 21 CFR Part 11, and AuthenticationUltimately leads to lotsof test and paperwork
What happens when we want to fix a bug or make an update?Adding a new table to a database = re-validationMore test and more paperwork
What if we find a critical bug or want to change to a new software entirely?More paperwork and more time
High cost Risk aversion means direct access to databases is not usually encouraged…loaders, batch processes, API..
We researched other key-value stores, structured text datatypes in SQL, and noSQL databases
We have over 20,000 measurement channels in our test labs alone.Life-test systems have been collecting data 24/7 for over 30 yearsData acquisition rates exceeding 1kHz on multiple channelsFire hose: We are much heavier on writes compared to reads and that we don't want to be limiting test system waiting for file I/O or remote DB calls Test Engineers: Data comes from simple RS232, Files, to Fully Automated SystemsConvert some data to summary databaseRest of data gets stored in raw files on serverIn long term testing, however, raw data is what matters
Mission: reduce the burden of collecting, curating, analyzing data, and generating knowledge.
Domain specific entities and their relationships (experiments, batteries, test systems, measurements)Science is becoming increasingly data intensive -> automated data collection, model comparison, predictionTime series data (sensors)
Systems generating discrete data sourcesIntroduces reporting and analytics tools needing to know how to find results (file paths, different database schemas…)Streaming sensor data from 10,000 sensors to SQL DB400 Million rows, partioning
Could implement a results repository with a data adapter for each data source, but still may not have all info needed to get results. Give me all the results for this component? Look in each result repository and merge results together…all at reporting time!
Reporting and analysis off a single data warehouseBuilding data adapters can be doneDesign of central DB? Table per result type? What if I want to add source? New table? Change a result -> schema change.What about generic columns?What about text data types JSON/XML in relational model?
Clients just need to be able to make HTTP POST requestsNon-windows clients, no database driversSend results to multiple databasesChange databasesUse of JSON in request body integrates well with Mongo -> if no transform, serialize to bsonIn some ways simpler for us:No DELETE, No PUT..ExpiresAfter: only save data for 2 years (?)
Could implement repository methods for common queries:GetResultsByExperimentNumber()GetResultsBySerialNumber()GetResultsByDateTimeRange()Deserialize mongo documents to “Model” classes .GetCollection<T>Use LINQ to implement queriesNotice C# driver support for replica sets