SlideShare ist ein Scribd-Unternehmen logo
1 von 112
Storing
   the
 Family
Tree with
We’re going to talk about
MongoDB Intro & Fundamentals
MongoDB for Genealogy data
Scaling MongoDB for all the generations
The Family Tree
Storing a graph in MongoDB
Steve                  @sp

                     A
                      15+ years building
                      the internet
                         Father, husband,
                         skateboarder,
                         genealogist at ❤


Chief Solutions Architect @
responsible for drivers,
integrations, web & docs
Company behind MongoDB
Offices in NYC, Palo Alto, London & Dublin
100+ employees
Support, consulting, training
Mgt: Google/DoubleClick, Oracle, Apple, NetApp, Mark Logic

Well Funded: Sequoia, Union Square, Flybridge
Introduction
     to
MongoD
A bit of
history
1974
The relational database is created
1979
1979   1994
1979   1994   1995
Computers in 1995
100 mhz Pentium
10 base T
16 MB ram
200 MB HD
Cloud in 1995
(Windows 95 cloud wallpaper)
Cell Phones in 2012
Dual core 1.5Ghz
802.11n (300+ Mbps)
1 GB ram
64 GB Solid State
MongoDB
         Application     Document
                         Oriented
    High                 { author : “steve”,
                           date : new Date(),

Performance
                           text : “About MongoDB...”,
                           tags : [“tech”, “database”]}




                           Fully
                         Consistent
 Horizontally Scalable
MongoDB philosophy
 Keep functionality when we can (key/value
 stores are great, but we need more)
 Non-relational (no joins) makes scaling
 horizontally practical
 Document data models are good
 Database technology should run anywhere
 virtualized, cloud, metal, etc
Under the hood
Written in C++
Runs nearly everywhere
Data serialized to BSON
Extensive use of memory-mapped files
i.e. read-through write-through
memory caching.
Database Landscape
Scalability & Performance


                            MemCache

                                             MongoDB



                                                  RDBMS



                               Depth of Functionality
“
MongoDB has the best
features of key/value
stores, document
databases and relational
databases in one.
         John Nunemaker
Relational made normalized
     data look like this
                      Category
                  • Name
                  • Url




                           Article
       User       • Name
                                              Tag
• Name            • Slug             • Name
• Email Address   • Publish date     • Url
                  • Text




                     Comment
                  • Comment
                  • Date
                  • Author
Document databases make
normalized data look like this
                            Article
                     • Name
                     • Slug
                     • Publish date
        User         • Text
   • Name            • Author
   • Email Address
                         Comment[]
                      • Comment
                      • Date
                      • Author

                            Tag[]
                      • Value

                         Category[]
                      • Value
But we’ve been using
a relational database
    for 40 years!
How do people store
documents in real life?
Think about a
doctors office
 There’s two ways they
could organize their files
Each document type
        in it’s own drawer
MRIs   X-rays   Lab   Invoices       Index



         1      1        1       1




         1      1        1       1




   History Medications Lab   Forms
Each document type
        in it’s own drawer
MRIs   X-rays   Lab   Invoices       Index



         1      1        1       1




         1      1        1       1




   History Medications Lab   Forms
Each document type
        in it’s own drawer
MRIs   X-rays   Lab   Invoices       Index



         1      1        1       1




         1      1        1       1




   History Medications Lab   Forms
2. Group related records


    Patient 1   Patient 2   Patient 3   ...




    Vendor 1    Vendor 2    Vendor 3
2. Group related records


    Patient 1               Patient 3   ...


            Patient 2

    Vendor 1     Vendor 2   Vendor 3
Databases work the same way
          Relation                               Docum


                                         Patient 1     Vendor 1


                                                            Article
              Category                                 • Name
            • Name                                     • Slug
            • Url                                      • Publish
                                          User             date
                                                       •   Text
                                   •   Name            •   Author
                                   •   Email Address
               Article
    User                     Tag
            • Name                                         Comment[]
• Name                   • Name
• Email     • Slug       • Url                         • Comment
  Address   • Publish
               date                                    • Date
                                                       • Author

              Comment                                        Tag[]
            • Comment                                  • Value
            • Date
            • Author
                                                        Category[]
                                                       • Value
Terminology
 RDBMS                 Mongo
Table, View   ➜   Collection
Row           ➜   Document
Index         ➜   Index
Join          ➜   Embedded
Foreign Key   ➜   Document
                  Reference
Partition     ➜   Shard
Why MongoDB
                   My Top 10 Reasons

10. Great developer experience
 9. Speaks your language
 8. Scale horizontally
 7. Fully consistent data w/atomic operations

1.It’s web scale
 6. Memory caching integrated
5. Open source
 4. Flexible, rich & structured data format not just K:V
 3. Ludicrously fast (without going plaid)
 2. Simplify infrastructure & application
Why MongoDB
                   My Top 10 Reasons

10. Great developer experience
 9. Speaks your language
 8. Scale horizontally
 7. Fully consistent data w/atomic operations

1.It’s web scale
 6. Memory caching integrated
5. Open source
 4. Flexible, rich & structured data format not just K:V
 3. Ludicrously fast (without going plaid)
 2. Simplify infrastructure & application
MongoDB
Use Cases
CMS / Blog
Needs:
• Business needed modern data store for rapid development and
  scale

Solution:
• Use PHP & MongoDB

Results:
• Real time statistics
• All data, images, etc stored together
  easy access, easy deployment, easy high availability
• No need for complex migrations
• Enabled very rapid development and growth
Photo Meta-Data
Problem:
• Business needed more flexibility than Oracle could deliver

Solution:
• Use MongoDB instead of Oracle

Results:
• Developed application in one sprint cycle
• 500% cost reduction compared to Oracle
• 900% performance improvement compared to Oracle
Customer Analytics
Problem:
• Deal with massive data volume across all customer sites

Solution:
• Use MongoDB to replace Google Analytics / Omniture options

Results:
• Less than one week to build prototype and prove business case
• Rapid deployment of new features
Archiving
Why MongoDB:
• Existing application built on MySQL
• Lots of friction with RDBMS based archive storage
• Needed more scalable archive storage backend
Solution:
• Keep MySQL for active data (100mil)
• MongoDB for archive (2+ billion)
Results:
• No more alter table statements taking over 2 months to run
• Sharding fixed vertical scale problem
• Very happily looking at other places to use MongoDB
Online Dictionary
Problem:
• MySQL could not scale to handle their 5B+ documents

Solution:
• Switched from MySQL to MongoDB

Results:
• Massive simplification of code base
• Eliminated need for external caching system
• 20x performance improvement over MySQL
E-commerce
Problem:
• Multi-vertical E-commerce impossible to model (efficiently) in
  RDBMS

Solution:
• Switched from MySQL to MongoDB

Results:
•   Massive simplification of code base
•   Rapidly build, halving time to market (and cost)
•   Eliminated need for external caching system
•   50x+ performance improvement over MySQL
Tons more
   MongoDB casts a wide net

  people keep coming up with
 new and brilliant ways to use it
In Good Company




   and 1000s more
MongoD
  B
Start with an
              (or array, hash, dict, e

place1 = {

   name : "10gen HQ",

 address : "578 Broadway 7th Floor",

   city : "New York",

    zip : "10011",
   tags : [ "business", "awesome" ]
}
Inserting the record
    Initial Data Load


               > db.places.insert(place1)

> db.places.insert(place1)
Querying
{

    name : "10gen HQ",

 address : "134 5th Avenue 3rd Floor",

    city : "New York",

     zip : "10011",
   tags : [ "business", "awesome" ]
}

> db.posts.findOne({ zip: "10011",
            tags: "awesome" })

> db.posts.find({tags: "business" })
Nested Documents
  { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
    author : "roger",
    date : "Sat Apr 24 2011 19:47:11",
    text : "About MongoDB...",
    tags : [ "tech", "databases" ],
    comments : [

         {

         
 
 author : "Fred",

         
 
 date : "Sat Apr 25 2010 20:51:03",

         
 
 text : "Best Post Ever!"

         
}
     ]
}
Object ID
> db.places.insert(place1)

object(MongoId)#4 (1) {
  ["$id"]=> string(24) "4e9cc76a4a1817fd21000000"
}

   4e9cc76a4a1817fd21000000
   |------||----||--||----|
     ts  mac pid inc
A More Complex Document

place1 = {
   name : "10gen HQ",
 address : "578 Broadway 7th Floor",
   city : "New York",
     zip : "10011",
   tags : [ "business", "awesome" ],
 latlong : [40.0,72.0],
     tips : [ { user : "ryan",
              time : 6/26/2011,
               tip : "stop by for office hours"},
   
           {.....}]
}
Indexing & Adv Querying
// Index nested documents
db.posts.ensureIndex({ "comments.author":1 })
db.posts.find({'comments.author':'Fred'})

// Regular Expressions
db.posts.find({'comments.author': /^Fr/})

// Index on tags (multi-key index)
db.posts.ensureIndex({ tags: 1})
db.posts.find( { tags: 'tech' } )

// geospatial index
db.posts.ensureIndex({ "author.location": "2d" })
db.posts.find({"author.location":{$near:[22,42]}})
Updating
place1 = {
    name : "10gen HQ",
> db.places.update(
 address : "578 Broadway 7th Floor",
  {name : "10gen HQ"},
    city : "New York",
  { $push :
     zip : "10011",
       { tips :
    tags : [ "business", "awesome" ],
 latlong {: user : "nosh",
              [40.0,72.0],
             tips : [ { user : "ryan",
              time : 6/26/2011, 
                   time : 6/26/2011,
               tiptip"Office by for office hours on
                     : : "stop hours are great!"
           }              Wednesdays from 4-6pm"}, 
       }         { user : "nosh",
                   time : 7/14/2011, 
  }
                    tip : "Office hours are great!"}
)              ]
}
Updating
place1 = {
    name : "10gen HQ",
> db.places.update(
 address : "578 Broadway 7th Floor",
  {name : "10gen HQ"},
    city : "New York",
  { $push :
     zip : "10011",
       { tips :
    tags : [ "business", "awesome" ],
 latlong {: user : "nosh",
              [40.0,72.0],
             tips : [ { user : "ryan",
              time : 6/26/2011, 
                   time : 6/26/2011,
               tiptip"Office by for office hours on
                     : : "stop hours are great!"
           }              Wednesdays from 4-6pm"}, 
       }         { user : "nosh",
                   time : 7/14/2011, 
  }
                    tip : "Office hours are great!"}
)              ]
}
Atomic
   Operations
$set   $unset       $rename

   $push     $pop     $pull


 $addToSet          $in
Cursors
$cursor = $c->find(array("foo" => "bar"));

foreach ($cursor as $id => $value) {
   echo "$id: ";
   var_dump( $value );
}

$a = iterator_to_array($cursor);
Paging
page_num = 3;
results_per_page = 10;

cursor = db.collection.find()
  .sort({ "ts" : -1 })
  .skip(page_num * results_per_page)
  .limit(results_per_page);
Grid FS
Storing Files




Under 16mb
Storing Big Files




>16mb stored in 16mb chunks
Storing Big Files




Works with replicated and
A better network FS
GridFS files are seamlessly sharded & replicated.
No OS constraints...
No file size limits
No naming constraints
No folder limits
Standard across different OSs
MongoDB automatically generates the MD5 hash of
the file
MongoDB for
 Genealogy
   Data
Types of
      genealogy data
Events (birth, death,   Photographs
etc)
                        Diaries & letters
Official records
                        Ship passenger list
Census
                        Occupation
Names
                        and more
Relationships
Challenges of
           genealogy data
Lots of possible data points... need flexible schema
Multiple versions of same data point
(3 different dates for death date, 4 variations on
name).
Data related to records
Multiple versions of same nodes
(intelligent nondestructive merge needed)
Need to have meta data associated
Genealo
 gy is
changin
   g
0   @I2@ INDI
1   NAME Charles Phillip /Ingalls/
1   SEX M
1   BIRT
2   DATE 10 JAN 1836
2   PLAC Cuba, Allegheny, NY
1   DEAT


                           Recog
2   DATE 08 JUN 1902
2   PLAC De Smet, Kingsbury, Dakota Territory
1   FAMC @F2@
1   FAMS @F3@


                            nize
0   @I3@ INDI
1   NAME Caroline Lake /Quiner/
1   SEX F
1   BIRT
2   DATE 12 DEC 1839
GEDCOM
File format, not a database
Handles the great variety of data well
Doesn’t really scale beyond a local user.
Doesn’t provide good mechanism for storing
external documents (birth certificates, etc).
Built to solve problem of sharing data
Genealogy &
              MongoDB

Genealogy is anything but rigid and fixed
Flexible schema fits genealogy data well
Packaging things together makes sense
Relating records doesn’t require a relational
database
Indivi
•AFN
•Modification Date
                      Events[]
                    •type
                    •date
    Name            •contributor[]
                    •record[]
 •First[]
 •Middle[]            Location
 •Last[]             •city
                     •state
                     •county
                     •country
Indivi                  Events[]
                                          Us
                                         • Name
• AFN                • type              • Email Address
• Modification Date   • date              • Password
                     • contributor[]     • Individual_id
                     • record[]
   Name
• First[]
• Middle[]              Location
• Last[]               • city
                       • state           Rec
                       • county          • contributor
                       • country         • type
                       • coordinates[]   • thumbnail
                                         • content
                                         • description
                                         • tags[]
Individual
individual = {
  _id : ObjectId("4f2978dfaa999d9db02618ce"),
  AFN : '1XYK-KQJ',
  name: {
     first: ['john', 'johannes'],
     middle: 'peter',
     last: ['smith', 'sandvik']
   }
}
Individual
individual = {
  _id : ObjectId("4f2978dfaa999d9db02618ce"),
  AFN : '1XYK-KQJ',
  name: {
     first: ['john', 'johannes'],
     middle: 'peter',
     last: ['smith', 'sandvik']
   }
}


db.individual.find(
{name.first : ‘john’, name.middle : ‘peter’})
Events
events : [
   death : {
    date : ISODate('1989-07-14'),
    location : {
      city: 'pensacola',
      state: 'fl',
      county: 'escambia',
      country: 'usa'
      coordinates : [30.26,87.12]},
    contributor : ObjectId("4eeac...691")}]
events : [
   death : {
                Events
    date : ISODate('1989-07-14'),
    location : {
      city: 'pensacola',
      state: 'fl',
      county: 'escambia',
      country: 'usa'
      coordinates : [30.26,87.12]},
    contributor : ObjectId("4eeac...691")}]

db.individual.find(
{events.death.date : ISODate(‘1989-07-14’)})

db.individual.find(
{events.death.location : { $near:[30,90]}})
Duplicate Events
events : [
  birth : [ {
      date : ISODate('1928-04-06'),
      location : {
        city: 'brattleboro',
        state: 'vt',
        county: 'windham',
        country: 'usa'
        coordinates : [42.51,72.34]},
      contributor : ObjectId("4ee...00000"),
      records: ObjectId("4ed8a...7b000000")
  },
county: 'windham',

Duplicate Events
            country: 'usa'
            coordinates : [42.51,72.34]},
          contributor : ObjectId("4ee...00000"),
          records: ObjectId("4ed8a...7b000000")
    },
    {
          date : ISODate('1928-04-16'),
          location : {
            city: 'brattleboro',
            state: 'vt',
            county: 'windham',
            country: 'usa'
            coordinates : [42.51,72.34]},
          contributor : ObjectId("4ee...37bb"),
          records: ObjectId("4eea...0000c8"),
    }],
}
Duplicate Events
events : [
  birth : [ { date : ISODate('1928-04-06')},
          { date : ISODate('1928-04-16')}],
]

db.individual.find(
{events.birth.date : ISODate(‘1928-04-16’)})

                     Same Query
                       Works!!
Multiple Events
marriage : [{
  date : ISODate('1939-08-11'),
  end_date : ISODate('1940-02-19'),
  to : ObjectId("4f297978aa999d9db02618cf"),
  location : {
    city: 'raleigh',
    state: 'nc',
    county: 'wake',
    country: 'usa'
    coordinates : [35.49,78.38]},
  contributor : ObjectId("4eeac...91537bb")},
{
  date : ISODate('1944-04-19'),
  to : ObjectId("4f2978dfaa999d9db02618ce"),
  location : {
marriage : [{


 Multiple Events
  date : ISODate('1939-08-11'),
  end_date : ISODate('1940-02-19'),
  to : ObjectId("4f297978aa999d9db02618cf"),
  location : {
    city: 'raleigh',
    state: 'nc',
    county: 'wake',
    country: 'usa'
    coordinates : [35.49,78.38]},
  contributor : ObjectId("4eeac...91537bb")},
{
  date : ISODate('1944-04-19'),
  to : ObjectId("4f2978dfaa999d9db02618ce"),
  location : {
    city: 'atlanta',
    state: 'ga',
    county: 'fulton',
    country: 'usa'
    coordinates : [33.45,84.23]},
    contributor : ObjectId("4eeb...37bb")}]
individual = {                              All
   _id : ObjectId("4f2978dfaa999d9db02618ce"),




                                          togeth
   AFN : '1XYK-KQJ',
   name: {
      first: ['john', 'johannes'],
      middle: 'peter',
      last: ['smith', 'sandvik']
   },
   events : [



                                            er
      birth : [
         {
             date : ISODate('1928-04-06'),
             location : {
                                   Text
                city: 'brattleboro',
                state: 'vt',
                county: 'windham',
                country: 'usa'
                coordinates : [42.51,72.34]
             },
             contributor : ObjectId("4eeabc958b691537bb000000"),
             records: ObjectId("4ed8aea7d8562f7d7b000000")
         },
         {
             date : ISODate('1928-04-16'),
             location : {
                city: 'brattleboro',
Records
record1 = {
   _id : ObjectId("4ed8aea7d8562f7d7b")
   contributor : ObjectId("4eeab...1537bb"),
   type : 'birth certificate',
   thumbnail : BinData(0,"/9j/4AAQSkZJ...."),
   content : BinData(0,"j6b/Id11lWqs..."),
   tags : ['NY', 'certified'],
   description : "John's birth certificate"
}
Users
user = {
  _id : ObjectId("4eeabc958b691537bb"),
  username : 'spf13',
  email_address : 'genealogy@spf13.com',
  password : 'a.long.passphrase18',
  individual_id : ObjectId("4f2f...0ce"),
}
Scaling
 MongoDB
 for all the
generation
Replica Sets
Primary         Primary    Primary

Secondary      Secondary   Secondary


Secondary       Arbiter    Secondary

                           Secondary

                           Secondary
Sharding
          App       App      App
         Server    Server   Server
         MongoS    MongoS    MongoS

                                           ConfigD
                                           ConfigD
                                           ConfigD


MongoD       MongoD     MongoD    MongoD

MongoD       MongoD     MongoD    MongoD

MongoD       MongoD     MongoD    MongoD
The Family
 Tree
It’s not a tree at all,
  It’s really a graph
     ... and an odd one at that
It would be easy if it
always looked like this
It would be easy if it
always looked like this
All sorts of mess
Step & adopted relationships
Duplicate nodes
Lots of missing nodes
Divorces and re-marriages
Multiple names for the same person
Multiple dates for the same event
How to make
sense of it all
Storing a
graph
   in
Graphs are important




Without them we couldn’t store family relationships
Trees / graphs
        in MongoDB
Since MongoDB data structures are
essentially objects, a good degree of
flexibility here.
Think of how you would structure them in
your application
Trees / graphs
        in MongoDB
Each node is stored as a document

Contains references to related nodes

What is “related” depends on your
application
References vs
         Relation
MongoDB uses references
Unlike foreign keys, references don’t
enforce integrity
Reference is really just a reference
For many applications a reference is
sufficient
Simple relationship
{   _id:   "a" } { _id: "b" } { _id: "c" } { _id: "d" }
{   _id:   "e", parents: ["a", "b" ]}
{   _id:   "f", parents: ["c", "d" ]}
{   _id:   "g", parents: ["e", "f" ]}



•= b =allancestors of g: of'g'});'b'}).toArray();
  Easy to access b:
//find
//find all descendants
var
                             nodes in either direction
           db.family.find({ _id:
g db.family.findOne({_id:
•Good for trees / {graphs
descendantsFind = function(par) {
ancestorFind = function(child)

• if ( ! (i in par) return sets
   var rv
  Can==[];[]; { large rv;
  var rv
             grab
   for child.parents)
//finddb.family.find( { descendants of b:} ).toArray();
  var k = all db.family.find( { _id : :{ par[i]._id }).toArray();
        parents = direct parents $in : child.parents }
•Minimum amount of maintenance
  rv = rv.concat(parents);
       rv = rv.concat(k);
>forrv = irv.concat(descendantsFind(k)); : ‘b’})
     db.family.find({ parents
        (var in parents) {
•Balanced ancestorFind(parents[i]));
  }
   }
      rv = rv.concat(
   return rv;
•Implied relationships
}
}
  return rv;


descendantsFind(b);
ancestorFind(g);
Simple relationship
{   _id:   "a" } { _id: "b" } { _id: "c" } { _id: "d" }
{   _id:   "e", parents: ["a", "b" ]}
{   _id:   "f", parents: ["c", "d" ]}
{   _id:   "g", parents: ["e", "f" ]}



•= b =allancestors of g: of'g'});'b'}).toArray();
  Easy to access b:
//find
//find all descendants
var
                             nodes in either direction
           db.family.find({ _id:
g db.family.findOne({_id:
•Good for trees / {graphs
descendantsFind = function(par) {
ancestorFind = function(child)

• if ( ! (i in par) return sets
   var rv
  Can==[];[]; { large rv;
  var rv
             grab
   for child.parents)
//finddb.family.find( { descendants of b:} ).toArray();
  var k = all db.family.find( { _id : :{ par[i]._id }).toArray();
        parents = direct parents $in : child.parents }
•Minimum amount of maintenance
  rv = rv.concat(parents);
       rv = rv.concat(k);
>forrv = irv.concat(descendantsFind(k)); : ‘b’})
     db.family.find({ parents
        (var in parents) {
•Balanced ancestorFind(parents[i]));
  }
   }
      rv = rv.concat(
   return rv;
•Implied relationships
}
}
  return rv;


descendantsFind(b);
ancestorFind(g);
Simple relationship
{   _id:   "a" } { _id: "b" } { _id: "c" } { _id: "d" }
{   _id:   "e", parents: ["a", "b" ]}
{   _id:   "f", parents: ["c", "d" ]}
{   _id:   "g", parents: ["e", "f" ]}



•= b =allancestors of g: of'g'});'b'}).toArray();
  Easy to access b:
//find
//find all descendants
var
                             nodes in either direction
           db.family.find({ _id:
g db.family.findOne({_id:
•Good for trees / {graphs
descendantsFind = function(par) {
ancestorFind = function(child)

• if ( ! (i in par) return sets
   var rv
  Can==[];[]; { large rv;
  var rv
             grab
   for child.parents)
//finddb.family.find( { descendants of b:} ).toArray();
  var k = all db.family.find( { _id : :{ par[i]._id }).toArray();
        parents = direct parents $in : child.parents }
•Minimum amount of maintenance
  rv = rv.concat(parents);
       rv = rv.concat(k);
>forrv = irv.concat(descendantsFind(k)); : ‘b’})
     db.family.find({ parents
        (var in parents) {
•Balanced ancestorFind(parents[i]));
  }
   }
      rv = rv.concat(
   return rv;
•Implied relationships
}
}
  return rv;


descendantsFind(b);
ancestorFind(g);
Simple relationship
{   _id:   "a" } { _id: "b" } { _id: "c" } { _id: "d" }
{   _id:   "e", parents: ["a", "b" ]}
{   _id:   "f", parents: ["c", "d" ]}
{   _id:   "g", parents: ["e", "f" ]}



•= b =allancestors of g: of'g'});'b'}).toArray();
  Easy to access b:
//find
//find all descendants
var
                             nodes in either direction
           db.family.find({ _id:
g db.family.findOne({_id:
•Good for trees / {graphs
descendantsFind = function(par) {
ancestorFind = function(child)

• if ( ! (i in par) return sets
   var rv
  Can==[];[]; { large rv;
  var rv
             grab
   for child.parents)
//finddb.family.find( { descendants of b:} ).toArray();
  var k = all db.family.find( { _id : :{ par[i]._id }).toArray();
        parents = direct parents $in : child.parents }
•Minimum amount of maintenance
  rv = rv.concat(parents);
       rv = rv.concat(k);
>forrv = irv.concat(descendantsFind(k)); : ‘b’})
     db.family.find({ parents
        (var in parents) {
•Balanced ancestorFind(parents[i]));
  }
   }
      rv = rv.concat(
   return rv;
•Implied relationships
}
}
  return rv;


descendantsFind(b);
ancestorFind(g);
Bi-directional
 {   _id:   "a", children: ["e"] }
 {   _id:   "b", children: ["e"] }
 {   _id:   "c", children: ["f"] }
 {   _id:   "d", children: ["f"] }
 {   _id:   "e", children: ["g"], parents: ["a", "b" ]}
 {   _id:   "f", children: ["g"], parents: ["c", "d" ]}
 {   _id:   "g", children: [] , parents: ["e", "f"] }


•Doesn’t really add much beyond the first example
•More maintenance
•Duplication of each relationship
•Only real advantage is ability to grab all related
nodes (both directions) with one query.
Array of Ancestors
{   _id:   "a" }
{   _id:   "b" }
{   _id:   "c" }
{   _id:   "d" }
{   _id:   "e", ancestors: [ "a", "b" ], parents: ["a", "b" ]}
{   _id:   "f", ancestors: [ "c", "d" ], parents: ["c", "d" ]}
{   _id:   "g", ancestors: [ "a", "b", "c", "d", "e", "f" ], parents: ["e", "f"] }



Great for small trees (or subsets).
//find all descendants of b:
> db.tree.find({ ancestors: ‘b’})
Could be used to store X generations of ancestors
Optimized for retrieving entire tree
//find all direct descendants of b:
> db.tree.find({ parents: ‘b’})
Uses implied relationships
//find all ancestors of g:
No = db.tree.findOne( { _id: 'g'is )this person my grandson?
> g help on specifics... }
> db.tree.find( { _id: { $in : g.ancestors } )
Easier retrieval at expense of costlier maintenance
Array of Ancestors
{   _id:   "a" }
{   _id:   "b" }
{   _id:   "c" }
{   _id:   "d" }
{   _id:   "e", ancestors: [ "a", "b" ], parents: ["a", "b" ]}
{   _id:   "f", ancestors: [ "c", "d" ], parents: ["c", "d" ]}
{   _id:   "g", ancestors: [ "a", "b", "c", "d", "e", "f" ], parents: ["e", "f"] }



Great for small trees (or subsets).
//find all descendants of b:
> db.tree.find({ ancestors: ‘b’})
Could be used to store X generations of ancestors
Optimized for retrieving entire tree
//find all direct descendants of b:
> db.tree.find({ parents: ‘b’})
Uses implied relationships
//find all ancestors of g:
No = db.tree.findOne( { _id: 'g'is )this person my grandson?
> g help on specifics... }
> db.tree.find( { _id: { $in : g.ancestors } )
Easier retrieval at expense of costlier maintenance
Relations (basic)
{   _id     : "b",
    relations : [
       {
         id      : "a",
         relation : "parent"},
       {
         id      : "c",
         relation : "grandparent"},
       {
         id      : "d",
         relation : "parent"}]}
Relations (detailed)
{   _id     : "b",
    relations : [
       {
         id      : "a",
         relation : "parent",
         type      : "mother",
         subtype : "biological" },
       {
         id      : "c",
         relation : "parent",
         type      : "father",
         subtype : "adopted"},
       {
         id      : "d",
         relation : "parent",
         type      : "father",
         subtype : "biological"}]}
Shouldn’t I store my
family tree in a graph
     database?
   They are built to store trees after all
Graphs are great at
traversing deep in a tree

              • Is this node my
                relative?


              • Retrieve my paternal
                great, great, great,
                great grandpa
Graphs are great at
traversing deep in a tree

              • Is this node my
                relative?


              • Retrieve my paternal
                great, great, great,
                great grandpa
Graphs are great at
traversing deep in a tree

              • Is this node my
                relative?


              • Retrieve my paternal
                great, great, great,
                great grandpa
Unfortunately that’s not
how we commonly work
Typically we are working with a node and
it’s immediate neighbors
The significant majority of our operations
aren’t traversing

If those operations are
important, perhaps a
hybrid graph & document
solution makes sense
http://spf13.com
                           http://github.com/s
                           @spf13




Question
    download at mongodb.org
We’re hiring!! Contact us at jobs@10gen.com
MongoDB for Genealogy

Weitere ähnliche Inhalte

Was ist angesagt?

Respaldo y Recuperación de una Base de Datos.pptx
Respaldo y Recuperación de una Base de Datos.pptxRespaldo y Recuperación de una Base de Datos.pptx
Respaldo y Recuperación de una Base de Datos.pptxJGUADALUPECAMPAMENDE
 
Twitter의 snowflake 소개 및 활용
Twitter의 snowflake 소개 및 활용Twitter의 snowflake 소개 및 활용
Twitter의 snowflake 소개 및 활용흥배 최
 
간단한 블로그를 만들며 Django 이해하기
간단한 블로그를 만들며 Django 이해하기간단한 블로그를 만들며 Django 이해하기
간단한 블로그를 만들며 Django 이해하기Kyoung Up Jung
 
[KAIST 채용설명회] 데이터 엔지니어는 무슨 일을 하나요?
[KAIST 채용설명회] 데이터 엔지니어는 무슨 일을 하나요?[KAIST 채용설명회] 데이터 엔지니어는 무슨 일을 하나요?
[KAIST 채용설명회] 데이터 엔지니어는 무슨 일을 하나요?Juhong Park
 
Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문SeungHyun Eom
 
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기Kee Hoon Lee
 
성장을 좋아하는 사람이, 성장하고 싶은 사람에게
성장을 좋아하는 사람이, 성장하고 싶은 사람에게성장을 좋아하는 사람이, 성장하고 싶은 사람에게
성장을 좋아하는 사람이, 성장하고 싶은 사람에게Seongyun Byeon
 
Sinatra Rack And Middleware
Sinatra Rack And MiddlewareSinatra Rack And Middleware
Sinatra Rack And MiddlewareBen Schwarz
 
[261] 실시간 추천엔진 머신한대에 구겨넣기
[261] 실시간 추천엔진 머신한대에 구겨넣기[261] 실시간 추천엔진 머신한대에 구겨넣기
[261] 실시간 추천엔진 머신한대에 구겨넣기NAVER D2
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger InternalsNorberto Leite
 
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기AWSKRUG - AWS한국사용자모임
 
Redmineプラグイン導入・開発入門
Redmineプラグイン導入・開発入門Redmineプラグイン導入・開発入門
Redmineプラグイン導入・開発入門Minoru Maeda
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMongoDB
 
MySQL Index Cookbook
MySQL Index CookbookMySQL Index Cookbook
MySQL Index CookbookMYXPLAIN
 
프론트엔드 코딩 컨벤션 자동화 도구
프론트엔드 코딩 컨벤션 자동화 도구프론트엔드 코딩 컨벤션 자동화 도구
프론트엔드 코딩 컨벤션 자동화 도구Taegon Kim
 
pandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Pythonpandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for PythonWes McKinney
 

Was ist angesagt? (20)

Respaldo y Recuperación de una Base de Datos.pptx
Respaldo y Recuperación de una Base de Datos.pptxRespaldo y Recuperación de una Base de Datos.pptx
Respaldo y Recuperación de una Base de Datos.pptx
 
Twitter의 snowflake 소개 및 활용
Twitter의 snowflake 소개 및 활용Twitter의 snowflake 소개 및 활용
Twitter의 snowflake 소개 및 활용
 
간단한 블로그를 만들며 Django 이해하기
간단한 블로그를 만들며 Django 이해하기간단한 블로그를 만들며 Django 이해하기
간단한 블로그를 만들며 Django 이해하기
 
[KAIST 채용설명회] 데이터 엔지니어는 무슨 일을 하나요?
[KAIST 채용설명회] 데이터 엔지니어는 무슨 일을 하나요?[KAIST 채용설명회] 데이터 엔지니어는 무슨 일을 하나요?
[KAIST 채용설명회] 데이터 엔지니어는 무슨 일을 하나요?
 
Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문
 
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기
 
MongoDB
MongoDBMongoDB
MongoDB
 
Get to know PostgreSQL!
Get to know PostgreSQL!Get to know PostgreSQL!
Get to know PostgreSQL!
 
How to Design Indexes, Really
How to Design Indexes, ReallyHow to Design Indexes, Really
How to Design Indexes, Really
 
성장을 좋아하는 사람이, 성장하고 싶은 사람에게
성장을 좋아하는 사람이, 성장하고 싶은 사람에게성장을 좋아하는 사람이, 성장하고 싶은 사람에게
성장을 좋아하는 사람이, 성장하고 싶은 사람에게
 
Sinatra Rack And Middleware
Sinatra Rack And MiddlewareSinatra Rack And Middleware
Sinatra Rack And Middleware
 
[261] 실시간 추천엔진 머신한대에 구겨넣기
[261] 실시간 추천엔진 머신한대에 구겨넣기[261] 실시간 추천엔진 머신한대에 구겨넣기
[261] 실시간 추천엔진 머신한대에 구겨넣기
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger Internals
 
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
 
Redmineプラグイン導入・開発入門
Redmineプラグイン導入・開発入門Redmineプラグイン導入・開発入門
Redmineプラグイン導入・開発入門
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MySQL Index Cookbook
MySQL Index CookbookMySQL Index Cookbook
MySQL Index Cookbook
 
프론트엔드 코딩 컨벤션 자동화 도구
프론트엔드 코딩 컨벤션 자동화 도구프론트엔드 코딩 컨벤션 자동화 도구
프론트엔드 코딩 컨벤션 자동화 도구
 
pandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Pythonpandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Python
 
Node.js 기본
Node.js 기본Node.js 기본
Node.js 기본
 

Ähnlich wie MongoDB for Genealogy

OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialSteven Francia
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBSean Laurent
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsKorea Sdec
 
Mongo db eveningschemadesign
Mongo db eveningschemadesignMongo db eveningschemadesign
Mongo db eveningschemadesignMongoDB APAC
 
10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCup10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCupWebGeek Philippines
 
Schema Design by Example ~ MongoSF 2012
Schema Design by Example ~ MongoSF 2012Schema Design by Example ~ MongoSF 2012
Schema Design by Example ~ MongoSF 2012hungarianhc
 
mongodb-120401144140-phpapp01 claud camputing
mongodb-120401144140-phpapp01 claud camputingmongodb-120401144140-phpapp01 claud camputing
mongodb-120401144140-phpapp01 claud camputingmoeincanada007
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big dataSteven Francia
 
No SQL : Which way to go? Presented at DDDMelbourne 2015
No SQL : Which way to go?  Presented at DDDMelbourne 2015No SQL : Which way to go?  Presented at DDDMelbourne 2015
No SQL : Which way to go? Presented at DDDMelbourne 2015Himanshu Desai
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsGeorge Stathis
 
How companies use NoSQL and Couchbase
How companies use NoSQL and CouchbaseHow companies use NoSQL and Couchbase
How companies use NoSQL and CouchbaseDipti Borkar
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRavi Teja
 
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...Prasoon Kumar
 
MongoDB by Emroz sardar.
MongoDB by Emroz sardar.MongoDB by Emroz sardar.
MongoDB by Emroz sardar.Emroz Sardar
 

Ähnlich wie MongoDB for Genealogy (20)

OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
 
Mongo db eveningschemadesign
Mongo db eveningschemadesignMongo db eveningschemadesign
Mongo db eveningschemadesign
 
10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCup10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCup
 
Schema Design by Example ~ MongoSF 2012
Schema Design by Example ~ MongoSF 2012Schema Design by Example ~ MongoSF 2012
Schema Design by Example ~ MongoSF 2012
 
mongodb-120401144140-phpapp01 claud camputing
mongodb-120401144140-phpapp01 claud camputingmongodb-120401144140-phpapp01 claud camputing
mongodb-120401144140-phpapp01 claud camputing
 
MongoDB
MongoDBMongoDB
MongoDB
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
 
No SQL : Which way to go? Presented at DDDMelbourne 2015
No SQL : Which way to go?  Presented at DDDMelbourne 2015No SQL : Which way to go?  Presented at DDDMelbourne 2015
No SQL : Which way to go? Presented at DDDMelbourne 2015
 
NoSQL, which way to go?
NoSQL, which way to go?NoSQL, which way to go?
NoSQL, which way to go?
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
 
How companies use NoSQL and Couchbase
How companies use NoSQL and CouchbaseHow companies use NoSQL and Couchbase
How companies use NoSQL and Couchbase
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
 
Mongo DB
Mongo DB Mongo DB
Mongo DB
 
mongoDB at Visibiz
mongoDB at VisibizmongoDB at Visibiz
mongoDB at Visibiz
 
MongoDB by Emroz sardar.
MongoDB by Emroz sardar.MongoDB by Emroz sardar.
MongoDB by Emroz sardar.
 
MongoDB and hadoop
MongoDB and hadoopMongoDB and hadoop
MongoDB and hadoop
 

Mehr von Steven Francia

State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017Steven Francia
 
Building Awesome CLI apps in Go
Building Awesome CLI apps in GoBuilding Awesome CLI apps in Go
Building Awesome CLI apps in GoSteven Francia
 
The Future of the Operating System - Keynote LinuxCon 2015
The Future of the Operating System -  Keynote LinuxCon 2015The Future of the Operating System -  Keynote LinuxCon 2015
The Future of the Operating System - Keynote LinuxCon 2015Steven Francia
 
7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)Steven Francia
 
What every successful open source project needs
What every successful open source project needsWhat every successful open source project needs
What every successful open source project needsSteven Francia
 
7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid them7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid themSteven Francia
 
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...Steven Francia
 
Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go Steven Francia
 
Getting Started with Go
Getting Started with GoGetting Started with Go
Getting Started with GoSteven Francia
 
Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013Steven Francia
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopSteven Francia
 
MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012Steven Francia
 
Big data for the rest of us
Big data for the rest of usBig data for the rest of us
Big data for the rest of usSteven Francia
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoverySteven Francia
 
Multi Data Center Strategies
Multi Data Center StrategiesMulti Data Center Strategies
Multi Data Center StrategiesSteven Francia
 
MongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataMongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataSteven Francia
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsSteven Francia
 
Building your first application w/mongoDB MongoSV2011
Building your first application w/mongoDB MongoSV2011Building your first application w/mongoDB MongoSV2011
Building your first application w/mongoDB MongoSV2011Steven Francia
 
MongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsMongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsSteven Francia
 

Mehr von Steven Francia (20)

State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017
 
Building Awesome CLI apps in Go
Building Awesome CLI apps in GoBuilding Awesome CLI apps in Go
Building Awesome CLI apps in Go
 
The Future of the Operating System - Keynote LinuxCon 2015
The Future of the Operating System -  Keynote LinuxCon 2015The Future of the Operating System -  Keynote LinuxCon 2015
The Future of the Operating System - Keynote LinuxCon 2015
 
7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)
 
What every successful open source project needs
What every successful open source project needsWhat every successful open source project needs
What every successful open source project needs
 
7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid them7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid them
 
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
 
Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go
 
Getting Started with Go
Getting Started with GoGetting Started with Go
Getting Started with Go
 
Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
 
Future of data
Future of dataFuture of data
Future of data
 
MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012
 
Big data for the rest of us
Big data for the rest of usBig data for the rest of us
Big data for the rest of us
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster Recovery
 
Multi Data Center Strategies
Multi Data Center StrategiesMulti Data Center Strategies
Multi Data Center Strategies
 
MongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataMongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous Data
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS Applications
 
Building your first application w/mongoDB MongoSV2011
Building your first application w/mongoDB MongoSV2011Building your first application w/mongoDB MongoSV2011
Building your first application w/mongoDB MongoSV2011
 
MongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsMongoDB, E-commerce and Transactions
MongoDB, E-commerce and Transactions
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Kürzlich hochgeladen (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

MongoDB for Genealogy

  • 1. Storing the Family Tree with
  • 2. We’re going to talk about MongoDB Intro & Fundamentals MongoDB for Genealogy data Scaling MongoDB for all the generations The Family Tree Storing a graph in MongoDB
  • 3. Steve @sp A 15+ years building the internet Father, husband, skateboarder, genealogist at ❤ Chief Solutions Architect @ responsible for drivers, integrations, web & docs
  • 4. Company behind MongoDB Offices in NYC, Palo Alto, London & Dublin 100+ employees Support, consulting, training Mgt: Google/DoubleClick, Oracle, Apple, NetApp, Mark Logic Well Funded: Sequoia, Union Square, Flybridge
  • 5. Introduction to MongoD
  • 8.
  • 9.
  • 10. 1979
  • 11. 1979 1994
  • 12. 1979 1994 1995
  • 13. Computers in 1995 100 mhz Pentium 10 base T 16 MB ram 200 MB HD
  • 14. Cloud in 1995 (Windows 95 cloud wallpaper)
  • 15. Cell Phones in 2012 Dual core 1.5Ghz 802.11n (300+ Mbps) 1 GB ram 64 GB Solid State
  • 16. MongoDB Application Document Oriented High { author : “steve”, date : new Date(), Performance text : “About MongoDB...”, tags : [“tech”, “database”]} Fully Consistent Horizontally Scalable
  • 17. MongoDB philosophy Keep functionality when we can (key/value stores are great, but we need more) Non-relational (no joins) makes scaling horizontally practical Document data models are good Database technology should run anywhere virtualized, cloud, metal, etc
  • 18. Under the hood Written in C++ Runs nearly everywhere Data serialized to BSON Extensive use of memory-mapped files i.e. read-through write-through memory caching.
  • 19. Database Landscape Scalability & Performance MemCache MongoDB RDBMS Depth of Functionality
  • 20. “ MongoDB has the best features of key/value stores, document databases and relational databases in one. John Nunemaker
  • 21. Relational made normalized data look like this Category • Name • Url Article User • Name Tag • Name • Slug • Name • Email Address • Publish date • Url • Text Comment • Comment • Date • Author
  • 22. Document databases make normalized data look like this Article • Name • Slug • Publish date User • Text • Name • Author • Email Address Comment[] • Comment • Date • Author Tag[] • Value Category[] • Value
  • 23. But we’ve been using a relational database for 40 years!
  • 24. How do people store documents in real life?
  • 25. Think about a doctors office There’s two ways they could organize their files
  • 26. Each document type in it’s own drawer MRIs X-rays Lab Invoices Index 1 1 1 1 1 1 1 1 History Medications Lab Forms
  • 27. Each document type in it’s own drawer MRIs X-rays Lab Invoices Index 1 1 1 1 1 1 1 1 History Medications Lab Forms
  • 28. Each document type in it’s own drawer MRIs X-rays Lab Invoices Index 1 1 1 1 1 1 1 1 History Medications Lab Forms
  • 29. 2. Group related records Patient 1 Patient 2 Patient 3 ... Vendor 1 Vendor 2 Vendor 3
  • 30. 2. Group related records Patient 1 Patient 3 ... Patient 2 Vendor 1 Vendor 2 Vendor 3
  • 31. Databases work the same way Relation Docum Patient 1 Vendor 1 Article Category • Name • Name • Slug • Url • Publish User date • Text • Name • Author • Email Address Article User Tag • Name Comment[] • Name • Name • Email • Slug • Url • Comment Address • Publish date • Date • Author Comment Tag[] • Comment • Value • Date • Author Category[] • Value
  • 32. Terminology RDBMS Mongo Table, View ➜ Collection Row ➜ Document Index ➜ Index Join ➜ Embedded Foreign Key ➜ Document Reference Partition ➜ Shard
  • 33. Why MongoDB My Top 10 Reasons 10. Great developer experience 9. Speaks your language 8. Scale horizontally 7. Fully consistent data w/atomic operations 1.It’s web scale 6. Memory caching integrated 5. Open source 4. Flexible, rich & structured data format not just K:V 3. Ludicrously fast (without going plaid) 2. Simplify infrastructure & application
  • 34. Why MongoDB My Top 10 Reasons 10. Great developer experience 9. Speaks your language 8. Scale horizontally 7. Fully consistent data w/atomic operations 1.It’s web scale 6. Memory caching integrated 5. Open source 4. Flexible, rich & structured data format not just K:V 3. Ludicrously fast (without going plaid) 2. Simplify infrastructure & application
  • 36. CMS / Blog Needs: • Business needed modern data store for rapid development and scale Solution: • Use PHP & MongoDB Results: • Real time statistics • All data, images, etc stored together easy access, easy deployment, easy high availability • No need for complex migrations • Enabled very rapid development and growth
  • 37. Photo Meta-Data Problem: • Business needed more flexibility than Oracle could deliver Solution: • Use MongoDB instead of Oracle Results: • Developed application in one sprint cycle • 500% cost reduction compared to Oracle • 900% performance improvement compared to Oracle
  • 38. Customer Analytics Problem: • Deal with massive data volume across all customer sites Solution: • Use MongoDB to replace Google Analytics / Omniture options Results: • Less than one week to build prototype and prove business case • Rapid deployment of new features
  • 39. Archiving Why MongoDB: • Existing application built on MySQL • Lots of friction with RDBMS based archive storage • Needed more scalable archive storage backend Solution: • Keep MySQL for active data (100mil) • MongoDB for archive (2+ billion) Results: • No more alter table statements taking over 2 months to run • Sharding fixed vertical scale problem • Very happily looking at other places to use MongoDB
  • 40. Online Dictionary Problem: • MySQL could not scale to handle their 5B+ documents Solution: • Switched from MySQL to MongoDB Results: • Massive simplification of code base • Eliminated need for external caching system • 20x performance improvement over MySQL
  • 41. E-commerce Problem: • Multi-vertical E-commerce impossible to model (efficiently) in RDBMS Solution: • Switched from MySQL to MongoDB Results: • Massive simplification of code base • Rapidly build, halving time to market (and cost) • Eliminated need for external caching system • 50x+ performance improvement over MySQL
  • 42. Tons more MongoDB casts a wide net people keep coming up with new and brilliant ways to use it
  • 43. In Good Company and 1000s more
  • 45. Start with an (or array, hash, dict, e place1 = { name : "10gen HQ", address : "578 Broadway 7th Floor", city : "New York", zip : "10011", tags : [ "business", "awesome" ] }
  • 46. Inserting the record Initial Data Load > db.places.insert(place1) > db.places.insert(place1)
  • 47. Querying { name : "10gen HQ", address : "134 5th Avenue 3rd Floor", city : "New York", zip : "10011", tags : [ "business", "awesome" ] } > db.posts.findOne({ zip: "10011", tags: "awesome" }) > db.posts.find({tags: "business" })
  • 48. Nested Documents { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "roger", date : "Sat Apr 24 2011 19:47:11", text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [ { author : "Fred", date : "Sat Apr 25 2010 20:51:03", text : "Best Post Ever!" } ] }
  • 49. Object ID > db.places.insert(place1) object(MongoId)#4 (1) { ["$id"]=> string(24) "4e9cc76a4a1817fd21000000" } 4e9cc76a4a1817fd21000000 |------||----||--||----| ts mac pid inc
  • 50. A More Complex Document place1 = { name : "10gen HQ", address : "578 Broadway 7th Floor", city : "New York", zip : "10011", tags : [ "business", "awesome" ], latlong : [40.0,72.0], tips : [ { user : "ryan", time : 6/26/2011, tip : "stop by for office hours"}, {.....}] }
  • 51. Indexing & Adv Querying // Index nested documents db.posts.ensureIndex({ "comments.author":1 }) db.posts.find({'comments.author':'Fred'}) // Regular Expressions db.posts.find({'comments.author': /^Fr/}) // Index on tags (multi-key index) db.posts.ensureIndex({ tags: 1}) db.posts.find( { tags: 'tech' } ) // geospatial index db.posts.ensureIndex({ "author.location": "2d" }) db.posts.find({"author.location":{$near:[22,42]}})
  • 52. Updating place1 = { name : "10gen HQ", > db.places.update( address : "578 Broadway 7th Floor", {name : "10gen HQ"}, city : "New York", { $push : zip : "10011", { tips : tags : [ "business", "awesome" ], latlong {: user : "nosh", [40.0,72.0], tips : [ { user : "ryan", time : 6/26/2011, time : 6/26/2011, tiptip"Office by for office hours on : : "stop hours are great!" } Wednesdays from 4-6pm"}, } { user : "nosh", time : 7/14/2011, } tip : "Office hours are great!"} ) ] }
  • 53. Updating place1 = { name : "10gen HQ", > db.places.update( address : "578 Broadway 7th Floor", {name : "10gen HQ"}, city : "New York", { $push : zip : "10011", { tips : tags : [ "business", "awesome" ], latlong {: user : "nosh", [40.0,72.0], tips : [ { user : "ryan", time : 6/26/2011, time : 6/26/2011, tiptip"Office by for office hours on : : "stop hours are great!" } Wednesdays from 4-6pm"}, } { user : "nosh", time : 7/14/2011, } tip : "Office hours are great!"} ) ] }
  • 54. Atomic Operations $set $unset $rename $push $pop $pull $addToSet $in
  • 55. Cursors $cursor = $c->find(array("foo" => "bar")); foreach ($cursor as $id => $value) { echo "$id: "; var_dump( $value ); } $a = iterator_to_array($cursor);
  • 56. Paging page_num = 3; results_per_page = 10; cursor = db.collection.find() .sort({ "ts" : -1 }) .skip(page_num * results_per_page) .limit(results_per_page);
  • 59. Storing Big Files >16mb stored in 16mb chunks
  • 60. Storing Big Files Works with replicated and
  • 61. A better network FS GridFS files are seamlessly sharded & replicated. No OS constraints... No file size limits No naming constraints No folder limits Standard across different OSs MongoDB automatically generates the MD5 hash of the file
  • 63. Types of genealogy data Events (birth, death, Photographs etc) Diaries & letters Official records Ship passenger list Census Occupation Names and more Relationships
  • 64. Challenges of genealogy data Lots of possible data points... need flexible schema Multiple versions of same data point (3 different dates for death date, 4 variations on name). Data related to records Multiple versions of same nodes (intelligent nondestructive merge needed) Need to have meta data associated
  • 66. 0 @I2@ INDI 1 NAME Charles Phillip /Ingalls/ 1 SEX M 1 BIRT 2 DATE 10 JAN 1836 2 PLAC Cuba, Allegheny, NY 1 DEAT Recog 2 DATE 08 JUN 1902 2 PLAC De Smet, Kingsbury, Dakota Territory 1 FAMC @F2@ 1 FAMS @F3@ nize 0 @I3@ INDI 1 NAME Caroline Lake /Quiner/ 1 SEX F 1 BIRT 2 DATE 12 DEC 1839
  • 67. GEDCOM File format, not a database Handles the great variety of data well Doesn’t really scale beyond a local user. Doesn’t provide good mechanism for storing external documents (birth certificates, etc). Built to solve problem of sharing data
  • 68. Genealogy & MongoDB Genealogy is anything but rigid and fixed Flexible schema fits genealogy data well Packaging things together makes sense Relating records doesn’t require a relational database
  • 69. Indivi •AFN •Modification Date Events[] •type •date Name •contributor[] •record[] •First[] •Middle[] Location •Last[] •city •state •county •country
  • 70. Indivi Events[] Us • Name • AFN • type • Email Address • Modification Date • date • Password • contributor[] • Individual_id • record[] Name • First[] • Middle[] Location • Last[] • city • state Rec • county • contributor • country • type • coordinates[] • thumbnail • content • description • tags[]
  • 71. Individual individual = { _id : ObjectId("4f2978dfaa999d9db02618ce"), AFN : '1XYK-KQJ', name: { first: ['john', 'johannes'], middle: 'peter', last: ['smith', 'sandvik'] } }
  • 72. Individual individual = { _id : ObjectId("4f2978dfaa999d9db02618ce"), AFN : '1XYK-KQJ', name: { first: ['john', 'johannes'], middle: 'peter', last: ['smith', 'sandvik'] } } db.individual.find( {name.first : ‘john’, name.middle : ‘peter’})
  • 73. Events events : [ death : { date : ISODate('1989-07-14'), location : { city: 'pensacola', state: 'fl', county: 'escambia', country: 'usa' coordinates : [30.26,87.12]}, contributor : ObjectId("4eeac...691")}]
  • 74. events : [ death : { Events date : ISODate('1989-07-14'), location : { city: 'pensacola', state: 'fl', county: 'escambia', country: 'usa' coordinates : [30.26,87.12]}, contributor : ObjectId("4eeac...691")}] db.individual.find( {events.death.date : ISODate(‘1989-07-14’)}) db.individual.find( {events.death.location : { $near:[30,90]}})
  • 75. Duplicate Events events : [ birth : [ { date : ISODate('1928-04-06'), location : { city: 'brattleboro', state: 'vt', county: 'windham', country: 'usa' coordinates : [42.51,72.34]}, contributor : ObjectId("4ee...00000"), records: ObjectId("4ed8a...7b000000") },
  • 76. county: 'windham', Duplicate Events country: 'usa' coordinates : [42.51,72.34]}, contributor : ObjectId("4ee...00000"), records: ObjectId("4ed8a...7b000000") }, { date : ISODate('1928-04-16'), location : { city: 'brattleboro', state: 'vt', county: 'windham', country: 'usa' coordinates : [42.51,72.34]}, contributor : ObjectId("4ee...37bb"), records: ObjectId("4eea...0000c8"), }], }
  • 77. Duplicate Events events : [ birth : [ { date : ISODate('1928-04-06')}, { date : ISODate('1928-04-16')}], ] db.individual.find( {events.birth.date : ISODate(‘1928-04-16’)}) Same Query Works!!
  • 78. Multiple Events marriage : [{ date : ISODate('1939-08-11'), end_date : ISODate('1940-02-19'), to : ObjectId("4f297978aa999d9db02618cf"), location : { city: 'raleigh', state: 'nc', county: 'wake', country: 'usa' coordinates : [35.49,78.38]}, contributor : ObjectId("4eeac...91537bb")}, { date : ISODate('1944-04-19'), to : ObjectId("4f2978dfaa999d9db02618ce"), location : {
  • 79. marriage : [{ Multiple Events date : ISODate('1939-08-11'), end_date : ISODate('1940-02-19'), to : ObjectId("4f297978aa999d9db02618cf"), location : { city: 'raleigh', state: 'nc', county: 'wake', country: 'usa' coordinates : [35.49,78.38]}, contributor : ObjectId("4eeac...91537bb")}, { date : ISODate('1944-04-19'), to : ObjectId("4f2978dfaa999d9db02618ce"), location : { city: 'atlanta', state: 'ga', county: 'fulton', country: 'usa' coordinates : [33.45,84.23]}, contributor : ObjectId("4eeb...37bb")}]
  • 80. individual = { All _id : ObjectId("4f2978dfaa999d9db02618ce"), togeth AFN : '1XYK-KQJ', name: { first: ['john', 'johannes'], middle: 'peter', last: ['smith', 'sandvik'] }, events : [ er birth : [ { date : ISODate('1928-04-06'), location : { Text city: 'brattleboro', state: 'vt', county: 'windham', country: 'usa' coordinates : [42.51,72.34] }, contributor : ObjectId("4eeabc958b691537bb000000"), records: ObjectId("4ed8aea7d8562f7d7b000000") }, { date : ISODate('1928-04-16'), location : { city: 'brattleboro',
  • 81. Records record1 = { _id : ObjectId("4ed8aea7d8562f7d7b") contributor : ObjectId("4eeab...1537bb"), type : 'birth certificate', thumbnail : BinData(0,"/9j/4AAQSkZJ...."), content : BinData(0,"j6b/Id11lWqs..."), tags : ['NY', 'certified'], description : "John's birth certificate" }
  • 82. Users user = { _id : ObjectId("4eeabc958b691537bb"), username : 'spf13', email_address : 'genealogy@spf13.com', password : 'a.long.passphrase18', individual_id : ObjectId("4f2f...0ce"), }
  • 83. Scaling MongoDB for all the generation
  • 84. Replica Sets Primary Primary Primary Secondary Secondary Secondary Secondary Arbiter Secondary Secondary Secondary
  • 85. Sharding App App App Server Server Server MongoS MongoS MongoS ConfigD ConfigD ConfigD MongoD MongoD MongoD MongoD MongoD MongoD MongoD MongoD MongoD MongoD MongoD MongoD
  • 87. It’s not a tree at all, It’s really a graph ... and an odd one at that
  • 88. It would be easy if it always looked like this
  • 89. It would be easy if it always looked like this
  • 90. All sorts of mess Step & adopted relationships Duplicate nodes Lots of missing nodes Divorces and re-marriages Multiple names for the same person Multiple dates for the same event
  • 91. How to make sense of it all
  • 93. Graphs are important Without them we couldn’t store family relationships
  • 94. Trees / graphs in MongoDB Since MongoDB data structures are essentially objects, a good degree of flexibility here. Think of how you would structure them in your application
  • 95. Trees / graphs in MongoDB Each node is stored as a document Contains references to related nodes What is “related” depends on your application
  • 96. References vs Relation MongoDB uses references Unlike foreign keys, references don’t enforce integrity Reference is really just a reference For many applications a reference is sufficient
  • 97. Simple relationship { _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" } { _id: "e", parents: ["a", "b" ]} { _id: "f", parents: ["c", "d" ]} { _id: "g", parents: ["e", "f" ]} •= b =allancestors of g: of'g'});'b'}).toArray(); Easy to access b: //find //find all descendants var nodes in either direction db.family.find({ _id: g db.family.findOne({_id: •Good for trees / {graphs descendantsFind = function(par) { ancestorFind = function(child) • if ( ! (i in par) return sets var rv Can==[];[]; { large rv; var rv grab for child.parents) //finddb.family.find( { descendants of b:} ).toArray(); var k = all db.family.find( { _id : :{ par[i]._id }).toArray(); parents = direct parents $in : child.parents } •Minimum amount of maintenance rv = rv.concat(parents); rv = rv.concat(k); >forrv = irv.concat(descendantsFind(k)); : ‘b’}) db.family.find({ parents (var in parents) { •Balanced ancestorFind(parents[i])); } } rv = rv.concat( return rv; •Implied relationships } } return rv; descendantsFind(b); ancestorFind(g);
  • 98. Simple relationship { _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" } { _id: "e", parents: ["a", "b" ]} { _id: "f", parents: ["c", "d" ]} { _id: "g", parents: ["e", "f" ]} •= b =allancestors of g: of'g'});'b'}).toArray(); Easy to access b: //find //find all descendants var nodes in either direction db.family.find({ _id: g db.family.findOne({_id: •Good for trees / {graphs descendantsFind = function(par) { ancestorFind = function(child) • if ( ! (i in par) return sets var rv Can==[];[]; { large rv; var rv grab for child.parents) //finddb.family.find( { descendants of b:} ).toArray(); var k = all db.family.find( { _id : :{ par[i]._id }).toArray(); parents = direct parents $in : child.parents } •Minimum amount of maintenance rv = rv.concat(parents); rv = rv.concat(k); >forrv = irv.concat(descendantsFind(k)); : ‘b’}) db.family.find({ parents (var in parents) { •Balanced ancestorFind(parents[i])); } } rv = rv.concat( return rv; •Implied relationships } } return rv; descendantsFind(b); ancestorFind(g);
  • 99. Simple relationship { _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" } { _id: "e", parents: ["a", "b" ]} { _id: "f", parents: ["c", "d" ]} { _id: "g", parents: ["e", "f" ]} •= b =allancestors of g: of'g'});'b'}).toArray(); Easy to access b: //find //find all descendants var nodes in either direction db.family.find({ _id: g db.family.findOne({_id: •Good for trees / {graphs descendantsFind = function(par) { ancestorFind = function(child) • if ( ! (i in par) return sets var rv Can==[];[]; { large rv; var rv grab for child.parents) //finddb.family.find( { descendants of b:} ).toArray(); var k = all db.family.find( { _id : :{ par[i]._id }).toArray(); parents = direct parents $in : child.parents } •Minimum amount of maintenance rv = rv.concat(parents); rv = rv.concat(k); >forrv = irv.concat(descendantsFind(k)); : ‘b’}) db.family.find({ parents (var in parents) { •Balanced ancestorFind(parents[i])); } } rv = rv.concat( return rv; •Implied relationships } } return rv; descendantsFind(b); ancestorFind(g);
  • 100. Simple relationship { _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" } { _id: "e", parents: ["a", "b" ]} { _id: "f", parents: ["c", "d" ]} { _id: "g", parents: ["e", "f" ]} •= b =allancestors of g: of'g'});'b'}).toArray(); Easy to access b: //find //find all descendants var nodes in either direction db.family.find({ _id: g db.family.findOne({_id: •Good for trees / {graphs descendantsFind = function(par) { ancestorFind = function(child) • if ( ! (i in par) return sets var rv Can==[];[]; { large rv; var rv grab for child.parents) //finddb.family.find( { descendants of b:} ).toArray(); var k = all db.family.find( { _id : :{ par[i]._id }).toArray(); parents = direct parents $in : child.parents } •Minimum amount of maintenance rv = rv.concat(parents); rv = rv.concat(k); >forrv = irv.concat(descendantsFind(k)); : ‘b’}) db.family.find({ parents (var in parents) { •Balanced ancestorFind(parents[i])); } } rv = rv.concat( return rv; •Implied relationships } } return rv; descendantsFind(b); ancestorFind(g);
  • 101. Bi-directional { _id: "a", children: ["e"] } { _id: "b", children: ["e"] } { _id: "c", children: ["f"] } { _id: "d", children: ["f"] } { _id: "e", children: ["g"], parents: ["a", "b" ]} { _id: "f", children: ["g"], parents: ["c", "d" ]} { _id: "g", children: [] , parents: ["e", "f"] } •Doesn’t really add much beyond the first example •More maintenance •Duplication of each relationship •Only real advantage is ability to grab all related nodes (both directions) with one query.
  • 102. Array of Ancestors { _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" } { _id: "e", ancestors: [ "a", "b" ], parents: ["a", "b" ]} { _id: "f", ancestors: [ "c", "d" ], parents: ["c", "d" ]} { _id: "g", ancestors: [ "a", "b", "c", "d", "e", "f" ], parents: ["e", "f"] } Great for small trees (or subsets). //find all descendants of b: > db.tree.find({ ancestors: ‘b’}) Could be used to store X generations of ancestors Optimized for retrieving entire tree //find all direct descendants of b: > db.tree.find({ parents: ‘b’}) Uses implied relationships //find all ancestors of g: No = db.tree.findOne( { _id: 'g'is )this person my grandson? > g help on specifics... } > db.tree.find( { _id: { $in : g.ancestors } ) Easier retrieval at expense of costlier maintenance
  • 103. Array of Ancestors { _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" } { _id: "e", ancestors: [ "a", "b" ], parents: ["a", "b" ]} { _id: "f", ancestors: [ "c", "d" ], parents: ["c", "d" ]} { _id: "g", ancestors: [ "a", "b", "c", "d", "e", "f" ], parents: ["e", "f"] } Great for small trees (or subsets). //find all descendants of b: > db.tree.find({ ancestors: ‘b’}) Could be used to store X generations of ancestors Optimized for retrieving entire tree //find all direct descendants of b: > db.tree.find({ parents: ‘b’}) Uses implied relationships //find all ancestors of g: No = db.tree.findOne( { _id: 'g'is )this person my grandson? > g help on specifics... } > db.tree.find( { _id: { $in : g.ancestors } ) Easier retrieval at expense of costlier maintenance
  • 104. Relations (basic) { _id : "b", relations : [ { id : "a", relation : "parent"}, { id : "c", relation : "grandparent"}, { id : "d", relation : "parent"}]}
  • 105. Relations (detailed) { _id : "b", relations : [ { id : "a", relation : "parent", type : "mother", subtype : "biological" }, { id : "c", relation : "parent", type : "father", subtype : "adopted"}, { id : "d", relation : "parent", type : "father", subtype : "biological"}]}
  • 106. Shouldn’t I store my family tree in a graph database? They are built to store trees after all
  • 107. Graphs are great at traversing deep in a tree • Is this node my relative? • Retrieve my paternal great, great, great, great grandpa
  • 108. Graphs are great at traversing deep in a tree • Is this node my relative? • Retrieve my paternal great, great, great, great grandpa
  • 109. Graphs are great at traversing deep in a tree • Is this node my relative? • Retrieve my paternal great, great, great, great grandpa
  • 110. Unfortunately that’s not how we commonly work Typically we are working with a node and it’s immediate neighbors The significant majority of our operations aren’t traversing If those operations are important, perhaps a hybrid graph & document solution makes sense
  • 111. http://spf13.com http://github.com/s @spf13 Question download at mongodb.org We’re hiring!! Contact us at jobs@10gen.com

Hinweis der Redaktion

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. Remember in 1995 there were around 10,000 websites. Mosiac, Lynx, Mozilla (pre netscape) and IE 2.0 were the only web browsers. \nApache (Dec ’95), Java (’96), PHP (June ’95), and .net didn’t exist yet. Linux just barely (1.0 in ’94)\n
  10. Remember in 1995 there were around 10,000 websites. Mosiac, Lynx, Mozilla (pre netscape) and IE 2.0 were the only web browsers. \nApache (Dec ’95), Java (’96), PHP (June ’95), and .net didn’t exist yet. Linux just barely (1.0 in ’94)\n
  11. Remember in 1995 there were around 10,000 websites. Mosiac, Lynx, Mozilla (pre netscape) and IE 2.0 were the only web browsers. \nApache (Dec ’95), Java (’96), PHP (June ’95), and .net didn’t exist yet. Linux just barely (1.0 in ’94)\n
  12. \n
  13. \n
  14. \n
  15. \n
  16. By reducing transactional semantics the db provides, one can still solve an interesting set of problems where performance is very important, and horizontal scaling then becomes easier.\n\n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. \n
  73. \n
  74. \n
  75. \n
  76. \n
  77. \n
  78. \n
  79. \n
  80. \n
  81. \n
  82. \n
  83. \n
  84. \n
  85. \n
  86. \n
  87. \n
  88. \n
  89. \n
  90. \n
  91. \n
  92. \n
  93. \n
  94. \n
  95. \n
  96. Store an array of the id of the ancestor of a given document\n
  97. Store an array of the id of the ancestor of a given document\n
  98. Store an array of the id of the ancestor of a given document\n
  99. Store an array of the id of the ancestor of a given document\n
  100. Store an array of the id of the ancestor of a given document\n
  101. Store an array of the id of the ancestor of a given document\n
  102. Store an array of the id of the ancestor of a given document\n
  103. \n
  104. \n
  105. \n
  106. \n
  107. \n
  108. \n
  109. \n
  110. \n
  111. \n
  112. \n
  113. \n
  114. \n
  115. \n
  116. \n
  117. \n