SlideShare ist ein Scribd-Unternehmen logo
1 von 76
Modern Database
   Systems
@spf13

                  AKA
Steve Francia




Chief Evangelist @
responsible for drivers,
integrations, web & docs
What’s the Point?
๏   Goal: Discover & identify ideal
    storage solution for our needs
๏   History is important
๏   Many options today
๏   Document databases are good
    for Genealogy
History of the
    World
Over 5500 years ago




     2 People
1804
1 Billion People
1927
2 Billion People
World Population Growth
World Population Growth
       (last ~200 years in Billions)
                                                 8



                                                 6



                                                 4
                                           7
                                    6
                             5
                      4                          2
               3
        2
 1
1804   1927                                      0
              1960   1974   1987   1999   2012
Really Big Data
In the last 50 years...

over 4 % of the world people
were born...

in less than 1 % of the time
History of
Databases
1970

๏ Oracle
       creates the relational
 database
๏ Everyone happily uses it for
 the next 43 years
What really
 happened
Let’s start at
the beginning
It’s a story about...

Storing & Retrieving
    Information
Even today we still use
the same mediums for
     data storage
With the advent of
the computer things
   really took off
1960 : DBMS Emerges
๏   Ordered set of fixed length fields
๏   Low level pointer operations (flat
    files)
๏   Most popular was IMS (created at
    IBM)
๏   Shockingly still in use today at IBM &
    American Airlines
Lots of Problems
๏   Complex and inflexible
๏   User had to know physical structure of the
    DB in order to query for information
๏   Adding a field to the DB required rewriting
    the underlying access/modification scheme
๏   Records isolated (no relations)
๏   Emphasis on records to be processed, not
    overall structure
1970 : Relational DB
๏   Edgar Frank “Ted” Codd
๏   Relational Database
    theory
๏   Codd’s 13 rules
    (aka 12 rules)
3 HUGE Advantages
๏   Data independence from hardware
    and storage implementation
๏   Ability to process more than one
    record at a time with a single
    operation
๏   Establishing a relationship
    between records
IBM vs Codd
๏ IBM   bet on IMS
๏ Codd   bets on relational DB
๏ Eventually
           2 relational
 prototypes emerge
Ingres

๏ Built   at UC Berkley
๏ Uses    QUEL
๏ Inspires   Sybase & MSSQL
System R
๏   Built at IBM
๏   Leads to SEQUEL... later SQL
๏   Evolved into SQL/DS which
    evolved into DB2
๏   Project concludes that relational
    model is viable
Oracle
๏   Larry Ellison watches IBM
๏   Starts Relational Software Inc.
๏   Oracle 1st commercial RDBMS
    released in 1979
๏   Beats IBM by 2 years to market
Entity Relationship
๏   Proposed by Peter
    Chen in 1976
๏   Focuses on data use
    and not logical table
    structure
1980s
๏ RDBMS    dominates
๏ Some fields (medicine,
 physics, multimedia) need
 more than RDBMS offers
๏ Object   Databases emerge
Object Databases
๏   Inspired by Entity Relationship
๏   More flexible than relational permits
๏   Tightly coupled with OO
    programming language (c++, later
    Java)
๏   Full object: data & methods stored
1990s
๏ Internet   emerges
๏ Data   demand spikes
๏ Databases used for
 archiving historical data
Early 2000s
๏ Internet   booms
๏ RDBMS   fails to scale
๏ Indesperation we take a
 step backwards
MemcacheD
๏1   dimensional
๏ No   persistence
๏ No   ACI or D
๏ but...
... FAST
2005 ish
๏   Relational + MemcacheD
    broken (and we didn’t know it)
๏   Scale redefined with high
    volume & social
๏   Infrastructure reinvented with
    cloud computing & SSDs
Alternatives Emerge

๏ Dynamo   / Key Value
๏ Document

๏ Graph
Modern Data
  Storage
A lot going on
Easiest to define databases in
broad terms
• What is a record?
 (data model)
• CAP : CA, AP, CP ?
 (infrastructure model)
Data Storage Structure
 1D           2D                            nD

Key     Key        Value   Key      Value(s)
        Key        Value   Key      Value(s)
Value   Key        Value   Key
        Key        Value     Key         Value
                             Key         Value(s)
                             Key
                                   Key     Value
                                   Key     Value(s)
Database structure
   1D         2D             nD



Key Value
            Relational   Document
Dynamo
 Graph
CAP Theorem
               Availability




Partitioning                  Consistency
CAP Theorem

xx
Node         Node




       App
CAP Theorem
               Availability


   Dynamo
                                          RDBMS
                    t
Key Value
                ten


                             Int
                                 o
              sis




                                  ler
NoSQLs
               on




                                   ant
            Inc




                    Unavailable
Partition                                Consistency
Tolerant            MongoDB
                    BigTable
Key Value
๏                       ๏   Often
    1 Dimensional
    storage (tupal)         MultiMaster...
๏
                            meaning
    Query key only          availability over
๏   Bucket index            consistency
    (range) on keys     ๏   Partitioning easy
๏   Records cannot be       thanks to single
    updated, only           value
    replaced

Cassandra, Redis, MemcacheD, Riak, DynamoDB
Relational
                    ๏   Single master
๏   2 Dimensional
    storage (map)       meaning
                        consistency >
๏   Query any           availability
    field           ๏   Partitioning hard
๏                       due to
    BTree Indexes       transactions &
                        joins

Oracle, MSSQL, MySQL, PostgreSQL, DB2
Document
๏                     ๏   Single master
    n Dimensional
    storage (hash         meaning
    w/ nesting)           consistency >
                          availability
๏   Query any field
                      ๏   Partitioning easy
    at any level
                          thanks to richer
๏   BTree Indexes         data model

MongoDB, CouchDB, RethinkDB
Graph
 ๏   1 Dimensional storage... but grouped to appear
     2D
 ๏   Differentiated by indexes
 ๏   Large indexes cover many relationships
 ๏   Query time depends on # records returned,
     not distance to get them
 ๏   Doesn’t require traversing to determine
     relationship

Neo4j, about 20 more... nobody talks much about
MongoDB for
 Genealogy
Right Data
  Model
Types of
              genealogy data
๏
    Events                ๏
                              Photographs
    (birth, death, etc)
                          ๏
๏                             Diaries & letters
    Official records
                          ๏
๏                             Ship passenger list
    Census
                          ๏
๏                             Occupation
    Names
                          ๏
๏                             and more
    Relationships
Challenges of
             genealogy data
๏
    Lots of possible data points... need flexible
    schema
๏
    Multiple versions of same data point
    (3 different dates for death date, 4 variations on
    name).
๏
    Lots of data associated with physical records
๏
    Multiple versions of same nodes
    (intelligent nondestructive merge needed)
๏
    Need to have meta data associated
Individual                               User
                           Events[]      • Name
• AFN                 • type             • Email Address
• Modification Date   • date             • Password
                      • contributor[]    • Individual_id
                      • record[]
     Name
• First[]
• Middle[]                  Location
• Last[]               • city
                       • state
                       • county
                                         Record
                                         • contributor
                       • country         • type
                       • coordinates[]   • thumbnail
                                         • content
                                         • description
                                         • tags[]
Individual
individual = {
   _id : ObjectId("4f2978dfaa999d9db02618ce"),
   AFN : '1XYK-KQJ',
   name: {
      first: ['john', 'johannes'],
      middle: 'peter',
      last: ['smith', 'sandvik']
    }
}


db.individual.find(
{name.first : ‘john’, name.middle : ‘peter’})
Individual.Events
events : [
    death : {
       date : ISODate('1989-07-14'),
       location : {
           city: 'pensacola',
           state: 'fl',
           county: 'escambia',
           country: 'usa'
           coordinates : [30.26,87.12]},
       contributor : ObjectId("4eeac...691")}]

db.individual.find(
{events.death.date : ISODate(‘1989-07-14’)})

db.individual.find(
{events.death.location : { $near:[30,90]}})
Event Versions
events : [
   birth : [ {
        date : ISODate('1928-04-06'),
        location : {
           city: 'brattleboro',
           state: 'vt',
           county: 'windham',
           country: 'usa'
           coordinates : [42.51,72.34]},
        contributor : ObjectId("4ee...00000"),
        records: ObjectId("4ed8a...7b000000")
   },
   {
        date : ISODate('1928-04-16'),
        location : {
           city: 'brattleboro',
           state: 'vt',
           county: 'windham',
           country: 'usa'
           coordinates : [42.51,72.34]},
        contributor : ObjectId("4ee...37bb"),
        records: ObjectId("4eea...0000c8"),
    }],
}
Query with Versioned Events
events : [
   birth : [
      { date : ISODate('1928-04-06')},
      { date : ISODate('1928-04-16')}
   ],
]




db.individual.find(
{events.birth.date : ISODate(‘1928-04-16’)})
Records
record1 = {
    _id : ObjectId("4ed8aea7d8562f7d7b")
    contributor : ObjectId("4eeab...1537bb"),
    type : 'birth certificate',
    thumbnail : BinData(0,"/9j/4AAQSkZJ...."),
    content : BinData(0,"j6b/Id11lWqs..."),
    tags : ['NY', 'certified'],
    description : "John's birth certificate"
}
Right Scale
MongoDB: Scale built in
๏   Intelligent replication
๏   Automatic partitioning of data
    (user configurable)
๏   Horizontal Scale
๏   Targeted Queries
๏   Parallel Processing
Intelligent Replication

   Node 1                          Node 2
   Secondary                       Secondary
                    Heartbeat
       Re




                                    on
          p




                                      i
                                  cat
         lic
            ati




                                  pli
               on




                                Re
                    Node 3
                     Primary
Scalable Architecture
                App Server   App Server   App Server




                 Mongos       Mongos       Mongos
     Config
    Node 1
     Server
    Secondary


     Config
    Node 1
     Server
    Secondary


     Config
    Node 1
     Server
    Secondary


                 Shard        Shard        Shard
x
High Availability in Shards

     Shard         Shard

                    Primary


     Mongod
              or
                   Secondary


                   Secondary
Targeted Requests
                 1
                     4

                 Mongos


         2

             3


     Shard       Shard    Shard
Parallel processing
               1
                        6

               Mongos 5


           2    2           2

           4        4       4


      Shard    Shard        Shard

       3           3            3
Right Feature
     Set
Broad Feature Set
๏   Rich query language
๏   Native support for over 12 languages
๏   GeoSpatial
๏   Text search
๏   Aggregation & MapReduce
๏   GridFS
    (distributed & replicated file storage)
๏   Integration with Hadoop, Solr & more
Last Year I
presented
on Graph in
MongoDB



      http://j.mp/XvJ3dl
FamilySearch
presented in
December
2012




      http://j.mp/X03TXp
http://j.mp/X03TXp
http://j.mp/X03TXp
http://j.mp/X03TXp
http://spf13.com
            http://github.com/spf13
            @spf13



Questions?
download at mongodb.org

Weitere ähnliche Inhalte

Was ist angesagt?

Scylla Summit 2022: ScyllaDB Embraces Wasm
Scylla Summit 2022: ScyllaDB Embraces WasmScylla Summit 2022: ScyllaDB Embraces Wasm
Scylla Summit 2022: ScyllaDB Embraces WasmScyllaDB
 
Shrikant Bhongade - Dot Net Resume
Shrikant Bhongade - Dot Net ResumeShrikant Bhongade - Dot Net Resume
Shrikant Bhongade - Dot Net ResumeShrikant Bhongade
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDBFoundationDB
 
AWSとオンプレミスを繋ぐときに知っておきたいルーティングの基礎知識(CCSI監修!)
AWSとオンプレミスを繋ぐときに知っておきたいルーティングの基礎知識(CCSI監修!)AWSとオンプレミスを繋ぐときに知っておきたいルーティングの基礎知識(CCSI監修!)
AWSとオンプレミスを繋ぐときに知っておきたいルーティングの基礎知識(CCSI監修!)Trainocate Japan, Ltd.
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep DiveRed_Hat_Storage
 
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...OpenStack Korea Community
 
Muhammad Asif - Procurement Quality Supervisor CV (2)
Muhammad Asif - Procurement Quality Supervisor CV (2)Muhammad Asif - Procurement Quality Supervisor CV (2)
Muhammad Asif - Procurement Quality Supervisor CV (2)MUHAMMAD ASIF
 
【公開版】AWS基礎 for 新卒エンジニア
【公開版】AWS基礎 for 新卒エンジニア【公開版】AWS基礎 for 新卒エンジニア
【公開版】AWS基礎 for 新卒エンジニア鉄次 尾形
 
Effectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache PulsarEffectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache PulsarMatteo Merli
 
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...confluent
 
Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Claus Ibsen
 
Webアプリケーション負荷試験実践入門
Webアプリケーション負荷試験実践入門Webアプリケーション負荷試験実践入門
Webアプリケーション負荷試験実践入門樽八 仲川
 
(NET403) Another Day, Another Billion Packets
(NET403) Another Day, Another Billion Packets(NET403) Another Day, Another Billion Packets
(NET403) Another Day, Another Billion PacketsAmazon Web Services
 
MuleアプリケーションのCI/CD
MuleアプリケーションのCI/CDMuleアプリケーションのCI/CD
MuleアプリケーションのCI/CDMuleSoft Meetup Tokyo
 

Was ist angesagt? (20)

TAHIR CV
TAHIR CVTAHIR CV
TAHIR CV
 
Scylla Summit 2022: ScyllaDB Embraces Wasm
Scylla Summit 2022: ScyllaDB Embraces WasmScylla Summit 2022: ScyllaDB Embraces Wasm
Scylla Summit 2022: ScyllaDB Embraces Wasm
 
Shrikant Bhongade - Dot Net Resume
Shrikant Bhongade - Dot Net ResumeShrikant Bhongade - Dot Net Resume
Shrikant Bhongade - Dot Net Resume
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDB
 
RIG STORE KEEPER RESUME
RIG STORE KEEPER RESUMERIG STORE KEEPER RESUME
RIG STORE KEEPER RESUME
 
AWSとオンプレミスを繋ぐときに知っておきたいルーティングの基礎知識(CCSI監修!)
AWSとオンプレミスを繋ぐときに知っておきたいルーティングの基礎知識(CCSI監修!)AWSとオンプレミスを繋ぐときに知っておきたいルーティングの基礎知識(CCSI監修!)
AWSとオンプレミスを繋ぐときに知っておきたいルーティングの基礎知識(CCSI監修!)
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep Dive
 
Rehman cv
Rehman cvRehman cv
Rehman cv
 
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
 
Muhammad Asif - Procurement Quality Supervisor CV (2)
Muhammad Asif - Procurement Quality Supervisor CV (2)Muhammad Asif - Procurement Quality Supervisor CV (2)
Muhammad Asif - Procurement Quality Supervisor CV (2)
 
Scale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 servicesScale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 services
 
【公開版】AWS基礎 for 新卒エンジニア
【公開版】AWS基礎 for 新卒エンジニア【公開版】AWS基礎 for 新卒エンジニア
【公開版】AWS基礎 for 新卒エンジニア
 
Effectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache PulsarEffectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache Pulsar
 
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
 
Civil - Structural Designer CV
Civil - Structural Designer CVCivil - Structural Designer CV
Civil - Structural Designer CV
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...
 
Webアプリケーション負荷試験実践入門
Webアプリケーション負荷試験実践入門Webアプリケーション負荷試験実践入門
Webアプリケーション負荷試験実践入門
 
(NET403) Another Day, Another Billion Packets
(NET403) Another Day, Another Billion Packets(NET403) Another Day, Another Billion Packets
(NET403) Another Day, Another Billion Packets
 
MuleアプリケーションのCI/CD
MuleアプリケーションのCI/CDMuleアプリケーションのCI/CD
MuleアプリケーションのCI/CD
 

Ähnlich wie Modern Database Systems (for Genealogy)

Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLYan Cui
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBWilliam LaForest
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBjhugg
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Jon Haddad
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social WebBogdan Gaza
 
Plmce2012 scaling pinterest
Plmce2012 scaling pinterestPlmce2012 scaling pinterest
Plmce2012 scaling pinterestMohit Jain
 
Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Saltmarch Media
 
SQL Server 2008 Overview
SQL Server 2008 OverviewSQL Server 2008 Overview
SQL Server 2008 OverviewDavid Chou
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxData Con LA
 
Evolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistEvolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistTony Rogerson
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklNeo4j
 
SQL vs. NoSQL. It's always a hard choice.
SQL vs. NoSQL. It's always a hard choice.SQL vs. NoSQL. It's always a hard choice.
SQL vs. NoSQL. It's always a hard choice.Denis Reznik
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to CassandraJon Haddad
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Chris Richardson
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLMongoDB
 
Back to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQLBack to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQLJoe Drumgoole
 

Ähnlich wie Modern Database Systems (for Genealogy) (20)

Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDB
 
Nosql
NosqlNosql
Nosql
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social Web
 
MongoDB for Genealogy
MongoDB for GenealogyMongoDB for Genealogy
MongoDB for Genealogy
 
Plmce2012 scaling pinterest
Plmce2012 scaling pinterestPlmce2012 scaling pinterest
Plmce2012 scaling pinterest
 
Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?
 
SQL Server 2008 Overview
SQL Server 2008 OverviewSQL Server 2008 Overview
SQL Server 2008 Overview
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of Datastax
 
Evolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistEvolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/Specialist
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quickl
 
SQL vs. NoSQL. It's always a hard choice.
SQL vs. NoSQL. It's always a hard choice.SQL vs. NoSQL. It's always a hard choice.
SQL vs. NoSQL. It's always a hard choice.
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 
iForum 2015: SQL vs. NoSQL
iForum 2015: SQL vs. NoSQLiForum 2015: SQL vs. NoSQL
iForum 2015: SQL vs. NoSQL
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 
Back to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQLBack to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQL
 

Mehr von Steven Francia

State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017Steven Francia
 
Building Awesome CLI apps in Go
Building Awesome CLI apps in GoBuilding Awesome CLI apps in Go
Building Awesome CLI apps in GoSteven Francia
 
The Future of the Operating System - Keynote LinuxCon 2015
The Future of the Operating System -  Keynote LinuxCon 2015The Future of the Operating System -  Keynote LinuxCon 2015
The Future of the Operating System - Keynote LinuxCon 2015Steven Francia
 
7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)Steven Francia
 
What every successful open source project needs
What every successful open source project needsWhat every successful open source project needs
What every successful open source project needsSteven Francia
 
7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid them7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid themSteven Francia
 
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...Steven Francia
 
Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go Steven Francia
 
Getting Started with Go
Getting Started with GoGetting Started with Go
Getting Started with GoSteven Francia
 
Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013Steven Francia
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopSteven Francia
 
MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012Steven Francia
 
Big data for the rest of us
Big data for the rest of usBig data for the rest of us
Big data for the rest of usSteven Francia
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialSteven Francia
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoverySteven Francia
 
Multi Data Center Strategies
Multi Data Center StrategiesMulti Data Center Strategies
Multi Data Center StrategiesSteven Francia
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big dataSteven Francia
 
MongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataMongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataSteven Francia
 

Mehr von Steven Francia (20)

State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017
 
Building Awesome CLI apps in Go
Building Awesome CLI apps in GoBuilding Awesome CLI apps in Go
Building Awesome CLI apps in Go
 
The Future of the Operating System - Keynote LinuxCon 2015
The Future of the Operating System -  Keynote LinuxCon 2015The Future of the Operating System -  Keynote LinuxCon 2015
The Future of the Operating System - Keynote LinuxCon 2015
 
7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)
 
What every successful open source project needs
What every successful open source project needsWhat every successful open source project needs
What every successful open source project needs
 
7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid them7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid them
 
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
 
Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go
 
Getting Started with Go
Getting Started with GoGetting Started with Go
Getting Started with Go
 
Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
 
Future of data
Future of dataFuture of data
Future of data
 
MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012
 
Big data for the rest of us
Big data for the rest of usBig data for the rest of us
Big data for the rest of us
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster Recovery
 
Multi Data Center Strategies
Multi Data Center StrategiesMulti Data Center Strategies
Multi Data Center Strategies
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
 
MongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataMongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous Data
 
MongoDB and hadoop
MongoDB and hadoopMongoDB and hadoop
MongoDB and hadoop
 

Kürzlich hochgeladen

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Kürzlich hochgeladen (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Modern Database Systems (for Genealogy)

  • 1. Modern Database Systems
  • 2. @spf13 AKA Steve Francia Chief Evangelist @ responsible for drivers, integrations, web & docs
  • 3. What’s the Point? ๏ Goal: Discover & identify ideal storage solution for our needs ๏ History is important ๏ Many options today ๏ Document databases are good for Genealogy
  • 5. Over 5500 years ago 2 People
  • 9. World Population Growth (last ~200 years in Billions) 8 6 4 7 6 5 4 2 3 2 1 1804 1927 0 1960 1974 1987 1999 2012
  • 10. Really Big Data In the last 50 years... over 4 % of the world people were born... in less than 1 % of the time
  • 12. 1970 ๏ Oracle creates the relational database ๏ Everyone happily uses it for the next 43 years
  • 14. Let’s start at the beginning
  • 15. It’s a story about... Storing & Retrieving Information
  • 16.
  • 17.
  • 18.
  • 19.
  • 20. Even today we still use the same mediums for data storage
  • 21.
  • 22.
  • 23. With the advent of the computer things really took off
  • 24. 1960 : DBMS Emerges ๏ Ordered set of fixed length fields ๏ Low level pointer operations (flat files) ๏ Most popular was IMS (created at IBM) ๏ Shockingly still in use today at IBM & American Airlines
  • 25. Lots of Problems ๏ Complex and inflexible ๏ User had to know physical structure of the DB in order to query for information ๏ Adding a field to the DB required rewriting the underlying access/modification scheme ๏ Records isolated (no relations) ๏ Emphasis on records to be processed, not overall structure
  • 26. 1970 : Relational DB ๏ Edgar Frank “Ted” Codd ๏ Relational Database theory ๏ Codd’s 13 rules (aka 12 rules)
  • 27. 3 HUGE Advantages ๏ Data independence from hardware and storage implementation ๏ Ability to process more than one record at a time with a single operation ๏ Establishing a relationship between records
  • 28. IBM vs Codd ๏ IBM bet on IMS ๏ Codd bets on relational DB ๏ Eventually 2 relational prototypes emerge
  • 29. Ingres ๏ Built at UC Berkley ๏ Uses QUEL ๏ Inspires Sybase & MSSQL
  • 30. System R ๏ Built at IBM ๏ Leads to SEQUEL... later SQL ๏ Evolved into SQL/DS which evolved into DB2 ๏ Project concludes that relational model is viable
  • 31. Oracle ๏ Larry Ellison watches IBM ๏ Starts Relational Software Inc. ๏ Oracle 1st commercial RDBMS released in 1979 ๏ Beats IBM by 2 years to market
  • 32. Entity Relationship ๏ Proposed by Peter Chen in 1976 ๏ Focuses on data use and not logical table structure
  • 33. 1980s ๏ RDBMS dominates ๏ Some fields (medicine, physics, multimedia) need more than RDBMS offers ๏ Object Databases emerge
  • 34. Object Databases ๏ Inspired by Entity Relationship ๏ More flexible than relational permits ๏ Tightly coupled with OO programming language (c++, later Java) ๏ Full object: data & methods stored
  • 35. 1990s ๏ Internet emerges ๏ Data demand spikes ๏ Databases used for archiving historical data
  • 36. Early 2000s ๏ Internet booms ๏ RDBMS fails to scale ๏ Indesperation we take a step backwards
  • 37. MemcacheD ๏1 dimensional ๏ No persistence ๏ No ACI or D ๏ but...
  • 39. 2005 ish ๏ Relational + MemcacheD broken (and we didn’t know it) ๏ Scale redefined with high volume & social ๏ Infrastructure reinvented with cloud computing & SSDs
  • 40. Alternatives Emerge ๏ Dynamo / Key Value ๏ Document ๏ Graph
  • 41. Modern Data Storage
  • 42. A lot going on Easiest to define databases in broad terms • What is a record? (data model) • CAP : CA, AP, CP ? (infrastructure model)
  • 43. Data Storage Structure 1D 2D nD Key Key Value Key Value(s) Key Value Key Value(s) Value Key Value Key Key Value Key Value Key Value(s) Key Key Value Key Value(s)
  • 44. Database structure 1D 2D nD Key Value Relational Document Dynamo Graph
  • 45. CAP Theorem Availability Partitioning Consistency
  • 47. CAP Theorem Availability Dynamo RDBMS t Key Value ten Int o sis ler NoSQLs on ant Inc Unavailable Partition Consistency Tolerant MongoDB BigTable
  • 48. Key Value ๏ ๏ Often 1 Dimensional storage (tupal) MultiMaster... ๏ meaning Query key only availability over ๏ Bucket index consistency (range) on keys ๏ Partitioning easy ๏ Records cannot be thanks to single updated, only value replaced Cassandra, Redis, MemcacheD, Riak, DynamoDB
  • 49. Relational ๏ Single master ๏ 2 Dimensional storage (map) meaning consistency > ๏ Query any availability field ๏ Partitioning hard ๏ due to BTree Indexes transactions & joins Oracle, MSSQL, MySQL, PostgreSQL, DB2
  • 50. Document ๏ ๏ Single master n Dimensional storage (hash meaning w/ nesting) consistency > availability ๏ Query any field ๏ Partitioning easy at any level thanks to richer ๏ BTree Indexes data model MongoDB, CouchDB, RethinkDB
  • 51. Graph ๏ 1 Dimensional storage... but grouped to appear 2D ๏ Differentiated by indexes ๏ Large indexes cover many relationships ๏ Query time depends on # records returned, not distance to get them ๏ Doesn’t require traversing to determine relationship Neo4j, about 20 more... nobody talks much about
  • 53. Right Data Model
  • 54. Types of genealogy data ๏ Events ๏ Photographs (birth, death, etc) ๏ ๏ Diaries & letters Official records ๏ ๏ Ship passenger list Census ๏ ๏ Occupation Names ๏ ๏ and more Relationships
  • 55. Challenges of genealogy data ๏ Lots of possible data points... need flexible schema ๏ Multiple versions of same data point (3 different dates for death date, 4 variations on name). ๏ Lots of data associated with physical records ๏ Multiple versions of same nodes (intelligent nondestructive merge needed) ๏ Need to have meta data associated
  • 56. Individual User Events[] • Name • AFN • type • Email Address • Modification Date • date • Password • contributor[] • Individual_id • record[] Name • First[] • Middle[] Location • Last[] • city • state • county Record • contributor • country • type • coordinates[] • thumbnail • content • description • tags[]
  • 57. Individual individual = { _id : ObjectId("4f2978dfaa999d9db02618ce"), AFN : '1XYK-KQJ', name: { first: ['john', 'johannes'], middle: 'peter', last: ['smith', 'sandvik'] } } db.individual.find( {name.first : ‘john’, name.middle : ‘peter’})
  • 58. Individual.Events events : [ death : { date : ISODate('1989-07-14'), location : { city: 'pensacola', state: 'fl', county: 'escambia', country: 'usa' coordinates : [30.26,87.12]}, contributor : ObjectId("4eeac...691")}] db.individual.find( {events.death.date : ISODate(‘1989-07-14’)}) db.individual.find( {events.death.location : { $near:[30,90]}})
  • 59. Event Versions events : [ birth : [ { date : ISODate('1928-04-06'), location : { city: 'brattleboro', state: 'vt', county: 'windham', country: 'usa' coordinates : [42.51,72.34]}, contributor : ObjectId("4ee...00000"), records: ObjectId("4ed8a...7b000000") }, { date : ISODate('1928-04-16'), location : { city: 'brattleboro', state: 'vt', county: 'windham', country: 'usa' coordinates : [42.51,72.34]}, contributor : ObjectId("4ee...37bb"), records: ObjectId("4eea...0000c8"), }], }
  • 60. Query with Versioned Events events : [ birth : [ { date : ISODate('1928-04-06')}, { date : ISODate('1928-04-16')} ], ] db.individual.find( {events.birth.date : ISODate(‘1928-04-16’)})
  • 61. Records record1 = { _id : ObjectId("4ed8aea7d8562f7d7b") contributor : ObjectId("4eeab...1537bb"), type : 'birth certificate', thumbnail : BinData(0,"/9j/4AAQSkZJ...."), content : BinData(0,"j6b/Id11lWqs..."), tags : ['NY', 'certified'], description : "John's birth certificate" }
  • 63. MongoDB: Scale built in ๏ Intelligent replication ๏ Automatic partitioning of data (user configurable) ๏ Horizontal Scale ๏ Targeted Queries ๏ Parallel Processing
  • 64. Intelligent Replication Node 1 Node 2 Secondary Secondary Heartbeat Re on p i cat lic ati pli on Re Node 3 Primary
  • 65. Scalable Architecture App Server App Server App Server Mongos Mongos Mongos Config Node 1 Server Secondary Config Node 1 Server Secondary Config Node 1 Server Secondary Shard Shard Shard
  • 66. x High Availability in Shards Shard Shard Primary Mongod or Secondary Secondary
  • 67. Targeted Requests 1 4 Mongos 2 3 Shard Shard Shard
  • 68. Parallel processing 1 6 Mongos 5 2 2 2 4 4 4 Shard Shard Shard 3 3 3
  • 70. Broad Feature Set ๏ Rich query language ๏ Native support for over 12 languages ๏ GeoSpatial ๏ Text search ๏ Aggregation & MapReduce ๏ GridFS (distributed & replicated file storage) ๏ Integration with Hadoop, Solr & more
  • 71. Last Year I presented on Graph in MongoDB http://j.mp/XvJ3dl
  • 76. http://spf13.com http://github.com/spf13 @spf13 Questions? download at mongodb.org