MongoDB is a document-oriented NoSQL database that uses dynamic schemas and indexing. Documents (equivalent to rows in SQL) are organized into collections (equivalent to tables). MongoDB supports dynamic queries on indexed fields and stores data and binary data as BSON (Binary JSON) documents. The mongo shell provides an interactive JavaScript interface for working with MongoDB databases and collections.
The 7 Things I Know About Cyber Security After 25 Years | April 2024
no SQL, no problem development with mongodb and Sinatra
1. no SQL, no problem
development with mongodb and Sinatra
Sam Beam
Onset Corps
2. A Brief History of
Relational Databases*
(*as if you need it)
• “A Relational Model of Data for
Large Shared Databanks” - Edgar
F. Codd, 1970 -- IBM
• “Standard” SQL - Structured
Query Language
• ACID (atomicity, consistency,
isolation, durability)
photo: osti.gov
3. The Relational Data Model
• An engine based on “rules” and “facts”
• Consistency/Isolation self-enforced
• ACID
5. SQL
• History
• 1974 - IBM Research “SEQUEL”
Ingres, UC Berkley
• 1979 - 1983 First IBM releases
6. SQL
• History
• 1974 - IBM Research “SEQUEL”
Ingres, UC Berkley
• 1979 - 1983 First IBM releases
7. SQL
• Purpose
SELECT name FROM emp
WHERE salary > 55000
AND dept = ’sales’
• Simple, set-based, declarative syntax brochure: 1980-85 computerhistory.org
DEC microcomputer 1983 (128kB RAM)
computerweekly.com
8. SQL
IBM Starlink Workstations, 1983
computerweekly.com
• “Standards”
• ANSI SQL-92
• no existing RDBMS in full compliance
9. SQL IBM AS/400
computerweekly.com
• Extensions
• Procedural languages (PL/SQL, T-SQL,pgSQL etc)
• Storage type extensions (BLOB, XML, Java)
10. SQL Extensions
SQL/XML SELECT XMLElement
(name emp,
Store documents as XMLForest(last_name || ’,’ ||
CLOB first_name AS fullname, salary) )
FROM emp;
or mapped to columns
Tools for <emp>
‣Annotating <fullname>Tiger, Scott</fullname>
‣Indexing <salary>10000</salary>
‣Searching (XPath) </emp>
‣Validating (DTD) <emp>
<fullname>Smith, John</fullname>
<salary>12000</salary>
XML documents directly </emp>
11. SQL Extensions
SQL/XML SELECT XMLElement
(name emp,
Store documents as XMLForest(last_name || ’,’ ||
CLOB first_name AS fullname, salary) )
FROM emp;
or mapped to columns
Tools for <emp>
‣Annotating <fullname>Tiger, Scott</fullname>
‣Indexing <salary>10000</salary>
‣Searching (XPath) </emp>
‣Validating (DTD) <emp>
<fullname>Smith, John</fullname>
<salary>12000</salary>
XML documents directly </emp>
12. SQL Extensions
Example of specialized, sparse data
Geospatial/Vector
• Vector Markup Language (VML)
Several choices - all XML • Scalable Vector Graphics (SVG)
• Geography Markup Language (GML) • LandXML
• Keyhole Markup Language • X3D
• G P S eXchange Format (GP X ) • VRML
13. SQL Extensions
SELECT TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'trk/number')) as track_number,
SUBSTR(EXTRACTVALUE(VALUE(t1), 'trk/name'),1,10) as track_name,
Example of specialized, sparse data
mdsys.sdo_geometry(CASE WHEN (SELECT COUNT(*)
FROM TABLE(XMLSequence(EXTRACT(VALUE(allSegments),
'trk/trkseg',
'xmlns="http://www.topografix.com/GPX/1/1"')))) > 1
THEN 3006
ELSE 3002
Geospatial/Vector END,
8307,
NULL, • Vector Markup Language (VML)
Several choices - all XML
GetElemInfoFromXML(EXTRACT(VALUE(allSegments),'trk/trkseg','xmlns="http://www.topografix.com/GPX/1/1"')),
• Scalable Vector Graphics (SVG)
• Geography Markup Language (GML)
CAST(MULTISET(SELECT case when mod(rownum,3) = 1 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/@lon','xmlns="http://
• LandXML
• Keyhole Markup Language when mod(rownum,3) = 2 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/@lat','xmlns="http://
• X3D
• G P S eXchange Format (GP X )
when mod(rownum,3) = 0 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/ele/text()','xmlns="h
• VRML
1/1"'))
end ordinate
FROM (select level as rin from dual connect by level < 4) r,
TABLE(XMLSequence(EXTRACT(VALUE(allSegments),'trk/trkseg/trkpt','xmlns="http://www.topogr
) as mdsys.sdo_ordinate_array
)) as geom
FROM GPX2 g,
TABLE(XMLSequence(EXTRACT(g.OBJECT_VALUE,'/gpx/trk','xmlns="http://www.topografix.com/GPX/1/1"'))) t1,
TABLE(XMLSequence(EXTRACT(VALUE(t1),'trk[number=' || EXTRACTVALUE(VALUE(t1), 'trk/number') || ']','xmlns="http://
1/1"'))) allSegments;
15. Typical RDBMS Schema
“For purposes of flexibility, the Magento
Issues database heavily utilizes an Entity-Attribute-
Value (EAV) data model. As is often the case,
the cost of flexibility is complexity -
Magento is no exception. The process of
• Conflictsmanipulating data in Magento is often more
“involved” than that typically experienced using
• Downtime
traditional relational tables.”
http://www.magentocommerce.com/wiki/
• Deployment
development/
• Scaling
image credit: www.magentocommerce.com
16. Entity-Attribute-Value
When you have
Unknown Unknowns
A “thing” and “properties” engine
(instead of “rules” and “facts”)
Often found in:
•e-commerce
•medical records
•event logging
•science image: Yale Univ. School of Medicine
senselab.med.yale.edu
18. Logical Progression
<sarcasm>
new datatype “xml_universe”
XML serialized object
DTD containing all the known attributes of
"everything"
Schema:
CREATE TABLE everything (
xml_universe NOT NULL
);
</sarcasm>
20. Enter the Dragon
“NoSQL”
NoSQL is a movement promoting a
loosely defined class of non-relational
data stores that break with a long history
of relational databases.
-- Wikipedia
21. Enter the Dragon
“NoSQL”
NoSQL is a movement promoting a
loosely defined class of non-relational
data stores that break with a long history
of relational databases.
-- Wikipedia
22. “NoSQL”
NoSQL is a movement promoting a
loosely defined class of non-relational
data stores that break with a long history
of relational databases.
-- Wikipedia
• Many techniques
• Many weapons
• Many use cases
28. Paradigm Change
CAP Theorem
“One can only have two of Consistency, Availability, and
tolerance to network Partitions at the same time”
29. Paradigm Change
CAP Theorem
“One can only have two of Consistency, Availability, and
tolerance to network Partitions at the same time”
If the network is broken, your database won’t work.
30. Paradigm Change
CAP Theorem
“One can only have two of Consistency, Availability, and
tolerance to network Partitions at the same time”
If the network is broken, your database won’t work.
The network is going to break.
37. The Candidates
Storage Type License Implementation
Cassandra ColumnFamily * Apache 2,0 Java
CouchDB Document Apache 2,0 Erlang
Hbase ColumnFamily * Apache 2,0 Java
Redis Key/Value BSD C
Tokyo Cabinet Key/Value LGPL C
Voldemort Key/Value Apache 2,0 Java
Memcached Key/Value BSD C
MongoDB Document (BSON) AGPL 3.0 C++
38.
39. • Document-oriented
• Dynamic queries
• Full dynamic index support
• Efficient binary large-object storage
• Built for speed
• Replication and Auto-failover
40. Installation
‣ Download source or binary for OS X, Linux, Windows
http://www.mongodb.org/
‣ Make data directory
$ mkdir /some/path/mongodb
‣ Run!
$ bin/mongod --dbpath=/some/path/mongodb
43. Documents
‣Always contains key _id
‣Creating Relationships:
subdocument, shared key, or DBRef
‣Native storage and transfer : BSON
44. JSON
A collection of name/value pairs.
[object, record, struct, dictionary, hash table, keyed list, associative array]
An ordered list of values.
[array, vector, list, sequence]
http://json.org/
45. BSON
BSON is a binary encoded serialization
of JSON-like documents.
http://bsonspec.org/
http://www.mongodb.org/display/DOCS/BSON
46. JSON/BSON Example
{
author : "Joe Example",
created : Date(’03-28-2010’),
title : "My latest blog post",
tags : [ "example", "joe", "testing"],
comments : [
{ author : 'jim', comment : 'I disagree' },
{ author : 'nancy', comment : 'Good post' }
]
}
http://bsonspec.org/
http://www.mongodb.org/display/DOCS/BSON
47. mongo shell
$ mongo
MongoDB shell version: 1.5.0-pre-
url: test
connecting to: test
type "help" for help
> show dbs
admin
shorty
test
> use test
switched to db test
>
48. mongo shell
$ mongo
MongoDB shell version: 1.5.0-pre-
url: test
connecting to: test
type "help" for help
> show dbs
admin
shorty
test
> use test
switched to db test
> show collections
foo
fs.chunks
fs.files
system.indexes
>
71. and Ruby
GridFS
Store large files
Transparently chunks
Incremental delivery (video streaming)
72. and Ruby
GridFS
@grid = Grid.new(@db)
# Saving IO data and including the optional filename
image = File.open("kitty.jpg")
file_id = @grid.put(image, :filename => "kitty.jpg")
....
@grid = Grid.new(@db)
# writing file with given _id to HTTP
if img = @grid.get(Mongo::ObjectID::from_string(params[:file_id]))
headers 'Content-Type' => img.content_type
img.read
end
73. and Ruby
Other interesting things
Capped collections (think memcached)
Multikeys and Full-text search
Auto-sharding
Replica Sets
74. and Ruby
Eventual Consistency
can my use case tolerate
• stale reads?
• reading values out of order?
• not reading my own writes?
E.F. Codd asserted that, mathematically, no commercial database conformed to his true Relational modelPredicates, Predicate variables, relations, tuples, superkeys, finite projectionsAtomicity Consistency Isolation Durability
IBM&#x2019;s first SQL release: System R
paper: 1974
Bring attention to EAV box!
originated with the concept of "association lists" AKA key/value pairs
A &#x201C;simple&#x201D; way to attach arbitrary attributes and values to records in a normalized RDMBS
"Physical schema" (actual storage structure) is radically different from the "logical schema" &#x2013; the way users and applications see it.
PIVOTING: Converting logical schema to/from physical schema
Note: full scan, no type control
combination - facebook &#x201C;Hive&#x201D; based on Hadoop with &#x201C;QL&#x201D;
mostly Hadoop, true
A column family is a container for columns, analogous to the table in a relational system.
CouchDB - REST interface, JSON response
Redis - in-memory, journaled changes to data stored to disk
Tokyo Cabinet - update of GDBM
Hbase - the Hadoop Database, modeled on Google BigTable
(JavaScript Object Notation) is a lightweight data-interchange format.It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition - December 1999.
assigning specific documents to &#x201C;thing&#x201D; and &#x201C;fred&#x201D;
Using DBRef to create a reference between collections
Showing that the $ref object in fred is tied to the actual record in things collection
find always returns a MongoDB::Cursor that is iterable / The query doesn&#x2019;t get run until you actually attempt to retrieve data from a cursor. / Cursors have a to_a method.
find_one returns a single object
where uses any valid JS expression
patented[1] software framework introduced by Google to support distributed computing on large data sets on clusters of computers.[2]
CouchDB - lacks native conditionals, but uses Javascript anyway
is SQL that much better?
specify the map and reduce functions in JavaScript, as strings
reduce receives array of values for each element emitted by map
chunks = 256k, auto-sharded
GridFilesystem - emulates a filesystem - write, open, close, delete, etc
GridFS saves whatever metadata - GridFS is a specification for mapping chunks->files
CC: FIFO - Logging, caches, auto-archiving
MK: auto-index on any array values
FTS : split text into array, use MK - no native stemming, bulk index (yet)
AS : beta. Uses router (godfather), config servers (consiglere), mongod instances (map/reduce recommended)
RP: master/slave - to be replaced by Replica Sets in 1.6 - Eventual Consistency
Amazon popularized the concept of &#x201C;Eventual Consistency&#x201D;.&#xA0; Their definition is:&#xA0;the storage system guarantees that if no new updates are made to the object, eventually all accesses will return the last updated value.
MongoRecord = 10gen&#x2019;s Original OM, ActiveRecord-ish, works w/ Rails
mongomapper - datamapper-ish v0.7 - in production
Candy - Candy's goal is to provide the simplest possible object persistence for the MongoDB database. By "simple" we mean "nearly invisible." Candy doesn't try to mirror ActiveRecord or DataMapper.
(alpha 0.2)