1. it's not "Never SQL"
NOSQL is simplyâŚ
Not Only SQL
NOSQL no-seek-wool n. Describes ongoing
trend where developers increasingly opt for
non-relational databases to help solve their
problems, in an effort to use the right tool for
the right job
4. Trend 2: Connectedness
GGG
Onotologies
RDFa
Folksonomies
Information connectivity
Tagging
Wikis
UGC
Blogs
Feeds
Hypertext
Text
Documents
5. Trend 3: Semi-structured information
⢠Individualisation of content
â 1970âs salary lists, all elements exactly one job
â 2000âs salary lists, we need many job columns!
⢠All encompassing âentire world viewsâ
⢠Store more data about each entity
⢠Trend accelerated by the decentralization of
content generation
â Age of participation (âweb 2.0â)
12. Key-Value Stores
⢠âDynamo: Amazonâs Highly Available Key-
Value Storeâ (2007)
⢠Data model:
â Global key-value mapping
â Big scalable HashMap
â Highly fault tolerant (typically)
⢠Examples:
â Riak, Redis, Voldemort
13. Pros and Cons
⢠Strengths
â Simple data model
â Great at scaling out horizontally
⢠Scalable
⢠Available
⢠Weaknesses:
â Simplistic data model
â Poor for complex data
14. Column Family (BigTable)
⢠Googleâs âBigtable: A Distributed Storage
System for Structured Dataâ (2006)
⢠Data model:
â A big table, with column families
â Map-reduce for querying/processing
⢠Examples:
â HBase, HyperTable, Cassandra
15. Pros and Cons
⢠Strengths
â Data model supports semi-structured data
â Naturally indexed (columns)
â Good at scaling out horizontally
⢠Weaknesses:
â Unsuited for interconnected data
16. Document Databases
⢠Data model
â Collections of documents
â A document is a key-value collection
â Index-centric, lots of map-reduce
⢠Examples
â CouchDB, MongoDB
17. Pros and Cons
⢠Strengths
â Simple, powerful data model (just like SVN!)
â Good scaling (especially if sharding supported)
⢠Weaknesses:
â Unsuited for interconnected data
â Query model limited to keys (and indexes)
⢠Map reduce for larger queries
18. Graph Databases
⢠Data model:
â Nodes with properties
â Named relationships with properties
â Hypergraph, sometimes
⢠Examples:
â Neo4j (of course), Sones GraphDB, OrientDB,
InfiniteGraph, AllegroGraph
19. Pros and Cons
⢠Strengths
â Powerful data model
â Fast
⢠For connected data, can be many orders of magnitude
faster than RDBMS
⢠Weaknesses:
â Sharding
⢠Though they can scale reasonably well
⢠And for some domains you can shard too!
20.
21. Disclaimer
⢠I donât hold any sort of copyright on any of the content used
including the photos, logos and text and trademarks used.
They all belong to the respective individual and companies
⢠I am not responsible for, and expressly disclaims all liability
for, damages of any kind arising out of use, reference to, or
reliance on any information contained within this slide .
UGC = User Generated ContentGGG = Giant Global Graph (what the web will become)Ontologies are the structural frameworks for organizing information and are used in artificial intelligence, the Semantic Web, systems engineering, software engineering, biomedical informatics, library science, enterprise bookmarking, and information architecture as a form of knowledge representation about the world or some part of it. The creation of domain ontologies is also fundamental to the definition and use of an enterprise architecture frameworkA folksonomy is a system of classification derived from the practice and method of collaboratively creating and managing tags to annotate and categorize content;[1][2] this practice is also known as collaborative tagging,[3] social classification, social indexing, and social tagging. Folksonomy, a term coined by Thomas Vander Wal, is a portmanteau offolk and taxonomy.RDFa (or Resource Description Framework â in â attributes) is a W3C Recommendation that adds a set of attribute-level extensions toXHTML for embedding rich metadata within Web documents. The RDF data-model mapping enables its use for embedding RDFsubject-predicate-object expressions within XHTML documents, it also enables the extraction of RDF model triples by compliant user agents.
This is strictly about connected data â joins kill performance there.No bashing of RDBMS performance for tabular transaction processingGreen line denotes âzone of SQL adequacyâ
Fowler points out that KV/Column/Document stores are all aggregates: theyâre different from graphs because they enforce structure at design time â as an aggregate of data.Clump of data that can be co-located on a cluster instance and which is accessed together.âa fundamental unit of storage which is a rich structure of closely related data: for key-value stores it's the value, for document stores it's the document, and for column-family stores it's the column family. In DDD terms, this group of data is an aggregate.â
History â Amazon decide that they always wanted the shopping basket to be available, but couldnât take a chance on RDBMSSo they built their ownBig risk, but simple data model and well-known computing science underpinning it (e.g. consistent hashing, Bloom filters for sensible replication)+ Massive read/write scale- Simplistic data model moves heavy lifting into the app tier (e.g. map reduce)
Mongo DB has a reputation for taking liberties with durability to get speedCouch DB has good multimaster replication from Lotus Notes
People talk about Coddâs relational model being mature because it was proposed in 1969 â 42 years old.Eulerâs graph theory was proposed in 1736 â 275 years old.
Canât easily shard graphs like documents or KV stores.This means that high performance graph databases are limited in terms of data set size that can be handled by a single machine.Can use replicas to speed things up (and improve availability) but limits data set size limited to a single machineâs disk/memory.Some domains can shard easily (.e.g geo, most web apps) using consistent routing approach and cache sharding â weâll cover that later.