SlideShare ist ein Scribd-Unternehmen logo
1 von 48
NOSQL
Eric Marshall
April 7th, 2016
For LOPSA NJ
I’M ERIC
I work at Airisdata and we are hiring!
http://airisdata.com
WHAT’S WRONG WITH
RELATIONAL DATABASES?
Nothing :)
Google & Amazon (followed by web tech)
Higher Performance
Larger Scale
Lower Cost
New Capabilities
BUT FIRST A POOR
METAPHOR
Cars!
What leads to better performance?
• Bigger engine,
remove excess
weight/features
• Better
controls/steering/br
aking
WAIT, I WANT MORE
PERFORMANCE!
We can go faster!
BUT YOU CAN’T MOVE
IKEA FURNITURE?!?
Feature loss?
Is it a car?
WELL, WE CAN SOLVE
THAT PROBLEMAlso, it has a very powerful engine
THE CHALLENGE OF
PERFORMANCE
<add wisdom>
long winded way to nosql is a poor label
SO WHAT IS NOSQL
‘UM, NON-RELATIONAL’
No good definitions to be found
For me:
 Scales horizontally
 Foregoes the ‘old school’ SQL relations, concurrency, etc.
 “exactly like SQL (except where it’s not)”
 Trades-in or reimagines most SQL features for ‘something else’
 Developer friendly/developer driven
 Schema loose / semi-structured
 Usually Open Source and usually associated with web infrastructure
 Ignoring older non-relational databases of the past
 Scales Horizontally (usually) – did I mention that?
 Can be ‘glued’ to other data stores
Don’t like mine; create your own definition :)
SIDEBAR OBSERVATION
ON SOFTWARE TEAMS
Software teams tied to large central relational
database (think 1990s/2000s)
 Large relational database ‘glue’ teams and apps together
leads to complex databases and dbadmins
Vs.
Software teams using no sql
 Independent except at the edges (input/logs &
output/reports)
FOWLER’S IMPEDANCE
MISMATCH
Java objects
vs.
rows in tables
What I have called Fowler’s Impedance is mentioned in his and Sadlage’s book NoSQL Distilled
Most of nosql
beasties can store
data in more
interesting ways
CAP
Here because management loves to chat endlessly about
it.
C is for Consistency
 “This is equivalent to requiring requests of the distributed shared
memory to act as if they were executing on a single node, responding
to operations one at a time.
 Most systems are not (exactly)
A is for Availability
 “For a distributed system to be continuously available, every request
received by a non-failing node in the system must result in a
response. …even when severe network failures occur, every request
must terminate.”
 I think everyone here understands this one ;)
P is for Partition Tolerance
 “In order to model partition tolerance, the network will be allowed to
lose arbitrarily many messages sent from one node to another.”
Quotes from “Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant
Web Services”
YOU CAN HAVE TWO
Consistency
 The system may shutdown or take a day to answer but you will have
the correct answer.
Availability
 The system will always answer; you might get your checking balance
from last year instead of today’s balance but you will get an answer.
Like asking a research group or asking folks in the pub.
Can’t have both :(
One can accept the write not knowing if all the servers
are up OR you can refuse until you know all the servers
are up. Partition Tolerance is mandatory in distributed
systems!https://codahale.com/you-cant-sacrifice-partition-
tolerance/
ONLY TWO, THE FINE
PRINT
Only two at any moment in time :)
For some systems you can choose different pairs for
each operation (Cassandra, Riak).
WHY WOULD ANYONE
BE INCONSISTENT?
Speed while highly concurrent
 “good now better is than perfect later”
 i.e. don’t block
Handling “partition cases” i.e. part of the system/network
is down!
DB CHEMISTRY – MORE
BUZZ
Is it ACID or BASE?
Atomicity, Consistency, Isolation, Durability
Basically Available, Soft-state, Eventually consistent
See “The Transaction Concept: Virtures and Limitations” by Jim Gray
HOW TO DISTRIBUTE
THE DATA?
Option 1: shard
Option 2: replicate
Option 3: do both!
WHAT WOULD LINNAEUS
SAY?
Key-Value
https://en.wikipedia.org/wiki/Linnaean_taxonomy
Graph DB
Document
Columnar (aka BigTab
Disclaimer: heavy overlap
COLUMNAR STORES
Inspired by Google’s Bigtable
Funky row/column setups
COLUMNAR EXAMPLES
http://db-
engines.com/en/ranking_trend/wide+column+store
KEY-VALUE STORES
Designed for
 Speed (even memory-only)
 High load
 Global data model of key-values (surprise!)
 Ring partition and replication
KEY VALUE EXAMPLES
http://db-engines.com/en/ranking_trend/key-
value+store
DOCUMENT STORES
Similar to key-value but the value is a
document!
Document is stored in json (or similar)
Flexible schema
Some support keys/references/indices
{
“date”:[ 2016, 04. 01],
“booktitle”:
”Hhitchhikers guide to
the galaxy”,
“author”:”Dogulas
Adams”
}
DOCUMENT EXAMPLES
http://db-
engines.com/en/ranking_trend/document+store
GRAPH DATABASES
Remember your data structures class in
college?
Edges and vertices – both can hold data
Reduces tough sql queries to simple
graph queries
Easier to model – ‘matches the
whiteboard’
Relationships between vertices are first
class
GRAPH DB EXAMPLES
http://db-
engines.com/en/ranking_trend/document+store
HBASE Nosql on top of
hadoop
SITS ON TOP OF HDFS
Name nodes
Data nodes
Replication
And the rest of that whole megillah
Column-oriented
Handles ‘wide’ ‘sparse’ tables well
Fault tolerant
Supports java, REST, Avro and Thrift
All operations are atomic at the row level (via write ahead
logs)
KIND OF SQL
Key – values
Keys are arbitrary strings
Values are a entire row of data
No joins
Apache Phoenix
 JDBC interface
COLUMN FAMILIES
Column’s fullname = family name & column qualifier
Each column family’s performance is configured
independently!
REGIONS
Looks like shards – different key ranges per box, no
overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMS
KNOB TWEAKING
Symmetric / peer to peer
Linearly scalable
Replication
Eventually consistency
Partitioning
CAP WITH QUORUMS
KNOB TWEAKINGSome systems choose per event!
Three knobs:
 replication amount,
 how many successful writes == ‘your writing to the
database is done!”,
 how many successful reads out of a full set == “here
is your data”
Higher the values, longer the wait...
GOSSIP AND PEERING
Whose up?
Passing requests
Handling missing nodes
DATA
ColumnFamilies
Keys and Values
Speed via appending data and timestamps
KEY VALUE DYNAMO
Replication
REST/Protocol Buffers for queries
Tunable consistency
RIAK
simple interface, high write-availability, linear scaling
Rest api via http – put, get, delete, post, etc.
Or Protobufs for quicker serialized data
‘hundreds of nodes’
DISTRIBUTED
Consistent hashing, vector clocks, sloppy quorums, virtual nodes (not machines
but light weight processess - more like having eggs in many baskets – easier to
give the eggs to folks during a failure), hinted hand off (“please pass along”),
replication.
Request -> riak
|
<- ask other nodes ->
| |
virt node -> virt node ->
| |
data store data store
And then return answers back up the stack
SERVERS?
“just add more” servers
Ring architecture – all nodes are peers
gossip protocols
KEYS AND BUCKETS
Riak can create them automatically (and return to you the
key)
http://SERVER:PORT/riak/BUCKET/KEY
http://SERVER:PORT/riak/BUCKET/KEY?keys=true
^ gets all the keys
http://SERVER:PORT/riak/BUCKET/KEY?keys=stream
^better for huge sets of data
You can store your code in a bucket!
LINKS
Curl blah –H “link: /riak/BUCKET/KEY;
riaktag=”tagname”
Link walking
^ can create other structures
HOMEWORK AND OTHER
READINGS
GENERAL
Brewer’s conjecture
 https://www.comp.nus.edu.sg/~gilbert/pubs/BrewersConjecture-SigAct.pdf
Vogels’ thoughts on eventually Consistent
 http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
Old school techniques for “almost perfect” systems: “The
Transaction Concept: Virtures and Limitations” by Jim Gray
 http://research.microsoft.com/en-
us/um/people/gray/papers/theTransactionConcept.pdf
ACID defined: Haerder and Reuter "Principles of transaction-
oriented database recovery”
 http://www.minet.uni-jena.de/dbis/lehre/ws2005/dbs1/HaerderReuter83.p
All your base: Dan Pritchett “Base: An Acid Alternative”
 http://queue.acm.org/detail.cfm?id=1394128
NoSQL Distilled by Sadalage and Fowler
Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER
READINGS CONT’D
Google’s big table
 http://static.googleusercontent.com/media/research.google.com/en
//archive/bigtable-osdi06.pdf
Hbase: The Definitive Guide by Lars George
Hbase in Action by Dimiduk and Kurana
Hadoop: The Definitive Guide by Tom White
HOMEWORK AND OTHER
READINGS CONT’D
• A Little Riak Book by Eric Redmond
– http://www.littleriakbook.com/
• Nice video on system details on safari
by Justin Sheehy
– https://www.safaribooksonline.com/libra
ry/view/riak-
core/9781449306144/part00.html?auto
Start=True
• Riak Handbook
– http://www.riakhandbook.com/
READINGS FOR GRAPHS
Graph Databases by Robinson, Webber and Eifrem
 Mostly about Neo4j, uses Cypher through out
RELATIONAL DATABASE
EXAMPLES
http://db-
engines.com/en/ranking_trend/relational+dbms

Weitere ähnliche Inhalte

Andere mochten auch

Developers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLDevelopers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQL
Ryu Kobayashi
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
Steven Francia
 

Andere mochten auch (8)

Developers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLDevelopers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQL
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
Mongo db
Mongo dbMongo db
Mongo db
 
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal ScalingMySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
 
MongoDB Sharding
MongoDB ShardingMongoDB Sharding
MongoDB Sharding
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 

Ähnlich wie Nosql

Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Cal Henderson
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
guest18a0f1
 
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
sankarapu posibabu
 
Scalable Web Architectures - Common Patterns & Approaches
Scalable Web Architectures - Common Patterns & ApproachesScalable Web Architectures - Common Patterns & Approaches
Scalable Web Architectures - Common Patterns & Approaches
Cal Henderson
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
BigBlueHat
 

Ähnlich wie Nosql (20)

Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
NoSQL Basics - A Quick Tour
NoSQL Basics - A Quick TourNoSQL Basics - A Quick Tour
NoSQL Basics - A Quick Tour
 
Woa. Reloaded
Woa. ReloadedWoa. Reloaded
Woa. Reloaded
 
05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt
 
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
 
No SQL Databases.ppt
No SQL Databases.pptNo SQL Databases.ppt
No SQL Databases.ppt
 
Enterprise NoSQL: Silver Bullet or Poison Pill
Enterprise NoSQL: Silver Bullet or Poison PillEnterprise NoSQL: Silver Bullet or Poison Pill
Enterprise NoSQL: Silver Bullet or Poison Pill
 
NoSQL, Hadoop, Cascading June 2010
NoSQL, Hadoop, Cascading June 2010NoSQL, Hadoop, Cascading June 2010
NoSQL, Hadoop, Cascading June 2010
 
Not only SQL
Not only SQL Not only SQL
Not only SQL
 
Scalable Web Arch
Scalable Web ArchScalable Web Arch
Scalable Web Arch
 
Scalable Web Architectures - Common Patterns & Approaches
Scalable Web Architectures - Common Patterns & ApproachesScalable Web Architectures - Common Patterns & Approaches
Scalable Web Architectures - Common Patterns & Approaches
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!
 
No sql
No sqlNo sql
No sql
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
 
To SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionTo SQL or NoSQL, that is the question
To SQL or NoSQL, that is the question
 
Cassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUGCassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUG
 

Mehr von ericwilliammarshall (7)

Spark infrastructure
Spark infrastructureSpark infrastructure
Spark infrastructure
 
File maker for yap
File maker for yapFile maker for yap
File maker for yap
 
Web arch gfdl
Web arch gfdlWeb arch gfdl
Web arch gfdl
 
Shibboleth
ShibbolethShibboleth
Shibboleth
 
Condor
CondorCondor
Condor
 
Hadoop for sysadmins
Hadoop for sysadminsHadoop for sysadmins
Hadoop for sysadmins
 
high performance computing exposed
high performance computing exposedhigh performance computing exposed
high performance computing exposed
 

Kürzlich hochgeladen

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Kürzlich hochgeladen (20)

HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 

Nosql

  • 1. NOSQL Eric Marshall April 7th, 2016 For LOPSA NJ
  • 2. I’M ERIC I work at Airisdata and we are hiring! http://airisdata.com
  • 3. WHAT’S WRONG WITH RELATIONAL DATABASES? Nothing :) Google & Amazon (followed by web tech) Higher Performance Larger Scale Lower Cost New Capabilities
  • 4. BUT FIRST A POOR METAPHOR Cars! What leads to better performance? • Bigger engine, remove excess weight/features • Better controls/steering/br aking
  • 5. WAIT, I WANT MORE PERFORMANCE! We can go faster!
  • 6. BUT YOU CAN’T MOVE IKEA FURNITURE?!? Feature loss? Is it a car?
  • 7. WELL, WE CAN SOLVE THAT PROBLEMAlso, it has a very powerful engine
  • 8. THE CHALLENGE OF PERFORMANCE <add wisdom> long winded way to nosql is a poor label
  • 9. SO WHAT IS NOSQL ‘UM, NON-RELATIONAL’ No good definitions to be found For me:  Scales horizontally  Foregoes the ‘old school’ SQL relations, concurrency, etc.  “exactly like SQL (except where it’s not)”  Trades-in or reimagines most SQL features for ‘something else’  Developer friendly/developer driven  Schema loose / semi-structured  Usually Open Source and usually associated with web infrastructure  Ignoring older non-relational databases of the past  Scales Horizontally (usually) – did I mention that?  Can be ‘glued’ to other data stores Don’t like mine; create your own definition :)
  • 10. SIDEBAR OBSERVATION ON SOFTWARE TEAMS Software teams tied to large central relational database (think 1990s/2000s)  Large relational database ‘glue’ teams and apps together leads to complex databases and dbadmins Vs. Software teams using no sql  Independent except at the edges (input/logs & output/reports)
  • 11. FOWLER’S IMPEDANCE MISMATCH Java objects vs. rows in tables What I have called Fowler’s Impedance is mentioned in his and Sadlage’s book NoSQL Distilled Most of nosql beasties can store data in more interesting ways
  • 12. CAP Here because management loves to chat endlessly about it. C is for Consistency  “This is equivalent to requiring requests of the distributed shared memory to act as if they were executing on a single node, responding to operations one at a time.  Most systems are not (exactly) A is for Availability  “For a distributed system to be continuously available, every request received by a non-failing node in the system must result in a response. …even when severe network failures occur, every request must terminate.”  I think everyone here understands this one ;) P is for Partition Tolerance  “In order to model partition tolerance, the network will be allowed to lose arbitrarily many messages sent from one node to another.” Quotes from “Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services”
  • 13. YOU CAN HAVE TWO Consistency  The system may shutdown or take a day to answer but you will have the correct answer. Availability  The system will always answer; you might get your checking balance from last year instead of today’s balance but you will get an answer. Like asking a research group or asking folks in the pub. Can’t have both :( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up. Partition Tolerance is mandatory in distributed systems!https://codahale.com/you-cant-sacrifice-partition- tolerance/
  • 14. ONLY TWO, THE FINE PRINT Only two at any moment in time :) For some systems you can choose different pairs for each operation (Cassandra, Riak).
  • 15. WHY WOULD ANYONE BE INCONSISTENT? Speed while highly concurrent  “good now better is than perfect later”  i.e. don’t block Handling “partition cases” i.e. part of the system/network is down!
  • 16. DB CHEMISTRY – MORE BUZZ Is it ACID or BASE? Atomicity, Consistency, Isolation, Durability Basically Available, Soft-state, Eventually consistent See “The Transaction Concept: Virtures and Limitations” by Jim Gray
  • 17. HOW TO DISTRIBUTE THE DATA? Option 1: shard Option 2: replicate Option 3: do both!
  • 18. WHAT WOULD LINNAEUS SAY? Key-Value https://en.wikipedia.org/wiki/Linnaean_taxonomy Graph DB Document Columnar (aka BigTab Disclaimer: heavy overlap
  • 19. COLUMNAR STORES Inspired by Google’s Bigtable Funky row/column setups
  • 21. KEY-VALUE STORES Designed for  Speed (even memory-only)  High load  Global data model of key-values (surprise!)  Ring partition and replication
  • 23. DOCUMENT STORES Similar to key-value but the value is a document! Document is stored in json (or similar) Flexible schema Some support keys/references/indices { “date”:[ 2016, 04. 01], “booktitle”: ”Hhitchhikers guide to the galaxy”, “author”:”Dogulas Adams” }
  • 25. GRAPH DATABASES Remember your data structures class in college? Edges and vertices – both can hold data Reduces tough sql queries to simple graph queries Easier to model – ‘matches the whiteboard’ Relationships between vertices are first class
  • 27. HBASE Nosql on top of hadoop
  • 28. SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
  • 29. Column-oriented Handles ‘wide’ ‘sparse’ tables well Fault tolerant Supports java, REST, Avro and Thrift All operations are atomic at the row level (via write ahead logs)
  • 30. KIND OF SQL Key – values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix  JDBC interface
  • 31. COLUMN FAMILIES Column’s fullname = family name & column qualifier Each column family’s performance is configured independently!
  • 32. REGIONS Looks like shards – different key ranges per box, no overlap
  • 34. CAP WITH QUORUMS KNOB TWEAKING Symmetric / peer to peer Linearly scalable Replication Eventually consistency Partitioning
  • 35. CAP WITH QUORUMS KNOB TWEAKINGSome systems choose per event! Three knobs:  replication amount,  how many successful writes == ‘your writing to the database is done!”,  how many successful reads out of a full set == “here is your data” Higher the values, longer the wait...
  • 36. GOSSIP AND PEERING Whose up? Passing requests Handling missing nodes
  • 37. DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
  • 38. KEY VALUE DYNAMO Replication REST/Protocol Buffers for queries Tunable consistency
  • 39. RIAK simple interface, high write-availability, linear scaling Rest api via http – put, get, delete, post, etc. Or Protobufs for quicker serialized data ‘hundreds of nodes’
  • 40. DISTRIBUTED Consistent hashing, vector clocks, sloppy quorums, virtual nodes (not machines but light weight processess - more like having eggs in many baskets – easier to give the eggs to folks during a failure), hinted hand off (“please pass along”), replication. Request -> riak | <- ask other nodes -> | | virt node -> virt node -> | | data store data store And then return answers back up the stack
  • 41. SERVERS? “just add more” servers Ring architecture – all nodes are peers gossip protocols
  • 42. KEYS AND BUCKETS Riak can create them automatically (and return to you the key) http://SERVER:PORT/riak/BUCKET/KEY http://SERVER:PORT/riak/BUCKET/KEY?keys=true ^ gets all the keys http://SERVER:PORT/riak/BUCKET/KEY?keys=stream ^better for huge sets of data You can store your code in a bucket!
  • 43. LINKS Curl blah –H “link: /riak/BUCKET/KEY; riaktag=”tagname” Link walking ^ can create other structures
  • 44. HOMEWORK AND OTHER READINGS GENERAL Brewer’s conjecture  https://www.comp.nus.edu.sg/~gilbert/pubs/BrewersConjecture-SigAct.pdf Vogels’ thoughts on eventually Consistent  http://www.allthingsdistributed.com/2008/12/eventually_consistent.html Old school techniques for “almost perfect” systems: “The Transaction Concept: Virtures and Limitations” by Jim Gray  http://research.microsoft.com/en- us/um/people/gray/papers/theTransactionConcept.pdf ACID defined: Haerder and Reuter "Principles of transaction- oriented database recovery”  http://www.minet.uni-jena.de/dbis/lehre/ws2005/dbs1/HaerderReuter83.p All your base: Dan Pritchett “Base: An Acid Alternative”  http://queue.acm.org/detail.cfm?id=1394128 NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
  • 45. HOMEWORK AND OTHER READINGS CONT’D Google’s big table  http://static.googleusercontent.com/media/research.google.com/en //archive/bigtable-osdi06.pdf Hbase: The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop: The Definitive Guide by Tom White
  • 46. HOMEWORK AND OTHER READINGS CONT’D • A Little Riak Book by Eric Redmond – http://www.littleriakbook.com/ • Nice video on system details on safari by Justin Sheehy – https://www.safaribooksonline.com/libra ry/view/riak- core/9781449306144/part00.html?auto Start=True • Riak Handbook – http://www.riakhandbook.com/
  • 47. READINGS FOR GRAPHS Graph Databases by Robinson, Webber and Eifrem  Mostly about Neo4j, uses Cypher through out

Hinweis der Redaktion

  1. What improves auto performance? Bigger engine; less weight Which leads to better brakes, steering, etc. <- better tools to manage And better safety systems. Which leads to a vehicle that requires a more support Formula 1 car uses 18000 liters of air per minute (you use 25 liters of air per minute to move a bicycle)