Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
When to NoSQL and When 
to Know SQL 
Simon Elliston Ball 
Head of Big Data 
@sireb 
#noSQLknowSQL 
http://nosqlknowsql.io
what is NoSQL? 
SQL 
Not only SQL 
Many many things 
NoSQL 
No, SQL
before SQL 
files 
multi-value 
ur… hash maps?
after SQL 
everything is relational 
ORMs fill in the other data structures 
scale up rules 
data first design
and now NoSQL 
datastores that suit applications 
polyglot persistence: the right tools 
scale out rules 
APIs not EDWs
why should you care? 
data growth 
rapid development 
fewer migration headaches… maybe 
machine learning 
social
big bucks. 
O’Reilly 2013 Data Science Salary Survey
So many NoSQLs…
document databases
document databases 
rapid development 
JSON docs 
complex, variable models 
known access pattern
document databases 
learn a very new query language 
denormalize 
document 
form 
joins? JUST DON’T 
http://www.sarahmei.c...
document vs SQL 
what can SQL do? 
query all the angles 
sure, you can use blobs… 
… but you can’t get into them
documents in SQL 
SQL xml fields 
mapping xquery paths is painful 
native JSON 
but still structured
query everything: search 
class of database database 
full-text indexing
you know google, right… 
range query 
span query 
keyword query
you know the score 
scores 
"query": { 
"function_score": { 
"query": { 
"match": { "title": "NoSQL"} 
}, 
"functions": [ ...
SQL knows the score too 
scores 
declare @origin float = 0; 
declare @delay_weeks float = 4; 
SELECT TOP 10 * FROM ( 
SELE...
you know google, right… 
more like this: instant tf-idf 
{ 
"more_like_this" : { 
"fields" : ["name.first", "name.last"], ...
Facets
Facets
Facets 
SQL: 
x lots 
SELECT a.name, count(p.id) FROM 
people p 
JOIN industry a on a.id = p.industry_id 
JOIN people_keyw...
Facets 
Elastic search: 
{ 
"query": { 
“query_string": { 
“default_field”: “content”, 
“query”: “keywords” 
} 
}, 
"facet...
logs 
untyped free-text documents 
timestamped 
semi-structured 
discovery 
aggregation and statistics
key: value 
close to your programming model 
distributed map | list | set 
keys can be objects
SQL extensions 
hash types 
hstore
SQL and polymorphism 
inheritance 
ORMs hide the horror
turning round the rows 
columnar databases 
physical layout matters
turning round the rows 
key value type 
1 A Home 
2 B Work 
3 C Work 
4 D Work 
Row storage 
00001 1 A Home 00002 2 B Work...
teaching an old SQL new tricks 
MySQL InfoBright 
SQL Server Columnar Indexes 
CREATE NONCLUSTERED COLUMNSTORE INDEX idx_c...
column for hadoop and other animals 
ORC files 
Parquet 
http://parquet.io
column families 
wide column databases
millions of columns 
eventually consistent 
CQL 
http://cassandra.apache.org/ 
http://www.datastax.com/ 
set | list | map ...
cell level security 
SQL: so many views, so much confusion 
accumulo https://accumulo.apache.org/
time 
retrieving time series and graphs 
window functions 
SELECT business_date, ticker, 
close, 
close / 
LAG(close,1) OV...
time 
Open TSDB 
Influx DB 
key:sub-key:metric 
key:* 
key:*:metric
Queues
queues in SQL 
CREATE procedure [dbo].[Dequeue] 
AS 
set nocount on 
declare @BatchSize int 
set @BatchSize = 10 
declare ...
queues in SQL 
index fragmentation is a problem 
but built in logs of a sort
message queues 
specialised apis 
capabilities like fan-out 
routing 
acknowledgement
relationships count 
Graph databases
relationships count 
trees and hierarchies 
overloaded relationships 
fancy algorithms
hierarchies with SQL 
adjacency lists 
CONSTRAIN fk_parent_id_id 
FOREIGN KEY parent_id REFERENCES 
some_table.id 
materia...
Consolidation 
OrientDB 
ArangoDB 
http://www.orientechnologies.com/ 
SQLs with NoSQL 
Postgres 
MySQL 
https://www.arango...
Big Data
SQL on Hadoop
More than SQL 
Shark 
Drill 
Cascading 
Map Reduce
System issues, 
Speed issues, 
Soft issues
the ACID, BASE litmus 
Atomic 
Consistent 
Isolated 
Durable 
Basically Available 
Soft-state 
Eventually consistent 
what...
CAP it all 
Consistency 
Partition Availability
write fast, ask questions later 
SQL writes cost a lot 
mainly write workload: NoSQL 
low latency write workload: NoSQL
is it web scale? 
most NoSQL scales well 
but clusters still need management 
are you facebook? one machine is easier than...
who is going to use it? 
analysts: they want SQL 
developers: they want applications 
data scientists: they want access
choose the 
right tool 
Photo: http://www.homespothq.com/
Thank you! 
Simon Elliston Ball 
simon@simonellistonball.com 
@sireb 
#noSQLknowSQL 
http://nosqlknowsql.io
Questions 
Simon Elliston Ball 
simon@simonellistonball.com 
@sireb 
http://nosqlknowsql.io
When to no sql and when to know sql   javaone
When to no sql and when to know sql   javaone
Nächste SlideShare
Wird geladen in …5
×

When to no sql and when to know sql javaone

623 Aufrufe

Veröffentlicht am

An introduction to the different types of NoSQL and some guidance on when to choose them, and when to use plain old SQL. Focuses on developer productivity, intuitive code, and system issues including scaling and usage patterns. As delivered at JavaOne 2014 in San Francisco

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

When to no sql and when to know sql javaone

  1. 1. When to NoSQL and When to Know SQL Simon Elliston Ball Head of Big Data @sireb #noSQLknowSQL http://nosqlknowsql.io
  2. 2. what is NoSQL? SQL Not only SQL Many many things NoSQL No, SQL
  3. 3. before SQL files multi-value ur… hash maps?
  4. 4. after SQL everything is relational ORMs fill in the other data structures scale up rules data first design
  5. 5. and now NoSQL datastores that suit applications polyglot persistence: the right tools scale out rules APIs not EDWs
  6. 6. why should you care? data growth rapid development fewer migration headaches… maybe machine learning social
  7. 7. big bucks. O’Reilly 2013 Data Science Salary Survey
  8. 8. So many NoSQLs…
  9. 9. document databases
  10. 10. document databases rapid development JSON docs complex, variable models known access pattern
  11. 11. document databases learn a very new query language denormalize document form joins? JUST DON’T http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
  12. 12. document vs SQL what can SQL do? query all the angles sure, you can use blobs… … but you can’t get into them
  13. 13. documents in SQL SQL xml fields mapping xquery paths is painful native JSON but still structured
  14. 14. query everything: search class of database database full-text indexing
  15. 15. you know google, right… range query span query keyword query
  16. 16. you know the score scores "query": { "function_score": { "query": { "match": { "title": "NoSQL"} }, "functions": [ "boost": 1, "gauss": { "timestamp": { "scale": "4w" } }, "script_score" : { "script" : "_score * doc['important_document'].value ? 2 : 1" } ], "score_mode": "sum" } }
  17. 17. SQL knows the score too scores declare @origin float = 0; declare @delay_weeks float = 4; SELECT TOP 10 * FROM ( SELECT title, score * CASE WHEN p.important = 1 THEN 2.0 WHEN p.important = 0 THEN 1.0 END * exp(-power(timestamp-@origin,2)/(2*@delay*7*24*3600)) + 1 AS score FROM posts p WHERE title LIKE '%NoSQL%' ) as found ORDER BY score
  18. 18. you know google, right… more like this: instant tf-idf { "more_like_this" : { "fields" : ["name.first", "name.last"], "like_text" : "text like this one", "min_term_freq" : 1, "max_query_terms" : 12 } }
  19. 19. Facets
  20. 20. Facets
  21. 21. Facets SQL: x lots SELECT a.name, count(p.id) FROM people p JOIN industry a on a.id = p.industry_id JOIN people_keywords pk on pk.person_id = p.id JOIN keywords k on k.id = pk.keyword_id WHERE CONTAINS(p.description, 'NoSQL') OR k.name = 'NoSQL' ... GROUP BY a.name SELECT a.name, count(p.id) FROM people p JOIN area a on a.id = p.area_id JOIN people_keywords pk on pk.person_id = p.id JOIN keywords k on k.id = pk.keyword_id WHERE CONTAINS(p.description, 'NoSQL') OR k.name = 'NoSQL' ... GROUP BY a.name
  22. 22. Facets Elastic search: { "query": { “query_string": { “default_field”: “content”, “query”: “keywords” } }, "facets": { “myTerms": { "terms": { "field" : "lang", "all_terms" : true } } } }
  23. 23. logs untyped free-text documents timestamped semi-structured discovery aggregation and statistics
  24. 24. key: value close to your programming model distributed map | list | set keys can be objects
  25. 25. SQL extensions hash types hstore
  26. 26. SQL and polymorphism inheritance ORMs hide the horror
  27. 27. turning round the rows columnar databases physical layout matters
  28. 28. turning round the rows key value type 1 A Home 2 B Work 3 C Work 4 D Work Row storage 00001 1 A Home 00002 2 B Work 00003 3 C Work … Column storage A B C D Home Work Work Work …
  29. 29. teaching an old SQL new tricks MySQL InfoBright SQL Server Columnar Indexes CREATE NONCLUSTERED COLUMNSTORE INDEX idx_col ON Orders (OrderDate, DueDate, ShipDate) Great for your data warehouse, but no use for OLTP
  30. 30. column for hadoop and other animals ORC files Parquet http://parquet.io
  31. 31. column families wide column databases
  32. 32. millions of columns eventually consistent CQL http://cassandra.apache.org/ http://www.datastax.com/ set | list | map types
  33. 33. cell level security SQL: so many views, so much confusion accumulo https://accumulo.apache.org/
  34. 34. time retrieving time series and graphs window functions SELECT business_date, ticker, close, close / LAG(close,1) OVER (PARTITION BY ticker ORDER BY business_date ASC) - 1 AS ret FROM sp500
  35. 35. time Open TSDB Influx DB key:sub-key:metric key:* key:*:metric
  36. 36. Queues
  37. 37. queues in SQL CREATE procedure [dbo].[Dequeue] AS set nocount on declare @BatchSize int set @BatchSize = 10 declare @Batch table (QueueID int, QueueDateTime datetime, Title nvarchar(255)) begin tran insert into @Batch select Top (@BatchSize) QueueID, QueueDateTime, Title from QueueMeta WITH (UPDLOCK, HOLDLOCK) where Status = 0 order by QueueDateTime ASC declare @ItemsToUpdate int set @ItemsToUpdate = @@ROWCOUNT update QueueMeta SET Status = 1 WHERE QueueID IN (select QueueID from @Batch) AND Status = 0 if @@ROWCOUNT = @ItemsToUpdate begin commit tran select b.*, q.TextData from @Batch b inner join QueueData q on q.QueueID = b.QueueID print 'SUCCESS' end else begin rollback tran print 'FAILED' end
  38. 38. queues in SQL index fragmentation is a problem but built in logs of a sort
  39. 39. message queues specialised apis capabilities like fan-out routing acknowledgement
  40. 40. relationships count Graph databases
  41. 41. relationships count trees and hierarchies overloaded relationships fancy algorithms
  42. 42. hierarchies with SQL adjacency lists CONSTRAIN fk_parent_id_id FOREIGN KEY parent_id REFERENCES some_table.id materialised path nested sets (MPTT) path = 1.2.23.55.786.33425 Node Left Right Depth A 1 22 1 B 2 9 2 C 10 21 2 D 3 4 3 E 5 6 3 F 7 8 3 G 11 18 3 H 19 20 3 I 12 13 4 J 14 15 4 K 16 17 4
  43. 43. Consolidation OrientDB ArangoDB http://www.orientechnologies.com/ SQLs with NoSQL Postgres MySQL https://www.arangodb.org/
  44. 44. Big Data
  45. 45. SQL on Hadoop
  46. 46. More than SQL Shark Drill Cascading Map Reduce
  47. 47. System issues, Speed issues, Soft issues
  48. 48. the ACID, BASE litmus Atomic Consistent Isolated Durable Basically Available Soft-state Eventually consistent what matters to you?
  49. 49. CAP it all Consistency Partition Availability
  50. 50. write fast, ask questions later SQL writes cost a lot mainly write workload: NoSQL low latency write workload: NoSQL
  51. 51. is it web scale? most NoSQL scales well but clusters still need management are you facebook? one machine is easier than n can ops handle it? app developers make bad admins
  52. 52. who is going to use it? analysts: they want SQL developers: they want applications data scientists: they want access
  53. 53. choose the right tool Photo: http://www.homespothq.com/
  54. 54. Thank you! Simon Elliston Ball simon@simonellistonball.com @sireb #noSQLknowSQL http://nosqlknowsql.io
  55. 55. Questions Simon Elliston Ball simon@simonellistonball.com @sireb http://nosqlknowsql.io

×