When to NoSQL and When ! 
to Know SQL 
Simon Elliston Ball 
Solutions Engineer - Hortonworks 
! 
@sireb 
! 
#noSQLknowSQL ...
what is NoSQL? 
SQL 
Not only SQL 
Many many things 
NoSQL 
No, SQL
before SQL 
files 
multi-value 
ur… hash maps?
after SQL 
everything is relational 
ORMs fill in the other data structures 
scale up rules 
data first design
and now NoSQL 
datastores that suit applications 
polyglot persistence: the right tools 
scale out rules 
APIs not EDWs
why should you care? 
data growth 
rapid development 
fewer migration headaches… maybe 
machine learning 
social
big bucks. 
number of tools. Median base salary is constant at $100k for those 
using up to 10 tools, but increases with n...
So many NoSQLs…
document databases
document databases 
rapid development 
JSON docs 
complex, variable models 
known access pattern
document databases 
learn a very new query language 
denormalize 
document form 
joins? JUST DON’T 
http://www.sarahmei.co...
document vs SQL 
what can SQL do? 
query all the angles 
sure, you can use blobs… 
… but you can’t get into them
documents in SQL 
SQL xml fields 
mapping xquery paths is painful 
native JSON 
but still structured
query everything: search 
class of database database 
full-text indexing
you know google, right… 
range query 
span query 
keyword query
you know the score 
scores 
"query": { 
"function_score": { 
"query": { 
"match": { "title": "NoSQL"} 
}, 
"functions": [ ...
SQL knows the score too 
scores 
declare @origin float = 0; 
declare @delay_weeks float = 4; 
! 
SELECT TOP 10 * FROM ( 
S...
you know google, right… 
more like this: instant tf-idf 
{ 
"more_like_this" : { 
"fields" : ["name.first", "name.last"], ...
Facets
Facets
Facets 
SQL: 
x lots 
SELECT a.name, count(p.id) FROM 
people p 
JOIN industry a on a.id = p.industry_id 
JOIN people_keyw...
Facets 
Elastic search: 
{ 
"query": { 
“query_string": { 
“default_field”: “content”, 
“query”: “keywords” 
} 
}, 
"facet...
logs 
untyped free-text documents 
timestamped 
semi-structured 
discovery 
aggregation and statistics
key: value 
close to your programming model 
distributed map | list | set 
keys can be objects
SQL extensions 
hash types 
hstore
SQL and polymorphism 
inheritance 
ORMs hide the horror
turning round the rows 
columnar databases 
physical layout matters
turning round the rows 
key! value type 
1 A Home 
2 B Work 
3 C Work 
4 D Work 
Row storage 
00001 1 A Home 00002 2 B Wor...
teaching an old SQL new tricks 
MySQL InfoBright 
SQL Server Columnar Indexes 
CREATE NONCLUSTERED COLUMNSTORE INDEX idx_c...
column for hadoop and other animals 
ORC files 
Parquet 
http://parquet.io
column families 
wide column databases
millions of columns 
eventually consistent 
CQL 
http://cassandra.apache.org/ 
http://www.datastax.com/ 
set | list | map ...
cell level security 
SQL: so many views, so much confusion 
accumulo https://accumulo.apache.org/
Time series
time 
retrieving time series and graphs 
window functions 
SELECT business_date, ticker, 
close, 
close / 
LAG(close,1) OV...
time 
Open TSDB 
Influx DB 
key:sub-key:metric 
key:* 
key:*:metric
Queues
queues in SQL 
CREATE procedure [dbo].[Dequeue] 
AS 
! 
set nocount on 
! 
declare @BatchSize int 
set @BatchSize = 10 
! ...
queues in SQL 
index fragmentation is a problem 
but built in logs of a sort
message queues 
specialised apis 
capabilities like fan-out 
routing 
acknowledgement
Kappa Architecture
relationships count 
Graph databases
relationships count 
trees and hierarchies 
overloaded relationships 
fancy algorithms
hierarchies with SQL 
adjacency lists 
CONSTRAIN fk_parent_id_id 
FOREIGN KEY parent_id REFERENCES some_table.id 
material...
Consolidation 
OrientDB 
http://www.orientechnologies.com/ 
https://www.arangodb.org/ 
ArangoDB 
SQLs with NoSQL 
Postgres...
Big Data
SQL on Hadoop
More than SQL 
Shark 
Drill 
Cascading 
Map Reduce
System issues, ! 
Speed issues,! 
Soft issues
the ACID, BASE litmus 
Atomic 
Consistent 
Isolated 
Durable 
Basically Available 
Soft-state 
Eventually consistent 
what...
CAP it all 
Consistency 
Partition Availability
write fast, ask questions later 
SQL writes cost a lot 
mainly write workload: NoSQL 
low latency write workload: NoSQL
is it web scale? 
most NoSQL scales well 
but clusters still need management 
are you facebook? one machine is easier than...
who is going to use it? 
analysts: they want SQL 
developers: they want applications 
data scientists: they want access
choose the ! 
right tool 
Photo: http://www.homespothq.com/
Thank you! 
Simon Elliston Ball 
simon@simonellistonball.com 
! 
@sireb 
#noSQLknowSQL 
http://nosqlknowsql.io
Questions 
Simon Elliston Ball 
simon@simonellistonball.com 
! 
@sireb 
http://nosqlknowsql.io
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barcelona 2014
Nächste SlideShare
Wird geladen in …5
×

Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barcelona 2014

795 Aufrufe

Veröffentlicht am

Simon Elliston Ball – When to NoSQL and When to Know SQL

With NoSQL, NewSQL and plain old SQL, there are so many tools around it’s not always clear which is the right one for the job.This is a look at a series of NoSQL technologies, comparing them against traditional SQL technology. I’ll compare real use cases and show how they are solved with both NoSQL options, and traditional SQL servers, and then see who wins. We’ll look at some code and architecture examples that fit a variety of NoSQL techniques, and some where SQL is a better answer. We’ll see some big data problems, little data problems, and a bunch of new and old database technologies to find whatever it takes to solve the problem.By the end you’ll hopefully know more NoSQL, and maybe even have a few new tricks with SQL, and what’s more how to choose the right tool for the job.

Veröffentlicht in: Daten & Analysen
0 Kommentare
0 Gefällt mir
Statistik
Notizen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Keine Downloads
Aufrufe
Aufrufe insgesamt
795
Auf SlideShare
0
Aus Einbettungen
0
Anzahl an Einbettungen
2
Aktionen
Geteilt
0
Downloads
23
Kommentare
0
Gefällt mir
0
Einbettungen 0
Keine Einbettungen

Keine Notizen für die Folie

Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barcelona 2014

  1. 1. When to NoSQL and When ! to Know SQL Simon Elliston Ball Solutions Engineer - Hortonworks ! @sireb ! #noSQLknowSQL http://nosqlknowsql.io
  2. 2. what is NoSQL? SQL Not only SQL Many many things NoSQL No, SQL
  3. 3. before SQL files multi-value ur… hash maps?
  4. 4. after SQL everything is relational ORMs fill in the other data structures scale up rules data first design
  5. 5. and now NoSQL datastores that suit applications polyglot persistence: the right tools scale out rules APIs not EDWs
  6. 6. why should you care? data growth rapid development fewer migration headaches… maybe machine learning social
  7. 7. big bucks. number of tools. Median base salary is constant at $100k for those using up to 10 tools, but increases with new tools after that.9 O’Reilly 2013 Data Science Salary Survey Given the two patterns we have just examined—the relationships be‐tween cluster tools and respondents’ overall tool counts, and between
  8. 8. So many NoSQLs…
  9. 9. document databases
  10. 10. document databases rapid development JSON docs complex, variable models known access pattern
  11. 11. document databases learn a very new query language denormalize document form joins? JUST DON’T http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
  12. 12. document vs SQL what can SQL do? query all the angles sure, you can use blobs… … but you can’t get into them
  13. 13. documents in SQL SQL xml fields mapping xquery paths is painful native JSON but still structured
  14. 14. query everything: search class of database database full-text indexing
  15. 15. you know google, right… range query span query keyword query
  16. 16. you know the score scores "query": { "function_score": { "query": { "match": { "title": "NoSQL"} }, "functions": [ "boost": 1, "gauss": { "timestamp": { "scale": "4w" } }, "script_score" : { "script" : "_score * doc['important_document'].value ? 2 : 1" } ], "score_mode": "sum" } }
  17. 17. SQL knows the score too scores declare @origin float = 0; declare @delay_weeks float = 4; ! SELECT TOP 10 * FROM ( SELECT title, score * CASE WHEN p.important = 1 THEN 2.0 WHEN p.important = 0 THEN 1.0 END * exp(-power(timestamp-@origin,2)/(2*@delay*7*24*3600)) + 1 AS score FROM posts p WHERE title LIKE '%NoSQL%' ) as found ORDER BY score
  18. 18. you know google, right… more like this: instant tf-idf { "more_like_this" : { "fields" : ["name.first", "name.last"], "like_text" : "text like this one", "min_term_freq" : 1, "max_query_terms" : 12 } }
  19. 19. Facets
  20. 20. Facets
  21. 21. Facets SQL: x lots SELECT a.name, count(p.id) FROM people p JOIN industry a on a.id = p.industry_id JOIN people_keywords pk on pk.person_id = p.id JOIN keywords k on k.id = pk.keyword_id WHERE CONTAINS(p.description, 'NoSQL') OR k.name = 'NoSQL' ... GROUP BY a.name SELECT a.name, count(p.id) FROM people p JOIN area a on a.id = p.area_id JOIN people_keywords pk on pk.person_id = p.id JOIN keywords k on k.id = pk.keyword_id WHERE CONTAINS(p.description, 'NoSQL') OR k.name = 'NoSQL' ... GROUP BY a.name
  22. 22. Facets Elastic search: { "query": { “query_string": { “default_field”: “content”, “query”: “keywords” } }, "facets": { “myTerms": { "terms": { "field" : "lang", "all_terms" : true } } } }
  23. 23. logs untyped free-text documents timestamped semi-structured discovery aggregation and statistics
  24. 24. key: value close to your programming model distributed map | list | set keys can be objects
  25. 25. SQL extensions hash types hstore
  26. 26. SQL and polymorphism inheritance ORMs hide the horror
  27. 27. turning round the rows columnar databases physical layout matters
  28. 28. turning round the rows key! value type 1 A Home 2 B Work 3 C Work 4 D Work Row storage 00001 1 A Home 00002 2 B Work 00003 3 C Work … Column storage A B C D Home Work Work Work …
  29. 29. teaching an old SQL new tricks MySQL InfoBright SQL Server Columnar Indexes CREATE NONCLUSTERED COLUMNSTORE INDEX idx_col ON Orders (OrderDate, DueDate, ShipDate) Great for your data warehouse, but no use for OLTP
  30. 30. column for hadoop and other animals ORC files Parquet http://parquet.io
  31. 31. column families wide column databases
  32. 32. millions of columns eventually consistent CQL http://cassandra.apache.org/ http://www.datastax.com/ set | list | map types
  33. 33. cell level security SQL: so many views, so much confusion accumulo https://accumulo.apache.org/
  34. 34. Time series
  35. 35. time retrieving time series and graphs window functions SELECT business_date, ticker, close, close / LAG(close,1) OVER (PARTITION BY ticker ORDER BY business_date ASC) - 1 AS ret FROM sp500
  36. 36. time Open TSDB Influx DB key:sub-key:metric key:* key:*:metric
  37. 37. Queues
  38. 38. queues in SQL CREATE procedure [dbo].[Dequeue] AS ! set nocount on ! declare @BatchSize int set @BatchSize = 10 ! declare @Batch table (QueueID int, QueueDateTime datetime, Title nvarchar(255)) ! begin tran ! insert into @Batch select Top (@BatchSize) QueueID, QueueDateTime, Title from QueueMeta WITH (UPDLOCK, HOLDLOCK) where Status = 0 order by QueueDateTime ASC ! declare @ItemsToUpdate int set @ItemsToUpdate = @@ROWCOUNT ! update QueueMeta SET Status = 1 WHERE QueueID IN (select QueueID from @Batch) AND Status = 0 ! if @@ROWCOUNT = @ItemsToUpdate begin commit tran select b.*, q.TextData from @Batch b inner join QueueData q on q.QueueID = b.QueueID print 'SUCCESS' end else begin rollback tran print 'FAILED' end
  39. 39. queues in SQL index fragmentation is a problem but built in logs of a sort
  40. 40. message queues specialised apis capabilities like fan-out routing acknowledgement
  41. 41. Kappa Architecture
  42. 42. relationships count Graph databases
  43. 43. relationships count trees and hierarchies overloaded relationships fancy algorithms
  44. 44. hierarchies with SQL adjacency lists CONSTRAIN fk_parent_id_id FOREIGN KEY parent_id REFERENCES some_table.id materialised path nested sets (MPTT) path = 1.2.23.55.786.33425 Node Left Right Depth A 1 22 1 B 2 9 2 C 10 21 2 D 3 4 3 E 5 6 3 F 7 8 3 G 11 18 3 H 19 20 3 I 12 13 4 J 14 15 4 K 16 17 4
  45. 45. Consolidation OrientDB http://www.orientechnologies.com/ https://www.arangodb.org/ ArangoDB SQLs with NoSQL Postgres MySQL
  46. 46. Big Data
  47. 47. SQL on Hadoop
  48. 48. More than SQL Shark Drill Cascading Map Reduce
  49. 49. System issues, ! Speed issues,! Soft issues
  50. 50. the ACID, BASE litmus Atomic Consistent Isolated Durable Basically Available Soft-state Eventually consistent what matters to you?
  51. 51. CAP it all Consistency Partition Availability
  52. 52. write fast, ask questions later SQL writes cost a lot mainly write workload: NoSQL low latency write workload: NoSQL
  53. 53. is it web scale? most NoSQL scales well but clusters still need management are you facebook? one machine is easier than n can ops handle it? app developers make bad admins
  54. 54. who is going to use it? analysts: they want SQL developers: they want applications data scientists: they want access
  55. 55. choose the ! right tool Photo: http://www.homespothq.com/
  56. 56. Thank you! Simon Elliston Ball simon@simonellistonball.com ! @sireb #noSQLknowSQL http://nosqlknowsql.io
  57. 57. Questions Simon Elliston Ball simon@simonellistonball.com ! @sireb http://nosqlknowsql.io

×