SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
My	
  name	
  is	
  Irakli.	
  Let	
  me	
  give	
  you	
  some	
  background	
  about	
  myself	
  and	
  how	
  I	
  tricked	
  
conference	
  organizers	
  into	
  thinking	
  that	
  I	
  was	
  qualified	
  to	
  talk	
  today.	
  J	
  
	
  
I	
  am	
  a	
  director	
  of	
  engineering	
  at	
  Na?onal	
  Public	
  Radio.	
  Which	
  is	
  a	
  fancy	
  way	
  of	
  
saying:	
  I	
  lead	
  the	
  soDware	
  team	
  that	
  is	
  responsible	
  for	
  the	
  code	
  behind	
  npr.org,	
  NPR	
  
API	
  and	
  NPR	
  mobile	
  apps.	
  
	
  
Prior	
  to	
  joining	
  NPR,	
  I	
  spent	
  several	
  years	
  developing	
  open-­‐source	
  products	
  for	
  the	
  
online	
  publishing	
  industry.	
  Some	
  of	
  these	
  products	
  are	
  now	
  used	
  by	
  news	
  
organiza?ons	
  like:	
  The	
  Na?on,	
  The	
  New	
  Republic,	
  Thomson	
  Reuters	
  and	
  Al	
  Jazeera.	
  
	
  
I	
  have	
  been	
  using	
  document-­‐based	
  [or,	
  so-­‐called:	
  NoSQL]	
  databases,	
  on	
  and	
  off,	
  for	
  
almost	
  a	
  year,	
  now	
  and	
  have	
  enjoyed	
  the	
  experience	
  a	
  lot!	
  Because	
  I	
  enjoyed	
  it	
  so	
  
much,	
  I	
  wanted	
  to	
  share	
  my	
  story	
  at	
  this	
  conference.	
  I	
  contacted	
  the	
  organizers	
  and	
  
they	
  kindly	
  agreed	
  [I	
  hope	
  they	
  will	
  not	
  regret	
  it	
  by	
  the	
  ?me	
  we	
  are	
  done	
  J].	
  	
  
	
  
So	
  here	
  it	
  is:	
  one	
  guy’s	
  story	
  of	
  falling	
  in	
  love	
  with	
  the	
  document	
  databases	
  and	
  why	
  
he	
  thinks	
  they	
  have	
  a	
  significant	
  role	
  in	
  online	
  publishing,	
  specifically.	
  




                                                                                                                                            1	
  
One	
  of	
  the	
  main	
  reasons	
  why	
  I	
  love	
  document	
  databases	
  is:	
  because	
  it	
  is	
  a	
  truly	
  
disrup?ve	
  technology.	
  And	
  when	
  we	
  say	
  “disrup?ve	
  technology”	
  we	
  mean	
  something	
  
so	
  innova?ve	
  that	
  it	
  helps	
  create	
  fundamentally	
  new	
  value	
  network,	
  thus	
  altering	
  
exis?ng	
  market	
  and	
  disrup?ng	
  legacy	
  technologies	
  in	
  the	
  market.	
  	
  
	
  
The	
  innova?on	
  of	
  disrup?ve	
  technologies	
  is	
  not	
  just	
  an	
  incremental	
  progression	
  over	
  
exis?ng	
  capabili?es.	
  Rather	
  it	
  is	
  a	
  fundamentally	
  re-­‐thought,	
  novel	
  approach	
  to	
  
solving	
  hard	
  problems.	
  
	
  
For	
  instance,	
  there’re	
  many	
  good	
  SQL	
  databases,	
  both	
  open-­‐source	
  as	
  well	
  as:	
  
commercial.	
  And	
  everybody	
  has	
  their	
  favorite:	
  some	
  like	
  SQL	
  server	
  X’s	
  simplicity,	
  
others:	
  love	
  the	
  power	
  of	
  the	
  database	
  Y	
  etc.	
  But	
  fundamentally	
  SQL	
  is	
  one	
  way	
  to	
  
model	
  data	
  and	
  solve	
  data-­‐warehousing	
  problems.	
  It	
  has	
  its	
  ?me-­‐proven	
  advantages,	
  
as	
  well	
  as	
  some	
  significant	
  shortcomings.	
  	
  
	
  
Document	
  databases	
  are	
  an	
  architecturally	
  different	
  approach	
  to	
  solving	
  data	
  
problems.	
  They	
  are	
  not	
  a	
  drop-­‐in	
  replacement	
  or	
  an	
  incremental	
  improvment	
  over	
  
SQL.	
  They	
  do	
  have	
  their	
  own	
  shortcomings,	
  but	
  they	
  also	
  allow	
  solving	
  problems	
  that	
  
were	
  either	
  very	
  hard	
  or	
  impossible	
  to	
  solve	
  with	
  the	
  tradi?onal,	
  SQL-­‐oriented	
  
databases.	
  
	
  




                                                                                                                                     2	
  
Tradi?onal,	
  SQL	
  database	
  theory	
  has	
  strong	
  emphasis	
  on	
  ACID	
  compliance.	
  You	
  
probably	
  remember	
  that	
  ACID	
  stands	
  for:	
  Atomicity,	
  Consistency,	
  Isola?on	
  and	
  
Durability.	
  
	
  
The	
  Consistency	
  property	
  ensures	
  that	
  no	
  database	
  transac?on	
  violates	
  referen?al	
  
integrity	
  rules	
  defined	
  in	
  the	
  database	
  schema.	
  
	
  
Isola?on	
  is	
  a	
  requirement	
  that	
  asserts	
  that,	
  given	
  concurrent	
  access	
  to	
  data,	
  parallel	
  
opera?ons	
  cannot	
  access	
  data	
  that	
  is	
  being	
  modified	
  by	
  a	
  another	
  transac?on,	
  but	
  
have	
  to	
  wait	
  un?l	
  the	
  transac?on	
  completes.	
  Isola?on	
  is	
  commonly	
  implemented	
  
with	
  pessimis?c	
  locking.	
  
	
  
Isola?on	
  and	
  Consistency	
  requirements	
  in	
  ACID-­‐compliance	
  cons?tute	
  a	
  fundamental	
  
problem	
  for	
  system’s	
  scalability.	
  	
  




                                                                                                                                 3	
  
To	
  put	
  it	
  in	
  the	
  words	
  of	
  Werner	
  Vogels,	
  CTO	
  of	
  Amazon	
  and	
  one	
  of	
  the	
  foremost	
  
experts	
  in	
  the	
  field	
  of	
  distributed	
  compu?ng:	
  	
  
	
  
“If	
  you’re	
  concerned	
  about	
  scalability,	
  any	
  algorithm	
  that	
  forces	
  you	
  to	
  run	
  
agreement	
  will	
  eventually	
  become	
  your	
  boaleneck.	
  Take	
  that	
  as	
  a	
  given.”	
  
	
  
ACID-­‐compliance	
  is	
  all	
  about	
  various	
  processes	
  [and	
  nodes],	
  in	
  the	
  system,	
  checking	
  
with	
  each-­‐other	
  to	
  keep	
  data	
  consistent	
  across	
  the	
  en?re	
  system.	
  Therefore,	
  it’s	
  not	
  
as	
  much	
  about	
  how	
  well-­‐implemented	
  master-­‐slave	
  or	
  master-­‐master	
  replica?on	
  in	
  
your	
  database	
  is,	
  but	
  the	
  bigger	
  challenge	
  is	
  the	
  architectural	
  constraint	
  that	
  ACID-­‐
compliance	
  imposes	
  on	
  scalability.	
  




                                                                                                                                     4	
  
How	
  important	
  is	
  scalability	
  for	
  a	
  Web	
  system?	
  Is	
  it	
  something	
  that	
  maaers	
  just	
  for	
  
Amazon,	
  Facebook,	
  Google	
  and	
  alike?	
  
	
  
Internet	
  is	
  an	
  incredibly	
  fast-­‐growing	
  medium.	
  It	
  took	
  radio	
  38	
  years	
  aDer	
  
introduc?on	
  to	
  reach	
  50	
  MM	
  users,	
  it	
  took	
  television	
  13	
  years,	
  Internet	
  did	
  it	
  in	
  just	
  4	
  
and	
  it	
  has	
  been	
  growing	
  exponen?ally	
  ever	
  since.	
  




                                                                                                                                               5	
  
In	
  a	
  report	
  published	
  in	
  June,	
  this	
  year,	
  Cisco	
  forecasted	
  that	
  global	
  IP	
  traffic	
  will	
  
quadruple	
  by	
  2015.	
  It	
  means:	
  more	
  users,	
  larger	
  amount	
  of	
  content,	
  more	
  types	
  of	
  
content,	
  more	
  sources	
  of	
  content	
  and	
  more	
  real-­‐?me	
  content.	
  In	
  this	
  context,	
  by	
  
“real-­‐?me-­‐content”	
  I	
  mean	
  things	
  like:	
  check-­‐ins,	
  coverage	
  of	
  live	
  events	
  and	
  ci?zen	
  
journalism	
  during	
  breaking	
  news.	
  
	
  
Now,	
  most	
  of	
  us	
  in	
  the	
  content-­‐produc?on	
  industry,	
  believe	
  that	
  having	
  more	
  traffic	
  
and	
  more	
  content	
  is	
  good	
  news.	
  Scratch	
  that:	
  it’s	
  great	
  news!	
  As	
  a	
  maaer	
  of	
  fact,	
  
Internet	
  community	
  has	
  goaen	
  so	
  obsessed	
  by	
  the	
  amount	
  of	
  website	
  traffic	
  that	
  it	
  is	
  
oDen	
  used	
  as	
  the	
  most	
  significant	
  measure	
  of	
  a	
  website’s	
  success	
  or	
  failure.	
  	
  
	
  
So:	
  more	
  traffic	
  is	
  good	
  news…	
  except	
  and	
  unless	
  you	
  are	
  the	
  developer	
  responsible	
  
for	
  making	
  sure	
  the	
  website	
  is	
  s?ll	
  up	
  and	
  running	
  when	
  traffic	
  quadruples.	
  




                                                                                                                                     6	
  
We	
  started	
  scalability	
  discussion	
  by	
  men?oning	
  the	
  scalability	
  limita?ons	
  that	
  ACID-­‐
compliance	
  requirement	
  enforces.	
  	
  This	
  constraint	
  is	
  actually	
  a	
  specific	
  case	
  of	
  a	
  more	
  
generic	
  theorem	
  called:	
  Brewer’s	
  or	
  CAP	
  Theorem.	
  
	
  
The	
  theorem	
  was	
  formulated	
  as	
  a	
  conjecture	
  by	
  a	
  UC	
  Berkeley	
  professor:	
  Eric	
  Brewer	
  
in	
  2000.	
  Two	
  years	
  later,	
  Seth	
  Gilbert	
  and	
  Nancy	
  Lynch	
  of	
  MIT	
  published	
  a	
  formal	
  
proof	
  of	
  Brewer's	
  conjecture.	
  
	
  
CAP	
  Theorem	
  states	
  that,	
  when	
  designing	
  distributed	
  soDware	
  systems	
  there	
  are	
  
three	
  proper?es	
  that	
  are	
  commonly	
  desired:	
  
1.  Consistency	
  
2.  Availability	
  and	
  
3.  Par??on	
  Tolerance,	
  	
  
	
  
Theorem	
  proves	
  that	
  it	
  is	
  impossible	
  to	
  achieve	
  all	
  three	
  at	
  the	
  same	
  ?me[1].	
  
	
  
Even	
  though	
  names	
  sound	
  intui?ve,	
  it	
  is	
  probably	
  worth-­‐while	
  to	
  clarify	
  what	
  Gilbert	
  
and	
  Lynch	
  meant	
  by	
  each	
  of	
  the	
  defini?ons	
  in	
  CAP,	
  since	
  there	
  are	
  mul?ple	
  
(some?mes	
  contradictory)	
  and	
  confusing	
  defini?ons	
  floa?ng	
  around	
  the	
  web.	
  
	
  
	
  




                                                                                                                                    7	
  
Consistency	
  basically	
  stands	
  for	
  the	
  requirement	
  that	
  all	
  nodes	
  in	
  a	
  distributed	
  
system	
  must	
  see	
  the	
  same	
  data	
  all	
  the	
  ?me	
  (subset	
  of	
  ACID	
  compliance).	
  
Availability	
  means:	
  every	
  request	
  should	
  succeed	
  to	
  receive	
  a	
  response.	
  System	
  as	
  a	
  
whole	
  should	
  be	
  highly	
  available.	
  
Par??on	
  Tolerance,	
  in	
  a	
  distributed	
  system,	
  means	
  system	
  should	
  allow	
  some	
  fault-­‐
tolerance.	
  When	
  some	
  nodes	
  crash	
  or	
  some	
  communica?ons	
  links	
  fail,	
  it	
  is	
  important	
  
that	
  system	
  s?ll	
  performs	
  as	
  expected.	
  




                                                                                                                              8	
  
Let’s	
  look	
  at	
  some	
  popular	
  distributed	
  data	
  storage	
  systems	
  that	
  you	
  are	
  probably	
  
familiar	
  with	
  and	
  see	
  which	
  bucket	
  they	
  fall	
  into	
  in	
  the	
  CAP	
  spectrum.	
  
	
  
Rela?onal	
  databases,	
  LDAP	
  directory	
  servers	
  and	
  xFS	
  file-­‐systems	
  are	
  all	
  examples	
  of	
  
consistent	
  and	
  available	
  distributed	
  systems.	
  They	
  are	
  consistent	
  because	
  they	
  
provide	
  ACID	
  compliance.	
  They	
  are	
  not	
  par??on-­‐tolerant	
  because	
  they	
  do	
  not	
  have	
  a	
  
quorum	
  system	
  for	
  removing	
  unreachable	
  nodes	
  from	
  the	
  system.	
  




                                                                                                                              9	
  
MongoDB,	
  Terrastore,	
  Redis	
  and	
  BigTable	
  all	
  guarantee	
  consistency,	
  and	
  they	
  use	
  
quorum	
  for	
  par??on	
  tolerance	
  but	
  they	
  forfeit	
  Availability.	
  	
  




                                                                                                                    10	
  
Domain	
  Name	
  Service	
  (yeap,	
  the	
  one	
  that	
  drives	
  all	
  internet	
  traffic),	
  CouchDB,	
  Riak	
  
and	
  Cassandra	
  are	
  all	
  examples	
  of	
  Available	
  and	
  Par??on-­‐tolerant	
  distributed	
  
systems.	
  They	
  do	
  not	
  guarantee	
  consistency.	
  Rather	
  they	
  provide	
  a	
  promise	
  of	
  
something	
  known	
  as	
  “eventual	
  consistency”.	
  	
  
	
  
For	
  any	
  given	
  request,	
  you	
  may	
  receive	
  a	
  value	
  that	
  is	
  globally	
  stale	
  (system-­‐wide)	
  
and	
  definitely	
  not	
  isolated	
  per	
  ACID-­‐compliance	
  requirements,	
  but	
  eventually	
  all	
  
nodes	
  will	
  sync-­‐up.	
  
	
  
Not	
  “running	
  agreement-­‐based	
  algorithm”,	
  that	
  Amazon’s	
  Werner	
  Vogels	
  was	
  
preaching,	
  is	
  exactly	
  the	
  sacrifice	
  that	
  systems	
  like	
  CouchDB	
  and	
  DNS	
  make	
  to	
  
provide	
  extreme	
  scalability	
  and	
  fault-­‐tolerance.	
  




                                                                                                                                   11	
  
In	
  his	
  2000	
  keynote	
  at	
  the	
  ACM	
  Symposium	
  on	
  Principles	
  of	
  Distributed	
  Compu?ng	
  
(the	
  same	
  one	
  where	
  he	
  formulated	
  CAP	
  theorem),	
  Dr.	
  Brewer	
  also	
  came	
  up	
  with	
  a	
  
new	
  defini?on	
  he	
  called:	
  BASE.	
  
	
  
BASE	
  stands	
  for:	
  Basically	
  Available	
  SoD-­‐state,	
  Eventual-­‐consistency.	
  
	
  
He	
  formulated	
  and	
  used	
  BASE	
  principles	
  to	
  demonstrate	
  the	
  trade-­‐offs	
  and	
  
differences	
  from	
  ACID-­‐compliant	
  systems	
  




                                                                                                                               12	
  
ACID-­‐compliant	
  systems	
  have	
  following	
  traits:	
  consistency,	
  isola?on,	
  focus	
  on	
  
commit,	
  nested	
  transac?ons,	
  pessimis?c	
  locking	
  and	
  typically	
  they	
  are	
  fixed	
  schema-­‐
based,	
  therefore:	
  inflexible	
  to	
  evolve.	
  




                                                                                                                     13	
  
In	
  contrast,	
  BASE	
  systems	
  exhibit:	
  weak	
  consistency,	
  availability	
  priori?zed	
  above	
  
else,	
  best-­‐effort	
  approach	
  to	
  conflict-­‐resolu?on,	
  op?mis?c	
  locking.	
  Systems	
  with	
  the	
  
BASE	
  philosophy	
  consider	
  approximate	
  responses	
  to	
  be	
  OK,	
  are	
  architecturally	
  
simpler,	
  faster	
  and	
  evolve	
  flexibly,	
  since	
  they	
  are	
  typically	
  schema-­‐less.	
  
	
  




                                                                                                                        14	
  
CouchDB	
  is	
  not	
  a	
  “beaer	
  MySQL”	
  or	
  a	
  “simpler	
  Oracle”.	
  It	
  is	
  really	
  good	
  at	
  availability	
  
and	
  par??on	
  tolerance	
  and	
  has	
  many	
  traits	
  making	
  it	
  a	
  beaer	
  tool	
  for	
  some	
  of	
  the	
  
problems	
  tradi?onally	
  solved	
  with	
  rela?onal	
  databases.	
  But	
  one	
  thing	
  it	
  is	
  not:	
  it	
  is	
  
not	
  a	
  drop-­‐in	
  replacement	
  for	
  SQL	
  databases.	
  	
  
	
  
There	
  are	
  tradeoffs	
  when	
  choosing	
  a	
  document	
  database,	
  and	
  specifically:	
  CouchDB.	
  
The	
  most	
  obvious	
  and	
  honestly	
  “scary”	
  tradeoff	
  is:	
  forfei?ng	
  Consistency.	
  
	
  
We	
  as	
  computer	
  scien?sts	
  were	
  trained	
  hard	
  and	
  log	
  that	
  data	
  must	
  be	
  consistent,	
  
models	
  must	
  be	
  normalized,	
  referen?al	
  integri?es	
  must	
  be	
  maintained	
  and	
  etc.	
  How	
  
can	
  we	
  even	
  dream	
  about	
  forfei?ng	
  consistency	
  even	
  for	
  scalability	
  and	
  fault-­‐
tolerance?	
  
	
  
	
  




                                                                                                                                           15	
  
The	
  reality,	
  however	
  is	
  that	
  there	
  are	
  systems	
  engineering	
  problems	
  where	
  strict	
  data	
  
consistency	
  is	
  crucial,	
  but	
  there	
  are	
  many	
  where	
  -­‐	
  it	
  is	
  not.	
  If	
  you	
  are	
  building	
  a	
  stock	
  
trading	
  soDware	
  you	
  should	
  probably	
  use	
  a	
  data	
  storage	
  that	
  guarantees	
  consistency.	
  
Financial	
  systems,	
  in	
  general	
  require	
  high-­‐level	
  of	
  consistency,	
  but	
  it	
  is	
  not	
  given	
  for	
  
just	
  any	
  system.	
  Anybody	
  who	
  has	
  built	
  a	
  real-­‐life,	
  high-­‐throughput	
  system	
  knows	
  
that	
  in	
  many	
  cases	
  you	
  end-­‐up	
  de-­‐normalizing	
  data	
  model	
  to	
  allow	
  for	
  beaer	
  
performance.	
  It	
  is	
  similar	
  to	
  forfei?ng	
  consistency	
  in	
  the	
  CAP	
  model.	
  
	
  
With	
  a	
  document-­‐based	
  database	
  like	
  Couch,	
  some	
  of	
  your	
  request	
  may	
  occasionally	
  
return	
  slightly	
  stale	
  data.	
  Addi?onally,	
  data	
  in	
  document	
  format	
  is	
  oDen	
  highly	
  de-­‐
normalized	
  and	
  less	
  referen?ally	
  consistent	
  than	
  data	
  in	
  a	
  fully	
  normalized,	
  rela?onal	
  
database.	
  	
  
	
  
However,	
  if	
  you	
  are	
  building	
  a	
  news	
  publishing	
  website	
  none	
  of	
  this	
  is	
  unheard	
  of.	
  
High-­‐traffic	
  news	
  websites	
  have	
  been	
  de-­‐normalizing	
  data	
  and	
  implemen?ng	
  
aggressive	
  caching	
  for	
  years.	
  This	
  is	
  neither	
  new	
  or	
  radical.	
  On	
  the	
  contrary,	
  instead	
  
of:	
  home-­‐cooked	
  and	
  half-­‐baked,	
  proprietary	
  solu?ons,	
  now	
  we	
  can	
  use	
  a	
  standard,	
  
open-­‐source,	
  highly	
  op?mized,	
  well	
  tested	
  solu?on	
  like	
  CouchDB.	
  	
  
	
  
Personally,	
  I	
  think	
  it’s	
  	
  a	
  preay	
  good	
  deal.	
  




                                                                                                                                                     16	
  
At	
  this	
  point,	
  I’ve	
  spent	
  good	
  por?on	
  of	
  this	
  presenta?on	
  explaining	
  the	
  scalability	
  
profile	
  of	
  CouchDB	
  (and	
  similar	
  systems);	
  discussed	
  how	
  improvements	
  are	
  not	
  
quan?ta?ve	
  but	
  are	
  fundamentally	
  qualita?ve.	
  We	
  have	
  also	
  talked	
  about	
  tradeoffs	
  
that	
  the	
  increased	
  availability	
  imposes.	
  	
  
	
  
Let’s	
  forget	
  about	
  scalability	
  for	
  now,	
  however,	
  and	
  talk	
  about	
  other	
  characteris?cs	
  
of	
  CouchDB	
  as	
  a	
  document	
  storage	
  engine.	
  ADer	
  all,	
  CouchDB	
  is	
  not	
  the	
  only	
  
document	
  database	
  and	
  there	
  are	
  document	
  databases	
  that	
  do	
  guarantee	
  data	
  
consistency,	
  so	
  forfei?ng	
  consistency	
  is	
  actually	
  a	
  trait	
  of	
  AP	
  systems	
  (in	
  CAP	
  model),	
  
not:	
  that	
  of	
  document	
  databases	
  in	
  general.	
  
	
  
An	
  important	
  trait	
  of	
  document	
  databases,	
  however,	
  is	
  that	
  they	
  are	
  schema-­‐less.	
  
There	
  is	
  no	
  pre-­‐defined,	
  strict	
  schema,	
  no	
  table	
  structures	
  or	
  rigid	
  rela?onships	
  
between	
  document	
  types.	
  Document	
  types	
  live	
  in	
  a	
  free	
  world	
  and	
  evolve	
  very	
  
flexibly.	
  
	
  
	
  




                                                                                                                                     17	
  
OK,	
  this	
  is	
  by	
  far	
  one	
  of	
  my	
  ugliest	
  slides.	
  And	
  what	
  you	
  see	
  here	
  is	
  a	
  rough	
  ER	
  
diagram	
  generated	
  off	
  a	
  fresh,	
  vanilla	
  installa?on	
  of	
  a	
  popular	
  open-­‐source	
  content	
  
management	
  system:	
  Drupal.	
  	
  There	
  are	
  72	
  tables	
  on	
  this	
  diagram.	
  	
  
	
  
Some	
  of	
  you	
  may	
  be	
  familiar	
  with	
  Drupal.	
  It	
  is	
  highly	
  extensible	
  (and	
  generally	
  really	
  
awesome),	
  but	
  it	
  does	
  not	
  do	
  much	
  out	
  of	
  the	
  box.	
  So	
  when	
  we	
  used	
  Drupal	
  for	
  
crea?ng	
  websites	
  like	
  that	
  of	
  The	
  Na?on	
  or	
  The	
  New	
  Republic,	
  we	
  installed	
  dozens	
  of	
  
addi?onal	
  Drupal	
  modules	
  and	
  wrote	
  a	
  bunch	
  on	
  top	
  ourselves.	
  Meaning:	
  we	
  added	
  
even	
  more	
  tables.	
  And	
  you	
  can	
  clearly	
  see	
  how	
  unreadable	
  this	
  schema	
  already	
  is.	
  
Obviously	
  we	
  never	
  even	
  tried	
  to	
  visualize	
  en?re	
  data-­‐model	
  on	
  any	
  real	
  projects,	
  
because	
  it	
  would	
  have	
  been	
  useless.	
  




                                                                                                                                             18	
  
The	
  same	
  data	
  model	
  in	
  a	
  document-­‐based	
  database,	
  would	
  look	
  like	
  this:	
  (see	
  slide)	
  
	
  
I	
  know,	
  I	
  know!	
  I	
  am	
  exaggera?ng,	
  obviously	
  we	
  would	
  have	
  more	
  than	
  one	
  logical	
  
type	
  of	
  a	
  document	
  even	
  in	
  a	
  document	
  database,	
  but	
  schema-­‐less	
  modeling	
  means:	
  
at	
  the	
  physical	
  level	
  it	
  is	
  just	
  one	
  document	
  type,	
  so	
  what	
  you	
  see	
  here	
  is	
  really	
  not	
  
that	
  far	
  from	
  reality	
  as	
  far	
  as	
  actual	
  data	
  storage	
  goes.	
  Most	
  things	
  above	
  and	
  beyond	
  
are	
  really	
  part	
  of	
  the	
  applica?on	
  logic	
  and	
  business	
  rules.	
  
	
  
Since	
  my	
  presenta?on	
  is	
  one	
  of	
  the	
  last	
  ones	
  at	
  this	
  conference,	
  I	
  am	
  sure	
  you	
  have	
  
already	
  listened	
  to	
  presenters	
  who	
  went	
  in	
  great	
  detail	
  about	
  data-­‐modeling	
  in	
  
CouchDB	
  and	
  I	
  am	
  sure	
  they	
  are	
  much	
  bigger	
  experts	
  of	
  the	
  subject	
  than	
  I	
  am.	
  So	
  I	
  
will	
  spare	
  you	
  the	
  experience.	
  
	
  
Suffice	
  to	
  say	
  that	
  embedding	
  documents	
  greatly	
  simplifies	
  data	
  models.	
  Think	
  about	
  
just	
  the	
  amount	
  of	
  so-­‐called	
  “mapping”	
  tables	
  that	
  rela?onal	
  systems	
  need	
  to	
  model	
  
things	
  like:	
  many-­‐to-­‐many	
  rela?onships.	
  
	
  
Also,	
  in	
  the	
  case	
  of	
  online	
  publishing	
  specifically,	
  most	
  business	
  objects	
  are…	
  well,	
  
documents	
  so	
  having	
  a	
  storage	
  engine	
  that	
  operates	
  in	
  terms	
  of	
  documents	
  is	
  
extremely	
  natural	
  and	
  enjoyable.	
  There’s	
  much	
  less	
  discrepancy	
  between	
  physical	
  
and	
  logical	
  models.	
  Things,	
  in	
  most	
  cases,	
  just	
  make	
  sense	
  and	
  fall	
  in	
  line	
  naturally.	
  




                                                                                                                                                19	
  
Another important, stark difference between relational databases and CouchDB
is the absence of a query language. As most other things about CouchDB, it’s
pretty “scary” for the newcomers. So much so, that some other document
databases have actually opted to implementing an SQL-like syntax (MongoDB
for instance) and I know a lot of people who appreciate that.

In contrast, CouchDB uses Map/Reduce, first filtering the data with a Map
function and then (optionally) grouping it with a Reduce function, if needed. The
documents, result of a map function as well as reduce function are all saved on a
B-tree (the secret sauce of CouchDB’s performance). If in a relational database
you would have normalized data and then you would index some columns from
that data, most things in Couch are a B-tree index to begin with.

This has significant consequences and much like in the case with forfeiting data
consistency, there are some real trade-offs to be made. While Map/Reduce is
very powerful, obviously you will find some queries that you could run in SQL
that are either impossible to model with a View or are too expensive/too slow.
Also, Views are not as dynamic as SQL queries. They are built incrementally and
a complete rebuild of one, in a large database is an expensive operation. As
such, it really pays off to carefully think through the Views that a system will be
using at the early stages of the system design.




                                                                                      20	
  
The	
  good	
  news	
  is:	
  in	
  online	
  publishing	
  most	
  user-­‐facing	
  content	
  is	
  a	
  document	
  type,	
  a	
  
lis?ng	
  of	
  documents	
  and	
  an	
  aggrega?on	
  -­‐-­‐	
  exactly	
  the	
  things	
  that	
  document-­‐based	
  
databases	
  and	
  CouchDB’s	
  Views	
  are	
  highly	
  op?mized	
  for.	
  
	
  
As	
  a	
  maaer	
  of	
  fact,	
  at	
  NPR,	
  to	
  withstand	
  millions	
  of	
  unique	
  users	
  that	
  the	
  main	
  
website	
  gets,	
  our	
  legacy	
  system	
  uses	
  an	
  architecture	
  with	
  very	
  similar	
  constraints.	
  It	
  
has	
  content	
  objects	
  that	
  are	
  serialized	
  XML,	
  XML	
  lists	
  of	
  content	
  objects	
  and	
  
aggrega?ons	
  also	
  represented	
  in	
  an	
  XML	
  format.	
  While	
  in	
  the	
  back-­‐end	
  we	
  do	
  use	
  an	
  
SQL	
  database,	
  the	
  front-­‐end	
  architecture	
  has	
  made	
  many	
  architectural	
  decisions	
  
similar	
  to	
  those	
  made	
  in	
  CouchDB.	
  
	
  
Yes,	
  the	
  legacy	
  system	
  uses	
  XML	
  instead	
  of	
  JSON…	
  I	
  know,	
  I	
  know!	
  But	
  we	
  have	
  been	
  
running	
  our	
  systems	
  for	
  a	
  long	
  while,	
  so	
  some	
  of	
  it	
  pre-­‐dates	
  the	
  ?me	
  when	
  JSON	
  got	
  
all	
  sexy	
  and	
  trendy	
  J	
  




                                                                                                                                            21	
  
To	
  summarize,	
  AP-­‐style	
  (as	
  defined	
  by	
  CAP	
  model)	
  document	
  databases	
  exhibit	
  
following	
  traits,	
  important	
  for	
  online	
  publishing	
  systems	
  that	
  get	
  significant	
  traffic	
  
and	
  have	
  real-­‐?me	
  content	
  streams:	
  
-­‐  High	
  availability	
  
-­‐  Par??on	
  Tolerance	
  
-­‐  Schema-­‐less	
  architecture	
  
-­‐  Document-­‐oriented	
  storage	
  
-­‐  Index-­‐based	
  semi-­‐dynamic	
  querying	
  like	
  that	
  in	
  CouchDB	
  Views.	
  

The	
  benefit	
  from	
  each	
  one	
  of	
  these	
  features	
  is	
  a	
  result	
  of	
  a	
  tradeoff.	
  For	
  teams	
  
architec?ng	
  systems	
  and	
  implemen?ng	
  document	
  databases,	
  it	
  is	
  crucial	
  to	
  
understand	
  and	
  appreciate	
  the	
  tradeoffs	
  made.	
  That	
  said,	
  document	
  databases	
  are	
  
disrup?ve,	
  benefits	
  they	
  provide	
  are	
  real	
  and	
  ignoring	
  them,	
  not	
  augmen?ng	
  
tradi?onal,	
  rela?onal	
  storage	
  systems	
  with	
  document-­‐based	
  ones	
  would	
  be	
  a	
  
mistake.	
  




                                                                                                                                  22	
  
Thank	
  you	
  for	
  your	
  aaen?on.	
  	
  




                                                  23	
  

Weitere ähnliche Inhalte

Andere mochten auch (6)

Invito ricerca e_imprese_280610
Invito ricerca e_imprese_280610Invito ricerca e_imprese_280610
Invito ricerca e_imprese_280610
 
Amonestracion 4 por pagina
Amonestracion 4 por paginaAmonestracion 4 por pagina
Amonestracion 4 por pagina
 
Question 1
Question 1Question 1
Question 1
 
CRISE - WEBINAIRE 2012 - Michel Tousignant - Le suicide en milieu autochtone:...
CRISE - WEBINAIRE 2012 - Michel Tousignant - Le suicide en milieu autochtone:...CRISE - WEBINAIRE 2012 - Michel Tousignant - Le suicide en milieu autochtone:...
CRISE - WEBINAIRE 2012 - Michel Tousignant - Le suicide en milieu autochtone:...
 
Oxford DrupalCamp 2012 - The things we found in your website
Oxford DrupalCamp 2012 - The things we found in your websiteOxford DrupalCamp 2012 - The things we found in your website
Oxford DrupalCamp 2012 - The things we found in your website
 
Microservices In Practice
Microservices In PracticeMicroservices In Practice
Microservices In Practice
 

Ähnlich wie Document Databases In Online Publishing

Top data center trends and predictions to watch for in 2016.
Top data center trends and predictions to watch for in 2016.Top data center trends and predictions to watch for in 2016.
Top data center trends and predictions to watch for in 2016.Swaroopanand Laxmikruppaneth
 
IT Performance Management Handbook for CIOs
IT Performance Management Handbook for CIOsIT Performance Management Handbook for CIOs
IT Performance Management Handbook for CIOsVikram Ramesh
 
How the Journey to Modern Data Management is Paved with an Inclusive Edge-to-...
How the Journey to Modern Data Management is Paved with an Inclusive Edge-to-...How the Journey to Modern Data Management is Paved with an Inclusive Edge-to-...
How the Journey to Modern Data Management is Paved with an Inclusive Edge-to-...Dana Gardner
 
Moving enterprise IT to the cloud
Moving enterprise IT to the cloudMoving enterprise IT to the cloud
Moving enterprise IT to the cloudJan Wiersma
 
The Growth Of Data Centers
The Growth Of Data CentersThe Growth Of Data Centers
The Growth Of Data CentersGina Buck
 
The Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedThe Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedTyler Treat
 
2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservice2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservicedevopsdaysaustin
 
A Tale of Contemporary Software
A Tale of Contemporary SoftwareA Tale of Contemporary Software
A Tale of Contemporary SoftwareYun Zhi Lin
 
Telecom Clouds crossing borders, Chet Golding, Zefflin Systems
Telecom Clouds crossing borders, Chet Golding, Zefflin SystemsTelecom Clouds crossing borders, Chet Golding, Zefflin Systems
Telecom Clouds crossing borders, Chet Golding, Zefflin SystemsSriram Subramanian
 
Building a Platform for the People - IBM's Open Cloud Architecture Summit - A...
Building a Platform for the People - IBM's Open Cloud Architecture Summit - A...Building a Platform for the People - IBM's Open Cloud Architecture Summit - A...
Building a Platform for the People - IBM's Open Cloud Architecture Summit - A...Chip Childers
 
CAPSTONE PROJECT LITERATURE REVIEW ASSIGNMENT 1CAPSTONE PROJEC
CAPSTONE PROJECT LITERATURE REVIEW ASSIGNMENT 1CAPSTONE PROJECCAPSTONE PROJECT LITERATURE REVIEW ASSIGNMENT 1CAPSTONE PROJEC
CAPSTONE PROJECT LITERATURE REVIEW ASSIGNMENT 1CAPSTONE PROJECTawnaDelatorrejs
 
Graph databases and OrientDB
Graph databases and OrientDBGraph databases and OrientDB
Graph databases and OrientDBAhsan Bilal
 
Reactive Architecture
Reactive ArchitectureReactive Architecture
Reactive ArchitectureKnoldus Inc.
 
2021-10-14 The Critical Role of Security in DevOps.pdf
2021-10-14 The Critical Role of Security in DevOps.pdf2021-10-14 The Critical Role of Security in DevOps.pdf
2021-10-14 The Critical Role of Security in DevOps.pdfSavinder Puri
 
Mooc And Document Orientated Nosql Database
Mooc And Document Orientated Nosql DatabaseMooc And Document Orientated Nosql Database
Mooc And Document Orientated Nosql DatabaseKaren Oliver
 
Introduction to cloud computing - za garage talks
Introduction to cloud computing -  za garage talksIntroduction to cloud computing -  za garage talks
Introduction to cloud computing - za garage talksVijay Rayapati
 

Ähnlich wie Document Databases In Online Publishing (20)

Top data center trends and predictions to watch for in 2016.
Top data center trends and predictions to watch for in 2016.Top data center trends and predictions to watch for in 2016.
Top data center trends and predictions to watch for in 2016.
 
IT Performance Management Handbook for CIOs
IT Performance Management Handbook for CIOsIT Performance Management Handbook for CIOs
IT Performance Management Handbook for CIOs
 
How the Journey to Modern Data Management is Paved with an Inclusive Edge-to-...
How the Journey to Modern Data Management is Paved with an Inclusive Edge-to-...How the Journey to Modern Data Management is Paved with an Inclusive Edge-to-...
How the Journey to Modern Data Management is Paved with an Inclusive Edge-to-...
 
Moving enterprise IT to the cloud
Moving enterprise IT to the cloudMoving enterprise IT to the cloud
Moving enterprise IT to the cloud
 
The Growth Of Data Centers
The Growth Of Data CentersThe Growth Of Data Centers
The Growth Of Data Centers
 
The Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedThe Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going Distributed
 
2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservice2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservice
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
A Tale of Contemporary Software
A Tale of Contemporary SoftwareA Tale of Contemporary Software
A Tale of Contemporary Software
 
Telecom Clouds crossing borders, Chet Golding, Zefflin Systems
Telecom Clouds crossing borders, Chet Golding, Zefflin SystemsTelecom Clouds crossing borders, Chet Golding, Zefflin Systems
Telecom Clouds crossing borders, Chet Golding, Zefflin Systems
 
Building a Platform for the People - IBM's Open Cloud Architecture Summit - A...
Building a Platform for the People - IBM's Open Cloud Architecture Summit - A...Building a Platform for the People - IBM's Open Cloud Architecture Summit - A...
Building a Platform for the People - IBM's Open Cloud Architecture Summit - A...
 
CAPSTONE PROJECT LITERATURE REVIEW ASSIGNMENT 1CAPSTONE PROJEC
CAPSTONE PROJECT LITERATURE REVIEW ASSIGNMENT 1CAPSTONE PROJECCAPSTONE PROJECT LITERATURE REVIEW ASSIGNMENT 1CAPSTONE PROJEC
CAPSTONE PROJECT LITERATURE REVIEW ASSIGNMENT 1CAPSTONE PROJEC
 
Graph databases and OrientDB
Graph databases and OrientDBGraph databases and OrientDB
Graph databases and OrientDB
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
 
Reactive Architecture
Reactive ArchitectureReactive Architecture
Reactive Architecture
 
2021-10-14 The Critical Role of Security in DevOps.pdf
2021-10-14 The Critical Role of Security in DevOps.pdf2021-10-14 The Critical Role of Security in DevOps.pdf
2021-10-14 The Critical Role of Security in DevOps.pdf
 
Mooc And Document Orientated Nosql Database
Mooc And Document Orientated Nosql DatabaseMooc And Document Orientated Nosql Database
Mooc And Document Orientated Nosql Database
 
Introduction to cloud computing - za garage talks
Introduction to cloud computing -  za garage talksIntroduction to cloud computing -  za garage talks
Introduction to cloud computing - za garage talks
 

Mehr von Irakli Nadareishvili

APIDays 2020 - SEED(S) API Design Methodology
APIDays 2020 - SEED(S) API Design MethodologyAPIDays 2020 - SEED(S) API Design Methodology
APIDays 2020 - SEED(S) API Design MethodologyIrakli Nadareishvili
 
Irakli Nadareishvili - O'Reilly SACon 2018, London
Irakli Nadareishvili - O'Reilly SACon 2018, LondonIrakli Nadareishvili - O'Reilly SACon 2018, London
Irakli Nadareishvili - O'Reilly SACon 2018, LondonIrakli Nadareishvili
 
Building Fintech with Microservices and Kubernetes @ API World 2018
Building Fintech with Microservices and Kubernetes @ API World 2018Building Fintech with Microservices and Kubernetes @ API World 2018
Building Fintech with Microservices and Kubernetes @ API World 2018Irakli Nadareishvili
 
Microservices Architecture - The Blind Spots
Microservices Architecture - The Blind SpotsMicroservices Architecture - The Blind Spots
Microservices Architecture - The Blind SpotsIrakli Nadareishvili
 
AnsibleBuilding a Docker-ized Microservice In Node, Using Ansible - AnsibleF...
AnsibleBuilding a Docker-ized Microservice  In Node, Using Ansible - AnsibleF...AnsibleBuilding a Docker-ized Microservice  In Node, Using Ansible - AnsibleF...
AnsibleBuilding a Docker-ized Microservice In Node, Using Ansible - AnsibleF...Irakli Nadareishvili
 
Hypermedia-Driven Orchestration in Microservices
Hypermedia-Driven Orchestration in MicroservicesHypermedia-Driven Orchestration in Microservices
Hypermedia-Driven Orchestration in MicroservicesIrakli Nadareishvili
 
DrupalCon DC: Busines Analytics with Views
DrupalCon DC: Busines Analytics with ViewsDrupalCon DC: Busines Analytics with Views
DrupalCon DC: Busines Analytics with ViewsIrakli Nadareishvili
 

Mehr von Irakli Nadareishvili (9)

APIDays 2020 - SEED(S) API Design Methodology
APIDays 2020 - SEED(S) API Design MethodologyAPIDays 2020 - SEED(S) API Design Methodology
APIDays 2020 - SEED(S) API Design Methodology
 
Irakli Nadareishvili - O'Reilly SACon 2018, London
Irakli Nadareishvili - O'Reilly SACon 2018, LondonIrakli Nadareishvili - O'Reilly SACon 2018, London
Irakli Nadareishvili - O'Reilly SACon 2018, London
 
Building Fintech with Microservices and Kubernetes @ API World 2018
Building Fintech with Microservices and Kubernetes @ API World 2018Building Fintech with Microservices and Kubernetes @ API World 2018
Building Fintech with Microservices and Kubernetes @ API World 2018
 
Reuse or Not and Microservices
Reuse or Not and MicroservicesReuse or Not and Microservices
Reuse or Not and Microservices
 
Microservices Architecture - The Blind Spots
Microservices Architecture - The Blind SpotsMicroservices Architecture - The Blind Spots
Microservices Architecture - The Blind Spots
 
AnsibleBuilding a Docker-ized Microservice In Node, Using Ansible - AnsibleF...
AnsibleBuilding a Docker-ized Microservice  In Node, Using Ansible - AnsibleF...AnsibleBuilding a Docker-ized Microservice  In Node, Using Ansible - AnsibleF...
AnsibleBuilding a Docker-ized Microservice In Node, Using Ansible - AnsibleF...
 
Hypermedia-Driven Orchestration in Microservices
Hypermedia-Driven Orchestration in MicroservicesHypermedia-Driven Orchestration in Microservices
Hypermedia-Driven Orchestration in Microservices
 
trends in online publishing
trends in online publishingtrends in online publishing
trends in online publishing
 
DrupalCon DC: Busines Analytics with Views
DrupalCon DC: Busines Analytics with ViewsDrupalCon DC: Busines Analytics with Views
DrupalCon DC: Busines Analytics with Views
 

Kürzlich hochgeladen

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Kürzlich hochgeladen (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Document Databases In Online Publishing

  • 1. My  name  is  Irakli.  Let  me  give  you  some  background  about  myself  and  how  I  tricked   conference  organizers  into  thinking  that  I  was  qualified  to  talk  today.  J     I  am  a  director  of  engineering  at  Na?onal  Public  Radio.  Which  is  a  fancy  way  of   saying:  I  lead  the  soDware  team  that  is  responsible  for  the  code  behind  npr.org,  NPR   API  and  NPR  mobile  apps.     Prior  to  joining  NPR,  I  spent  several  years  developing  open-­‐source  products  for  the   online  publishing  industry.  Some  of  these  products  are  now  used  by  news   organiza?ons  like:  The  Na?on,  The  New  Republic,  Thomson  Reuters  and  Al  Jazeera.     I  have  been  using  document-­‐based  [or,  so-­‐called:  NoSQL]  databases,  on  and  off,  for   almost  a  year,  now  and  have  enjoyed  the  experience  a  lot!  Because  I  enjoyed  it  so   much,  I  wanted  to  share  my  story  at  this  conference.  I  contacted  the  organizers  and   they  kindly  agreed  [I  hope  they  will  not  regret  it  by  the  ?me  we  are  done  J].       So  here  it  is:  one  guy’s  story  of  falling  in  love  with  the  document  databases  and  why   he  thinks  they  have  a  significant  role  in  online  publishing,  specifically.   1  
  • 2. One  of  the  main  reasons  why  I  love  document  databases  is:  because  it  is  a  truly   disrup?ve  technology.  And  when  we  say  “disrup?ve  technology”  we  mean  something   so  innova?ve  that  it  helps  create  fundamentally  new  value  network,  thus  altering   exis?ng  market  and  disrup?ng  legacy  technologies  in  the  market.       The  innova?on  of  disrup?ve  technologies  is  not  just  an  incremental  progression  over   exis?ng  capabili?es.  Rather  it  is  a  fundamentally  re-­‐thought,  novel  approach  to   solving  hard  problems.     For  instance,  there’re  many  good  SQL  databases,  both  open-­‐source  as  well  as:   commercial.  And  everybody  has  their  favorite:  some  like  SQL  server  X’s  simplicity,   others:  love  the  power  of  the  database  Y  etc.  But  fundamentally  SQL  is  one  way  to   model  data  and  solve  data-­‐warehousing  problems.  It  has  its  ?me-­‐proven  advantages,   as  well  as  some  significant  shortcomings.       Document  databases  are  an  architecturally  different  approach  to  solving  data   problems.  They  are  not  a  drop-­‐in  replacement  or  an  incremental  improvment  over   SQL.  They  do  have  their  own  shortcomings,  but  they  also  allow  solving  problems  that   were  either  very  hard  or  impossible  to  solve  with  the  tradi?onal,  SQL-­‐oriented   databases.     2  
  • 3. Tradi?onal,  SQL  database  theory  has  strong  emphasis  on  ACID  compliance.  You   probably  remember  that  ACID  stands  for:  Atomicity,  Consistency,  Isola?on  and   Durability.     The  Consistency  property  ensures  that  no  database  transac?on  violates  referen?al   integrity  rules  defined  in  the  database  schema.     Isola?on  is  a  requirement  that  asserts  that,  given  concurrent  access  to  data,  parallel   opera?ons  cannot  access  data  that  is  being  modified  by  a  another  transac?on,  but   have  to  wait  un?l  the  transac?on  completes.  Isola?on  is  commonly  implemented   with  pessimis?c  locking.     Isola?on  and  Consistency  requirements  in  ACID-­‐compliance  cons?tute  a  fundamental   problem  for  system’s  scalability.     3  
  • 4. To  put  it  in  the  words  of  Werner  Vogels,  CTO  of  Amazon  and  one  of  the  foremost   experts  in  the  field  of  distributed  compu?ng:       “If  you’re  concerned  about  scalability,  any  algorithm  that  forces  you  to  run   agreement  will  eventually  become  your  boaleneck.  Take  that  as  a  given.”     ACID-­‐compliance  is  all  about  various  processes  [and  nodes],  in  the  system,  checking   with  each-­‐other  to  keep  data  consistent  across  the  en?re  system.  Therefore,  it’s  not   as  much  about  how  well-­‐implemented  master-­‐slave  or  master-­‐master  replica?on  in   your  database  is,  but  the  bigger  challenge  is  the  architectural  constraint  that  ACID-­‐ compliance  imposes  on  scalability.   4  
  • 5. How  important  is  scalability  for  a  Web  system?  Is  it  something  that  maaers  just  for   Amazon,  Facebook,  Google  and  alike?     Internet  is  an  incredibly  fast-­‐growing  medium.  It  took  radio  38  years  aDer   introduc?on  to  reach  50  MM  users,  it  took  television  13  years,  Internet  did  it  in  just  4   and  it  has  been  growing  exponen?ally  ever  since.   5  
  • 6. In  a  report  published  in  June,  this  year,  Cisco  forecasted  that  global  IP  traffic  will   quadruple  by  2015.  It  means:  more  users,  larger  amount  of  content,  more  types  of   content,  more  sources  of  content  and  more  real-­‐?me  content.  In  this  context,  by   “real-­‐?me-­‐content”  I  mean  things  like:  check-­‐ins,  coverage  of  live  events  and  ci?zen   journalism  during  breaking  news.     Now,  most  of  us  in  the  content-­‐produc?on  industry,  believe  that  having  more  traffic   and  more  content  is  good  news.  Scratch  that:  it’s  great  news!  As  a  maaer  of  fact,   Internet  community  has  goaen  so  obsessed  by  the  amount  of  website  traffic  that  it  is   oDen  used  as  the  most  significant  measure  of  a  website’s  success  or  failure.       So:  more  traffic  is  good  news…  except  and  unless  you  are  the  developer  responsible   for  making  sure  the  website  is  s?ll  up  and  running  when  traffic  quadruples.   6  
  • 7. We  started  scalability  discussion  by  men?oning  the  scalability  limita?ons  that  ACID-­‐ compliance  requirement  enforces.    This  constraint  is  actually  a  specific  case  of  a  more   generic  theorem  called:  Brewer’s  or  CAP  Theorem.     The  theorem  was  formulated  as  a  conjecture  by  a  UC  Berkeley  professor:  Eric  Brewer   in  2000.  Two  years  later,  Seth  Gilbert  and  Nancy  Lynch  of  MIT  published  a  formal   proof  of  Brewer's  conjecture.     CAP  Theorem  states  that,  when  designing  distributed  soDware  systems  there  are   three  proper?es  that  are  commonly  desired:   1.  Consistency   2.  Availability  and   3.  Par??on  Tolerance,       Theorem  proves  that  it  is  impossible  to  achieve  all  three  at  the  same  ?me[1].     Even  though  names  sound  intui?ve,  it  is  probably  worth-­‐while  to  clarify  what  Gilbert   and  Lynch  meant  by  each  of  the  defini?ons  in  CAP,  since  there  are  mul?ple   (some?mes  contradictory)  and  confusing  defini?ons  floa?ng  around  the  web.       7  
  • 8. Consistency  basically  stands  for  the  requirement  that  all  nodes  in  a  distributed   system  must  see  the  same  data  all  the  ?me  (subset  of  ACID  compliance).   Availability  means:  every  request  should  succeed  to  receive  a  response.  System  as  a   whole  should  be  highly  available.   Par??on  Tolerance,  in  a  distributed  system,  means  system  should  allow  some  fault-­‐ tolerance.  When  some  nodes  crash  or  some  communica?ons  links  fail,  it  is  important   that  system  s?ll  performs  as  expected.   8  
  • 9. Let’s  look  at  some  popular  distributed  data  storage  systems  that  you  are  probably   familiar  with  and  see  which  bucket  they  fall  into  in  the  CAP  spectrum.     Rela?onal  databases,  LDAP  directory  servers  and  xFS  file-­‐systems  are  all  examples  of   consistent  and  available  distributed  systems.  They  are  consistent  because  they   provide  ACID  compliance.  They  are  not  par??on-­‐tolerant  because  they  do  not  have  a   quorum  system  for  removing  unreachable  nodes  from  the  system.   9  
  • 10. MongoDB,  Terrastore,  Redis  and  BigTable  all  guarantee  consistency,  and  they  use   quorum  for  par??on  tolerance  but  they  forfeit  Availability.     10  
  • 11. Domain  Name  Service  (yeap,  the  one  that  drives  all  internet  traffic),  CouchDB,  Riak   and  Cassandra  are  all  examples  of  Available  and  Par??on-­‐tolerant  distributed   systems.  They  do  not  guarantee  consistency.  Rather  they  provide  a  promise  of   something  known  as  “eventual  consistency”.       For  any  given  request,  you  may  receive  a  value  that  is  globally  stale  (system-­‐wide)   and  definitely  not  isolated  per  ACID-­‐compliance  requirements,  but  eventually  all   nodes  will  sync-­‐up.     Not  “running  agreement-­‐based  algorithm”,  that  Amazon’s  Werner  Vogels  was   preaching,  is  exactly  the  sacrifice  that  systems  like  CouchDB  and  DNS  make  to   provide  extreme  scalability  and  fault-­‐tolerance.   11  
  • 12. In  his  2000  keynote  at  the  ACM  Symposium  on  Principles  of  Distributed  Compu?ng   (the  same  one  where  he  formulated  CAP  theorem),  Dr.  Brewer  also  came  up  with  a   new  defini?on  he  called:  BASE.     BASE  stands  for:  Basically  Available  SoD-­‐state,  Eventual-­‐consistency.     He  formulated  and  used  BASE  principles  to  demonstrate  the  trade-­‐offs  and   differences  from  ACID-­‐compliant  systems   12  
  • 13. ACID-­‐compliant  systems  have  following  traits:  consistency,  isola?on,  focus  on   commit,  nested  transac?ons,  pessimis?c  locking  and  typically  they  are  fixed  schema-­‐ based,  therefore:  inflexible  to  evolve.   13  
  • 14. In  contrast,  BASE  systems  exhibit:  weak  consistency,  availability  priori?zed  above   else,  best-­‐effort  approach  to  conflict-­‐resolu?on,  op?mis?c  locking.  Systems  with  the   BASE  philosophy  consider  approximate  responses  to  be  OK,  are  architecturally   simpler,  faster  and  evolve  flexibly,  since  they  are  typically  schema-­‐less.     14  
  • 15. CouchDB  is  not  a  “beaer  MySQL”  or  a  “simpler  Oracle”.  It  is  really  good  at  availability   and  par??on  tolerance  and  has  many  traits  making  it  a  beaer  tool  for  some  of  the   problems  tradi?onally  solved  with  rela?onal  databases.  But  one  thing  it  is  not:  it  is   not  a  drop-­‐in  replacement  for  SQL  databases.       There  are  tradeoffs  when  choosing  a  document  database,  and  specifically:  CouchDB.   The  most  obvious  and  honestly  “scary”  tradeoff  is:  forfei?ng  Consistency.     We  as  computer  scien?sts  were  trained  hard  and  log  that  data  must  be  consistent,   models  must  be  normalized,  referen?al  integri?es  must  be  maintained  and  etc.  How   can  we  even  dream  about  forfei?ng  consistency  even  for  scalability  and  fault-­‐ tolerance?       15  
  • 16. The  reality,  however  is  that  there  are  systems  engineering  problems  where  strict  data   consistency  is  crucial,  but  there  are  many  where  -­‐  it  is  not.  If  you  are  building  a  stock   trading  soDware  you  should  probably  use  a  data  storage  that  guarantees  consistency.   Financial  systems,  in  general  require  high-­‐level  of  consistency,  but  it  is  not  given  for   just  any  system.  Anybody  who  has  built  a  real-­‐life,  high-­‐throughput  system  knows   that  in  many  cases  you  end-­‐up  de-­‐normalizing  data  model  to  allow  for  beaer   performance.  It  is  similar  to  forfei?ng  consistency  in  the  CAP  model.     With  a  document-­‐based  database  like  Couch,  some  of  your  request  may  occasionally   return  slightly  stale  data.  Addi?onally,  data  in  document  format  is  oDen  highly  de-­‐ normalized  and  less  referen?ally  consistent  than  data  in  a  fully  normalized,  rela?onal   database.       However,  if  you  are  building  a  news  publishing  website  none  of  this  is  unheard  of.   High-­‐traffic  news  websites  have  been  de-­‐normalizing  data  and  implemen?ng   aggressive  caching  for  years.  This  is  neither  new  or  radical.  On  the  contrary,  instead   of:  home-­‐cooked  and  half-­‐baked,  proprietary  solu?ons,  now  we  can  use  a  standard,   open-­‐source,  highly  op?mized,  well  tested  solu?on  like  CouchDB.       Personally,  I  think  it’s    a  preay  good  deal.   16  
  • 17. At  this  point,  I’ve  spent  good  por?on  of  this  presenta?on  explaining  the  scalability   profile  of  CouchDB  (and  similar  systems);  discussed  how  improvements  are  not   quan?ta?ve  but  are  fundamentally  qualita?ve.  We  have  also  talked  about  tradeoffs   that  the  increased  availability  imposes.       Let’s  forget  about  scalability  for  now,  however,  and  talk  about  other  characteris?cs   of  CouchDB  as  a  document  storage  engine.  ADer  all,  CouchDB  is  not  the  only   document  database  and  there  are  document  databases  that  do  guarantee  data   consistency,  so  forfei?ng  consistency  is  actually  a  trait  of  AP  systems  (in  CAP  model),   not:  that  of  document  databases  in  general.     An  important  trait  of  document  databases,  however,  is  that  they  are  schema-­‐less.   There  is  no  pre-­‐defined,  strict  schema,  no  table  structures  or  rigid  rela?onships   between  document  types.  Document  types  live  in  a  free  world  and  evolve  very   flexibly.       17  
  • 18. OK,  this  is  by  far  one  of  my  ugliest  slides.  And  what  you  see  here  is  a  rough  ER   diagram  generated  off  a  fresh,  vanilla  installa?on  of  a  popular  open-­‐source  content   management  system:  Drupal.    There  are  72  tables  on  this  diagram.       Some  of  you  may  be  familiar  with  Drupal.  It  is  highly  extensible  (and  generally  really   awesome),  but  it  does  not  do  much  out  of  the  box.  So  when  we  used  Drupal  for   crea?ng  websites  like  that  of  The  Na?on  or  The  New  Republic,  we  installed  dozens  of   addi?onal  Drupal  modules  and  wrote  a  bunch  on  top  ourselves.  Meaning:  we  added   even  more  tables.  And  you  can  clearly  see  how  unreadable  this  schema  already  is.   Obviously  we  never  even  tried  to  visualize  en?re  data-­‐model  on  any  real  projects,   because  it  would  have  been  useless.   18  
  • 19. The  same  data  model  in  a  document-­‐based  database,  would  look  like  this:  (see  slide)     I  know,  I  know!  I  am  exaggera?ng,  obviously  we  would  have  more  than  one  logical   type  of  a  document  even  in  a  document  database,  but  schema-­‐less  modeling  means:   at  the  physical  level  it  is  just  one  document  type,  so  what  you  see  here  is  really  not   that  far  from  reality  as  far  as  actual  data  storage  goes.  Most  things  above  and  beyond   are  really  part  of  the  applica?on  logic  and  business  rules.     Since  my  presenta?on  is  one  of  the  last  ones  at  this  conference,  I  am  sure  you  have   already  listened  to  presenters  who  went  in  great  detail  about  data-­‐modeling  in   CouchDB  and  I  am  sure  they  are  much  bigger  experts  of  the  subject  than  I  am.  So  I   will  spare  you  the  experience.     Suffice  to  say  that  embedding  documents  greatly  simplifies  data  models.  Think  about   just  the  amount  of  so-­‐called  “mapping”  tables  that  rela?onal  systems  need  to  model   things  like:  many-­‐to-­‐many  rela?onships.     Also,  in  the  case  of  online  publishing  specifically,  most  business  objects  are…  well,   documents  so  having  a  storage  engine  that  operates  in  terms  of  documents  is   extremely  natural  and  enjoyable.  There’s  much  less  discrepancy  between  physical   and  logical  models.  Things,  in  most  cases,  just  make  sense  and  fall  in  line  naturally.   19  
  • 20. Another important, stark difference between relational databases and CouchDB is the absence of a query language. As most other things about CouchDB, it’s pretty “scary” for the newcomers. So much so, that some other document databases have actually opted to implementing an SQL-like syntax (MongoDB for instance) and I know a lot of people who appreciate that. In contrast, CouchDB uses Map/Reduce, first filtering the data with a Map function and then (optionally) grouping it with a Reduce function, if needed. The documents, result of a map function as well as reduce function are all saved on a B-tree (the secret sauce of CouchDB’s performance). If in a relational database you would have normalized data and then you would index some columns from that data, most things in Couch are a B-tree index to begin with. This has significant consequences and much like in the case with forfeiting data consistency, there are some real trade-offs to be made. While Map/Reduce is very powerful, obviously you will find some queries that you could run in SQL that are either impossible to model with a View or are too expensive/too slow. Also, Views are not as dynamic as SQL queries. They are built incrementally and a complete rebuild of one, in a large database is an expensive operation. As such, it really pays off to carefully think through the Views that a system will be using at the early stages of the system design. 20  
  • 21. The  good  news  is:  in  online  publishing  most  user-­‐facing  content  is  a  document  type,  a   lis?ng  of  documents  and  an  aggrega?on  -­‐-­‐  exactly  the  things  that  document-­‐based   databases  and  CouchDB’s  Views  are  highly  op?mized  for.     As  a  maaer  of  fact,  at  NPR,  to  withstand  millions  of  unique  users  that  the  main   website  gets,  our  legacy  system  uses  an  architecture  with  very  similar  constraints.  It   has  content  objects  that  are  serialized  XML,  XML  lists  of  content  objects  and   aggrega?ons  also  represented  in  an  XML  format.  While  in  the  back-­‐end  we  do  use  an   SQL  database,  the  front-­‐end  architecture  has  made  many  architectural  decisions   similar  to  those  made  in  CouchDB.     Yes,  the  legacy  system  uses  XML  instead  of  JSON…  I  know,  I  know!  But  we  have  been   running  our  systems  for  a  long  while,  so  some  of  it  pre-­‐dates  the  ?me  when  JSON  got   all  sexy  and  trendy  J   21  
  • 22. To  summarize,  AP-­‐style  (as  defined  by  CAP  model)  document  databases  exhibit   following  traits,  important  for  online  publishing  systems  that  get  significant  traffic   and  have  real-­‐?me  content  streams:   -­‐  High  availability   -­‐  Par??on  Tolerance   -­‐  Schema-­‐less  architecture   -­‐  Document-­‐oriented  storage   -­‐  Index-­‐based  semi-­‐dynamic  querying  like  that  in  CouchDB  Views.   The  benefit  from  each  one  of  these  features  is  a  result  of  a  tradeoff.  For  teams   architec?ng  systems  and  implemen?ng  document  databases,  it  is  crucial  to   understand  and  appreciate  the  tradeoffs  made.  That  said,  document  databases  are   disrup?ve,  benefits  they  provide  are  real  and  ignoring  them,  not  augmen?ng   tradi?onal,  rela?onal  storage  systems  with  document-­‐based  ones  would  be  a   mistake.   22  
  • 23. Thank  you  for  your  aaen?on.     23