SlideShare ist ein Scribd-Unternehmen logo
1 von 99
Downloaden Sie, um offline zu lesen
Copyright	
  Third	
  Nature,	
  Inc.	
  
Following	
  Google	
  
Or	
  
Don’t	
  Follow	
  the	
  Followers,	
  
Follow	
  the	
  Leaders	
  
Or	
  
The	
  problem	
  probably	
  isn’t	
  the	
  
database,	
  the	
  problem	
  is	
  
probably	
  you	
  
May,	
  2014	
  
Mark	
  Madsen	
  
www.ThirdNature.net	
  
@markmadsen	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
	
  A	
  Quick	
  Intro	
  
I	
  may	
  or	
  may	
  not	
  be	
  
qualified	
  to	
  make	
  the	
  
outlandish	
  statements	
  in	
  
this	
  presentaDon.	
  I	
  have,	
  
however,	
  made	
  almost	
  
every	
  mistake	
  menDoned,	
  
and	
  one	
  learns	
  by	
  making	
  
mistakes.	
  
You	
  might	
  say	
  that’s	
  my	
  
singular	
  skill.	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
A	
  BRIEF	
  HISTORY	
  OF	
  DATA	
  
STORAGE	
  AND	
  RETRIEVAL	
  
History	
  isn’t	
  taught	
  in	
  most	
  university	
  science	
  curricula	
  	
  
(probably	
  because	
  it’s	
  a	
  rabbit	
  hole)	
  	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Databases:	
  the	
  problem	
  statements	
  over	
  Gme	
  
“InformaDon	
  has	
  become	
  a	
  form	
  of	
  garbage,	
  not	
  only	
  
incapable	
  of	
  answering	
  the	
  most	
  fundamental	
  human	
  
quesDons	
  but	
  barely	
  useful	
  in	
  providing	
  coherent	
  
direcDon	
  to	
  the	
  soluDon	
  of	
  even	
  mundane	
  problems.”	
  –	
  	
  
Neil	
  Postman,	
  1985	
  
"We	
  have	
  reason	
  to	
  fear	
  that	
  the	
  mulDtude	
  of	
  books	
  
which	
  grows	
  every	
  day	
  in	
  a	
  prodigious	
  fashion	
  will	
  make	
  
the	
  following	
  centuries	
  fall	
  into	
  a	
  state	
  as	
  barbarous	
  as	
  
that	
  of	
  the	
  centuries	
  that	
  followed	
  the	
  fall	
  of	
  the	
  Roman	
  
Empire.”	
  –	
  Adrien	
  Baillet,	
  1685	
  
”…so	
  many	
  books	
  that	
  we	
  do	
  not	
  even	
  have	
  Dme	
  to	
  read	
  
the	
  Dtles."	
  –	
  Anton	
  Francesco	
  Doni,	
  1550	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
The	
  origin	
  of	
  informaGon	
  management	
  problems	
  
For	
  ~5000	
  years	
  we	
  used	
  
counters	
  of	
  various	
  types,	
  
eventually	
  developing	
  
wriDng	
  to	
  cope	
  with	
  
civilizaDon’s	
  needs.	
  
WriDng	
  is	
  more	
  efficient	
  
than	
  counters	
  you	
  can	
  lose.	
  
Sumerian	
  bulla	
  envelope	
  with	
  tokens.	
  
The	
  beta	
  period.	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
InformaGon	
  Technology	
  v1.0:	
  Clay	
  Tech,	
  ~3000	
  bce	
  
The	
  first	
  informaGon	
  explosion	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
That	
  explosion	
  led	
  to	
  the	
  first	
  metadata	
  
“tags”
Small piles in baskets are easy to tag and search
Copyright	
  Third	
  Nature,	
  Inc.	
  
Metadata	
  v1.0:	
  tablets	
  about	
  tablets	
  
When	
  there	
  are	
  enough	
  of	
  
these	
  lying	
  around	
  you	
  need	
  to	
  
work	
  on	
  organizaDon	
  of	
  the	
  
collecDon	
  by	
  categorizaDons,	
  
aka	
  “taxonomy”,	
  “schema”	
  
Like	
  working	
  out	
  what	
  tables	
  
are	
  in	
  a	
  database,	
  or	
  what	
  files	
  
are	
  stored	
  in	
  HDFS.	
  
Babylonian	
  library	
  catalog	
  ~2000bce	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Metadata	
  v1.1:	
  tablets	
  about	
  what’s	
  in	
  tablets	
  
When literacy rates are
higher and people need
to communicate more
effectively, you need to
invent mechanisms to
cope, like dictionaries.
Now we’re worried about
what’s inside the
documents, not where
they are placed.
Synonym list, Ashurbanipal,
~900 bce
Copyright	
  Third	
  Nature,	
  Inc.	
  
Clay	
  Tech	
  has	
  some	
  familiar	
  limitaGons	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
InformaGon	
  Management	
  v2.0	
  Paper	
  Tech*	
  
Lighter,	
  denser,	
  faster	
  storage	
  media	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
More	
  informaGon	
  =	
  need	
  for	
  new	
  metadata	
  
techniques:	
  content	
  tagging,	
  author	
  catalogs	
  
The	
  first	
  real	
  library	
  ~300BC	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Discovery	
  of	
  one	
  tradeoff	
  between	
  clay	
  and	
  paper…	
  
Recorded	
  informaDon	
  creates	
  permanence	
  and	
  instability	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
You	
  can’t	
  have	
  disconGnuous	
  reading*	
  unGl	
  you	
  
have	
  a	
  random	
  access	
  technology.	
  
*Indexing	
  and	
  encyclopedias	
  are	
  hard	
  in	
  linear	
  scrolls.	
  Hello	
  ISAM	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Paper	
  Tech	
  v2.1:	
  increased	
  storage	
  density,	
  smaller	
  
form	
  factor,	
  durability,	
  high	
  res	
  RGB	
  graphics	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Paper	
  Tech	
  v2.2	
  
The	
  change	
  in	
  prinDng	
  
over	
  Dme	
  accelerates.	
  
Block	
  prinDng	
  replaced	
  by	
  
movable	
  metal	
  type.	
  
The	
  job	
  of	
  producDon	
  is	
  
faster	
  and	
  cheaper.	
  
CommodiDzaDon	
  changes	
  
the	
  landscape	
  over	
  the	
  
next	
  200	
  years.	
  
The	
  printed	
  becomes	
  
more	
  important	
  than	
  the	
  
printer.	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
The	
  Elizabethan	
  Era	
  
ProducDon:	
  prinDng	
  presses	
  
Data	
  management	
  tech:	
  
▪  Perfect	
  copies	
  
▪  Topical	
  catalogs	
  
▪  Font	
  standardizaDon	
  
▪  Taxonomy	
  ascends	
  
InformaDon	
  explosion:	
  	
  
▪  8M	
  books	
  in	
  1500	
  
▪  200M	
  by	
  1600	
  
▪  CommodiDzaDon	
  
▪  Overload	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Be^er	
  embedded	
  metadata:	
  Gtle	
  page,	
  colophon,	
  ToC	
  
The	
  Georgian	
  Era:	
  The	
  Explosion	
  of	
  Natural	
  Philosophy	
  
Sharing knowledge in a larger community required common language, structure
Copyright	
  Third	
  Nature,	
  Inc.	
  
Linnaeus	
  
Top	
  down	
  orientaDon	
  
StaDc	
  structure	
  
DescripDve	
  rather	
  than	
  
explanatory	
  
Taxonomic	
  classificaHon	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Bobom	
  up	
  orientaDon	
  
Flexible	
  structure	
  
Explanatory,	
  descripDve	
  
Faceted	
  classificaHon	
  
Buffon	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
vs	
  
vs	
  
SQL 	
  NoSQL	
  
Think about why that happened
Copyright	
  Third	
  Nature,	
  Inc.	
  
The	
  Victorian	
  Era	
  
The	
  powered	
  prinDng	
  
informaDon	
  explosion:	
  
▪  Card	
  catalogs,	
  cross-­‐
referencing,	
  random	
  
access	
  metadata	
  
▪  Universal	
  classificaDon	
  
▪  Extended	
  informaDon	
  
management	
  debates	
  
▪  Trading	
  effort	
  and	
  
flexibility	
  for	
  storage	
  
and	
  retrieval	
  
▪  Stereotyping	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Melvil	
  Dewey	
  
Dewey	
  Decimal	
  System	
  
Top	
  down	
  orientaDon	
  
StaDc	
  structure	
  
DescripDve	
  rather	
  than	
  
explanatory	
  
Taxonomic	
  classificaHon	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Cuber	
  Expansive	
  
ClassificaDon	
  System	
  
(~1882)	
  
Bobom	
  up	
  orientaDon	
  
More	
  flexible	
  structure	
  
Explanatory,	
  descripDve	
  
Faceted	
  classificaHon	
  
Charles	
  Ammi	
  Cu^er	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
vs	
  
SQL 	
  NoSQL	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
So	
  why	
  did	
  Linnaeus	
  and	
  Dewey	
  win?	
  
Good	
  enough	
  
wins	
  the	
  day	
  
	
  PragmaJsm	
  
It	
  wasn’t	
  solving	
  
the	
  problem	
  you	
  
thought	
  it	
  was.	
  	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Summarizing	
  
Thousands	
  of	
  years	
  of	
  thought	
  have	
  been	
  put	
  into	
  
principles	
  of	
  organizaDon	
  and	
  use.	
  The	
  abstract	
  	
  paberns	
  
are	
  the	
  same,	
  only	
  the	
  implementaDon	
  changed.	
  
▪  Clay:	
  tablets	
  about	
  tablets,	
  tablets	
  about	
  what’s	
  in	
  tablets,	
  
100X	
  increase	
  in	
  data	
  density	
  over	
  counDng	
  tech	
  
▪  Scrolls:	
  scrolls	
  about	
  scrolls,	
  scrolls	
  about	
  what’s	
  in	
  scrolls,	
  
prepended/appended	
  navigaDon,	
  >100X	
  increase	
  in	
  density	
  
▪  Books:	
  books	
  about	
  books,	
  books	
  about	
  what’s	
  in	
  books,	
  
embedded	
  internal	
  navigaDon,	
  >1000X	
  increase	
  in	
  density	
  
▪  DigiDzed	
  data:	
  similar,	
  far	
  denser,	
  and	
  different	
  because	
  it	
  
isn’t	
  locked	
  into	
  physical	
  forms	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
History	
  is	
  always	
  the	
  same	
  
Every	
  technology	
  is	
  a	
  trade:	
  
▪  Top	
  down	
  vs.	
  bobom	
  up	
  
▪  Authority	
  vs.	
  anarchy	
  
▪  Bureaucracy	
  vs.	
  autonomy	
  
▪  Control	
  vs.	
  creaDvity	
  
▪  Hierarchy	
  vs.	
  network	
  
▪  Dynamic	
  vs.	
  staDc	
  
▪  Power	
  vs.	
  ease	
  
▪  Work	
  up	
  front	
  vs.	
  postponed	
  
In every choice, something is lost and something is gained.
Copyright	
  Third	
  Nature,	
  Inc.	
  
What	
  lessons	
  does	
  this	
  history	
  teach	
  us?	
  
1.  InformaDon	
  requires	
  organizing	
  principles.	
  
2.  Differences	
  in	
  scale	
  require	
  different	
  principles.	
  
3.  There	
  are	
  mulDple	
  levels	
  of	
  informaDon	
  
architecture	
  and	
  principles	
  of	
  organizaDon.	
  	
  
4.  At	
  a	
  key	
  point	
  in	
  the	
  adopDon	
  cycle,	
  emphasis	
  
shins	
  from	
  collecDon	
  and	
  management	
  of	
  
informaDon	
  to	
  disseminaDon	
  and	
  use.	
  
	
  First	
  we	
  record,	
  then	
  we	
  use	
  and	
  share.	
  
Like	
  transacDon	
  processing,	
  query	
  &	
  analysis.	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
InformaGon	
  management	
  through	
  human	
  history	
  
always	
  follows	
  the	
  same	
  pa^ern	
  
New	
  technology	
  development	
  
creates	
  
New	
  methods	
  to	
  cope	
  
creates	
  
New	
  informaGon	
  scale	
  and	
  availability	
  
creates…	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Big	
  Data	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
DEALING	
  WITH	
  BIG:	
  
SOME	
  SCALING	
  HISTORY	
  
"The	
  most	
  amazing	
  achievement	
  of	
  the	
  computer	
  
sonware	
  industry	
  is	
  its	
  conDnuing	
  cancellaDon	
  of	
  
the	
  steady	
  and	
  staggering	
  gains	
  made	
  by	
  the	
  
computer	
  hardware	
  industry.“	
  -­‐	
  Henry	
  Peteroski	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Why	
  doesn’t	
  your	
  database	
  scale?	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Hipster	
  bullshit	
  
I	
  can’t	
  get	
  MySQL	
  to	
  scale	
  
therefore	
  
RelaDonal	
  databases	
  don’t	
  scale	
  
therefore	
  
We	
  must	
  use	
  NoSQL*	
  for	
  query	
  too	
  
*including	
  Hadoop	
  and	
  related	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
GeneraGon	
  and	
  collecGon	
  always	
  come	
  first	
  
UnDl	
  there’s	
  enough	
  informaDon,	
  general	
  problems	
  of	
  
organizaDon	
  and	
  informaDon	
  architecture	
  don’t	
  appear.	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
The	
  early	
  days:	
  client/server	
  as	
  the	
  starGng	
  point	
  
We	
  had	
  transacDon	
  processing	
  against	
  the	
  DB	
  all	
  on	
  
the	
  same	
  machine.	
  Then	
  on	
  two	
  separate	
  machines.	
  
Application
Copyright	
  Third	
  Nature,	
  Inc.	
  
Scaling	
  client/server	
  
We	
  added	
  app	
  servers	
  and	
  pooled	
  connecDons.	
  
Client
App
serverClient
Client
Copyright	
  Third	
  Nature,	
  Inc.	
  
Scaling	
  client/server	
  
Then	
  threw	
  money	
  at	
  the	
  problem	
  in	
  the	
  form	
  of	
  
hardware	
  (made	
  the	
  database	
  bigger).	
  
Client
App
serverClient
Client
Copyright	
  Third	
  Nature,	
  Inc.	
  
Web	
  apps	
  were	
  a	
  huge	
  increase	
  in	
  concurrency	
  
Architecture	
  changed	
  to	
  reflect	
  new	
  stateless	
  model.	
  
We	
  had	
  scalability	
  and	
  availability	
  problems.	
  
webapp
webapp
Loadbalancer
Loadbalancer
service
service
service
Copyright	
  Third	
  Nature,	
  Inc.	
  
Increasing	
  traffic?	
  
Keep	
  adding	
  hardware,	
  make	
  the	
  DB	
  bigger.	
  
Limits	
  reached,	
  performance,	
  scalability	
  and	
  
availability	
  problems.	
  
webapp
webapp
Loadbalancer
Loadbalancer
service
service
service
Copyright	
  Third	
  Nature,	
  Inc.	
  
Increasing	
  traffic?	
  
Read-­‐only	
  replicas	
  will	
  save	
  the	
  day!	
  
SDll	
  have	
  scalability	
  and	
  availability	
  problems.	
  
And	
  now	
  operaDonal	
  overhead	
  and	
  problems.	
  
webapp
webapp
Loadbalancer
Loadbalancer
service
service
service
Copyright	
  Third	
  Nature,	
  Inc.	
  
Increasing	
  traffic?	
  
Sharding	
  seems	
  a	
  fine	
  thing.	
  But	
  it’s	
  one	
  leber	
  from…	
  
Scaling	
  and	
  perf	
  beber,	
  overhead	
  and	
  operaDonal	
  
complexity	
  high	
  and	
  worsening.	
  	
  
webapp
webapp
Loadbalancer
Loadbalancer
service
service
service
Loadbalancer
Copyright	
  Third	
  Nature,	
  Inc.	
  
Increasing	
  traffic?	
  
Let’s	
  cache	
  data	
  at	
  the	
  service	
  Der!	
  
Performance	
  beber,	
  overhead	
  and	
  operaDonal	
  
complexity	
  higher.	
  	
  
webapp
webapp
Loadbalancer
Loadbalancer
service
service
service
Loadbalancer
cachecachecache
Copyright	
  Third	
  Nature,	
  Inc.	
  
What	
  are	
  the	
  problems	
  now?	
  
1.  More	
  hardware,	
  more	
  things	
  to	
  break	
  
2.  More	
  management	
  and	
  administraDon	
  
3.  More	
  sonware	
  complexity	
  
4.  Increasing	
  distance	
  for	
  data	
  to	
  travel	
  =	
  latency	
  
5.  Data	
  administraDon	
  difficult	
  to	
  impossible	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Problem	
  solved?	
  
Distributed	
  NoSQL	
  DB	
  (handles	
  cache,	
  load	
  balance,	
  
data	
  distribuDon).	
  Similar	
  performance,	
  simpler	
  
scaling,	
  reduced	
  operaDonal	
  problems,	
  simpler	
  
applicaDon	
  architecture.	
  Finished!	
  	
  
webapp
webapp
Loadbalancer
Loadbalancer
service
service
service
Copyright	
  Third	
  Nature,	
  Inc.	
  
TANSTAAFL	
  
When	
  replacing	
  the	
  old	
  
with	
  the	
  new	
  (or	
  ignoring	
  
the	
  new	
  over	
  the	
  old)	
  you	
  
always	
  make	
  tradeoffs,	
  
and	
  usually	
  you	
  won’t	
  see	
  
them	
  for	
  a	
  long	
  Dme.	
  
Technologies	
  are	
  not	
  
perfect	
  replacements	
  for	
  
one	
  another.	
  Onen	
  not	
  
beber,	
  only	
  different.	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Not	
  finished:	
  remember	
  the	
  cycle	
  of	
  history…	
  
The	
  biggest	
  hole	
  in	
  the	
  prior	
  secDon	
  on	
  scaling	
  is	
  
that	
  we	
  scaled	
  OLTP,	
  what	
  about	
  OLAP?	
  
Queries	
  <>	
  transacDons.	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Solving	
  query	
  problems	
  
Aggregate	
  or	
  low	
  selecDvity	
  queries	
  were	
  a	
  problem	
  
early	
  on,	
  when	
  people	
  wanted	
  to	
  use	
  the	
  data.	
  
Every	
  report	
  or	
  query	
  is	
  a	
  program.	
  
OLTP
Report
app
Report
app
Copyright	
  Third	
  Nature,	
  Inc.	
  
Increasing	
  data	
  volume	
  
Make	
  it	
  faster	
  by	
  throwing	
  money	
  at	
  hardware	
  
(sound	
  familiar?)	
  
OLTP
Report
app
Report
app
Copyright	
  Third	
  Nature,	
  Inc.	
  
Increasing	
  data	
  volume	
  
Replicas:	
  split	
  the	
  workload	
  and	
  tune	
  the	
  systems	
  
based	
  on	
  their	
  workload.	
  
Report
app
OLTP Report
app
Copyright	
  Third	
  Nature,	
  Inc.	
  
Increasing	
  data	
  volume	
  breaks	
  the	
  old	
  model	
  
Devise	
  a	
  new	
  architecture.	
  	
  
ReschemaDze	
  the	
  database,	
  eliminate	
  cyclic	
  joins,	
  
selecDve	
  denormalizaDon,	
  query	
  generators.	
  But	
  it	
  
takes	
  bulk	
  processing	
  to	
  reschemaDze	
  the	
  data.	
  
Query
tools
OLTP
OLTP
OLTP
ETL
Copyright	
  Third	
  Nature,	
  Inc.	
  
Increasing	
  data	
  volume	
  
Improve	
  response	
  Dme	
  with	
  caching	
  in	
  the	
  query	
  
tools,	
  and	
  by	
  using	
  MOLAP	
  tools	
  that	
  map	
  into	
  cache	
  
or	
  memory.	
  Like	
  mainframe	
  reporDng.	
  Or…	
  
Query
tools
OLTP
OLTP
OLTP
ETL
cache
Query
tools
cache
Copyright	
  Third	
  Nature,	
  Inc.	
  
Increasing	
  data	
  volume	
  
Parallel	
  processing	
  for	
  ETL.	
  Distributed	
  query	
  
databases	
  for	
  fine	
  grained	
  high	
  volume	
  parallelism.	
  
ETLETLETL
OLTP
OLTP
OLTP
Query
tools
cache
Query
tools
cache
Copyright	
  Third	
  Nature,	
  Inc.	
  
The	
  architecture	
  looks	
  familiar	
  
Two	
  workloads,	
  two	
  not	
  dissimilar	
  architectures:	
  
▪  Load-­‐balanced	
  front	
  ends	
  
▪  Distributed	
  caching	
  layers	
  
▪  Scalable	
  distributed	
  parallel	
  databases	
  
But	
  the	
  nature	
  of	
  the	
  OLTP	
  and	
  OLAP	
  workloads	
  is	
  
very	
  different.	
  Forcing	
  them	
  into	
  one	
  plaworm	
  is	
  
almost	
  impossible	
  for	
  data	
  architecture	
  reasons	
  
and	
  parDcularly	
  at	
  scale*	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
DATA	
  PERSISTENCE	
  AND	
  STORES	
  
Why	
  would	
  digital	
  data	
  be	
  any	
  different	
  than	
  clay	
  or	
  
scrolls	
  or	
  books?	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
“Big	
  data	
  is	
  unprecedented.”	
  
-­‐	
  Anyone	
  involved	
  with	
  big	
  data	
  in	
  even	
  the	
  
most	
  barely	
  percepHble	
  way	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
There’s	
  a	
  difference	
  
between	
  having	
  no	
  past	
  
and	
  acDvely	
  rejecDng	
  it.	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
MultiValue, Hierarchical
PICK
IMS
IDS
ADABAS
Relational
CODASYL
System R (SEQUEL)
SQL/DS
INGRES (QUEL)
Mimer
Oracle
RDBMS, SQL standard
DB2
Teradata
Informix
Sybase
Postgres
OODBMS, ORDBMS
Versant
Objectivity
Gemstone
Informix*
Oracle*
MPP Query, NoSQL
Netezza
Paraccel
Vertica
MongoDB
CouchBase
Riak
Cassandra
NewSQL
SciDB
MonetDB
NuoDB
CitusDB
1960s 1980s 2000s
1970s 1990s 2010s
Spanner
F1
Copyright	
  Third	
  Nature,	
  Inc.	
  
A	
  history	
  of	
  databases	
  in	
  No-­‐taGon	
  
1970:	
  NoSQL	
  =	
  We	
  have	
  no	
  SQL	
  	
  
1980:	
  NoSQL	
  =	
  Know	
  SQL	
  	
  
2000:	
  NoSQL	
  =	
  No	
  SQL!	
  	
  
2005:	
  NoSQL	
  =	
  Not	
  only	
  SQL	
  	
  
2013:	
  NoSQL	
  =	
  No,	
  SQL!	
  
(R)DB(MS)	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Tradeoffs?	
  
“Query	
  opHmizaHon	
  
is	
  not	
  rocket	
  science.	
  
When	
  you	
  flunk	
  out	
  of	
  
query	
  opHmizaHon,	
  
we	
  make	
  you	
  go	
  build	
  
rockets.”	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Joins.	
  Wait,	
  there’s	
  more	
  than	
  one?	
  
Wait, there’s more than one?
Copyright	
  Third	
  Nature,	
  Inc.	
  
In	
  NoSQL	
  Land,	
  OpGmizer	
  is	
  You!	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Services	
  provided	
  
Standard	
  API/query	
  layer*	
  
TransacDon	
  /	
  consistency	
  
Query	
  opDmizaDon	
  
Data	
  navigaDon,	
  joins	
  
Data	
  access	
  
Storage	
  management	
  
Database
Database
Tradeoffs:	
  In	
  NoSQL,	
  the	
  DBMS	
  is	
  You	
  Too	
  
SQL	
  database	
   NoSQL	
  database	
  
Application Application
Anything	
  not	
  done	
  by	
  the	
  DB	
  becomes	
  a	
  developer’s	
  task.	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
The	
  relaDonal	
  database	
  is	
  the	
  franchise	
  technology	
  for	
  storing	
  and	
  
retrieving	
  data,	
  but…	
  
1.  Global,	
  staDc	
  schema	
  model	
  
2.  No	
  rich	
  typing	
  system	
  
3.  No	
  management	
  of	
  natural	
  ordering	
  in	
  data	
  
4.  Many	
  are	
  not	
  a	
  good	
  fit	
  for	
  network	
  parallel	
  compuDng,	
  aka	
  cloud	
  
5.  Limited	
  API	
  in	
  atomic	
  SQL	
  statement	
  syntax	
  	
  &	
  simple	
  result	
  set	
  return	
  
6.  Poor	
  developer	
  support	
  (in	
  languages,	
  in	
  IDEs,	
  in	
  processes)	
  
RelaGonal:	
  a	
  good	
  conceptual	
  model,	
  but	
  a	
  
prematurely	
  standardized	
  implementaGon	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
The	
  relaDonal	
  database	
  is	
  the	
  franchise	
  technology	
  for	
  storing	
  and	
  
retrieving	
  data,	
  but…	
  
1.  Global,	
  staDc	
  schema	
  model	
  
2.  No	
  rich	
  typing	
  system	
  
3.  No	
  management	
  of	
  natural	
  ordering	
  in	
  data	
  
4.  Many	
  are	
  not	
  a	
  good	
  fit	
  for	
  network	
  parallel	
  compuDng,	
  aka	
  cloud	
  
5.  Limited	
  API	
  in	
  atomic	
  SQL	
  statement	
  syntax	
  	
  &	
  simple	
  result	
  set	
  return	
  
6.  Poor	
  developer	
  support	
  
RelaGonal:	
  a	
  good	
  conceptual	
  model,	
  but	
  a	
  
prematurely	
  standardized	
  implementaGon	
  
What	
  I	
  did	
  not	
  list:	
  
Scalability	
  and	
  performance	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
BIGNESS	
  AND	
  SCALABILITY	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Technology	
  Capability	
  and	
  Data	
  Volume	
  
Source: Noumenal, Inc.
Copyright	
  Third	
  Nature,	
  Inc.	
  
You	
  can	
  make	
  a	
  database	
  emulate	
  a	
  KVS	
  
If	
  you	
  map	
  the	
  shared	
  event	
  fields	
  to	
  fixed	
  columns	
  and	
  
an	
  event	
  type,	
  then	
  use	
  a	
  varchar	
  or	
  clob	
  payload	
  
column,	
  you	
  can	
  store	
  arbitrary	
  events	
  in	
  a	
  database	
  and	
  
query	
  them	
  via	
  SQL	
  and	
  views	
  (or	
  column	
  funcDons),	
  and	
  
do	
  it	
  all	
  in	
  a	
  single	
  table.	
  
Arbitrary data parsed
at query time using
native DB features like
regex or UDF
Data common to all
events in the database
Type code used to
differentiate event
payload formats.
Data	
  Warehouse	
  +	
  
Behavioral	
  
Singularity
Data	
  Warehouse	
  
Low End Enterprise-class System
Contextual-Complex Analytics
Deep, Seasonal, Consumable Data Sets
Production Data Warehousing
Large Concurrent User-base
+ +
150+
concurrent users
500+
concurrent users
Enterprise-class System
5-10
concurrent users
Structure the Unstructured
Detect Patterns
Commodity Hardware System
6+PB 40+PB 20+PB
Analyze	
  &	
  Report	
  
Discover	
  &	
  Explore	
  
Data	
  Plagorms	
  
Thanks to eBay for these case slides.
Start_dt	
   Guid	
   Sess_id	
   Page_id	
   Soj	
  
2011-­‐10-­‐18	
   1234	
   1	
   15	
   Language=en&	
  
source=hp&	
  
itm=i1,i2,i3,i4,i5	
  
SELECT start_dt, guid, sess_id, page_id,
NVL(e.soj, ‘itm’) AS item_list
FROM event e
WHERE e.start_dt = ‘2011-10-18’
AND e.page_id = 3286 /* Search Results */
Start_dt	
   Guid	
   Sess_id	
   Page_id	
   Item_list	
  
2011-­‐10-­‐18	
   1234	
   1	
   15	
   i1,i2,i3,i4,i5	
  
How	
  to	
  use	
  a	
  database	
  for	
  semi-­‐structured	
  data	
  
First the user-defined function (UDF)
Thanks to eBay for these case slides.
WITH event (start_dt, item_list) AS (<previous SQL>)
SELECT
start_dt,
item_id, /* Individual Item */
count(*)
FROM TABLE ( /* Normalize comma delimited list */
normalize_list( start_dt, item_list, ‘,’)
RETURNS(start_dt, idx, item_id)
)
GROUP BY 1, 2
ORDER BY 3 DESC
Start_dt	
   Item_id	
   Count(*)	
  
2011-­‐10-­‐18	
   i1	
   555	
  
2011-­‐10-­‐18	
   i2	
   444	
  
2011-­‐10-­‐18	
   i3	
   333	
  
2011-­‐10-­‐18	
   i4	
   222	
  
2011-­‐10-­‐18	
   i5	
   111	
  
*syntax simplified
How	
  to	
  use	
  a	
  database	
  for	
  semi-­‐structured	
  data	
  
Then the table function (standard ANSI SQL)
Thanks to eBay for these case slides.
Copyright	
  Third	
  Nature,	
  Inc.	
  
Pricing	
  and	
  performance:	
  Hadoop	
  is	
  a	
  storage	
  
and	
  processing	
  play,	
  not	
  	
  a	
  database	
  play*	
  
Source: Venturebeat
With	
  big	
  data	
  systems,	
  
the	
  cost	
  of	
  storing	
  data	
  is	
  
an	
  order	
  of	
  magnitude	
  
lower	
  than	
  with	
  
databases	
  today	
  (but	
  not	
  
the	
  cost	
  or	
  ability	
  to	
  
query	
  it	
  back	
  out).	
  
Processing	
  data	
  at	
  scale	
  
is	
  at	
  least	
  an	
  order	
  of	
  
magnitude	
  cheaper	
  too.	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Bigness:	
  most	
  people	
  do	
  not	
  need	
  special	
  technology	
  Numberofpeople
The distribution of data
size is about normal, yet
these guys set the tone of
the market today.
Bigness of data
Copyright	
  Third	
  Nature,	
  Inc.	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
AnalyGcs:	
  This	
  is	
  really	
  raw	
  data	
  under	
  storage	
  Numberofjobs
Microsoft study of 174,000 analytic
jobs in their cluster: median size ???
Bigness of data
Copyright	
  Third	
  Nature,	
  Inc.	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Working	
  data	
  for	
  analyGcs	
  most	
  olen	
  not	
  big	
  Numberofjobs
14 GB
Smallness of data
Copyright	
  Third	
  Nature,	
  Inc.	
  
© Third Nature Inc.© Third Nature Inc.
What	
  makes	
  data	
  “big”?	
  
Very	
  large	
  amounts	
  
Hierarchical	
  structures	
  
Nested	
  structures	
  
Encoded	
  values	
  
Non-­‐standard	
  (for	
  a	
  database)	
  
types	
  
Time	
  series	
  
Deep	
  structure	
  
Human	
  authored	
  text	
  
“big”	
  is	
  beMer	
  off	
  being	
  defined	
  as	
  “complex”	
  or	
  “hard	
  to	
  manage”	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Web	
  tracking	
  data	
  has	
  a	
  nested	
  structure	
  
USER_ID 301212631165031
SESSION_ID 590387153892659
VISIT_DATE 1/10/2010	
  0:00
SESSION_START_DATE 1:41:44	
  AM
PAGE_VIEW_DATE 1/10/2010	
  9:59
DESTINATION_URL
hbps://www.phisherking.com/gins/store/LogonForm?
mmc=link-­‐src-­‐email-­‐_-­‐m100109-­‐_-­‐44IOJ1-­‐_-­‐
shop&langId=-­‐1&storeId=1055&URL=BECGinListItemDisplay
REFERRAL_NAME Direct
REFERRAL_URL -­‐
PAGE_ID PROD_24259_CARD
REL_PRODUCTS PROD_24654_CARD,	
  PROD_3648_FLOWERS
SITE_LOCATION_NAME VALENTINE'S	
  DAY	
  MICROSITE
SITE_LOCATION_ID SHOP-­‐BY-­‐HOLIDAY	
  VALENTINES	
  DAY
IP_ADDRESS 67.189.110.179
BROWSER_OS_NAME
MOZILLA/4.0	
  (COMPATIBLE;	
  MSIE	
  7.0;	
  AOL	
  9.0;	
  WINDOWS	
  NT	
  
5.1;	
  TRIDENT/4.0;	
  GTB6;	
  .NET	
  CLR	
  1.1.4322)
“unstructured” data
embedded in the
logged message:
complex strings
…but it’s easily unpacked into tuples
© Third Nature Inc.© Third Nature Inc.
Common	
  Names	
  
▪  Aspirin,	
  Acetylsalicylic	
  acid,	
  Excedrin	
  
Structural	
  formulas	
  
▪  Commonly	
  used	
  to	
  communicate	
  between	
  chemists	
  
SystemaDc	
  nomenclatures:	
  
▪  Mass	
  formula:	
  C9H8O4	
  
▪  SMILES:	
  OC(=O)C1=C(C=CC=C1)OC(=O)C	
  
▪  InChI:	
  1/C9H8O4/c1-­‐6(10)13-­‐8-­‐5-­‐3-­‐2-­‐4-­‐7(8)9(11)12/h2-­‐5H,1H3,(H,
11,12)	
  
▪  IUPAC:	
  pyrido[1",2":1',2']imidazo[4',5':5,6]pyrazino[2,3-­‐b]phenazine	
  
They	
  all	
  refer	
  to	
  the	
  same	
  thing.	
  
If	
  you	
  process	
  them	
  in	
  a	
  database,	
  how	
  do	
  you	
  store	
  them?	
  
All	
  of	
  these	
  things	
  are	
  “unstructured”	
  data	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Unstructured	
  data	
  isn’t	
  
really	
  unstructured.	
  
The	
  problem	
  is	
  that	
  this	
  
data	
  is	
  unmodeled.	
  
The	
  real	
  challenge	
  is	
  
complexity.	
  
We	
  usually	
  mean	
  text,	
  not	
  data	
  structures…	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Pa^erns	
  emerge	
  from	
  lots	
  of	
  event	
  data	
  
Paberns	
  emerge	
  from	
  
the	
  underlying	
  structure	
  
of	
  the	
  enHre	
  dataset.	
  
The	
  paberns	
  are	
  more	
  
interesDng	
  than	
  sums	
  
and	
  counts	
  of	
  the	
  events.	
  
Web	
  paths:	
  clicks	
  in	
  a	
  
session	
  as	
  network	
  node	
  
traversal.	
  
Email:	
  traffic	
  analysis	
  
producing	
  a	
  network	
  
The event stream is a source for analysis, generating
another set of data that is the source for different analysis.
Copyright	
  Third	
  Nature,	
  Inc.	
  
BIGNESS	
  AND	
  DATA	
  
COMPUTATIONAL	
  WORKLOADS	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Not	
  finished:	
  remember	
  the	
  cycle	
  of	
  history…	
  
The	
  biggest	
  hole	
  in	
  the	
  prior	
  secDons	
  is	
  that	
  we	
  
scaled	
  OLTP	
  and	
  OLAP	
  but	
  what	
  about	
  analyDcs?	
  
Queries	
  <>	
  transacDons	
  <>	
  computaDons	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
The	
  three	
  way	
  workload	
  break	
  
1.  OperaDonal:	
  OLTP	
  systems	
  
2.  AnalyDc:	
  OLAP	
  systems	
  
3.  ScienDfic:	
  ComputaDonal	
  systems	
  
Unit	
  of	
  focus:	
  
1.  TransacDon	
  
2.  Query	
  
3.  ComputaDon	
  
Different	
  problems	
  require	
  different	
  plaworms	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
The	
  holy	
  grail	
  of	
  databases	
  under	
  current	
  market	
  hype	
  
We’re	
  talking	
  mostly	
  about	
  
computaGon	
  over	
  data	
  when	
  
we	
  talk	
  about	
  “big	
  data”	
  and	
  
analyGcs.	
  
The	
  goal	
  is	
  combining	
  data	
  
storage,	
  retrieval	
  and	
  analysis	
  
into	
  one	
  system,	
  a	
  potenJal	
  
mismatch	
  for	
  both	
  relaJonal	
  
and	
  nosql.
Copyright	
  Third	
  Nature,	
  Inc.	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
A	
  Simple	
  Division	
  of	
  the	
  AnalyGc	
  Problem	
  Space	
  Computation
LittleLots
Data volume
Little Lots
Big analytics, little data
Specialized computing,
modeling problems:
supercomputing, GPUs
Big analytics, big data
Complex math over large
data volumes requires non-
relational shared nothing
architectures
Little analytics, little data
The entry point; SAS,
SMP databases, even
OLAP cubes can work
Little analytics, big data
The BI/DW space, for the
most part, done in
databases mostly
Copyright	
  Third	
  Nature,	
  Inc.	
  
No	
  technical	
  soluGon	
  fits	
  all	
  three	
  axes	
  
Big	
  
ComputaGon	
  
Parallel	
  DSLs,	
  
HPC,	
  GPU,	
  
MapReduce	
  
Concurrency	
  
Parallel	
  DBs,	
  
nosql	
  stores*	
  
Volume	
  
Parallel	
  DBs,	
  
MapReduce,	
  
BSP	
  	
  
Some work along two axes,
but most have a primary focus
on one core component
Copyright	
  Third	
  Nature,	
  Inc.	
  
Data	
  infrastructure	
  is	
  a	
  plagorm	
  
▪  Any	
  data	
  –	
  structures,	
  forms	
  
▪  Any	
  latency	
  –in	
  moDon,	
  at	
  rest	
  
▪  Any	
  process	
  –	
  query,	
  algorithm,	
  engine,	
  
transformaDon	
  
▪  Any	
  access	
  –	
  SQL,	
  API,	
  queue,	
  file	
  movement	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Hadoop	
  &	
  NoSQL	
  AdopGon	
  
Some	
  people	
  can’t	
  resist	
  
ge„ng	
  the	
  next	
  new	
  thing	
  
because	
  it’s	
  new.	
  
Many	
  organizaDons	
  are	
  like	
  
this,	
  promoDng	
  a	
  soluDon	
  and	
  
hunDng	
  for	
  the	
  problem	
  that	
  
matches	
  it.	
  
Beber	
  to	
  ask	
  “What	
  is	
  the	
  
problem	
  for	
  which	
  this	
  
technology	
  is	
  the	
  answer?”	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
NoSQL	
  Will	
  “Fail”	
  
Unless	
  some	
  mathemaDcally	
  derived	
  data	
  model	
  is	
  
developed,	
  and	
  a	
  query	
  language	
  using	
  it	
  is	
  created.	
  
Otherwise	
  there’s	
  no	
  standard	
  interfacing	
  model,	
  
no	
  interoperability,	
  no	
  chance	
  for	
  a	
  tool	
  ecosystem	
  
to	
  evolve	
  over	
  top	
  of	
  the	
  plaworm	
  –	
  because	
  there	
  
are	
  no	
  uniform	
  plaworm	
  boundaries.	
  
“One	
  logical	
  interface,	
  many	
  physical	
  
implementaDons”	
  is	
  a	
  key	
  reason	
  why	
  SQL	
  won	
  the	
  
database	
  wars.	
  This	
  creates	
  an	
  ecosystem.	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Maybe…	
  
These aren’t the
databases we’re
looking for.
We need one that
speaks pig latin.
Copyright	
  Third	
  Nature,	
  Inc.	
  
What’s	
  wrong	
  with	
  BASE?	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Google	
  F1:	
  Another	
  EvoluGon	
  
Distributed	
  SQL	
  database	
  
ACID	
  compliance,	
  2PC	
  and	
  row-­‐
level	
  locking	
  (!)	
  
Transparent	
  data	
  distribuDon	
  
Synchronous	
  replicaDon	
  across	
  
data	
  centers	
  
Table	
  interleaving	
  (hierarchies)	
  
Queryable	
  protobufs	
  
MapReduce	
  access	
  to	
  underlying	
  
data	
  
Average	
  user-­‐facing	
  latency	
  of	
  
~200ms	
  with	
  small	
  deviaDon	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
The	
  big	
  data	
  revoluGon,	
  more	
  of	
  an	
  evoluGon	
  
Be	
  pragmaGc,	
  not	
  dogmaGc	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
Conclusion	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
References	
  (things	
  worth	
  reading	
  on	
  the	
  way	
  home)	
  A	
  relaDonal	
  model	
  for	
  large	
  shared	
  data	
  banks,	
  CommunicaDons	
  of	
  the	
  ACM,	
  June,	
  1970,	
  
hbp://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf	
  
Column-­‐Oriented	
  Database	
  Systems,	
  Stavros	
  Harizopoulos,	
  Daniel	
  Abadi,	
  Peter	
  Boncz,	
  VLDB	
  2009	
  Tutorial	
  
hbp://cs-­‐www.cs.yale.edu/homes/dna/talks/Column_Store_Tutorial_VLDB09.pdf	
  
Nobody	
  ever	
  got	
  fired	
  for	
  using	
  Hadoop	
  on	
  a	
  cluster,	
  1st	
  InternaDonal	
  Workshop	
  on	
  Hot	
  Topics	
  in	
  Cloud	
  Data	
  
ProcessingApril	
  10,	
  2012,	
  Bern,	
  Switzerland.	
  
A	
  co-­‐RelaDonal	
  Model	
  of	
  Data	
  for	
  Large	
  Shared	
  Data	
  Banks,	
  ACM	
  Queue,	
  2012,	
  
hbp://queue.acm.org/detail.cfm?id=1961297	
  
A	
  query	
  language	
  for	
  mulDdimensional	
  arrays:	
  design,	
  implementaDon	
  and	
  opDmizaDon	
  techniques,	
  SIGMOD,	
  
1996	
  
ProbabilisDcally	
  Bounded	
  Staleness	
  for	
  PracDcal	
  ParDal	
  Quorums,	
  Proceedings	
  of	
  the	
  VLDB	
  Endowment,	
  Vol.	
  5,	
  
No.	
  8,	
  hbp://vldb.org/pvldb/vol5/p776_peterbailis_vldb2012.pdf	
  
“Amorphous	
  Data-­‐parallelism	
  in	
  Irregular	
  Algorithms”,	
  Keshav	
  Pingali	
  et	
  al	
  
MapReduce:	
  Simplified	
  Data	
  Processing	
  on	
  Large	
  Clusters,	
  
hbp://staDc.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/
mapreduce-­‐osdi04.pdf	
  
Dremel:	
  InteracDve	
  Analysis	
  of	
  Web-­‐Scale	
  Datasets,	
  Proceedings	
  of	
  the	
  VLDB	
  Endowment,	
  Vol.	
  3,	
  No.	
  1,	
  2010	
  
hbp://staDc.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/
36632.pdf	
  
Spanner:	
  Google’s	
  Globally-­‐Distributed	
  Database,	
  SIGMOD,	
  May,	
  2012,	
  
hbp://staDc.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/es//archive/
spanner-­‐osdi2012.pdf	
  
F1:	
  A	
  Distributed	
  SQL	
  Database	
  That	
  Scales,	
  Proceedings	
  of	
  the	
  VLDB	
  Endowment,	
  Vol.	
  6,	
  No.	
  11,	
  2013,	
  
hbp://staDc.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/
archive/41344.pdf	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
CC	
  Image	
  A^ribuGons	
  
Thanks	
  to	
  the	
  people	
  who	
  supplied	
  the	
  creaDve	
  commons	
  licensed	
  images	
  used	
  in	
  this	
  presentaDon:	
  
shady_puppy_sales.jpg	
  -­‐	
  hbp://www.flickr.com/photos/brizzlebornandbred/5001120150	
  
cuneiform_proto_3000bc.jpg	
  -­‐	
  hbp://www.flickr.com/photos/takomabibelot/3124619443/	
  
cuneiform_undo.jpg	
  -­‐	
  hbp://www.flickr.com/photos/charlesDlford/2552654321/	
  
scroll_kerouac.jpg	
  -­‐	
  hbp://www.flickr.com/photos/ari/93966538/	
  
House	
  on	
  fire	
  -­‐	
  hbp://flickr.com/photos/oldonliner/1485881035/	
  
Manuscripts	
  on	
  shelf	
  -­‐	
  hbp://flickr.com/photos/peterkaminski/1688635/	
  
manuscript_illum.jpg	
  -­‐	
  hbp://www.flickr.com/photos/diorama_sky/2975796332/	
  
manuscript_page.jpg	
  -­‐	
  hbp://www.flickr.com/photos/calliope/306564541/	
  
subway	
  dc	
  metro	
  	
  -­‐	
  hbp://flickr.com/photos/musaeum/509899161/	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
About	
  the	
  Presenter	
  
Mark	
  Madsen	
  is	
  president	
  of	
  Third	
  
Nature,	
  a	
  research	
  and	
  consulDng	
  firm	
  
focused	
  on	
  building	
  the	
  infrastructure	
  for	
  
analyDcs,	
  evidence-­‐based	
  management,	
  
business	
  intelligence	
  and	
  data	
  
management.	
  Mark	
  is	
  an	
  award-­‐winning	
  
author,	
  architect	
  and	
  CTO	
  whose	
  work	
  
has	
  been	
  featured	
  in	
  numerous	
  industry	
  
publicaDons.	
  Over	
  the	
  past	
  ten	
  years	
  
Mark	
  received	
  awards	
  for	
  his	
  work	
  from	
  
the	
  American	
  ProducDvity	
  &	
  Quality	
  
Center,	
  TDWI,	
  and	
  the	
  Smithsonian	
  
InsDtute.	
  He	
  is	
  an	
  internaDonal	
  speaker,	
  a	
  
contributor	
  at	
  Forbes	
  Online	
  and	
  
InformaDon	
  Management.	
  For	
  more	
  
informaDon	
  or	
  to	
  contact	
  Mark,	
  follow	
  
@markmadsen	
  on	
  Twiber	
  or	
  visit	
  	
  hbp://
ThirdNature.net	
  	
  
Copyright	
  Third	
  Nature,	
  Inc.	
  
About	
  Third	
  Nature	
  
Third Nature is a research and consulting firm focused on new and
emerging technology and practices in business intelligence, analytics and
performance management. If your question is related to BI, analytics,
information strategy and data then you‘re at the right place.
Our goal is to help companies take advantage of information-driven
management practices and applications. We offer education, consulting
and research services to support business and IT organizations as well as
technology vendors.
We fill the gap between what the industry analyst firms cover and what IT
needs. We specialize in product and technology analysis, so we look at
emerging technologies and markets, evaluating technology and hw it is
applied rather than vendor market positions.

Weitere ähnliche Inhalte

Andere mochten auch

Capítulo piloto de Desenfocados
Capítulo piloto de DesenfocadosCapítulo piloto de Desenfocados
Capítulo piloto de DesenfocadosRipi86
 
Newsletter SMS Varanasi 2014
Newsletter SMS Varanasi 2014Newsletter SMS Varanasi 2014
Newsletter SMS Varanasi 2014smsvaranasi
 
CIF - Classificação Internacional de Funcionalidade
CIF - Classificação Internacional de FuncionalidadeCIF - Classificação Internacional de Funcionalidade
CIF - Classificação Internacional de FuncionalidadeEduardo Santana Cordeiro
 
Desfaces educativos
Desfaces educativosDesfaces educativos
Desfaces educativosJesus Perez
 
PrimeVision E-mercials
PrimeVision E-mercialsPrimeVision E-mercials
PrimeVision E-mercialsAlex Coroneos
 
Conservación del sonido
Conservación del sonidoConservación del sonido
Conservación del sonidoJack Ahm Peace
 
Web 2.0 in und für Bibliotheken
Web 2.0 in und für BibliothekenWeb 2.0 in und für Bibliotheken
Web 2.0 in und für BibliothekenChristian Hauschke
 
Public Value Management, een nieuw sturingsparadigma?
Public Value Management, een nieuw sturingsparadigma?Public Value Management, een nieuw sturingsparadigma?
Public Value Management, een nieuw sturingsparadigma?Rob Janssens
 
AñO 3 Nº 12 Diciembre 1990
AñO 3 Nº 12 Diciembre 1990AñO 3 Nº 12 Diciembre 1990
AñO 3 Nº 12 Diciembre 1990maranchon
 
Todo acerca de tour & go
Todo acerca de tour  & goTodo acerca de tour  & go
Todo acerca de tour & godouglasmotilon
 
Rutas de iglesias singulares en la Comunidad de Madrid
Rutas de iglesias singulares en la Comunidad de MadridRutas de iglesias singulares en la Comunidad de Madrid
Rutas de iglesias singulares en la Comunidad de MadridLa Gatera de la Villa
 
Harbor Research - The Internet of Things Meets the Internet of People
Harbor Research - The Internet of Things Meets the Internet of PeopleHarbor Research - The Internet of Things Meets the Internet of People
Harbor Research - The Internet of Things Meets the Internet of PeopleHarbor Research
 
Democracia Sexual Por Eric Fassin
Democracia Sexual Por Eric FassinDemocracia Sexual Por Eric Fassin
Democracia Sexual Por Eric FassinEsteban Galvan
 
Bank of America Merill Lynch - Banking & Insurance CEO Conference, Christian ...
Bank of America Merill Lynch - Banking & Insurance CEO Conference, Christian ...Bank of America Merill Lynch - Banking & Insurance CEO Conference, Christian ...
Bank of America Merill Lynch - Banking & Insurance CEO Conference, Christian ...Nordea Bank
 
Historia de pokemon
Historia de pokemonHistoria de pokemon
Historia de pokemoncpfontes
 
Video Games And Operant Conditioning
Video Games And Operant ConditioningVideo Games And Operant Conditioning
Video Games And Operant ConditioningMarti McCaleb
 
Mitología griega
Mitología griegaMitología griega
Mitología griegafrancimanz
 
Alan chalmers ¿Qué es esa cosa llamada ciencia?
Alan chalmers ¿Qué es esa cosa llamada ciencia?Alan chalmers ¿Qué es esa cosa llamada ciencia?
Alan chalmers ¿Qué es esa cosa llamada ciencia?abelantonioo
 

Andere mochten auch (20)

Capítulo piloto de Desenfocados
Capítulo piloto de DesenfocadosCapítulo piloto de Desenfocados
Capítulo piloto de Desenfocados
 
Tips RGE
Tips RGETips RGE
Tips RGE
 
Newsletter SMS Varanasi 2014
Newsletter SMS Varanasi 2014Newsletter SMS Varanasi 2014
Newsletter SMS Varanasi 2014
 
CIF - Classificação Internacional de Funcionalidade
CIF - Classificação Internacional de FuncionalidadeCIF - Classificação Internacional de Funcionalidade
CIF - Classificação Internacional de Funcionalidade
 
Desfaces educativos
Desfaces educativosDesfaces educativos
Desfaces educativos
 
PrimeVision E-mercials
PrimeVision E-mercialsPrimeVision E-mercials
PrimeVision E-mercials
 
Conservación del sonido
Conservación del sonidoConservación del sonido
Conservación del sonido
 
Web 2.0 in und für Bibliotheken
Web 2.0 in und für BibliothekenWeb 2.0 in und für Bibliotheken
Web 2.0 in und für Bibliotheken
 
Public Value Management, een nieuw sturingsparadigma?
Public Value Management, een nieuw sturingsparadigma?Public Value Management, een nieuw sturingsparadigma?
Public Value Management, een nieuw sturingsparadigma?
 
AñO 3 Nº 12 Diciembre 1990
AñO 3 Nº 12 Diciembre 1990AñO 3 Nº 12 Diciembre 1990
AñO 3 Nº 12 Diciembre 1990
 
Todo acerca de tour & go
Todo acerca de tour  & goTodo acerca de tour  & go
Todo acerca de tour & go
 
Onlinekommunikation – K2-Tagungsbroschüre 16. Juni 2011
Onlinekommunikation – K2-Tagungsbroschüre 16. Juni 2011Onlinekommunikation – K2-Tagungsbroschüre 16. Juni 2011
Onlinekommunikation – K2-Tagungsbroschüre 16. Juni 2011
 
Rutas de iglesias singulares en la Comunidad de Madrid
Rutas de iglesias singulares en la Comunidad de MadridRutas de iglesias singulares en la Comunidad de Madrid
Rutas de iglesias singulares en la Comunidad de Madrid
 
Harbor Research - The Internet of Things Meets the Internet of People
Harbor Research - The Internet of Things Meets the Internet of PeopleHarbor Research - The Internet of Things Meets the Internet of People
Harbor Research - The Internet of Things Meets the Internet of People
 
Democracia Sexual Por Eric Fassin
Democracia Sexual Por Eric FassinDemocracia Sexual Por Eric Fassin
Democracia Sexual Por Eric Fassin
 
Bank of America Merill Lynch - Banking & Insurance CEO Conference, Christian ...
Bank of America Merill Lynch - Banking & Insurance CEO Conference, Christian ...Bank of America Merill Lynch - Banking & Insurance CEO Conference, Christian ...
Bank of America Merill Lynch - Banking & Insurance CEO Conference, Christian ...
 
Historia de pokemon
Historia de pokemonHistoria de pokemon
Historia de pokemon
 
Video Games And Operant Conditioning
Video Games And Operant ConditioningVideo Games And Operant Conditioning
Video Games And Operant Conditioning
 
Mitología griega
Mitología griegaMitología griega
Mitología griega
 
Alan chalmers ¿Qué es esa cosa llamada ciencia?
Alan chalmers ¿Qué es esa cosa llamada ciencia?Alan chalmers ¿Qué es esa cosa llamada ciencia?
Alan chalmers ¿Qué es esa cosa llamada ciencia?
 

Ähnlich wie Don't follow the followers

First, Firster, Firstest: Three lessons from history on information overload
First, Firster, Firstest: Three lessons from history on information overloadFirst, Firster, Firstest: Three lessons from history on information overload
First, Firster, Firstest: Three lessons from history on information overloadmark madsen
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogiesmark madsen
 
Data Integration Lecture
Data Integration LectureData Integration Lecture
Data Integration LectureSUNY Oneonta
 
VINT Symposium 2012: Recorded Future | David Weinberger
VINT Symposium 2012: Recorded Future | David WeinbergerVINT Symposium 2012: Recorded Future | David Weinberger
VINT Symposium 2012: Recorded Future | David WeinbergerVINTlabs | The Sogeti Trendlab
 
How Can You Write Essay. Online assignment writing service.
How Can You Write Essay. Online assignment writing service.How Can You Write Essay. Online assignment writing service.
How Can You Write Essay. Online assignment writing service.Kara Flores
 
Grab a bucket! It's raining data!
Grab a bucket! It's raining data!Grab a bucket! It's raining data!
Grab a bucket! It's raining data!Dorothea Salo
 
Week 8 Presentation Angela Wade
Week 8 Presentation Angela WadeWeek 8 Presentation Angela Wade
Week 8 Presentation Angela Wadeangewade
 
The Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebThe Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebMartin Kalfatovic
 
Jabes 2009 - Conférence inaugurale "Un avenir sans livres pour les bibliothèq...
Jabes 2009 - Conférence inaugurale "Un avenir sans livres pour les bibliothèq...Jabes 2009 - Conférence inaugurale "Un avenir sans livres pour les bibliothèq...
Jabes 2009 - Conférence inaugurale "Un avenir sans livres pour les bibliothèq...ABES
 
The Object Orientation of Teams
The Object Orientation of TeamsThe Object Orientation of Teams
The Object Orientation of TeamsLisa Welchman
 
Wie Schreibe Ich Einen Wissenschaftlichen Essay
Wie Schreibe Ich Einen Wissenschaftlichen EssayWie Schreibe Ich Einen Wissenschaftlichen Essay
Wie Schreibe Ich Einen Wissenschaftlichen EssayTanya Collins
 
Derrick De K Brainframes Of Web 2.0
Derrick De K Brainframes Of Web 2.0Derrick De K Brainframes Of Web 2.0
Derrick De K Brainframes Of Web 2.0New Media Days
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadKelly Technologies
 
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Duncan Hull
 
Data Harmonisation for Ethical Collaborative Research: The ResearchSpace Project
Data Harmonisation for Ethical Collaborative Research:The ResearchSpace ProjectData Harmonisation for Ethical Collaborative Research:The ResearchSpace Project
Data Harmonisation for Ethical Collaborative Research: The ResearchSpace ProjectDominic Oldman
 
The Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebThe Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebMartin Kalfatovic
 

Ähnlich wie Don't follow the followers (20)

First, Firster, Firstest: Three lessons from history on information overload
First, Firster, Firstest: Three lessons from history on information overloadFirst, Firster, Firstest: Three lessons from history on information overload
First, Firster, Firstest: Three lessons from history on information overload
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogies
 
Nosql public
Nosql publicNosql public
Nosql public
 
Data Integration Lecture
Data Integration LectureData Integration Lecture
Data Integration Lecture
 
So what if it's a bubble?
So what if it's a bubble?So what if it's a bubble?
So what if it's a bubble?
 
VINT Symposium 2012: Recorded Future | David Weinberger
VINT Symposium 2012: Recorded Future | David WeinbergerVINT Symposium 2012: Recorded Future | David Weinberger
VINT Symposium 2012: Recorded Future | David Weinberger
 
How Can You Write Essay. Online assignment writing service.
How Can You Write Essay. Online assignment writing service.How Can You Write Essay. Online assignment writing service.
How Can You Write Essay. Online assignment writing service.
 
Grab a bucket! It's raining data!
Grab a bucket! It's raining data!Grab a bucket! It's raining data!
Grab a bucket! It's raining data!
 
Week 8 Presentation Angela Wade
Week 8 Presentation Angela WadeWeek 8 Presentation Angela Wade
Week 8 Presentation Angela Wade
 
When?
When?When?
When?
 
The Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebThe Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic Web
 
Jabes 2009 - Conférence inaugurale "Un avenir sans livres pour les bibliothèq...
Jabes 2009 - Conférence inaugurale "Un avenir sans livres pour les bibliothèq...Jabes 2009 - Conférence inaugurale "Un avenir sans livres pour les bibliothèq...
Jabes 2009 - Conférence inaugurale "Un avenir sans livres pour les bibliothèq...
 
The Object Orientation of Teams
The Object Orientation of TeamsThe Object Orientation of Teams
The Object Orientation of Teams
 
Resource Description Pres and Paper
Resource Description Pres and PaperResource Description Pres and Paper
Resource Description Pres and Paper
 
Wie Schreibe Ich Einen Wissenschaftlichen Essay
Wie Schreibe Ich Einen Wissenschaftlichen EssayWie Schreibe Ich Einen Wissenschaftlichen Essay
Wie Schreibe Ich Einen Wissenschaftlichen Essay
 
Derrick De K Brainframes Of Web 2.0
Derrick De K Brainframes Of Web 2.0Derrick De K Brainframes Of Web 2.0
Derrick De K Brainframes Of Web 2.0
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
 
Data Harmonisation for Ethical Collaborative Research: The ResearchSpace Project
Data Harmonisation for Ethical Collaborative Research:The ResearchSpace ProjectData Harmonisation for Ethical Collaborative Research:The ResearchSpace Project
Data Harmonisation for Ethical Collaborative Research: The ResearchSpace Project
 
The Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebThe Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic Web
 

Mehr von mark madsen

Data Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of PeopleData Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of Peoplemark madsen
 
Solve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for HumansSolve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for Humansmark madsen
 
The Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data ManagementThe Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data Managementmark madsen
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprisemark madsen
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019mark madsen
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018mark madsen
 
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou RangeA Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Rangemark madsen
 
How to understand trends in the data & software market
How to understand trends in the data & software marketHow to understand trends in the data & software market
How to understand trends in the data & software marketmark madsen
 
Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...mark madsen
 
Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesmark madsen
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehousemark madsen
 
A Pragmatic Approach to Analyzing Customers
A Pragmatic Approach to Analyzing CustomersA Pragmatic Approach to Analyzing Customers
A Pragmatic Approach to Analyzing Customersmark madsen
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?mark madsen
 
Briefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionBriefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionmark madsen
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecturemark madsen
 
Briefing Room analyst comments - streaming analytics
Briefing Room analyst comments - streaming analyticsBriefing Room analyst comments - streaming analytics
Briefing Room analyst comments - streaming analyticsmark madsen
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except usmark madsen
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)mark madsen
 
On the edge: analytics for the modern enterprise (analyst comments)
On the edge: analytics for the modern enterprise (analyst comments)On the edge: analytics for the modern enterprise (analyst comments)
On the edge: analytics for the modern enterprise (analyst comments)mark madsen
 

Mehr von mark madsen (20)

Data Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of PeopleData Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of People
 
Solve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for HumansSolve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for Humans
 
The Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data ManagementThe Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data Management
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprise
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018
 
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou RangeA Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
 
How to understand trends in the data & software market
How to understand trends in the data & software marketHow to understand trends in the data & software market
How to understand trends in the data & software market
 
Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...
 
Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slides
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehouse
 
A Pragmatic Approach to Analyzing Customers
A Pragmatic Approach to Analyzing CustomersA Pragmatic Approach to Analyzing Customers
A Pragmatic Approach to Analyzing Customers
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?
 
Briefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionBriefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collection
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecture
 
Briefing Room analyst comments - streaming analytics
Briefing Room analyst comments - streaming analyticsBriefing Room analyst comments - streaming analytics
Briefing Room analyst comments - streaming analytics
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except us
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)
 
On the edge: analytics for the modern enterprise (analyst comments)
On the edge: analytics for the modern enterprise (analyst comments)On the edge: analytics for the modern enterprise (analyst comments)
On the edge: analytics for the modern enterprise (analyst comments)
 

Kürzlich hochgeladen

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 

Kürzlich hochgeladen (20)

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 

Don't follow the followers

  • 1. Copyright  Third  Nature,  Inc.   Following  Google   Or   Don’t  Follow  the  Followers,   Follow  the  Leaders   Or   The  problem  probably  isn’t  the   database,  the  problem  is   probably  you   May,  2014   Mark  Madsen   www.ThirdNature.net   @markmadsen  
  • 2. Copyright  Third  Nature,  Inc.    A  Quick  Intro   I  may  or  may  not  be   qualified  to  make  the   outlandish  statements  in   this  presentaDon.  I  have,   however,  made  almost   every  mistake  menDoned,   and  one  learns  by  making   mistakes.   You  might  say  that’s  my   singular  skill.  
  • 3. Copyright  Third  Nature,  Inc.   A  BRIEF  HISTORY  OF  DATA   STORAGE  AND  RETRIEVAL   History  isn’t  taught  in  most  university  science  curricula     (probably  because  it’s  a  rabbit  hole)    
  • 4. Copyright  Third  Nature,  Inc.   Databases:  the  problem  statements  over  Gme   “InformaDon  has  become  a  form  of  garbage,  not  only   incapable  of  answering  the  most  fundamental  human   quesDons  but  barely  useful  in  providing  coherent   direcDon  to  the  soluDon  of  even  mundane  problems.”  –     Neil  Postman,  1985   "We  have  reason  to  fear  that  the  mulDtude  of  books   which  grows  every  day  in  a  prodigious  fashion  will  make   the  following  centuries  fall  into  a  state  as  barbarous  as   that  of  the  centuries  that  followed  the  fall  of  the  Roman   Empire.”  –  Adrien  Baillet,  1685   ”…so  many  books  that  we  do  not  even  have  Dme  to  read   the  Dtles."  –  Anton  Francesco  Doni,  1550  
  • 5. Copyright  Third  Nature,  Inc.   The  origin  of  informaGon  management  problems   For  ~5000  years  we  used   counters  of  various  types,   eventually  developing   wriDng  to  cope  with   civilizaDon’s  needs.   WriDng  is  more  efficient   than  counters  you  can  lose.   Sumerian  bulla  envelope  with  tokens.   The  beta  period.  
  • 6. Copyright  Third  Nature,  Inc.   InformaGon  Technology  v1.0:  Clay  Tech,  ~3000  bce   The  first  informaGon  explosion  
  • 7. Copyright  Third  Nature,  Inc.   That  explosion  led  to  the  first  metadata   “tags” Small piles in baskets are easy to tag and search
  • 8. Copyright  Third  Nature,  Inc.   Metadata  v1.0:  tablets  about  tablets   When  there  are  enough  of   these  lying  around  you  need  to   work  on  organizaDon  of  the   collecDon  by  categorizaDons,   aka  “taxonomy”,  “schema”   Like  working  out  what  tables   are  in  a  database,  or  what  files   are  stored  in  HDFS.   Babylonian  library  catalog  ~2000bce  
  • 9. Copyright  Third  Nature,  Inc.   Metadata  v1.1:  tablets  about  what’s  in  tablets   When literacy rates are higher and people need to communicate more effectively, you need to invent mechanisms to cope, like dictionaries. Now we’re worried about what’s inside the documents, not where they are placed. Synonym list, Ashurbanipal, ~900 bce
  • 10. Copyright  Third  Nature,  Inc.   Clay  Tech  has  some  familiar  limitaGons  
  • 11. Copyright  Third  Nature,  Inc.   InformaGon  Management  v2.0  Paper  Tech*   Lighter,  denser,  faster  storage  media  
  • 12. Copyright  Third  Nature,  Inc.   More  informaGon  =  need  for  new  metadata   techniques:  content  tagging,  author  catalogs   The  first  real  library  ~300BC  
  • 13. Copyright  Third  Nature,  Inc.   Discovery  of  one  tradeoff  between  clay  and  paper…   Recorded  informaDon  creates  permanence  and  instability  
  • 14. Copyright  Third  Nature,  Inc.   You  can’t  have  disconGnuous  reading*  unGl  you   have  a  random  access  technology.   *Indexing  and  encyclopedias  are  hard  in  linear  scrolls.  Hello  ISAM  
  • 15. Copyright  Third  Nature,  Inc.   Paper  Tech  v2.1:  increased  storage  density,  smaller   form  factor,  durability,  high  res  RGB  graphics  
  • 16. Copyright  Third  Nature,  Inc.   Paper  Tech  v2.2   The  change  in  prinDng   over  Dme  accelerates.   Block  prinDng  replaced  by   movable  metal  type.   The  job  of  producDon  is   faster  and  cheaper.   CommodiDzaDon  changes   the  landscape  over  the   next  200  years.   The  printed  becomes   more  important  than  the   printer.  
  • 17. Copyright  Third  Nature,  Inc.   The  Elizabethan  Era   ProducDon:  prinDng  presses   Data  management  tech:   ▪  Perfect  copies   ▪  Topical  catalogs   ▪  Font  standardizaDon   ▪  Taxonomy  ascends   InformaDon  explosion:     ▪  8M  books  in  1500   ▪  200M  by  1600   ▪  CommodiDzaDon   ▪  Overload  
  • 18. Copyright  Third  Nature,  Inc.   Be^er  embedded  metadata:  Gtle  page,  colophon,  ToC  
  • 19. The  Georgian  Era:  The  Explosion  of  Natural  Philosophy   Sharing knowledge in a larger community required common language, structure
  • 20. Copyright  Third  Nature,  Inc.   Linnaeus   Top  down  orientaDon   StaDc  structure   DescripDve  rather  than   explanatory   Taxonomic  classificaHon  
  • 21. Copyright  Third  Nature,  Inc.   Bobom  up  orientaDon   Flexible  structure   Explanatory,  descripDve   Faceted  classificaHon   Buffon  
  • 22. Copyright  Third  Nature,  Inc.   vs   vs   SQL  NoSQL   Think about why that happened
  • 23. Copyright  Third  Nature,  Inc.   The  Victorian  Era   The  powered  prinDng   informaDon  explosion:   ▪  Card  catalogs,  cross-­‐ referencing,  random   access  metadata   ▪  Universal  classificaDon   ▪  Extended  informaDon   management  debates   ▪  Trading  effort  and   flexibility  for  storage   and  retrieval   ▪  Stereotyping  
  • 24. Copyright  Third  Nature,  Inc.   Melvil  Dewey   Dewey  Decimal  System   Top  down  orientaDon   StaDc  structure   DescripDve  rather  than   explanatory   Taxonomic  classificaHon  
  • 25. Copyright  Third  Nature,  Inc.   Cuber  Expansive   ClassificaDon  System   (~1882)   Bobom  up  orientaDon   More  flexible  structure   Explanatory,  descripDve   Faceted  classificaHon   Charles  Ammi  Cu^er  
  • 26. Copyright  Third  Nature,  Inc.   vs   SQL  NoSQL  
  • 27. Copyright  Third  Nature,  Inc.   So  why  did  Linnaeus  and  Dewey  win?   Good  enough   wins  the  day    PragmaJsm   It  wasn’t  solving   the  problem  you   thought  it  was.    
  • 28. Copyright  Third  Nature,  Inc.   Summarizing   Thousands  of  years  of  thought  have  been  put  into   principles  of  organizaDon  and  use.  The  abstract    paberns   are  the  same,  only  the  implementaDon  changed.   ▪  Clay:  tablets  about  tablets,  tablets  about  what’s  in  tablets,   100X  increase  in  data  density  over  counDng  tech   ▪  Scrolls:  scrolls  about  scrolls,  scrolls  about  what’s  in  scrolls,   prepended/appended  navigaDon,  >100X  increase  in  density   ▪  Books:  books  about  books,  books  about  what’s  in  books,   embedded  internal  navigaDon,  >1000X  increase  in  density   ▪  DigiDzed  data:  similar,  far  denser,  and  different  because  it   isn’t  locked  into  physical  forms  
  • 29. Copyright  Third  Nature,  Inc.   History  is  always  the  same   Every  technology  is  a  trade:   ▪  Top  down  vs.  bobom  up   ▪  Authority  vs.  anarchy   ▪  Bureaucracy  vs.  autonomy   ▪  Control  vs.  creaDvity   ▪  Hierarchy  vs.  network   ▪  Dynamic  vs.  staDc   ▪  Power  vs.  ease   ▪  Work  up  front  vs.  postponed   In every choice, something is lost and something is gained.
  • 30. Copyright  Third  Nature,  Inc.   What  lessons  does  this  history  teach  us?   1.  InformaDon  requires  organizing  principles.   2.  Differences  in  scale  require  different  principles.   3.  There  are  mulDple  levels  of  informaDon   architecture  and  principles  of  organizaDon.     4.  At  a  key  point  in  the  adopDon  cycle,  emphasis   shins  from  collecDon  and  management  of   informaDon  to  disseminaDon  and  use.    First  we  record,  then  we  use  and  share.   Like  transacDon  processing,  query  &  analysis.  
  • 31. Copyright  Third  Nature,  Inc.   InformaGon  management  through  human  history   always  follows  the  same  pa^ern   New  technology  development   creates   New  methods  to  cope   creates   New  informaGon  scale  and  availability   creates…  
  • 32. Copyright  Third  Nature,  Inc.   Big  Data  
  • 33. Copyright  Third  Nature,  Inc.   DEALING  WITH  BIG:   SOME  SCALING  HISTORY   "The  most  amazing  achievement  of  the  computer   sonware  industry  is  its  conDnuing  cancellaDon  of   the  steady  and  staggering  gains  made  by  the   computer  hardware  industry.“  -­‐  Henry  Peteroski  
  • 34. Copyright  Third  Nature,  Inc.   Why  doesn’t  your  database  scale?  
  • 35. Copyright  Third  Nature,  Inc.   Hipster  bullshit   I  can’t  get  MySQL  to  scale   therefore   RelaDonal  databases  don’t  scale   therefore   We  must  use  NoSQL*  for  query  too   *including  Hadoop  and  related  
  • 36. Copyright  Third  Nature,  Inc.   GeneraGon  and  collecGon  always  come  first   UnDl  there’s  enough  informaDon,  general  problems  of   organizaDon  and  informaDon  architecture  don’t  appear.  
  • 37. Copyright  Third  Nature,  Inc.   The  early  days:  client/server  as  the  starGng  point   We  had  transacDon  processing  against  the  DB  all  on   the  same  machine.  Then  on  two  separate  machines.   Application
  • 38. Copyright  Third  Nature,  Inc.   Scaling  client/server   We  added  app  servers  and  pooled  connecDons.   Client App serverClient Client
  • 39. Copyright  Third  Nature,  Inc.   Scaling  client/server   Then  threw  money  at  the  problem  in  the  form  of   hardware  (made  the  database  bigger).   Client App serverClient Client
  • 40. Copyright  Third  Nature,  Inc.   Web  apps  were  a  huge  increase  in  concurrency   Architecture  changed  to  reflect  new  stateless  model.   We  had  scalability  and  availability  problems.   webapp webapp Loadbalancer Loadbalancer service service service
  • 41. Copyright  Third  Nature,  Inc.   Increasing  traffic?   Keep  adding  hardware,  make  the  DB  bigger.   Limits  reached,  performance,  scalability  and   availability  problems.   webapp webapp Loadbalancer Loadbalancer service service service
  • 42. Copyright  Third  Nature,  Inc.   Increasing  traffic?   Read-­‐only  replicas  will  save  the  day!   SDll  have  scalability  and  availability  problems.   And  now  operaDonal  overhead  and  problems.   webapp webapp Loadbalancer Loadbalancer service service service
  • 43. Copyright  Third  Nature,  Inc.   Increasing  traffic?   Sharding  seems  a  fine  thing.  But  it’s  one  leber  from…   Scaling  and  perf  beber,  overhead  and  operaDonal   complexity  high  and  worsening.     webapp webapp Loadbalancer Loadbalancer service service service Loadbalancer
  • 44. Copyright  Third  Nature,  Inc.   Increasing  traffic?   Let’s  cache  data  at  the  service  Der!   Performance  beber,  overhead  and  operaDonal   complexity  higher.     webapp webapp Loadbalancer Loadbalancer service service service Loadbalancer cachecachecache
  • 45. Copyright  Third  Nature,  Inc.   What  are  the  problems  now?   1.  More  hardware,  more  things  to  break   2.  More  management  and  administraDon   3.  More  sonware  complexity   4.  Increasing  distance  for  data  to  travel  =  latency   5.  Data  administraDon  difficult  to  impossible  
  • 46. Copyright  Third  Nature,  Inc.   Problem  solved?   Distributed  NoSQL  DB  (handles  cache,  load  balance,   data  distribuDon).  Similar  performance,  simpler   scaling,  reduced  operaDonal  problems,  simpler   applicaDon  architecture.  Finished!     webapp webapp Loadbalancer Loadbalancer service service service
  • 47. Copyright  Third  Nature,  Inc.   TANSTAAFL   When  replacing  the  old   with  the  new  (or  ignoring   the  new  over  the  old)  you   always  make  tradeoffs,   and  usually  you  won’t  see   them  for  a  long  Dme.   Technologies  are  not   perfect  replacements  for   one  another.  Onen  not   beber,  only  different.  
  • 48. Copyright  Third  Nature,  Inc.   Not  finished:  remember  the  cycle  of  history…   The  biggest  hole  in  the  prior  secDon  on  scaling  is   that  we  scaled  OLTP,  what  about  OLAP?   Queries  <>  transacDons.  
  • 49. Copyright  Third  Nature,  Inc.   Solving  query  problems   Aggregate  or  low  selecDvity  queries  were  a  problem   early  on,  when  people  wanted  to  use  the  data.   Every  report  or  query  is  a  program.   OLTP Report app Report app
  • 50. Copyright  Third  Nature,  Inc.   Increasing  data  volume   Make  it  faster  by  throwing  money  at  hardware   (sound  familiar?)   OLTP Report app Report app
  • 51. Copyright  Third  Nature,  Inc.   Increasing  data  volume   Replicas:  split  the  workload  and  tune  the  systems   based  on  their  workload.   Report app OLTP Report app
  • 52. Copyright  Third  Nature,  Inc.   Increasing  data  volume  breaks  the  old  model   Devise  a  new  architecture.     ReschemaDze  the  database,  eliminate  cyclic  joins,   selecDve  denormalizaDon,  query  generators.  But  it   takes  bulk  processing  to  reschemaDze  the  data.   Query tools OLTP OLTP OLTP ETL
  • 53. Copyright  Third  Nature,  Inc.   Increasing  data  volume   Improve  response  Dme  with  caching  in  the  query   tools,  and  by  using  MOLAP  tools  that  map  into  cache   or  memory.  Like  mainframe  reporDng.  Or…   Query tools OLTP OLTP OLTP ETL cache Query tools cache
  • 54. Copyright  Third  Nature,  Inc.   Increasing  data  volume   Parallel  processing  for  ETL.  Distributed  query   databases  for  fine  grained  high  volume  parallelism.   ETLETLETL OLTP OLTP OLTP Query tools cache Query tools cache
  • 55. Copyright  Third  Nature,  Inc.   The  architecture  looks  familiar   Two  workloads,  two  not  dissimilar  architectures:   ▪  Load-­‐balanced  front  ends   ▪  Distributed  caching  layers   ▪  Scalable  distributed  parallel  databases   But  the  nature  of  the  OLTP  and  OLAP  workloads  is   very  different.  Forcing  them  into  one  plaworm  is   almost  impossible  for  data  architecture  reasons   and  parDcularly  at  scale*  
  • 56. Copyright  Third  Nature,  Inc.   DATA  PERSISTENCE  AND  STORES   Why  would  digital  data  be  any  different  than  clay  or   scrolls  or  books?  
  • 57. Copyright  Third  Nature,  Inc.   “Big  data  is  unprecedented.”   -­‐  Anyone  involved  with  big  data  in  even  the   most  barely  percepHble  way  
  • 58. Copyright  Third  Nature,  Inc.   There’s  a  difference   between  having  no  past   and  acDvely  rejecDng  it.  
  • 59. Copyright  Third  Nature,  Inc.   MultiValue, Hierarchical PICK IMS IDS ADABAS Relational CODASYL System R (SEQUEL) SQL/DS INGRES (QUEL) Mimer Oracle RDBMS, SQL standard DB2 Teradata Informix Sybase Postgres OODBMS, ORDBMS Versant Objectivity Gemstone Informix* Oracle* MPP Query, NoSQL Netezza Paraccel Vertica MongoDB CouchBase Riak Cassandra NewSQL SciDB MonetDB NuoDB CitusDB 1960s 1980s 2000s 1970s 1990s 2010s Spanner F1
  • 60. Copyright  Third  Nature,  Inc.   A  history  of  databases  in  No-­‐taGon   1970:  NoSQL  =  We  have  no  SQL     1980:  NoSQL  =  Know  SQL     2000:  NoSQL  =  No  SQL!     2005:  NoSQL  =  Not  only  SQL     2013:  NoSQL  =  No,  SQL!   (R)DB(MS)  
  • 61. Copyright  Third  Nature,  Inc.   Tradeoffs?   “Query  opHmizaHon   is  not  rocket  science.   When  you  flunk  out  of   query  opHmizaHon,   we  make  you  go  build   rockets.”  
  • 62. Copyright  Third  Nature,  Inc.   Joins.  Wait,  there’s  more  than  one?   Wait, there’s more than one?
  • 63. Copyright  Third  Nature,  Inc.   In  NoSQL  Land,  OpGmizer  is  You!  
  • 64. Copyright  Third  Nature,  Inc.   Services  provided   Standard  API/query  layer*   TransacDon  /  consistency   Query  opDmizaDon   Data  navigaDon,  joins   Data  access   Storage  management   Database Database Tradeoffs:  In  NoSQL,  the  DBMS  is  You  Too   SQL  database   NoSQL  database   Application Application Anything  not  done  by  the  DB  becomes  a  developer’s  task.  
  • 65. Copyright  Third  Nature,  Inc.   The  relaDonal  database  is  the  franchise  technology  for  storing  and   retrieving  data,  but…   1.  Global,  staDc  schema  model   2.  No  rich  typing  system   3.  No  management  of  natural  ordering  in  data   4.  Many  are  not  a  good  fit  for  network  parallel  compuDng,  aka  cloud   5.  Limited  API  in  atomic  SQL  statement  syntax    &  simple  result  set  return   6.  Poor  developer  support  (in  languages,  in  IDEs,  in  processes)   RelaGonal:  a  good  conceptual  model,  but  a   prematurely  standardized  implementaGon  
  • 66. Copyright  Third  Nature,  Inc.   The  relaDonal  database  is  the  franchise  technology  for  storing  and   retrieving  data,  but…   1.  Global,  staDc  schema  model   2.  No  rich  typing  system   3.  No  management  of  natural  ordering  in  data   4.  Many  are  not  a  good  fit  for  network  parallel  compuDng,  aka  cloud   5.  Limited  API  in  atomic  SQL  statement  syntax    &  simple  result  set  return   6.  Poor  developer  support   RelaGonal:  a  good  conceptual  model,  but  a   prematurely  standardized  implementaGon   What  I  did  not  list:   Scalability  and  performance  
  • 67. Copyright  Third  Nature,  Inc.   BIGNESS  AND  SCALABILITY  
  • 68. Copyright  Third  Nature,  Inc.   Technology  Capability  and  Data  Volume   Source: Noumenal, Inc.
  • 69. Copyright  Third  Nature,  Inc.   You  can  make  a  database  emulate  a  KVS   If  you  map  the  shared  event  fields  to  fixed  columns  and   an  event  type,  then  use  a  varchar  or  clob  payload   column,  you  can  store  arbitrary  events  in  a  database  and   query  them  via  SQL  and  views  (or  column  funcDons),  and   do  it  all  in  a  single  table.   Arbitrary data parsed at query time using native DB features like regex or UDF Data common to all events in the database Type code used to differentiate event payload formats.
  • 70. Data  Warehouse  +   Behavioral   Singularity Data  Warehouse   Low End Enterprise-class System Contextual-Complex Analytics Deep, Seasonal, Consumable Data Sets Production Data Warehousing Large Concurrent User-base + + 150+ concurrent users 500+ concurrent users Enterprise-class System 5-10 concurrent users Structure the Unstructured Detect Patterns Commodity Hardware System 6+PB 40+PB 20+PB Analyze  &  Report   Discover  &  Explore   Data  Plagorms   Thanks to eBay for these case slides.
  • 71. Start_dt   Guid   Sess_id   Page_id   Soj   2011-­‐10-­‐18   1234   1   15   Language=en&   source=hp&   itm=i1,i2,i3,i4,i5   SELECT start_dt, guid, sess_id, page_id, NVL(e.soj, ‘itm’) AS item_list FROM event e WHERE e.start_dt = ‘2011-10-18’ AND e.page_id = 3286 /* Search Results */ Start_dt   Guid   Sess_id   Page_id   Item_list   2011-­‐10-­‐18   1234   1   15   i1,i2,i3,i4,i5   How  to  use  a  database  for  semi-­‐structured  data   First the user-defined function (UDF) Thanks to eBay for these case slides.
  • 72. WITH event (start_dt, item_list) AS (<previous SQL>) SELECT start_dt, item_id, /* Individual Item */ count(*) FROM TABLE ( /* Normalize comma delimited list */ normalize_list( start_dt, item_list, ‘,’) RETURNS(start_dt, idx, item_id) ) GROUP BY 1, 2 ORDER BY 3 DESC Start_dt   Item_id   Count(*)   2011-­‐10-­‐18   i1   555   2011-­‐10-­‐18   i2   444   2011-­‐10-­‐18   i3   333   2011-­‐10-­‐18   i4   222   2011-­‐10-­‐18   i5   111   *syntax simplified How  to  use  a  database  for  semi-­‐structured  data   Then the table function (standard ANSI SQL) Thanks to eBay for these case slides.
  • 73. Copyright  Third  Nature,  Inc.   Pricing  and  performance:  Hadoop  is  a  storage   and  processing  play,  not    a  database  play*   Source: Venturebeat With  big  data  systems,   the  cost  of  storing  data  is   an  order  of  magnitude   lower  than  with   databases  today  (but  not   the  cost  or  ability  to   query  it  back  out).   Processing  data  at  scale   is  at  least  an  order  of   magnitude  cheaper  too.   Copyright  Third  Nature,  Inc.  
  • 74. Copyright  Third  Nature,  Inc.   Bigness:  most  people  do  not  need  special  technology  Numberofpeople The distribution of data size is about normal, yet these guys set the tone of the market today. Bigness of data Copyright  Third  Nature,  Inc.  
  • 75. Copyright  Third  Nature,  Inc.   AnalyGcs:  This  is  really  raw  data  under  storage  Numberofjobs Microsoft study of 174,000 analytic jobs in their cluster: median size ??? Bigness of data Copyright  Third  Nature,  Inc.  
  • 76. Copyright  Third  Nature,  Inc.   Working  data  for  analyGcs  most  olen  not  big  Numberofjobs 14 GB Smallness of data Copyright  Third  Nature,  Inc.  
  • 77. © Third Nature Inc.© Third Nature Inc. What  makes  data  “big”?   Very  large  amounts   Hierarchical  structures   Nested  structures   Encoded  values   Non-­‐standard  (for  a  database)   types   Time  series   Deep  structure   Human  authored  text   “big”  is  beMer  off  being  defined  as  “complex”  or  “hard  to  manage”   Copyright  Third  Nature,  Inc.  
  • 78. Copyright  Third  Nature,  Inc.   Web  tracking  data  has  a  nested  structure   USER_ID 301212631165031 SESSION_ID 590387153892659 VISIT_DATE 1/10/2010  0:00 SESSION_START_DATE 1:41:44  AM PAGE_VIEW_DATE 1/10/2010  9:59 DESTINATION_URL hbps://www.phisherking.com/gins/store/LogonForm? mmc=link-­‐src-­‐email-­‐_-­‐m100109-­‐_-­‐44IOJ1-­‐_-­‐ shop&langId=-­‐1&storeId=1055&URL=BECGinListItemDisplay REFERRAL_NAME Direct REFERRAL_URL -­‐ PAGE_ID PROD_24259_CARD REL_PRODUCTS PROD_24654_CARD,  PROD_3648_FLOWERS SITE_LOCATION_NAME VALENTINE'S  DAY  MICROSITE SITE_LOCATION_ID SHOP-­‐BY-­‐HOLIDAY  VALENTINES  DAY IP_ADDRESS 67.189.110.179 BROWSER_OS_NAME MOZILLA/4.0  (COMPATIBLE;  MSIE  7.0;  AOL  9.0;  WINDOWS  NT   5.1;  TRIDENT/4.0;  GTB6;  .NET  CLR  1.1.4322) “unstructured” data embedded in the logged message: complex strings …but it’s easily unpacked into tuples
  • 79. © Third Nature Inc.© Third Nature Inc. Common  Names   ▪  Aspirin,  Acetylsalicylic  acid,  Excedrin   Structural  formulas   ▪  Commonly  used  to  communicate  between  chemists   SystemaDc  nomenclatures:   ▪  Mass  formula:  C9H8O4   ▪  SMILES:  OC(=O)C1=C(C=CC=C1)OC(=O)C   ▪  InChI:  1/C9H8O4/c1-­‐6(10)13-­‐8-­‐5-­‐3-­‐2-­‐4-­‐7(8)9(11)12/h2-­‐5H,1H3,(H, 11,12)   ▪  IUPAC:  pyrido[1",2":1',2']imidazo[4',5':5,6]pyrazino[2,3-­‐b]phenazine   They  all  refer  to  the  same  thing.   If  you  process  them  in  a  database,  how  do  you  store  them?   All  of  these  things  are  “unstructured”  data  
  • 80. Copyright  Third  Nature,  Inc.   Unstructured  data  isn’t   really  unstructured.   The  problem  is  that  this   data  is  unmodeled.   The  real  challenge  is   complexity.   We  usually  mean  text,  not  data  structures…  
  • 81. Copyright  Third  Nature,  Inc.   Pa^erns  emerge  from  lots  of  event  data   Paberns  emerge  from   the  underlying  structure   of  the  enHre  dataset.   The  paberns  are  more   interesDng  than  sums   and  counts  of  the  events.   Web  paths:  clicks  in  a   session  as  network  node   traversal.   Email:  traffic  analysis   producing  a  network   The event stream is a source for analysis, generating another set of data that is the source for different analysis.
  • 82. Copyright  Third  Nature,  Inc.   BIGNESS  AND  DATA   COMPUTATIONAL  WORKLOADS  
  • 83. Copyright  Third  Nature,  Inc.   Not  finished:  remember  the  cycle  of  history…   The  biggest  hole  in  the  prior  secDons  is  that  we   scaled  OLTP  and  OLAP  but  what  about  analyDcs?   Queries  <>  transacDons  <>  computaDons  
  • 84. Copyright  Third  Nature,  Inc.   The  three  way  workload  break   1.  OperaDonal:  OLTP  systems   2.  AnalyDc:  OLAP  systems   3.  ScienDfic:  ComputaDonal  systems   Unit  of  focus:   1.  TransacDon   2.  Query   3.  ComputaDon   Different  problems  require  different  plaworms  
  • 85. Copyright  Third  Nature,  Inc.   The  holy  grail  of  databases  under  current  market  hype   We’re  talking  mostly  about   computaGon  over  data  when   we  talk  about  “big  data”  and   analyGcs.   The  goal  is  combining  data   storage,  retrieval  and  analysis   into  one  system,  a  potenJal   mismatch  for  both  relaJonal   and  nosql. Copyright  Third  Nature,  Inc.  
  • 86. Copyright  Third  Nature,  Inc.   A  Simple  Division  of  the  AnalyGc  Problem  Space  Computation LittleLots Data volume Little Lots Big analytics, little data Specialized computing, modeling problems: supercomputing, GPUs Big analytics, big data Complex math over large data volumes requires non- relational shared nothing architectures Little analytics, little data The entry point; SAS, SMP databases, even OLAP cubes can work Little analytics, big data The BI/DW space, for the most part, done in databases mostly
  • 87. Copyright  Third  Nature,  Inc.   No  technical  soluGon  fits  all  three  axes   Big   ComputaGon   Parallel  DSLs,   HPC,  GPU,   MapReduce   Concurrency   Parallel  DBs,   nosql  stores*   Volume   Parallel  DBs,   MapReduce,   BSP     Some work along two axes, but most have a primary focus on one core component
  • 88. Copyright  Third  Nature,  Inc.   Data  infrastructure  is  a  plagorm   ▪  Any  data  –  structures,  forms   ▪  Any  latency  –in  moDon,  at  rest   ▪  Any  process  –  query,  algorithm,  engine,   transformaDon   ▪  Any  access  –  SQL,  API,  queue,  file  movement  
  • 89. Copyright  Third  Nature,  Inc.   Hadoop  &  NoSQL  AdopGon   Some  people  can’t  resist   ge„ng  the  next  new  thing   because  it’s  new.   Many  organizaDons  are  like   this,  promoDng  a  soluDon  and   hunDng  for  the  problem  that   matches  it.   Beber  to  ask  “What  is  the   problem  for  which  this   technology  is  the  answer?”  
  • 90. Copyright  Third  Nature,  Inc.   NoSQL  Will  “Fail”   Unless  some  mathemaDcally  derived  data  model  is   developed,  and  a  query  language  using  it  is  created.   Otherwise  there’s  no  standard  interfacing  model,   no  interoperability,  no  chance  for  a  tool  ecosystem   to  evolve  over  top  of  the  plaworm  –  because  there   are  no  uniform  plaworm  boundaries.   “One  logical  interface,  many  physical   implementaDons”  is  a  key  reason  why  SQL  won  the   database  wars.  This  creates  an  ecosystem.  
  • 91. Copyright  Third  Nature,  Inc.   Maybe…   These aren’t the databases we’re looking for. We need one that speaks pig latin.
  • 92. Copyright  Third  Nature,  Inc.   What’s  wrong  with  BASE?  
  • 93. Copyright  Third  Nature,  Inc.   Google  F1:  Another  EvoluGon   Distributed  SQL  database   ACID  compliance,  2PC  and  row-­‐ level  locking  (!)   Transparent  data  distribuDon   Synchronous  replicaDon  across   data  centers   Table  interleaving  (hierarchies)   Queryable  protobufs   MapReduce  access  to  underlying   data   Average  user-­‐facing  latency  of   ~200ms  with  small  deviaDon  
  • 94. Copyright  Third  Nature,  Inc.   The  big  data  revoluGon,  more  of  an  evoluGon   Be  pragmaGc,  not  dogmaGc  
  • 95. Copyright  Third  Nature,  Inc.   Conclusion  
  • 96. Copyright  Third  Nature,  Inc.   References  (things  worth  reading  on  the  way  home)  A  relaDonal  model  for  large  shared  data  banks,  CommunicaDons  of  the  ACM,  June,  1970,   hbp://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf   Column-­‐Oriented  Database  Systems,  Stavros  Harizopoulos,  Daniel  Abadi,  Peter  Boncz,  VLDB  2009  Tutorial   hbp://cs-­‐www.cs.yale.edu/homes/dna/talks/Column_Store_Tutorial_VLDB09.pdf   Nobody  ever  got  fired  for  using  Hadoop  on  a  cluster,  1st  InternaDonal  Workshop  on  Hot  Topics  in  Cloud  Data   ProcessingApril  10,  2012,  Bern,  Switzerland.   A  co-­‐RelaDonal  Model  of  Data  for  Large  Shared  Data  Banks,  ACM  Queue,  2012,   hbp://queue.acm.org/detail.cfm?id=1961297   A  query  language  for  mulDdimensional  arrays:  design,  implementaDon  and  opDmizaDon  techniques,  SIGMOD,   1996   ProbabilisDcally  Bounded  Staleness  for  PracDcal  ParDal  Quorums,  Proceedings  of  the  VLDB  Endowment,  Vol.  5,   No.  8,  hbp://vldb.org/pvldb/vol5/p776_peterbailis_vldb2012.pdf   “Amorphous  Data-­‐parallelism  in  Irregular  Algorithms”,  Keshav  Pingali  et  al   MapReduce:  Simplified  Data  Processing  on  Large  Clusters,   hbp://staDc.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/ mapreduce-­‐osdi04.pdf   Dremel:  InteracDve  Analysis  of  Web-­‐Scale  Datasets,  Proceedings  of  the  VLDB  Endowment,  Vol.  3,  No.  1,  2010   hbp://staDc.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/ 36632.pdf   Spanner:  Google’s  Globally-­‐Distributed  Database,  SIGMOD,  May,  2012,   hbp://staDc.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/es//archive/ spanner-­‐osdi2012.pdf   F1:  A  Distributed  SQL  Database  That  Scales,  Proceedings  of  the  VLDB  Endowment,  Vol.  6,  No.  11,  2013,   hbp://staDc.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/ archive/41344.pdf  
  • 97. Copyright  Third  Nature,  Inc.   CC  Image  A^ribuGons   Thanks  to  the  people  who  supplied  the  creaDve  commons  licensed  images  used  in  this  presentaDon:   shady_puppy_sales.jpg  -­‐  hbp://www.flickr.com/photos/brizzlebornandbred/5001120150   cuneiform_proto_3000bc.jpg  -­‐  hbp://www.flickr.com/photos/takomabibelot/3124619443/   cuneiform_undo.jpg  -­‐  hbp://www.flickr.com/photos/charlesDlford/2552654321/   scroll_kerouac.jpg  -­‐  hbp://www.flickr.com/photos/ari/93966538/   House  on  fire  -­‐  hbp://flickr.com/photos/oldonliner/1485881035/   Manuscripts  on  shelf  -­‐  hbp://flickr.com/photos/peterkaminski/1688635/   manuscript_illum.jpg  -­‐  hbp://www.flickr.com/photos/diorama_sky/2975796332/   manuscript_page.jpg  -­‐  hbp://www.flickr.com/photos/calliope/306564541/   subway  dc  metro    -­‐  hbp://flickr.com/photos/musaeum/509899161/  
  • 98. Copyright  Third  Nature,  Inc.   About  the  Presenter   Mark  Madsen  is  president  of  Third   Nature,  a  research  and  consulDng  firm   focused  on  building  the  infrastructure  for   analyDcs,  evidence-­‐based  management,   business  intelligence  and  data   management.  Mark  is  an  award-­‐winning   author,  architect  and  CTO  whose  work   has  been  featured  in  numerous  industry   publicaDons.  Over  the  past  ten  years   Mark  received  awards  for  his  work  from   the  American  ProducDvity  &  Quality   Center,  TDWI,  and  the  Smithsonian   InsDtute.  He  is  an  internaDonal  speaker,  a   contributor  at  Forbes  Online  and   InformaDon  Management.  For  more   informaDon  or  to  contact  Mark,  follow   @markmadsen  on  Twiber  or  visit    hbp:// ThirdNature.net    
  • 99. Copyright  Third  Nature,  Inc.   About  Third  Nature   Third Nature is a research and consulting firm focused on new and emerging technology and practices in business intelligence, analytics and performance management. If your question is related to BI, analytics, information strategy and data then you‘re at the right place. Our goal is to help companies take advantage of information-driven management practices and applications. We offer education, consulting and research services to support business and IT organizations as well as technology vendors. We fill the gap between what the industry analyst firms cover and what IT needs. We specialize in product and technology analysis, so we look at emerging technologies and markets, evaluating technology and hw it is applied rather than vendor market positions.