SlideShare a Scribd company logo
1 of 46
Download to read offline
NoSQL & BigData
Why Every NoSQL Deployment Should be Paired with Hadoop

Tugdual Grall
Couchbase
@tgrall
About	
  Me	
  
• Tugdual	
  “Tug”	
  Grall
­ Couchbase
• Technical	
  Evangelist

­ eXo
• CTO

­ Oracle
• Developer/Product	
  Manager
• Mainly	
  Java/SOA

­ Developer	
  in	
  consul@ng	
  firms

• Web
• @tgrall
• hEp://blog.grallandco.com
• tgrall
• NantesJUG	
  co-­‐founder
• Pet	
  Project	
  :
• hEp://www.resultri.com
Big	
  Data

High	
  Data	
  Variety	
  and	
  Velocity

Trillions	
  of	
  Gigabytes	
  (ZeEabytes)

2.00

1.50

1.00

0.50

0

2000

2006

2011

Source:	
  IDC	
  2011	
  Digital	
  Universe	
  Study	
  (hEp://www.emc.com/collateral/demos/microsites/emc-­‐digital-­‐universe-­‐2011/index.htm)

More	
  Flexible	
  Data	
  Model	
  Required
3
• Usually	
  when	
  people	
  talk	
  about	
  Big	
  Data	
  they	
  talk	
  about	
  capturing	
  huge	
  amounts	
  of	
  data	
  and	
  analyzing	
  it.	
  This	
  reference	
  to	
  Big	
  Data	
  is	
  
certainly	
  a	
  big	
  trend.
• But	
  Big	
  Data	
  affects	
  opera@onal	
  databases	
  in	
  a	
  big	
  way	
  as	
  well	
  but	
  for	
  a	
  different	
  set	
  of	
  reasons.
• There	
  are	
  2	
  aspects	
  of	
  Big	
  Data	
  that	
  are	
  pushing	
  people	
  toward	
  NoSQL	
  technologies.
• The	
  first	
  is	
  that	
  the	
  vast	
  majority	
  of	
  the	
  increase	
  in	
  data	
  is	
  in	
  the	
  form	
  of	
  un-­‐structured	
  or	
  semi-­‐structured	
  data.	
  	
  This	
  is	
  data	
  like	
  user-­‐
generated	
  content	
  like	
  consumer	
  recommenda@ons	
  and	
  machine	
  generated	
  data	
  like	
  log	
  files	
  and	
  website	
  click	
  data.	
  	
  Rela@onal	
  databases	
  
aren’t	
  well	
  suited	
  for	
  storing	
  this	
  type	
  of	
  data	
  while	
  NoSQL	
  technologies	
  like	
  document-­‐oriented	
  database	
  are	
  ideally	
  suited	
  for	
  this.
• The	
  second	
  is	
  that	
  applica@on	
  developers	
  are	
  finding	
  new	
  types	
  of	
  data	
  they	
  want	
  to	
  store	
  all	
  the	
  @me.	
  	
  It	
  might	
  be	
  new	
  informa@on	
  they	
  
want	
  to	
  store	
  in	
  a	
  user’s	
  account	
  profile,	
  new	
  logging	
  informa@on,	
  etc.	
  	
  The	
  point	
  is	
  that	
  what	
  developers	
  want	
  to	
  store	
  is	
  changing	
  very	
  
rapidly	
  and	
  the	
  amount	
  of	
  data	
  they	
  want	
  to	
  store	
  is	
  increasing	
  very	
  rapidly.	
  	
  The	
  result	
  is	
  that	
  developers	
  want	
  a	
  very	
  flexible	
  data	
  model	
  
that	
  they	
  can	
  evolve	
  very	
  quickly.	
  
• Rela@onal	
  databases	
  have	
  fixed	
  schemas	
  that	
  ofen	
  take	
  weeks	
  or	
  months	
  to	
  change.	
  	
  On	
  the	
  other	
  hand,	
  NoSQL	
  databases	
  are	
  schema-­‐less.	
  	
  
As	
  a	
  result,	
  you	
  can	
  far	
  more	
  easily	
  add	
  new	
  types	
  of	
  data	
  and	
  iterate	
  quickly	
  on	
  your	
  applica@on.
Big	
  Data

High	
  Data	
  Variety	
  and	
  Velocity

Trillions	
  of	
  Gigabytes	
  (ZeEabytes)

2.00

1.50

1.00

0.50

0

2000

2006

2011

Source:	
  IDC	
  2011	
  Digital	
  Universe	
  Study	
  (hEp://www.emc.com/collateral/demos/microsites/emc-­‐digital-­‐universe-­‐2011/index.htm)

More	
  Flexible	
  Data	
  Model	
  Required
3
• Usually	
  when	
  people	
  talk	
  about	
  Big	
  Data	
  they	
  talk	
  about	
  capturing	
  huge	
  amounts	
  of	
  data	
  and	
  analyzing	
  it.	
  This	
  reference	
  to	
  Big	
  Data	
  is	
  
certainly	
  a	
  big	
  trend.
• But	
  Big	
  Data	
  affects	
  opera@onal	
  databases	
  in	
  a	
  big	
  way	
  as	
  well	
  but	
  for	
  a	
  different	
  set	
  of	
  reasons.
• There	
  are	
  2	
  aspects	
  of	
  Big	
  Data	
  that	
  are	
  pushing	
  people	
  toward	
  NoSQL	
  technologies.
• The	
  first	
  is	
  that	
  the	
  vast	
  majority	
  of	
  the	
  increase	
  in	
  data	
  is	
  in	
  the	
  form	
  of	
  un-­‐structured	
  or	
  semi-­‐structured	
  data.	
  	
  This	
  is	
  data	
  like	
  user-­‐
generated	
  content	
  like	
  consumer	
  recommenda@ons	
  and	
  machine	
  generated	
  data	
  like	
  log	
  files	
  and	
  website	
  click	
  data.	
  	
  Rela@onal	
  databases	
  
aren’t	
  well	
  suited	
  for	
  storing	
  this	
  type	
  of	
  data	
  while	
  NoSQL	
  technologies	
  like	
  document-­‐oriented	
  database	
  are	
  ideally	
  suited	
  for	
  this.
• The	
  second	
  is	
  that	
  applica@on	
  developers	
  are	
  finding	
  new	
  types	
  of	
  data	
  they	
  want	
  to	
  store	
  all	
  the	
  @me.	
  	
  It	
  might	
  be	
  new	
  informa@on	
  they	
  
want	
  to	
  store	
  in	
  a	
  user’s	
  account	
  profile,	
  new	
  logging	
  informa@on,	
  etc.	
  	
  The	
  point	
  is	
  that	
  what	
  developers	
  want	
  to	
  store	
  is	
  changing	
  very	
  
rapidly	
  and	
  the	
  amount	
  of	
  data	
  they	
  want	
  to	
  store	
  is	
  increasing	
  very	
  rapidly.	
  	
  The	
  result	
  is	
  that	
  developers	
  want	
  a	
  very	
  flexible	
  data	
  model	
  
that	
  they	
  can	
  evolve	
  very	
  quickly.	
  
• Rela@onal	
  databases	
  have	
  fixed	
  schemas	
  that	
  ofen	
  take	
  weeks	
  or	
  months	
  to	
  change.	
  	
  On	
  the	
  other	
  hand,	
  NoSQL	
  databases	
  are	
  schema-­‐less.	
  	
  
As	
  a	
  result,	
  you	
  can	
  far	
  more	
  easily	
  add	
  new	
  types	
  of	
  data	
  and	
  iterate	
  quickly	
  on	
  your	
  applica@on.
Big	
  Data

High	
  Data	
  Variety	
  and	
  Velocity

Trillions	
  of	
  Gigabytes	
  (ZeEabytes)

2.00

1.50

1.00

0.50

0

Structured	
  Data

2000

2006

2011

Source:	
  IDC	
  2011	
  Digital	
  Universe	
  Study	
  (hEp://www.emc.com/collateral/demos/microsites/emc-­‐digital-­‐universe-­‐2011/index.htm)

More	
  Flexible	
  Data	
  Model	
  Required
3
• Usually	
  when	
  people	
  talk	
  about	
  Big	
  Data	
  they	
  talk	
  about	
  capturing	
  huge	
  amounts	
  of	
  data	
  and	
  analyzing	
  it.	
  This	
  reference	
  to	
  Big	
  Data	
  is	
  
certainly	
  a	
  big	
  trend.
• But	
  Big	
  Data	
  affects	
  opera@onal	
  databases	
  in	
  a	
  big	
  way	
  as	
  well	
  but	
  for	
  a	
  different	
  set	
  of	
  reasons.
• There	
  are	
  2	
  aspects	
  of	
  Big	
  Data	
  that	
  are	
  pushing	
  people	
  toward	
  NoSQL	
  technologies.
• The	
  first	
  is	
  that	
  the	
  vast	
  majority	
  of	
  the	
  increase	
  in	
  data	
  is	
  in	
  the	
  form	
  of	
  un-­‐structured	
  or	
  semi-­‐structured	
  data.	
  	
  This	
  is	
  data	
  like	
  user-­‐
generated	
  content	
  like	
  consumer	
  recommenda@ons	
  and	
  machine	
  generated	
  data	
  like	
  log	
  files	
  and	
  website	
  click	
  data.	
  	
  Rela@onal	
  databases	
  
aren’t	
  well	
  suited	
  for	
  storing	
  this	
  type	
  of	
  data	
  while	
  NoSQL	
  technologies	
  like	
  document-­‐oriented	
  database	
  are	
  ideally	
  suited	
  for	
  this.
• The	
  second	
  is	
  that	
  applica@on	
  developers	
  are	
  finding	
  new	
  types	
  of	
  data	
  they	
  want	
  to	
  store	
  all	
  the	
  @me.	
  	
  It	
  might	
  be	
  new	
  informa@on	
  they	
  
want	
  to	
  store	
  in	
  a	
  user’s	
  account	
  profile,	
  new	
  logging	
  informa@on,	
  etc.	
  	
  The	
  point	
  is	
  that	
  what	
  developers	
  want	
  to	
  store	
  is	
  changing	
  very	
  
rapidly	
  and	
  the	
  amount	
  of	
  data	
  they	
  want	
  to	
  store	
  is	
  increasing	
  very	
  rapidly.	
  	
  The	
  result	
  is	
  that	
  developers	
  want	
  a	
  very	
  flexible	
  data	
  model	
  
that	
  they	
  can	
  evolve	
  very	
  quickly.	
  
• Rela@onal	
  databases	
  have	
  fixed	
  schemas	
  that	
  ofen	
  take	
  weeks	
  or	
  months	
  to	
  change.	
  	
  On	
  the	
  other	
  hand,	
  NoSQL	
  databases	
  are	
  schema-­‐less.	
  	
  
As	
  a	
  result,	
  you	
  can	
  far	
  more	
  easily	
  add	
  new	
  types	
  of	
  data	
  and	
  iterate	
  quickly	
  on	
  your	
  applica@on.
Big	
  Data

High	
  Data	
  Variety	
  and	
  Velocity

Trillions	
  of	
  Gigabytes	
  (ZeEabytes)

2.00

1.50

Unstructured	
  and	
  Semi-­‐
Structured	
  Data

1.00

0.50

0

Text,	
  Log	
  Files,	
  Click	
  
Streams,	
  Blogs,	
  
Tweets,	
  Audio,	
  
Video,	
  etc.

Structured	
  Data

2000

2006

2011

Source:	
  IDC	
  2011	
  Digital	
  Universe	
  Study	
  (hEp://www.emc.com/collateral/demos/microsites/emc-­‐digital-­‐universe-­‐2011/index.htm)

More	
  Flexible	
  Data	
  Model	
  Required
3
• Usually	
  when	
  people	
  talk	
  about	
  Big	
  Data	
  they	
  talk	
  about	
  capturing	
  huge	
  amounts	
  of	
  data	
  and	
  analyzing	
  it.	
  This	
  reference	
  to	
  Big	
  Data	
  is	
  
certainly	
  a	
  big	
  trend.
• But	
  Big	
  Data	
  affects	
  opera@onal	
  databases	
  in	
  a	
  big	
  way	
  as	
  well	
  but	
  for	
  a	
  different	
  set	
  of	
  reasons.
• There	
  are	
  2	
  aspects	
  of	
  Big	
  Data	
  that	
  are	
  pushing	
  people	
  toward	
  NoSQL	
  technologies.
• The	
  first	
  is	
  that	
  the	
  vast	
  majority	
  of	
  the	
  increase	
  in	
  data	
  is	
  in	
  the	
  form	
  of	
  un-­‐structured	
  or	
  semi-­‐structured	
  data.	
  	
  This	
  is	
  data	
  like	
  user-­‐
generated	
  content	
  like	
  consumer	
  recommenda@ons	
  and	
  machine	
  generated	
  data	
  like	
  log	
  files	
  and	
  website	
  click	
  data.	
  	
  Rela@onal	
  databases	
  
aren’t	
  well	
  suited	
  for	
  storing	
  this	
  type	
  of	
  data	
  while	
  NoSQL	
  technologies	
  like	
  document-­‐oriented	
  database	
  are	
  ideally	
  suited	
  for	
  this.
• The	
  second	
  is	
  that	
  applica@on	
  developers	
  are	
  finding	
  new	
  types	
  of	
  data	
  they	
  want	
  to	
  store	
  all	
  the	
  @me.	
  	
  It	
  might	
  be	
  new	
  informa@on	
  they	
  
want	
  to	
  store	
  in	
  a	
  user’s	
  account	
  profile,	
  new	
  logging	
  informa@on,	
  etc.	
  	
  The	
  point	
  is	
  that	
  what	
  developers	
  want	
  to	
  store	
  is	
  changing	
  very	
  
rapidly	
  and	
  the	
  amount	
  of	
  data	
  they	
  want	
  to	
  store	
  is	
  increasing	
  very	
  rapidly.	
  	
  The	
  result	
  is	
  that	
  developers	
  want	
  a	
  very	
  flexible	
  data	
  model	
  
that	
  they	
  can	
  evolve	
  very	
  quickly.	
  
• Rela@onal	
  databases	
  have	
  fixed	
  schemas	
  that	
  ofen	
  take	
  weeks	
  or	
  months	
  to	
  change.	
  	
  On	
  the	
  other	
  hand,	
  NoSQL	
  databases	
  are	
  schema-­‐less.	
  	
  
As	
  a	
  result,	
  you	
  can	
  far	
  more	
  easily	
  add	
  new	
  types	
  of	
  data	
  and	
  iterate	
  quickly	
  on	
  your	
  applica@on.
Opera@onal	
  vs.	
  Analy@c	
  Databases
AnalyOc
Databases

Real-­‐Ome,	
  
InteracOve	
  Databases

NoSQL
Get	
  insights	
  from	
  
data

Fast	
  access	
  
to	
  data

Couchbase
Mongo

Cloudera
Hortonworks
4

• There	
  are	
  two	
  types	
  of	
  databases.	
  Each	
  is	
  focused	
  on	
  a	
  very	
  different	
  problem.
• AnalyOc	
  databases	
  were	
  referred	
  to	
  in	
  the	
  past	
  as	
  OLAP	
  databases.	
  	
  They	
  are	
  focused	
  on	
  looking	
  through	
  every	
  record	
  in	
  a	
  huge	
  database	
  to	
  
answer	
  a	
  ques@on	
  or	
  gain	
  an	
  insight	
  about	
  the	
  data	
  contained	
  in	
  it.	
  	
  These	
  analyses	
  are	
  batch	
  processes	
  that	
  access	
  every	
  piece	
  of	
  data	
  in	
  the	
  
database,	
  are	
  very	
  “read”	
  heavy,	
  and	
  produce	
  results	
  in	
  seconds,	
  minutes,	
  or	
  someOmes	
  days.	
  For	
  analy@c	
  databases,	
  “real	
  @me”	
  means	
  an	
  
analysis	
  takes	
  a	
  few	
  seconds	
  to	
  run.
• Real-­‐Ome	
  interac@ve	
  databases	
  are	
  ofen	
  referred	
  to	
  as	
  operaOonal	
  databases.	
  	
  They	
  store	
  a	
  lot	
  of	
  data	
  but	
  usually	
  much	
  less	
  than	
  an	
  
analy@c	
  database.
• They	
  must	
  provide	
  access	
  to	
  individual	
  records	
  in	
  a	
  database	
  in	
  milliseconds	
  so	
  that	
  users	
  of	
  an	
  applica@on	
  get	
  good	
  response	
  @me.
• Since	
  the	
  requirements	
  of	
  each	
  database	
  is	
  very	
  different,	
  the	
  architectures	
  and	
  capabili@es	
  of	
  each	
  are	
  very	
  different	
  as	
  well.
• When	
  I	
  refer	
  to	
  NoSQL	
  in	
  my	
  presenta@on,	
  I	
  am	
  referring	
  to	
  real-­‐Ome,	
  interacOve	
  databases.	
  	
  This	
  is	
  the	
  type	
  of	
  NoSQL	
  database	
  Couchbase	
  
provides.
49%
35%

29%
16%

Lack	
  of	
  flexibility/
rigid	
  schemas

Inability	
  to	
  scale	
   Performance	
  challenges
out	
  data

Source:	
  Couchbase	
  Survey,	
  December	
  2011,	
  n	
  =	
  1351.

Cost

12%

11%

All	
  of	
  these

Other
NoSQL	
  catalog
Cache
(memory	
  only)

Key-­‐Value

Data	
  Structure

Memcached

Document

Column

Graph

Redis

Couchbase

Cassandra

Neo4j

MongoDB

HBase

Database
(memory/disk)

Coherence

Membase

InfiniteGraph
Use	
  Cases
Key	
  Value

•	
  Session	
  Management
•	
  User	
  Profile/Preferences
•	
  Shopping	
  Cart

Document

•	
  Event	
  Logging
•	
  Content	
  Management	
  
•	
  Web	
  AnalyOcs
•	
  E-­‐Commerce	
  ApplicaOon

Columns

•	
  Event	
  Logging
•	
  Content	
  Management
•	
  Counters

Graph

•	
  Connected	
  Data	
  /	
  	
  Social	
  Networks
•	
  RouOng,	
  Dispatch
•	
  RecommendaOons	
  based	
  on	
  Social	
  Graph
Hadoop
What	
  is	
  Hadoop?
• Highly	
  scalable
• Unstructured	
  data
• Open	
  source
• Big	
  Data	
  OperaOng	
  System
• Changing	
  the	
  World	
  One	
  Petabyte	
  at	
  a	
  Time
What	
  is	
  Hadoop?
• Simplest	
  unit	
  of	
  compute	
  and	
  storage

Disks
CPU

Application
Data
What	
  is	
  Hadoop?
• And	
  when	
  it	
  grows?

Application

Data
What	
  is	
  Hadoop?
• And	
  when	
  it	
  grows	
  more?
What	
  is	
  Hadoop?
• NoSQL	
  to	
  the	
  rescue

Application

Data
What	
  is	
  Hadoop?
• Hadoop	
  is	
  a	
  different	
  paradigm

Application
Data
Hadoop is not a “NoSQL Database” but more a set of tools to work with BigData:
the ultimate Swiss Army Knife to deal with VERY VERY large volume of data

Oozie: Workflow, coordination
Sqoop : Data connector to import/export data
Hive : SQL-Like interface
Pig : High level programming language
Mahout : Machine learning library
Whirr : Hadoop management tools for cloud services
Flume : Aggregator
Map Reduce : Framework to process large volume of data
HBase : Key Value data store
Zookeeper : Centralized configuration management
HDFS : Distributed file system
Hadoop	
  and	
  NoSQL
Ad	
  and	
  offer	
  targeOng
40	
  milliseconds	
  to	
  respond	
  with	
  
the	
  decision.

3

profiles,	
  real	
  @me	
  campaign	
  
sta@s@cs

2
1

profiles,	
  campaigns

events

17
Ad	
  and	
  offer	
  targeOng
40	
  milliseconds	
  to	
  respond	
  with	
  
the	
  decision.

3

profiles,	
  real	
  @me	
  campaign	
  
sta@s@cs

2
1

profiles,	
  campaigns

events

17
Moving	
  Parts

Ad Targeting
Platform

Couchbase Server Cluster

sqoop export

Logs
Logs
Logs
Logs
Logs

flume
flow

sqoop import
Hadoop Cluster

18
Content	
  &	
  RecommendaOon	
  TargeOng

3&
make&&
recommenda2ons&

Content
Oriented Site

1&
events&

2&
user&profiles&

19

Legacy Relational
Database
Content	
  &	
  RecommendaOon	
  TargeOng

3&
make&&
recommenda2ons&

Content
Oriented Site

1&
events&

2&
user&profiles&

19

Legacy Relational
Database
Moving	
  Parts

In order to keep up with changing needs on
richer, more targeted content that is delivered
to larger and larger audiences very quickly,
data behind content driven sites is shifting to
Couchbase.

Content Driven
Web Site

Original RDBMS

Couchbase Server Cluster
Logs
Logs
Logs
Logs
Logs

flume
flow

sqoop import

Hadoop excels at complex analytics which
may involve multiple steps of processing
which incorporate a number of different data
sources.

sqoop export
Hadoop Cluster

20

sqoop import
Sqoop	
  :	
  What	
  is	
  this?
What	
  is	
  Sqoop?
Sqoop is a tool designed to transfer data between Hadoop and relational
databases.
You can use Sqoop to import data from a relational database management
system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File
System (HDFS), transform the data in Hadoop MapReduce, and then
export the data back into an RDBMS.

sqoop.apache.org

22
What	
  is	
  Sqoop?
• Traditional ETL

T

Data

Application

23

Data
What	
  is	
  Sqoop?
• A different paradigm

Applicatio
n
Data
Data

24
What	
  is	
  Sqoop?
• A very scalable different paradigm

Application
Data

Application
Data

Application
Data

Data

25
What	
  is	
  Sqoop?
• Where did the Transform go?

TTT

TTT

TTT

TTT

Application
Data

26
What	
  is	
  Sqoop?
• Sqoop	
  “SQL-­‐Hadoop”
­ Default	
  connec@on	
  is	
  via	
  JDBC

• Lots	
  of	
  custom	
  connectors
­ Couchbase,	
  VoltDB,	
  Ver@ca
­ Teradata,	
  Netezza
­ Oracle,	
  MySQL,	
  Postgres
Sqoop	
  :	
  Import
Sqoop	
  :	
  Import

sqoop import --connect jdbc:mysql://rdbms1.demo.com/CRM
--table customers
Sqoop	
  :	
  Export
Sqoop	
  :	
  Export

sqoop export --connect jdbc:mysql://rdbms1.demo.com/ANALYTICS
--table sales
--export-dir /user/hive/warehouse/zip_profits
--input-fields-terminated-by '0001'
Sqoop	
  :	
  Import
Sqoop	
  :	
  Import

sqoop import –-connect http://localhost:8091/pools
--table DUMP
Sqoop	
  :	
  Import
Metadata
Sqoop	
  
Client
Launches

Map

Map

Map

HDFS

HDFS

HDFS

MapReduceJob
Sqoop	
  :	
  Import
Metadata
Sqoop	
  
Client
Launches

Map

Map

Map

HDFS

HDFS

HDFS

MapReduceJob
Sqoop	
  :	
  Export
Sqoop	
  :	
  Export

sqoop export --connect http://localhost:8091/pools
--table DUMP
--export-dir /user/hive/profiles/recommendation
--username social
Sqoop	
  :	
  Export
MapReduceJob
HDFS

HDFS

HDFS

Map

Map

Map

Launches

Sqoop	
  
Client
Metadata
Sqoop	
  :	
  Export
MapReduceJob
HDFS

HDFS

HDFS

Map

Map

Map

Launches

Sqoop	
  
Client
Metadata
DemonstraOon
NoSQL & BigData
Why Every NoSQL Deployment Should be Paired with Hadoop

Tugdual Grall
Couchbase
@tgrall

Q&A

More Related Content

What's hot

Denodo DataFest 2016: What’s New in Denodo Platform – Demo and Roadmap
Denodo DataFest 2016: What’s New in Denodo Platform – Demo and RoadmapDenodo DataFest 2016: What’s New in Denodo Platform – Demo and Roadmap
Denodo DataFest 2016: What’s New in Denodo Platform – Demo and RoadmapDenodo
 
Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Aravindharamanan S
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeeling Cheung
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyNishant Gandhi
 
Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...
Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...
Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...Denodo
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019DataKitchen
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data LakeCaserta
 
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesData Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesDenodo
 
Accelerating Time to Research Using CloudBank
Accelerating Time to Research Using CloudBankAccelerating Time to Research Using CloudBank
Accelerating Time to Research Using CloudBankSanjay Padhi, Ph.D
 
Decision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great DataDecision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great DataDLT Solutions
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsKamalika Dutta
 
5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance 5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance Qubole
 
The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...
The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...
The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...Denodo
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overviewNitesh Ghosh
 
Technical Demonstration - Denodo Platform 7.0
Technical Demonstration - Denodo Platform 7.0Technical Demonstration - Denodo Platform 7.0
Technical Demonstration - Denodo Platform 7.0Denodo
 
Big Data & Oracle Technologies
Big Data & Oracle TechnologiesBig Data & Oracle Technologies
Big Data & Oracle TechnologiesOleksii Movchaniuk
 

What's hot (20)

Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Denodo DataFest 2016: What’s New in Denodo Platform – Demo and Roadmap
Denodo DataFest 2016: What’s New in Denodo Platform – Demo and RoadmapDenodo DataFest 2016: What’s New in Denodo Platform – Demo and Roadmap
Denodo DataFest 2016: What’s New in Denodo Platform – Demo and Roadmap
 
Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
 
Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...
Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...
Analyst Keynote: Delivering Faster Insights with a Logical Data Fabric in a H...
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data Lake
 
Destroying Data Silos
Destroying Data SilosDestroying Data Silos
Destroying Data Silos
 
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesData Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
 
Taming Big Data With Modern Software Architecture
Taming Big Data  With Modern Software ArchitectureTaming Big Data  With Modern Software Architecture
Taming Big Data With Modern Software Architecture
 
Accelerating Time to Research Using CloudBank
Accelerating Time to Research Using CloudBankAccelerating Time to Research Using CloudBank
Accelerating Time to Research Using CloudBank
 
Decision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great DataDecision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great Data
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
 
5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance 5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance
 
The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...
The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...
The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
 
Technical Demonstration - Denodo Platform 7.0
Technical Demonstration - Denodo Platform 7.0Technical Demonstration - Denodo Platform 7.0
Technical Demonstration - Denodo Platform 7.0
 
Big Data & Oracle Technologies
Big Data & Oracle TechnologiesBig Data & Oracle Technologies
Big Data & Oracle Technologies
 

Viewers also liked

Christmas CTF 보안대회 수상팀 문제풀이서(팀명:구운순살치즈치킨)
Christmas CTF 보안대회 수상팀 문제풀이서(팀명:구운순살치즈치킨)Christmas CTF 보안대회 수상팀 문제풀이서(팀명:구운순살치즈치킨)
Christmas CTF 보안대회 수상팀 문제풀이서(팀명:구운순살치즈치킨)NAVER D2
 
[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - HTML, Android Animation
[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - HTML, Android Animation[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - HTML, Android Animation
[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - HTML, Android AnimationNAVER D2
 
2015 한양대학교 프로그래밍 경시대회 - beginner division
2015 한양대학교 프로그래밍 경시대회 - beginner division2015 한양대학교 프로그래밍 경시대회 - beginner division
2015 한양대학교 프로그래밍 경시대회 - beginner divisionNAVER D2
 
2015 한양대학교 프로그래밍 경시대회 - advanced division
2015 한양대학교 프로그래밍 경시대회 - advanced division2015 한양대학교 프로그래밍 경시대회 - advanced division
2015 한양대학교 프로그래밍 경시대회 - advanced divisionNAVER D2
 
한양대학교 ALOHA - 봄내전대회_C언어반
 한양대학교 ALOHA - 봄내전대회_C언어반 한양대학교 ALOHA - 봄내전대회_C언어반
한양대학교 ALOHA - 봄내전대회_C언어반NAVER D2
 
한양대학교 ALOHA - 봄내전대회_알고리즘반
한양대학교 ALOHA - 봄내전대회_알고리즘반한양대학교 ALOHA - 봄내전대회_알고리즘반
한양대학교 ALOHA - 봄내전대회_알고리즘반NAVER D2
 
[D2CAMPUS] Algorithm tips - ALGOS
[D2CAMPUS] Algorithm tips - ALGOS[D2CAMPUS] Algorithm tips - ALGOS
[D2CAMPUS] Algorithm tips - ALGOSNAVER D2
 
[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - java OOM, Reference API
[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - java OOM, Reference API[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - java OOM, Reference API
[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - java OOM, Reference APINAVER D2
 
Memcached의 확장성 개선
Memcached의 확장성 개선Memcached의 확장성 개선
Memcached의 확장성 개선NAVER D2
 
[D2 CAMPUS] 분야별 모임 '보안' 발표자료
[D2 CAMPUS] 분야별 모임 '보안' 발표자료[D2 CAMPUS] 분야별 모임 '보안' 발표자료
[D2 CAMPUS] 분야별 모임 '보안' 발표자료NAVER D2
 
swig를 이용한 C++ 랩핑
swig를 이용한 C++ 랩핑swig를 이용한 C++ 랩핑
swig를 이용한 C++ 랩핑NAVER D2
 
[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - OkHttp
[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - OkHttp[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - OkHttp
[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - OkHttpNAVER D2
 
[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - Http Request
[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - Http Request[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - Http Request
[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - Http RequestNAVER D2
 
Django에서 websocket을 사용하는 방법
Django에서 websocket을 사용하는 방법Django에서 websocket을 사용하는 방법
Django에서 websocket을 사용하는 방법NAVER D2
 
[D2 CAMPUS] 2016 한양대학교 프로그래밍 경시대회 문제
[D2 CAMPUS] 2016 한양대학교 프로그래밍 경시대회 문제[D2 CAMPUS] 2016 한양대학교 프로그래밍 경시대회 문제
[D2 CAMPUS] 2016 한양대학교 프로그래밍 경시대회 문제NAVER D2
 
[D2 CAMPUS] 숭실대 SCCC 프로그래밍 경시대회 문제
[D2 CAMPUS] 숭실대 SCCC 프로그래밍 경시대회 문제[D2 CAMPUS] 숭실대 SCCC 프로그래밍 경시대회 문제
[D2 CAMPUS] 숭실대 SCCC 프로그래밍 경시대회 문제NAVER D2
 
파이어베이스 네이버 밋업발표
파이어베이스 네이버 밋업발표파이어베이스 네이버 밋업발표
파이어베이스 네이버 밋업발표NAVER D2
 
[D2 CAMPUS] 부산대 Alcall 프로그래밍 경시대회 문제
[D2 CAMPUS] 부산대 Alcall 프로그래밍 경시대회 문제[D2 CAMPUS] 부산대 Alcall 프로그래밍 경시대회 문제
[D2 CAMPUS] 부산대 Alcall 프로그래밍 경시대회 문제NAVER D2
 
[D2 CAMPUS] 부산대 Alcall 프로그래밍 경시대회 문제 풀이
[D2 CAMPUS] 부산대 Alcall 프로그래밍 경시대회 문제 풀이[D2 CAMPUS] 부산대 Alcall 프로그래밍 경시대회 문제 풀이
[D2 CAMPUS] 부산대 Alcall 프로그래밍 경시대회 문제 풀이NAVER D2
 
개알못의 오픈소스이야기 - 이상준님
개알못의 오픈소스이야기 - 이상준님개알못의 오픈소스이야기 - 이상준님
개알못의 오픈소스이야기 - 이상준님NAVER D2
 

Viewers also liked (20)

Christmas CTF 보안대회 수상팀 문제풀이서(팀명:구운순살치즈치킨)
Christmas CTF 보안대회 수상팀 문제풀이서(팀명:구운순살치즈치킨)Christmas CTF 보안대회 수상팀 문제풀이서(팀명:구운순살치즈치킨)
Christmas CTF 보안대회 수상팀 문제풀이서(팀명:구운순살치즈치킨)
 
[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - HTML, Android Animation
[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - HTML, Android Animation[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - HTML, Android Animation
[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - HTML, Android Animation
 
2015 한양대학교 프로그래밍 경시대회 - beginner division
2015 한양대학교 프로그래밍 경시대회 - beginner division2015 한양대학교 프로그래밍 경시대회 - beginner division
2015 한양대학교 프로그래밍 경시대회 - beginner division
 
2015 한양대학교 프로그래밍 경시대회 - advanced division
2015 한양대학교 프로그래밍 경시대회 - advanced division2015 한양대학교 프로그래밍 경시대회 - advanced division
2015 한양대학교 프로그래밍 경시대회 - advanced division
 
한양대학교 ALOHA - 봄내전대회_C언어반
 한양대학교 ALOHA - 봄내전대회_C언어반 한양대학교 ALOHA - 봄내전대회_C언어반
한양대학교 ALOHA - 봄내전대회_C언어반
 
한양대학교 ALOHA - 봄내전대회_알고리즘반
한양대학교 ALOHA - 봄내전대회_알고리즘반한양대학교 ALOHA - 봄내전대회_알고리즘반
한양대학교 ALOHA - 봄내전대회_알고리즘반
 
[D2CAMPUS] Algorithm tips - ALGOS
[D2CAMPUS] Algorithm tips - ALGOS[D2CAMPUS] Algorithm tips - ALGOS
[D2CAMPUS] Algorithm tips - ALGOS
 
[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - java OOM, Reference API
[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - java OOM, Reference API[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - java OOM, Reference API
[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - java OOM, Reference API
 
Memcached의 확장성 개선
Memcached의 확장성 개선Memcached의 확장성 개선
Memcached의 확장성 개선
 
[D2 CAMPUS] 분야별 모임 '보안' 발표자료
[D2 CAMPUS] 분야별 모임 '보안' 발표자료[D2 CAMPUS] 분야별 모임 '보안' 발표자료
[D2 CAMPUS] 분야별 모임 '보안' 발표자료
 
swig를 이용한 C++ 랩핑
swig를 이용한 C++ 랩핑swig를 이용한 C++ 랩핑
swig를 이용한 C++ 랩핑
 
[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - OkHttp
[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - OkHttp[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - OkHttp
[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - OkHttp
 
[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - Http Request
[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - Http Request[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - Http Request
[D2 CAMPUS] 안드로이드 오픈소스 스터디자료 - Http Request
 
Django에서 websocket을 사용하는 방법
Django에서 websocket을 사용하는 방법Django에서 websocket을 사용하는 방법
Django에서 websocket을 사용하는 방법
 
[D2 CAMPUS] 2016 한양대학교 프로그래밍 경시대회 문제
[D2 CAMPUS] 2016 한양대학교 프로그래밍 경시대회 문제[D2 CAMPUS] 2016 한양대학교 프로그래밍 경시대회 문제
[D2 CAMPUS] 2016 한양대학교 프로그래밍 경시대회 문제
 
[D2 CAMPUS] 숭실대 SCCC 프로그래밍 경시대회 문제
[D2 CAMPUS] 숭실대 SCCC 프로그래밍 경시대회 문제[D2 CAMPUS] 숭실대 SCCC 프로그래밍 경시대회 문제
[D2 CAMPUS] 숭실대 SCCC 프로그래밍 경시대회 문제
 
파이어베이스 네이버 밋업발표
파이어베이스 네이버 밋업발표파이어베이스 네이버 밋업발표
파이어베이스 네이버 밋업발표
 
[D2 CAMPUS] 부산대 Alcall 프로그래밍 경시대회 문제
[D2 CAMPUS] 부산대 Alcall 프로그래밍 경시대회 문제[D2 CAMPUS] 부산대 Alcall 프로그래밍 경시대회 문제
[D2 CAMPUS] 부산대 Alcall 프로그래밍 경시대회 문제
 
[D2 CAMPUS] 부산대 Alcall 프로그래밍 경시대회 문제 풀이
[D2 CAMPUS] 부산대 Alcall 프로그래밍 경시대회 문제 풀이[D2 CAMPUS] 부산대 Alcall 프로그래밍 경시대회 문제 풀이
[D2 CAMPUS] 부산대 Alcall 프로그래밍 경시대회 문제 풀이
 
개알못의 오픈소스이야기 - 이상준님
개알못의 오픈소스이야기 - 이상준님개알못의 오픈소스이야기 - 이상준님
개알못의 오픈소스이야기 - 이상준님
 

Similar to (Tugdual grall) no sql-hadoop

Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsCaserta
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataSpringPeople
 
Real Time Analytics
Real Time AnalyticsReal Time Analytics
Real Time AnalyticsMohsin Hakim
 
Real Time Analytics
Real Time AnalyticsReal Time Analytics
Real Time AnalyticsMohsin Hakim
 
Current trends in dbms
Current trends in dbmsCurrent trends in dbms
Current trends in dbmsDaisy Joy
 
Introduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLIntroduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLTushar Shende
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreSoftweb Solutions
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeSaurabh K. Gupta
 
Achieve data democracy in data lake with data integration
Achieve data democracy in data lake with data integration Achieve data democracy in data lake with data integration
Achieve data democracy in data lake with data integration Saurabh K. Gupta
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1RUHULAMINHAZARIKA
 
No sql databases new millennium database for big data, big users, cloud compu...
No sql databases new millennium database for big data, big users, cloud compu...No sql databases new millennium database for big data, big users, cloud compu...
No sql databases new millennium database for big data, big users, cloud compu...eSAT Publishing House
 
Big data seminor
Big data seminorBig data seminor
Big data seminorberasrujana
 

Similar to (Tugdual grall) no sql-hadoop (20)

Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment Options
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Real Time Analytics
Real Time AnalyticsReal Time Analytics
Real Time Analytics
 
Real Time Analytics
Real Time AnalyticsReal Time Analytics
Real Time Analytics
 
Current trends in dbms
Current trends in dbmsCurrent trends in dbms
Current trends in dbms
 
Introduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLIntroduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQL
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Big data
Big dataBig data
Big data
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data Lake
 
Big data rmoug
Big data rmougBig data rmoug
Big data rmoug
 
Achieve data democracy in data lake with data integration
Achieve data democracy in data lake with data integration Achieve data democracy in data lake with data integration
Achieve data democracy in data lake with data integration
 
Big data.pptx
Big data.pptxBig data.pptx
Big data.pptx
 
Applying Big Data
Applying Big DataApplying Big Data
Applying Big Data
 
BigData.pptx
BigData.pptxBigData.pptx
BigData.pptx
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1
 
No sql databases new millennium database for big data, big users, cloud compu...
No sql databases new millennium database for big data, big users, cloud compu...No sql databases new millennium database for big data, big users, cloud compu...
No sql databases new millennium database for big data, big users, cloud compu...
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Big data seminor
Big data seminorBig data seminor
Big data seminor
 

More from NAVER D2

[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다NAVER D2
 
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...NAVER D2
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기NAVER D2
 
[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발NAVER D2
 
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈NAVER D2
 
[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&ANAVER D2
 
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기NAVER D2
 
[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep LearningNAVER D2
 
[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applicationsNAVER D2
 
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingOld version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingNAVER D2
 
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지NAVER D2
 
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기NAVER D2
 
[224]네이버 검색과 개인화
[224]네이버 검색과 개인화[224]네이버 검색과 개인화
[224]네이버 검색과 개인화NAVER D2
 
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)NAVER D2
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기NAVER D2
 
[213] Fashion Visual Search
[213] Fashion Visual Search[213] Fashion Visual Search
[213] Fashion Visual SearchNAVER D2
 
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화NAVER D2
 
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지NAVER D2
 
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터NAVER D2
 
[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?NAVER D2
 

More from NAVER D2 (20)

[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다
 
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
 
[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발
 
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
 
[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A
 
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기
 
[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning
 
[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications
 
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingOld version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
 
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
 
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
 
[224]네이버 검색과 개인화
[224]네이버 검색과 개인화[224]네이버 검색과 개인화
[224]네이버 검색과 개인화
 
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
 
[213] Fashion Visual Search
[213] Fashion Visual Search[213] Fashion Visual Search
[213] Fashion Visual Search
 
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화
 
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
 
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
 
[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?
 

Recently uploaded

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 

Recently uploaded (20)

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 

(Tugdual grall) no sql-hadoop

  • 1. NoSQL & BigData Why Every NoSQL Deployment Should be Paired with Hadoop Tugdual Grall Couchbase @tgrall
  • 2. About  Me   • Tugdual  “Tug”  Grall ­ Couchbase • Technical  Evangelist ­ eXo • CTO ­ Oracle • Developer/Product  Manager • Mainly  Java/SOA ­ Developer  in  consul@ng  firms • Web • @tgrall • hEp://blog.grallandco.com • tgrall • NantesJUG  co-­‐founder • Pet  Project  : • hEp://www.resultri.com
  • 3. Big  Data High  Data  Variety  and  Velocity Trillions  of  Gigabytes  (ZeEabytes) 2.00 1.50 1.00 0.50 0 2000 2006 2011 Source:  IDC  2011  Digital  Universe  Study  (hEp://www.emc.com/collateral/demos/microsites/emc-­‐digital-­‐universe-­‐2011/index.htm) More  Flexible  Data  Model  Required 3 • Usually  when  people  talk  about  Big  Data  they  talk  about  capturing  huge  amounts  of  data  and  analyzing  it.  This  reference  to  Big  Data  is   certainly  a  big  trend. • But  Big  Data  affects  opera@onal  databases  in  a  big  way  as  well  but  for  a  different  set  of  reasons. • There  are  2  aspects  of  Big  Data  that  are  pushing  people  toward  NoSQL  technologies. • The  first  is  that  the  vast  majority  of  the  increase  in  data  is  in  the  form  of  un-­‐structured  or  semi-­‐structured  data.    This  is  data  like  user-­‐ generated  content  like  consumer  recommenda@ons  and  machine  generated  data  like  log  files  and  website  click  data.    Rela@onal  databases   aren’t  well  suited  for  storing  this  type  of  data  while  NoSQL  technologies  like  document-­‐oriented  database  are  ideally  suited  for  this. • The  second  is  that  applica@on  developers  are  finding  new  types  of  data  they  want  to  store  all  the  @me.    It  might  be  new  informa@on  they   want  to  store  in  a  user’s  account  profile,  new  logging  informa@on,  etc.    The  point  is  that  what  developers  want  to  store  is  changing  very   rapidly  and  the  amount  of  data  they  want  to  store  is  increasing  very  rapidly.    The  result  is  that  developers  want  a  very  flexible  data  model   that  they  can  evolve  very  quickly.   • Rela@onal  databases  have  fixed  schemas  that  ofen  take  weeks  or  months  to  change.    On  the  other  hand,  NoSQL  databases  are  schema-­‐less.     As  a  result,  you  can  far  more  easily  add  new  types  of  data  and  iterate  quickly  on  your  applica@on.
  • 4. Big  Data High  Data  Variety  and  Velocity Trillions  of  Gigabytes  (ZeEabytes) 2.00 1.50 1.00 0.50 0 2000 2006 2011 Source:  IDC  2011  Digital  Universe  Study  (hEp://www.emc.com/collateral/demos/microsites/emc-­‐digital-­‐universe-­‐2011/index.htm) More  Flexible  Data  Model  Required 3 • Usually  when  people  talk  about  Big  Data  they  talk  about  capturing  huge  amounts  of  data  and  analyzing  it.  This  reference  to  Big  Data  is   certainly  a  big  trend. • But  Big  Data  affects  opera@onal  databases  in  a  big  way  as  well  but  for  a  different  set  of  reasons. • There  are  2  aspects  of  Big  Data  that  are  pushing  people  toward  NoSQL  technologies. • The  first  is  that  the  vast  majority  of  the  increase  in  data  is  in  the  form  of  un-­‐structured  or  semi-­‐structured  data.    This  is  data  like  user-­‐ generated  content  like  consumer  recommenda@ons  and  machine  generated  data  like  log  files  and  website  click  data.    Rela@onal  databases   aren’t  well  suited  for  storing  this  type  of  data  while  NoSQL  technologies  like  document-­‐oriented  database  are  ideally  suited  for  this. • The  second  is  that  applica@on  developers  are  finding  new  types  of  data  they  want  to  store  all  the  @me.    It  might  be  new  informa@on  they   want  to  store  in  a  user’s  account  profile,  new  logging  informa@on,  etc.    The  point  is  that  what  developers  want  to  store  is  changing  very   rapidly  and  the  amount  of  data  they  want  to  store  is  increasing  very  rapidly.    The  result  is  that  developers  want  a  very  flexible  data  model   that  they  can  evolve  very  quickly.   • Rela@onal  databases  have  fixed  schemas  that  ofen  take  weeks  or  months  to  change.    On  the  other  hand,  NoSQL  databases  are  schema-­‐less.     As  a  result,  you  can  far  more  easily  add  new  types  of  data  and  iterate  quickly  on  your  applica@on.
  • 5. Big  Data High  Data  Variety  and  Velocity Trillions  of  Gigabytes  (ZeEabytes) 2.00 1.50 1.00 0.50 0 Structured  Data 2000 2006 2011 Source:  IDC  2011  Digital  Universe  Study  (hEp://www.emc.com/collateral/demos/microsites/emc-­‐digital-­‐universe-­‐2011/index.htm) More  Flexible  Data  Model  Required 3 • Usually  when  people  talk  about  Big  Data  they  talk  about  capturing  huge  amounts  of  data  and  analyzing  it.  This  reference  to  Big  Data  is   certainly  a  big  trend. • But  Big  Data  affects  opera@onal  databases  in  a  big  way  as  well  but  for  a  different  set  of  reasons. • There  are  2  aspects  of  Big  Data  that  are  pushing  people  toward  NoSQL  technologies. • The  first  is  that  the  vast  majority  of  the  increase  in  data  is  in  the  form  of  un-­‐structured  or  semi-­‐structured  data.    This  is  data  like  user-­‐ generated  content  like  consumer  recommenda@ons  and  machine  generated  data  like  log  files  and  website  click  data.    Rela@onal  databases   aren’t  well  suited  for  storing  this  type  of  data  while  NoSQL  technologies  like  document-­‐oriented  database  are  ideally  suited  for  this. • The  second  is  that  applica@on  developers  are  finding  new  types  of  data  they  want  to  store  all  the  @me.    It  might  be  new  informa@on  they   want  to  store  in  a  user’s  account  profile,  new  logging  informa@on,  etc.    The  point  is  that  what  developers  want  to  store  is  changing  very   rapidly  and  the  amount  of  data  they  want  to  store  is  increasing  very  rapidly.    The  result  is  that  developers  want  a  very  flexible  data  model   that  they  can  evolve  very  quickly.   • Rela@onal  databases  have  fixed  schemas  that  ofen  take  weeks  or  months  to  change.    On  the  other  hand,  NoSQL  databases  are  schema-­‐less.     As  a  result,  you  can  far  more  easily  add  new  types  of  data  and  iterate  quickly  on  your  applica@on.
  • 6. Big  Data High  Data  Variety  and  Velocity Trillions  of  Gigabytes  (ZeEabytes) 2.00 1.50 Unstructured  and  Semi-­‐ Structured  Data 1.00 0.50 0 Text,  Log  Files,  Click   Streams,  Blogs,   Tweets,  Audio,   Video,  etc. Structured  Data 2000 2006 2011 Source:  IDC  2011  Digital  Universe  Study  (hEp://www.emc.com/collateral/demos/microsites/emc-­‐digital-­‐universe-­‐2011/index.htm) More  Flexible  Data  Model  Required 3 • Usually  when  people  talk  about  Big  Data  they  talk  about  capturing  huge  amounts  of  data  and  analyzing  it.  This  reference  to  Big  Data  is   certainly  a  big  trend. • But  Big  Data  affects  opera@onal  databases  in  a  big  way  as  well  but  for  a  different  set  of  reasons. • There  are  2  aspects  of  Big  Data  that  are  pushing  people  toward  NoSQL  technologies. • The  first  is  that  the  vast  majority  of  the  increase  in  data  is  in  the  form  of  un-­‐structured  or  semi-­‐structured  data.    This  is  data  like  user-­‐ generated  content  like  consumer  recommenda@ons  and  machine  generated  data  like  log  files  and  website  click  data.    Rela@onal  databases   aren’t  well  suited  for  storing  this  type  of  data  while  NoSQL  technologies  like  document-­‐oriented  database  are  ideally  suited  for  this. • The  second  is  that  applica@on  developers  are  finding  new  types  of  data  they  want  to  store  all  the  @me.    It  might  be  new  informa@on  they   want  to  store  in  a  user’s  account  profile,  new  logging  informa@on,  etc.    The  point  is  that  what  developers  want  to  store  is  changing  very   rapidly  and  the  amount  of  data  they  want  to  store  is  increasing  very  rapidly.    The  result  is  that  developers  want  a  very  flexible  data  model   that  they  can  evolve  very  quickly.   • Rela@onal  databases  have  fixed  schemas  that  ofen  take  weeks  or  months  to  change.    On  the  other  hand,  NoSQL  databases  are  schema-­‐less.     As  a  result,  you  can  far  more  easily  add  new  types  of  data  and  iterate  quickly  on  your  applica@on.
  • 7. Opera@onal  vs.  Analy@c  Databases AnalyOc Databases Real-­‐Ome,   InteracOve  Databases NoSQL Get  insights  from   data Fast  access   to  data Couchbase Mongo Cloudera Hortonworks 4 • There  are  two  types  of  databases.  Each  is  focused  on  a  very  different  problem. • AnalyOc  databases  were  referred  to  in  the  past  as  OLAP  databases.    They  are  focused  on  looking  through  every  record  in  a  huge  database  to   answer  a  ques@on  or  gain  an  insight  about  the  data  contained  in  it.    These  analyses  are  batch  processes  that  access  every  piece  of  data  in  the   database,  are  very  “read”  heavy,  and  produce  results  in  seconds,  minutes,  or  someOmes  days.  For  analy@c  databases,  “real  @me”  means  an   analysis  takes  a  few  seconds  to  run. • Real-­‐Ome  interac@ve  databases  are  ofen  referred  to  as  operaOonal  databases.    They  store  a  lot  of  data  but  usually  much  less  than  an   analy@c  database. • They  must  provide  access  to  individual  records  in  a  database  in  milliseconds  so  that  users  of  an  applica@on  get  good  response  @me. • Since  the  requirements  of  each  database  is  very  different,  the  architectures  and  capabili@es  of  each  are  very  different  as  well. • When  I  refer  to  NoSQL  in  my  presenta@on,  I  am  referring  to  real-­‐Ome,  interacOve  databases.    This  is  the  type  of  NoSQL  database  Couchbase   provides.
  • 8. 49% 35% 29% 16% Lack  of  flexibility/ rigid  schemas Inability  to  scale   Performance  challenges out  data Source:  Couchbase  Survey,  December  2011,  n  =  1351. Cost 12% 11% All  of  these Other
  • 9. NoSQL  catalog Cache (memory  only) Key-­‐Value Data  Structure Memcached Document Column Graph Redis Couchbase Cassandra Neo4j MongoDB HBase Database (memory/disk) Coherence Membase InfiniteGraph
  • 10. Use  Cases Key  Value •  Session  Management •  User  Profile/Preferences •  Shopping  Cart Document •  Event  Logging •  Content  Management   •  Web  AnalyOcs •  E-­‐Commerce  ApplicaOon Columns •  Event  Logging •  Content  Management •  Counters Graph •  Connected  Data  /    Social  Networks •  RouOng,  Dispatch •  RecommendaOons  based  on  Social  Graph
  • 12. What  is  Hadoop? • Highly  scalable • Unstructured  data • Open  source • Big  Data  OperaOng  System • Changing  the  World  One  Petabyte  at  a  Time
  • 13. What  is  Hadoop? • Simplest  unit  of  compute  and  storage Disks CPU Application Data
  • 14. What  is  Hadoop? • And  when  it  grows? Application Data
  • 15. What  is  Hadoop? • And  when  it  grows  more?
  • 16. What  is  Hadoop? • NoSQL  to  the  rescue Application Data
  • 17. What  is  Hadoop? • Hadoop  is  a  different  paradigm Application Data
  • 18. Hadoop is not a “NoSQL Database” but more a set of tools to work with BigData: the ultimate Swiss Army Knife to deal with VERY VERY large volume of data Oozie: Workflow, coordination Sqoop : Data connector to import/export data Hive : SQL-Like interface Pig : High level programming language Mahout : Machine learning library Whirr : Hadoop management tools for cloud services Flume : Aggregator Map Reduce : Framework to process large volume of data HBase : Key Value data store Zookeeper : Centralized configuration management HDFS : Distributed file system
  • 20. Ad  and  offer  targeOng 40  milliseconds  to  respond  with   the  decision. 3 profiles,  real  @me  campaign   sta@s@cs 2 1 profiles,  campaigns events 17
  • 21. Ad  and  offer  targeOng 40  milliseconds  to  respond  with   the  decision. 3 profiles,  real  @me  campaign   sta@s@cs 2 1 profiles,  campaigns events 17
  • 22. Moving  Parts Ad Targeting Platform Couchbase Server Cluster sqoop export Logs Logs Logs Logs Logs flume flow sqoop import Hadoop Cluster 18
  • 23. Content  &  RecommendaOon  TargeOng 3& make&& recommenda2ons& Content Oriented Site 1& events& 2& user&profiles& 19 Legacy Relational Database
  • 24. Content  &  RecommendaOon  TargeOng 3& make&& recommenda2ons& Content Oriented Site 1& events& 2& user&profiles& 19 Legacy Relational Database
  • 25. Moving  Parts In order to keep up with changing needs on richer, more targeted content that is delivered to larger and larger audiences very quickly, data behind content driven sites is shifting to Couchbase. Content Driven Web Site Original RDBMS Couchbase Server Cluster Logs Logs Logs Logs Logs flume flow sqoop import Hadoop excels at complex analytics which may involve multiple steps of processing which incorporate a number of different data sources. sqoop export Hadoop Cluster 20 sqoop import
  • 26. Sqoop  :  What  is  this?
  • 27. What  is  Sqoop? Sqoop is a tool designed to transfer data between Hadoop and relational databases. You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS. sqoop.apache.org 22
  • 28. What  is  Sqoop? • Traditional ETL T Data Application 23 Data
  • 29. What  is  Sqoop? • A different paradigm Applicatio n Data Data 24
  • 30. What  is  Sqoop? • A very scalable different paradigm Application Data Application Data Application Data Data 25
  • 31. What  is  Sqoop? • Where did the Transform go? TTT TTT TTT TTT Application Data 26
  • 32. What  is  Sqoop? • Sqoop  “SQL-­‐Hadoop” ­ Default  connec@on  is  via  JDBC • Lots  of  custom  connectors ­ Couchbase,  VoltDB,  Ver@ca ­ Teradata,  Netezza ­ Oracle,  MySQL,  Postgres
  • 34. Sqoop  :  Import sqoop import --connect jdbc:mysql://rdbms1.demo.com/CRM --table customers
  • 36. Sqoop  :  Export sqoop export --connect jdbc:mysql://rdbms1.demo.com/ANALYTICS --table sales --export-dir /user/hive/warehouse/zip_profits --input-fields-terminated-by '0001'
  • 38. Sqoop  :  Import sqoop import –-connect http://localhost:8091/pools --table DUMP
  • 39. Sqoop  :  Import Metadata Sqoop   Client Launches Map Map Map HDFS HDFS HDFS MapReduceJob
  • 40. Sqoop  :  Import Metadata Sqoop   Client Launches Map Map Map HDFS HDFS HDFS MapReduceJob
  • 42. Sqoop  :  Export sqoop export --connect http://localhost:8091/pools --table DUMP --export-dir /user/hive/profiles/recommendation --username social
  • 46. NoSQL & BigData Why Every NoSQL Deployment Should be Paired with Hadoop Tugdual Grall Couchbase @tgrall Q&A