SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Downloaden Sie, um offline zu lesen
Friday, April 26, 13
Introduc)on	
  to	
  Map	
  Reduce
with	
  Couchbase
Tugdual	
  Grall	
  /	
  @tgrall
NoSQL	
  Ma)ers	
  ‘13	
  -­‐	
  Cologne	
  -­‐	
  April	
  25th	
  2013
Friday, April 26, 13
About	
  Me	
  
• Tugdual	
  “Tug”	
  Grall
­ Couchbase
• Technical	
  Evangelist
­ eXo
• CTO
­ Oracle
• Developer/Product	
  Manager
• Mainly	
  Java/SOA
­ Developer	
  in	
  consul@ng	
  firms
• Web
• @tgrall
• hEp://blog.grallandco.com
• tgrall
• NantesJUG	
  co-­‐founder
• Pet	
  Project	
  :
• hEp://www.resultri.com
Friday, April 26, 13
What’s	
  the	
  Problem	
  ?
Lots	
  of	
  Data
Big	
  Data SaaS/Cloud	
  
CompuDng
Big	
  Users
Friday, April 26, 13
Solu)on
Distribute:
•	
  the	
  data
•	
  the	
  processing	
  of	
  the	
  data
Friday, April 26, 13
Map	
  Reduce	
  
MapReduce	
  is	
  a	
  programming	
  model	
  for	
  processing	
  
large	
  data	
  sets,	
  and	
  the	
  name	
  of	
  an	
  implementa@on	
  
of	
  the	
  model	
  by	
  Google.	
  MapReduce	
  is	
  typically	
  used	
  
to	
  do	
  distributed	
  compu@ng	
  on	
  clusters	
  of	
  
computers.
hEp://research.google.com/archive/mapreduce.html
Friday, April 26, 13
In	
  details
• Developer	
  specifies	
  2	
  methods:
­ map (in_key, in_value) -> list(out_key, intermediate_value)
• Processes	
  input	
  data	
  
• Produces	
  key,	
  values	
  pairs
­ reduce (out_key, list(intermediate_value)) -> list(out_value)
• Combines	
  all	
  intermediate	
  values	
  for	
  a	
  par@cular	
  key
• Produce	
  a	
  set	
  of	
  merged	
  output	
  values
Friday, April 26, 13
Execu)on
Friday, April 26, 13
Most	
  common	
  use	
  case
©	
  Yahoo	
  inc.
Friday, April 26, 13
What	
  about	
  Couchbase?
Friday, April 26, 13
Couchbase	
  Open	
  Source	
  Project
• Leading	
  NoSQL	
  database	
  project	
  
focused	
  on	
  distributed	
  database	
  
technology	
  and	
  surrounding	
  
ecosystem
• Supports	
  both	
  key-­‐value	
  and	
  
document-­‐oriented	
  use	
  cases
• All	
  components	
  are	
  available	
  under	
  
the	
  Apache	
  2.0	
  Public	
  License
• Obtained	
  as	
  packaged	
  soXware	
  in	
  
both	
  enterprise	
  and	
  community	
  
edi@ons.
Couchbase
Open Source Project
Friday, April 26, 13
Couchbase	
  Server	
  Core	
  Principles
Easy	
  
Scalability
Consistent	
  High	
  
Performance
Always	
  On	
  
24x365
Grow	
  cluster	
  without	
  applica@on	
  
changes,	
  without	
  down@me	
  with	
  a	
  
single	
  click
Consistent	
  sub-­‐millisecond	
  
read	
  and	
  write	
  response	
  @mes	
  
with	
  consistent	
  high	
  throughput
No	
  down@me	
  for	
  soXware	
  upgrades,	
  
hardware	
  maintenance,	
  etc.
Flexible	
  Data	
  
Model
JSON	
  document	
  model	
  with	
  no	
  fixed	
  
schema.
JSON
JSON
JSON
JSONJSON
PERFORMANCE
Friday, April 26, 13
Addi)onal	
  Couchbase	
  Server	
  Features
Built-­‐in	
  clustering	
  –	
  All	
  nodes	
  equal
Data	
  replica@on	
  with	
  auto-­‐failover
Zero-­‐down@me	
  maintenance	
  
Built-­‐in	
  managed	
  cached
Append-­‐only	
  storage	
  layer
Online	
  compac@on
Monitoring	
  and	
  admin	
  API	
  &	
  UI
SDK	
  for	
  a	
  variety	
  of	
  languages
Friday, April 26, 13
Heartbeat
Process	
  monitor
Global	
  singleton	
  supervisor
Configura@on	
  manager
on	
  each	
  node
Rebalance	
  orchestrator
Node	
  health	
  monitor
one	
  per	
  cluster
vBucket	
  state	
  and	
  replica@on	
  manager
hVp
REST	
  management	
  API/Web	
  UI
HTTP
8091
Erlang	
  port	
  mapper
4369
Distributed	
  Erlang
21100	
  -­‐	
  21199
Erlang/OTP
storage	
  interface
Couchbase	
  EP	
  Engine
11210
Memcapable	
  	
  2.0
Moxi
11211
Memcapable	
  	
  1.0
Memcached
New	
  Persistence	
  Layer
8092
Query	
  APIQuery	
  Engine
Data	
  Manager Cluster	
  Manager
Couchbase	
  Server	
  2.0	
  Architecture
Friday, April 26, 13
New	
  Persistence	
  Layer
storage	
  interface
Couchbase	
  EP	
  Engine
11210
Memcapable	
  	
  2.0
Moxi
11211
Memcapable	
  	
  1.0
Object-­‐level	
  Cache
Disk	
  Persistence
8092
Query	
  API
Query	
  Engine
HTTP
8091
Erlang	
  port	
  mapper
4369
Distributed	
  Erlang
21100	
  -­‐	
  21199
Heartbeat
Process	
  monitor
Global	
  singleton	
  supervisor
Configura@on	
  manager
on	
  each	
  node
Rebalance	
  orchestrator
Node	
  health	
  monitor
one	
  per	
  cluster
vBucket	
  state	
  and	
  replica@on	
  manager
hVp
REST	
  management	
  API/Web	
  UI
Erlang/OTP
Server/Cluster	
  
Management	
  &	
  
CommunicaDon
(Erlang)
RAM	
  Cache,	
  
Indexing	
  &	
  
Persistence	
  
Management
(C	
  &	
  V8)
The Unreasonable Effectiveness of C by Damien Katz
Couchbase	
  Server	
  2.0	
  Architecture
Friday, April 26, 13
COUCHBASE	
  SERVER	
  CLUSTER
Basic	
  Opera)on
• Docs	
  distributed	
  evenly	
  across	
  servers	
  
• Each	
  server	
  stores	
  both	
  ac)ve	
  and	
  replica	
  
docs
Only	
  one	
  server	
  ac@ve	
  at	
  a	
  @me
• Client	
  library	
  provides	
  app	
  with	
  simple	
  
interface	
  to	
  database
• Cluster	
  map	
  provides	
  map	
  
to	
  which	
  server	
  doc	
  is	
  on
App	
  never	
  needs	
  to	
  know
• App	
  reads,	
  writes,	
  updates	
  docs
• Mul)ple	
  app	
  servers	
  can	
  access	
  same	
  
document	
  at	
  same	
  )me
User	
  Configured	
  Replica	
  Count	
  =	
  1
READ/WRITE/UPDATE
ACTIVE
Doc	
  5
Doc	
  2
Doc
Doc
Doc
SERVER	
  1
ACTIVE
Doc	
  4
Doc	
  7
Doc
Doc
Doc
SERVER	
  2
Doc	
  8
ACTIVE
Doc	
  1
Doc	
  2
Doc
Doc
Doc
REPLICA
Doc	
  4
Doc	
  1
Doc	
  8
Doc
Doc
Doc
REPLICA
Doc	
  6
Doc	
  3
Doc	
  2
Doc
Doc
Doc
REPLICA
Doc	
  7
Doc	
  9
Doc	
  5
Doc
Doc
Doc
SERVER	
  3
Doc	
  6
APP	
  SERVER	
  1
COUCHBASE	
  Client	
  Library
CLUSTER	
  MAP
COUCHBASE	
  Client	
  Library
CLUSTER	
  MAP
APP	
  SERVER	
  2
Doc	
  9
Friday, April 26, 13
How	
  to	
  access	
  the	
  data?
Friday, April 26, 13
Couchbase.get(“my-key”);
Friday, April 26, 13
Key
{
	
  	
  	
  	
  “string”	
  :	
  “string”,
	
  	
  	
  	
  “string”	
  :	
  value,
	
  	
  	
  	
  “string”	
  :	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  {	
  	
  “string”	
  :	
  “string”,
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  “string”	
  :	
  value	
  },
	
  	
  	
  	
  “string”	
  :	
  [	
  array	
  ]
}
JSON
OBJECT
(“DOCUMENT”)
• How	
  to	
  find	
  document	
  based	
  
on	
  its	
  aVributes?
­ get	
  employee	
  by	
  email
­ get	
  products	
  by	
  type
­ ...
• You	
  need	
  to	
  look	
  “into”	
  the	
  
document/value
Look	
  at	
  a	
  document
Friday, April 26, 13
Create	
  an	
  index	
  !
How	
  to?
Friday, April 26, 13
{
"name": "Aventinus",
"abv": 8.2,
"ibu": 0,
"srm": 0,
"upc": 0,
"type": "beer",
"brewery_id": "110f1f2012",
"updated": "2010-07-22 20:00:20",
"description": "Dark-ruby,
... Weizenbock",
"category": "German Ale"
}
{
"id": "110f37fa30",
"rev": "1-000000000",
"expiration": 0,
"flags": 0,
"type": "json"
}
Key Value
Aven@nus 8.2
Avenue	
  Ale 4.1
... ...
{
"name": "Aventinus",
"abv": 8.2,
"ibu": 0,
"srm": 0,
"upc": 0,
"type": "beer",
"brewery_id": "110f1f2012",
"updated": "2010-07-22 20:00:20",
"description": "Dark-ruby,
... Weizenbock",
"category": "German Ale"
}
{
"id": "110f37fa30",
"rev": "1-000000000",
"expiration": 0,
"flags": 0,
"type": "json"
}
{
"name": "Aventinus",
"abv": 8.2,
"ibu": 0,
"srm": 0,
"upc": 0,
"type": "beer",
"brewery_id": "110f1f2012",
"updated": "2010-07-22 20:00:20",
"description": "Dark-ruby,
... Weizenbock",
"category": "German Ale"
}
{
"id": "110f37fa30",
"rev": "1-000000000",
"expiration": 0,
"flags": 0,
"type": "json"
}
{
"name": "Aventinus",
"abv": 8.2,
"ibu": 0,
"srm": 0,
"upc": 0,
"type": "beer",
"brewery_id": "110f1f2012",
"updated": "2010-07-22 20:00:20",
"description": "Dark-ruby,
... Weizenbock",
"category": "German Ale"
}
{
"id": "110f37fa30",
"rev": "1-000000000",
"expiration": 0,
"flags": 0,
"type": "json"
}
{
"name": "Aventinus",
"abv": 8.2,
"ibu": 0,
"srm": 0,
"upc": 0,
"type": "beer",
"brewery_id": "110f1f2012",
"updated": "2010-07-22 20:00:20",
"description": "Dark-ruby,
... Weizenbock",
"category": "German Ale"
}
{
"id": "110f37fa30",
"rev": "1-000000000",
"expiration": 0,
"flags": 0,
"type": "json"
}
{
"name": "Aventinus",
"abv": 8.2,
"ibu": 0,
"srm": 0,
"upc": 0,
"type": "beer",
"brewery_id": "110f1f2012",
"updated": "2010-07-22 20:00:20",
"description": "Dark-ruby,
... Weizenbock",
"category": "German Ale"
}
{
"id": "110f37fa30",
"rev": "1-000000000",
"expiration": 0,
"flags": 0,
"type": "json"
}
{
"name": "Aventinus",
"abv": 8.2,
"ibu": 0,
"srm": 0,
"upc": 0,
"type": "beer",
"brewery_id": "110f1f2012",
"updated": "2010-07-22 20:00:20",
"description": "Dark-ruby,
... Weizenbock",
"category": "German Ale"
}
{
"id": "110f37fa30",
"rev": "1-000000000",
"expiration": 0,
"flags": 0,
"type": "json"
}
{
"name": "Aventinus",
"abv": 8.2,
"ibu": 0,
"srm": 0,
"upc": 0,
"type": "beer",
"brewery_id": "110f1f2012",
"updated": "2010-07-22 20:00:20",
"description": "Dark-ruby,
... Weizenbock",
"category": "German Ale"
}
{
"id": "110f37fa30",
"rev": "1-000000000",
"expiration": 0,
"flags": 0,
"type": "json"
}
{
"name": "Aventinus",
"abv": 8.2,
"ibu": 0,
"srm": 0,
"upc": 0,
"type": "beer",
"brewery_id": "110f1f2012",
"updated": "2010-07-22 20:00:20",
"description": "Dark-ruby,
... Weizenbock",
"category": "German Ale"
}
{
"id": "110f37fa30",
"rev": "1-000000000",
"expiration": 0,
"flags": 0,
"type": "json"
}
Create	
  the	
  index
Friday, April 26, 13
Concrete	
  Example
• This	
  map	
  func)on:
­ receives	
  the	
  document	
  and	
  metadata
­ as	
  developer	
  you	
  just	
  have	
  to	
  emit	
  the	
  K,V
Friday, April 26, 13
Map	
  Func)on
Text
Friday, April 26, 13
doc.email meta.id
abba@couchbase.com u::1
beta@couchbase.com u::7
jasdeep@couchbase.com u::2
math@couchbase.com u::5
maE@couchbase.com u::6
ye@@couchbase.com u::4
zorro@couchbase.com u::3
?startkey=”b1”	
  &	
  endkey=”zz”
Pulls	
  the	
  Index-­‐Keys	
  
between	
  UTF-­‐8	
  Range	
  
specified	
  by	
  the	
  
startkey	
  and	
  endkey.
?startkey=”bz”	
  &	
  endkey=”zn”
Pulls	
  the	
  Index-­‐Keys	
  
between	
  UTF-­‐8	
  Range	
  
specified	
  by	
  the	
  
startkey	
  and	
  endkey.
Friday, April 26, 13
doc.email meta.id
abba@couchbase.com u::1
beta@couchbase.com u::7
jasdeep@couchbase.com u::2
math@couchbase.com u::5
maE@couchbase.com u::6
ye@@couchbase.com u::4
zorro@couchbase.com u::3
?key=”math@couchbase.com”	
  
Match	
  a	
  Single	
  Index-­‐Key
Friday, April 26, 13
doc.email meta.id
abba@couchbase.com u::1
beta@couchbase.com u::7
jasdeep@couchbase.com u::2
math@couchbase.com u::5
maE@couchbase.com u::6
ye@@couchbase.com u::4
zorro@couchbase.com u::3
?keys=[“math@couchbase.com”,
“yeD@couchbase.com”]
Query	
  Mul@ple	
  in	
  the	
  
Set	
  (Array	
  Nota@on)
Friday, April 26, 13
How	
  it	
  works	
  ?
Friday, April 26, 13
COUCHBASE	
  SERVER	
  	
  CLUSTER
Indexing	
  and	
  Querying	
  
User	
  Configured	
  Replica	
  Count	
  =	
  1
ACTIVE
Doc	
  5
Doc	
  2
Doc
Doc
Doc
SERVER	
  1
REPLICA
Doc	
  4
Doc	
  1
Doc	
  8
Doc
Doc
Doc
APP	
  SERVER	
  1
COUCHBASE	
  Client	
  Library
CLUSTER	
  MAP
COUCHBASE	
  Client	
  Library
CLUSTER	
  MAP
APP	
  SERVER	
  2
Doc	
  9
• Indexing	
  work	
  is	
  distributed	
  amongst	
  
nodes
• Large	
  data	
  set	
  possible
• Parallelize	
  the	
  effort
• Each	
  node	
  has	
  index	
  for	
  data	
  stored	
  on	
  it
• Queries	
  combine	
  the	
  results	
  from	
  
required	
  nodes
ACTIVE
Doc	
  5
Doc	
  2
Doc
Doc
Doc
SERVER	
  2
REPLICA
Doc	
  4
Doc	
  1
Doc	
  8
Doc
Doc
Doc
Doc	
  9
ACTIVE
Doc	
  5
Doc	
  2
Doc
Doc
Doc
SERVER	
  3
REPLICA
Doc	
  4
Doc	
  1
Doc	
  8
Doc
Doc
Doc
Doc	
  9
Query
Friday, April 26, 13
Couchbase	
  Server	
  2.0:	
  Views
• Views	
  can	
  cover	
  a	
  few	
  different	
  use	
  cases
­ Primary	
  Index	
  
­ Simple	
  secondary	
  indexes	
  (the	
  most	
  common)
­ Complex	
  secondary,	
  ter@ary	
  and	
  composite	
  indexes
­ Aggrega@on	
  func@ons	
  (reduc@on)
• Example:	
  count	
  the	
  number	
  of	
  “North	
  American	
  Ales”
­ Organizing	
  related	
  data
• Built	
  using	
  Map/Reduce
­ Map	
  func@on	
  creates	
  a	
  matrix	
  from	
  document	
  fields
­ Reduce	
  func@on	
  summarizes	
  (reduces)	
  informa@on
Friday, April 26, 13
Distributed	
  Index	
  Build	
  Phase
• Op)mized	
  for	
  lookups,	
  in-­‐order	
  access	
  and	
  aggrega)ons
• All	
  view	
  reads	
  from	
  disk	
  (different	
  performance	
  profile)
• View	
  builds	
  against	
  every	
  document	
  on	
  every	
  node
­ This	
  is	
  why	
  you	
  should	
  group	
  them	
  in	
  a	
  design	
  document
• Automa)cally	
  kept	
  up	
  to	
  date
­ “Incremental”	
  Map	
  Reduce
Friday, April 26, 13
Dynamic	
  Range	
  Queries	
  with	
  Op5onal	
  Aggrega5on
•Efficiently	
  fetch	
  an	
  row	
  or	
  group	
  of	
  related	
  rows.
•Queries	
  use	
  cached	
  values	
  from	
  B-­‐tree	
  inner	
  nodes	
  when	
  possible
•Take	
  advantage	
  of	
  in-­‐order	
  tree	
  traversal	
  with	
  group_level	
  queries
Doc	
  4
Doc	
  2
Doc	
  5
SERVER	
  1
Doc	
  6
Doc	
  4
SERVER	
  2
Doc	
  7
Doc	
  1
SERVER	
  3
Doc	
  3
Doc	
  9
Doc	
  7
Doc	
  8 Doc	
  6
Doc	
  3
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
Doc	
  9
Doc	
  5
DOC
DOC
DOC
Doc	
  1
Doc	
  8 Doc	
  2
Replica	
  Docs Replica	
  Docs Replica	
  Docs
Ac@ve	
  Docs Ac@ve	
  Docs Ac@ve	
  Docs
?startkey=“J”&endkey=“K”
{“rows”:[{“key”:“Juneau”,“value”:null}]}
Friday, April 26, 13
Append	
  Only	
  Index
• Disk	
  acDvity	
  is	
  slow
• UpdaDng	
  disk	
  blocks	
  is	
  very	
  slow
• Appending	
  new	
  data	
  to	
  the	
  end	
  of	
  the	
  current	
  file	
  is	
  fast
• Overhead	
  of	
  reverse	
  reading	
  is	
  small
• Because	
  exisDng	
  blocks	
  are	
  not	
  re-­‐used,	
  can	
  lead	
  to	
  fragmentaDon
­ Couchbase	
  will	
  compact	
  the	
  index	
  automa@cally
Doc
View
Processor Disk
Doc
View
Processor
Changed Documents
Appended
Original
Friday, April 26, 13
Adding	
  a	
  new	
  Document
A-R
15
I-R
8
M-R
5
A B C D F G H I K L N O Q R
A-C
3
D-F
2
G-H
2
I-L
3
N-R
4
A-H
7
I-R
7
A-R
14
M
new root
new key
new reductions
Friday, April 26, 13
What	
  about	
  Reduce	
  ?
• Out	
  of	
  the	
  box	
  func)ons	
  :
­ _count()
­ _sum()
­ _stats()
• Create	
  your	
  own	
  if	
  needed
function(key, values, rereduce) {
if (rereduce) {
var result = 0;
for (var i = 0; i < values.length; i++) {
result += values[i];
}
return result;
} else {
return values.length;
}
}
Friday, April 26, 13
Reduce	
  Func)on
• Key	
  and	
  Arrays	
  of	
  values	
  as	
  parameters
• WriVen	
  Javascript
• Called	
  aner	
  the	
  map	
  func)on
• Used	
  to	
  reduce	
  the	
  result	
  of	
  a	
  map	
  of	
  single	
  values
• Used	
  with	
  grouping
• Could	
  be	
  ignored	
  when	
  querying
­ reuse	
  the	
  index
Friday, April 26, 13
• Map()	
  Result
• Reduce()
• Result
Reduce	
  in	
  Ac)on
Key Value
Belgian-­‐Style	
  Dubbel 1
Belgian-­‐Style	
  Dubbel 1
Belgian-­‐Style	
  Dubbel 1
Belgian-­‐Style	
  Pale	
  Ale 1
Belgian-­‐Style	
  White 1
Belgian-­‐Style	
  White 1
... ...
_count()
Key Value
Belgian-­‐Style	
  Dubbel 3
Belgian-­‐Style	
  Pale	
  Ale 1
Belgian-­‐Style	
  White 2
Friday, April 26, 13
How	
  to	
  use	
  it?
• Use	
  client	
  SDK	
  to	
  call	
  the	
  view:
View view = client.getView("beer", "by_name");
Query query = new Query();
query.setIncludeDocs(true)
     .setLimit(20)
     .setRangeStart(ComplexKey.of(startKey))
     .setRangeEnd(ComplexKey.of(startKey + "uefff"));
ViewResponse result = client.query(view, query);
 
for(ViewRow row : result) {
....
}
Friday, April 26, 13
Demonstra)on
Friday, April 26, 13
≠
Hadoop	
  &	
  Couchbase
• Deal	
  with	
  “Big	
  Data”
• “More”	
  is	
  be)er	
  than	
  “Faster”
• Batch	
  Oriented
• Usually	
  used	
  to	
  “extract/transform”	
  data
• Fully	
  distributed
­ Map,	
  Shuffle,	
  Reduce
• Distributed	
  
• Executed	
  where	
  the	
  document	
  is
• Deal	
  with	
  “indexing”	
  data	
  
• As	
  fast	
  as	
  possible
• Use	
  to	
  query	
  the	
  data	
  in	
  the	
  Database
Friday, April 26, 13
Map	
  Reduce	
  in	
  Couchbase
• Like	
  many	
  other	
  NoSQL	
  Database	
  :	
  Used	
  for	
  queries	
  !	
  
• Index	
  are	
  distributed	
  on	
  each	
  node	
  of	
  the	
  cluster
• Index	
  are	
  updated	
  Incrementally
• Write	
  you	
  Map	
  Reduce	
  in	
  Javascript
Friday, April 26, 13
Thank	
  you!
tug@couchbase.com
@tgrall
Get	
  Couchbase	
  Server	
  at	
  
hEp://www.couchbase.com/download
Friday, April 26, 13
Friday, April 26, 13

Weitere ähnliche Inhalte

Was ist angesagt?

Habits of Effective Sqoop Users
Habits of Effective Sqoop UsersHabits of Effective Sqoop Users
Habits of Effective Sqoop UsersKathleen Ting
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use caseDavin Abraham
 
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Helena Edelson
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomyDongmin Yu
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka Dori Waldman
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopCloudera, Inc.
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop User Group
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale Hakka Labs
 
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...Michael Stack
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSteve Loughran
 
Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons          Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons Provectus
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesHBaseCon
 
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKIntroduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKSkills Matter
 
Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database huguk
 
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Edureka!
 

Was ist angesagt? (20)

Habits of Effective Sqoop Users
Habits of Effective Sqoop UsersHabits of Effective Sqoop Users
Habits of Effective Sqoop Users
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
 
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
 
Apache phoenix
Apache phoenixApache phoenix
Apache phoenix
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka
 
Sqoop
SqoopSqoop
Sqoop
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for Hadoop
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User Group
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale
 
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object stores
 
Introduction to sqoop
Introduction to sqoopIntroduction to sqoop
Introduction to sqoop
 
Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons          Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New Features
 
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKIntroduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
 
Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database
 
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
 
Sqoop tutorial
Sqoop tutorialSqoop tutorial
Sqoop tutorial
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 

Andere mochten auch

Mongodb sharding
Mongodb shardingMongodb sharding
Mongodb shardingxiangrong
 
Developers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLDevelopers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLRyu Kobayashi
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialSteven Francia
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDBMongoDB
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jDebanjan Mahata
 
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal ScalingMySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal ScalingMats Kindahl
 
MongoDB Sharding
MongoDB ShardingMongoDB Sharding
MongoDB ShardingRob Walters
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented DatabasesFabio Fumarola
 

Andere mochten auch (11)

Nosql
NosqlNosql
Nosql
 
MapReduce and NoSQL
MapReduce and NoSQLMapReduce and NoSQL
MapReduce and NoSQL
 
Mongodb sharding
Mongodb shardingMongodb sharding
Mongodb sharding
 
Developers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLDevelopers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQL
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
Mongo db
Mongo dbMongo db
Mongo db
 
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal ScalingMySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
 
MongoDB Sharding
MongoDB ShardingMongoDB Sharding
MongoDB Sharding
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 

Ähnlich wie Introduction to Map Reduce with Couchbase

Introduction to NoSQL with Couchbase
Introduction to NoSQL with CouchbaseIntroduction to NoSQL with Couchbase
Introduction to NoSQL with CouchbaseTugdual Grall
 
Softshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseSoftshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseTugdual Grall
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Alluxio, Inc.
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101MongoDB
 
Elastic search and Symfony3 - A practical approach
Elastic search and Symfony3 - A practical approachElastic search and Symfony3 - A practical approach
Elastic search and Symfony3 - A practical approachSymfonyMu
 
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31Timothy Spann
 
The Convergence of HPC and Deep Learning
The Convergence of HPC and Deep LearningThe Convergence of HPC and Deep Learning
The Convergence of HPC and Deep Learninginside-BigData.com
 
nuclio Overview October 2017
nuclio Overview October 2017nuclio Overview October 2017
nuclio Overview October 2017iguazio
 
Introduction to ClustrixDB
Introduction to ClustrixDBIntroduction to ClustrixDB
Introduction to ClustrixDBI Goo Lee
 
LAMP Stack (Reloaded) - Infrastructure as Code with Terraform & Packer
LAMP Stack (Reloaded) - Infrastructure as Code with Terraform & PackerLAMP Stack (Reloaded) - Infrastructure as Code with Terraform & Packer
LAMP Stack (Reloaded) - Infrastructure as Code with Terraform & PackerJan-Christoph Küster
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsDataStax Academy
 
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...HostedbyConfluent
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Provectus
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Guido Schmutz
 
iguazio - nuclio overview to CNCF (Sep 25th 2017)
iguazio - nuclio overview to CNCF (Sep 25th 2017)iguazio - nuclio overview to CNCF (Sep 25th 2017)
iguazio - nuclio overview to CNCF (Sep 25th 2017)Eran Duchan
 
Couchbase overview033113long
Couchbase overview033113longCouchbase overview033113long
Couchbase overview033113longJeff Harris
 
Couchbase overview033113long
Couchbase overview033113longCouchbase overview033113long
Couchbase overview033113longJeff Harris
 
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
 Openstack - An introduction/Installation - Presented at Dr Dobb's conference... Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...Rahul Krishna Upadhyaya
 
Storage Is Not Virtualized Enough - part 1
Storage Is Not Virtualized Enough - part 1Storage Is Not Virtualized Enough - part 1
Storage Is Not Virtualized Enough - part 1Zhipeng Huang
 

Ähnlich wie Introduction to Map Reduce with Couchbase (20)

Introduction to NoSQL with Couchbase
Introduction to NoSQL with CouchbaseIntroduction to NoSQL with Couchbase
Introduction to NoSQL with Couchbase
 
Softshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseSoftshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with Couchbase
 
Not only SQL
Not only SQL Not only SQL
Not only SQL
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
Elastic search and Symfony3 - A practical approach
Elastic search and Symfony3 - A practical approachElastic search and Symfony3 - A practical approach
Elastic search and Symfony3 - A practical approach
 
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
 
The Convergence of HPC and Deep Learning
The Convergence of HPC and Deep LearningThe Convergence of HPC and Deep Learning
The Convergence of HPC and Deep Learning
 
nuclio Overview October 2017
nuclio Overview October 2017nuclio Overview October 2017
nuclio Overview October 2017
 
Introduction to ClustrixDB
Introduction to ClustrixDBIntroduction to ClustrixDB
Introduction to ClustrixDB
 
LAMP Stack (Reloaded) - Infrastructure as Code with Terraform & Packer
LAMP Stack (Reloaded) - Infrastructure as Code with Terraform & PackerLAMP Stack (Reloaded) - Infrastructure as Code with Terraform & Packer
LAMP Stack (Reloaded) - Infrastructure as Code with Terraform & Packer
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
 
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
iguazio - nuclio overview to CNCF (Sep 25th 2017)
iguazio - nuclio overview to CNCF (Sep 25th 2017)iguazio - nuclio overview to CNCF (Sep 25th 2017)
iguazio - nuclio overview to CNCF (Sep 25th 2017)
 
Couchbase overview033113long
Couchbase overview033113longCouchbase overview033113long
Couchbase overview033113long
 
Couchbase overview033113long
Couchbase overview033113longCouchbase overview033113long
Couchbase overview033113long
 
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
 Openstack - An introduction/Installation - Presented at Dr Dobb's conference... Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
 
Storage Is Not Virtualized Enough - part 1
Storage Is Not Virtualized Enough - part 1Storage Is Not Virtualized Enough - part 1
Storage Is Not Virtualized Enough - part 1
 

Mehr von Tugdual Grall

Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkTugdual Grall
 
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkTugdual Grall
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Tugdual Grall
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Tugdual Grall
 
Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015Tugdual Grall
 
Introduction to NoSQL with MongoDB - SQLi Workshop
Introduction to NoSQL with MongoDB - SQLi WorkshopIntroduction to NoSQL with MongoDB - SQLi Workshop
Introduction to NoSQL with MongoDB - SQLi WorkshopTugdual Grall
 
Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Tugdual Grall
 
Proud to be polyglot
Proud to be polyglotProud to be polyglot
Proud to be polyglotTugdual Grall
 
Drop your table ! MongoDB Schema Design
Drop your table ! MongoDB Schema DesignDrop your table ! MongoDB Schema Design
Drop your table ! MongoDB Schema DesignTugdual Grall
 
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6Tugdual Grall
 
Some cool features of MongoDB
Some cool features of MongoDBSome cool features of MongoDB
Some cool features of MongoDBTugdual Grall
 
Building Your First MongoDB Application
Building Your First MongoDB ApplicationBuilding Your First MongoDB Application
Building Your First MongoDB ApplicationTugdual Grall
 
Opensourceday 2014-iot
Opensourceday 2014-iotOpensourceday 2014-iot
Opensourceday 2014-iotTugdual Grall
 
Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?Tugdual Grall
 
Big Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLBig Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLTugdual Grall
 
Big Data Israel Meetup : Couchbase and Big Data
Big Data Israel Meetup : Couchbase and Big DataBig Data Israel Meetup : Couchbase and Big Data
Big Data Israel Meetup : Couchbase and Big DataTugdual Grall
 
FOSDEM 2013 : Getting Started with Couchhbase Server 2.0
FOSDEM 2013 : Getting Started with Couchhbase Server 2.0FOSDEM 2013 : Getting Started with Couchhbase Server 2.0
FOSDEM 2013 : Getting Started with Couchhbase Server 2.0Tugdual Grall
 

Mehr von Tugdual Grall (20)

Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
 
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015
 
Introduction to NoSQL with MongoDB - SQLi Workshop
Introduction to NoSQL with MongoDB - SQLi WorkshopIntroduction to NoSQL with MongoDB - SQLi Workshop
Introduction to NoSQL with MongoDB - SQLi Workshop
 
Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications
 
MongoDB and Hadoop
MongoDB and HadoopMongoDB and Hadoop
MongoDB and Hadoop
 
Proud to be polyglot
Proud to be polyglotProud to be polyglot
Proud to be polyglot
 
Drop your table ! MongoDB Schema Design
Drop your table ! MongoDB Schema DesignDrop your table ! MongoDB Schema Design
Drop your table ! MongoDB Schema Design
 
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
 
Some cool features of MongoDB
Some cool features of MongoDBSome cool features of MongoDB
Some cool features of MongoDB
 
Building Your First MongoDB Application
Building Your First MongoDB ApplicationBuilding Your First MongoDB Application
Building Your First MongoDB Application
 
Opensourceday 2014-iot
Opensourceday 2014-iotOpensourceday 2014-iot
Opensourceday 2014-iot
 
Neotys conference
Neotys conferenceNeotys conference
Neotys conference
 
Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?
 
Big Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLBig Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQL
 
Big Data Israel Meetup : Couchbase and Big Data
Big Data Israel Meetup : Couchbase and Big DataBig Data Israel Meetup : Couchbase and Big Data
Big Data Israel Meetup : Couchbase and Big Data
 
FOSDEM 2013 : Getting Started with Couchhbase Server 2.0
FOSDEM 2013 : Getting Started with Couchhbase Server 2.0FOSDEM 2013 : Getting Started with Couchhbase Server 2.0
FOSDEM 2013 : Getting Started with Couchhbase Server 2.0
 

Kürzlich hochgeladen

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

Introduction to Map Reduce with Couchbase

  • 2. Introduc)on  to  Map  Reduce with  Couchbase Tugdual  Grall  /  @tgrall NoSQL  Ma)ers  ‘13  -­‐  Cologne  -­‐  April  25th  2013 Friday, April 26, 13
  • 3. About  Me   • Tugdual  “Tug”  Grall ­ Couchbase • Technical  Evangelist ­ eXo • CTO ­ Oracle • Developer/Product  Manager • Mainly  Java/SOA ­ Developer  in  consul@ng  firms • Web • @tgrall • hEp://blog.grallandco.com • tgrall • NantesJUG  co-­‐founder • Pet  Project  : • hEp://www.resultri.com Friday, April 26, 13
  • 4. What’s  the  Problem  ? Lots  of  Data Big  Data SaaS/Cloud   CompuDng Big  Users Friday, April 26, 13
  • 5. Solu)on Distribute: •  the  data •  the  processing  of  the  data Friday, April 26, 13
  • 6. Map  Reduce   MapReduce  is  a  programming  model  for  processing   large  data  sets,  and  the  name  of  an  implementa@on   of  the  model  by  Google.  MapReduce  is  typically  used   to  do  distributed  compu@ng  on  clusters  of   computers. hEp://research.google.com/archive/mapreduce.html Friday, April 26, 13
  • 7. In  details • Developer  specifies  2  methods: ­ map (in_key, in_value) -> list(out_key, intermediate_value) • Processes  input  data   • Produces  key,  values  pairs ­ reduce (out_key, list(intermediate_value)) -> list(out_value) • Combines  all  intermediate  values  for  a  par@cular  key • Produce  a  set  of  merged  output  values Friday, April 26, 13
  • 9. Most  common  use  case ©  Yahoo  inc. Friday, April 26, 13
  • 11. Couchbase  Open  Source  Project • Leading  NoSQL  database  project   focused  on  distributed  database   technology  and  surrounding   ecosystem • Supports  both  key-­‐value  and   document-­‐oriented  use  cases • All  components  are  available  under   the  Apache  2.0  Public  License • Obtained  as  packaged  soXware  in   both  enterprise  and  community   edi@ons. Couchbase Open Source Project Friday, April 26, 13
  • 12. Couchbase  Server  Core  Principles Easy   Scalability Consistent  High   Performance Always  On   24x365 Grow  cluster  without  applica@on   changes,  without  down@me  with  a   single  click Consistent  sub-­‐millisecond   read  and  write  response  @mes   with  consistent  high  throughput No  down@me  for  soXware  upgrades,   hardware  maintenance,  etc. Flexible  Data   Model JSON  document  model  with  no  fixed   schema. JSON JSON JSON JSONJSON PERFORMANCE Friday, April 26, 13
  • 13. Addi)onal  Couchbase  Server  Features Built-­‐in  clustering  –  All  nodes  equal Data  replica@on  with  auto-­‐failover Zero-­‐down@me  maintenance   Built-­‐in  managed  cached Append-­‐only  storage  layer Online  compac@on Monitoring  and  admin  API  &  UI SDK  for  a  variety  of  languages Friday, April 26, 13
  • 14. Heartbeat Process  monitor Global  singleton  supervisor Configura@on  manager on  each  node Rebalance  orchestrator Node  health  monitor one  per  cluster vBucket  state  and  replica@on  manager hVp REST  management  API/Web  UI HTTP 8091 Erlang  port  mapper 4369 Distributed  Erlang 21100  -­‐  21199 Erlang/OTP storage  interface Couchbase  EP  Engine 11210 Memcapable    2.0 Moxi 11211 Memcapable    1.0 Memcached New  Persistence  Layer 8092 Query  APIQuery  Engine Data  Manager Cluster  Manager Couchbase  Server  2.0  Architecture Friday, April 26, 13
  • 15. New  Persistence  Layer storage  interface Couchbase  EP  Engine 11210 Memcapable    2.0 Moxi 11211 Memcapable    1.0 Object-­‐level  Cache Disk  Persistence 8092 Query  API Query  Engine HTTP 8091 Erlang  port  mapper 4369 Distributed  Erlang 21100  -­‐  21199 Heartbeat Process  monitor Global  singleton  supervisor Configura@on  manager on  each  node Rebalance  orchestrator Node  health  monitor one  per  cluster vBucket  state  and  replica@on  manager hVp REST  management  API/Web  UI Erlang/OTP Server/Cluster   Management  &   CommunicaDon (Erlang) RAM  Cache,   Indexing  &   Persistence   Management (C  &  V8) The Unreasonable Effectiveness of C by Damien Katz Couchbase  Server  2.0  Architecture Friday, April 26, 13
  • 16. COUCHBASE  SERVER  CLUSTER Basic  Opera)on • Docs  distributed  evenly  across  servers   • Each  server  stores  both  ac)ve  and  replica   docs Only  one  server  ac@ve  at  a  @me • Client  library  provides  app  with  simple   interface  to  database • Cluster  map  provides  map   to  which  server  doc  is  on App  never  needs  to  know • App  reads,  writes,  updates  docs • Mul)ple  app  servers  can  access  same   document  at  same  )me User  Configured  Replica  Count  =  1 READ/WRITE/UPDATE ACTIVE Doc  5 Doc  2 Doc Doc Doc SERVER  1 ACTIVE Doc  4 Doc  7 Doc Doc Doc SERVER  2 Doc  8 ACTIVE Doc  1 Doc  2 Doc Doc Doc REPLICA Doc  4 Doc  1 Doc  8 Doc Doc Doc REPLICA Doc  6 Doc  3 Doc  2 Doc Doc Doc REPLICA Doc  7 Doc  9 Doc  5 Doc Doc Doc SERVER  3 Doc  6 APP  SERVER  1 COUCHBASE  Client  Library CLUSTER  MAP COUCHBASE  Client  Library CLUSTER  MAP APP  SERVER  2 Doc  9 Friday, April 26, 13
  • 17. How  to  access  the  data? Friday, April 26, 13
  • 19. Key {        “string”  :  “string”,        “string”  :  value,        “string”  :                        {    “string”  :  “string”,                              “string”  :  value  },        “string”  :  [  array  ] } JSON OBJECT (“DOCUMENT”) • How  to  find  document  based   on  its  aVributes? ­ get  employee  by  email ­ get  products  by  type ­ ... • You  need  to  look  “into”  the   document/value Look  at  a  document Friday, April 26, 13
  • 20. Create  an  index  ! How  to? Friday, April 26, 13
  • 21. { "name": "Aventinus", "abv": 8.2, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "110f1f2012", "updated": "2010-07-22 20:00:20", "description": "Dark-ruby, ... Weizenbock", "category": "German Ale" } { "id": "110f37fa30", "rev": "1-000000000", "expiration": 0, "flags": 0, "type": "json" } Key Value Aven@nus 8.2 Avenue  Ale 4.1 ... ... { "name": "Aventinus", "abv": 8.2, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "110f1f2012", "updated": "2010-07-22 20:00:20", "description": "Dark-ruby, ... Weizenbock", "category": "German Ale" } { "id": "110f37fa30", "rev": "1-000000000", "expiration": 0, "flags": 0, "type": "json" } { "name": "Aventinus", "abv": 8.2, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "110f1f2012", "updated": "2010-07-22 20:00:20", "description": "Dark-ruby, ... Weizenbock", "category": "German Ale" } { "id": "110f37fa30", "rev": "1-000000000", "expiration": 0, "flags": 0, "type": "json" } { "name": "Aventinus", "abv": 8.2, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "110f1f2012", "updated": "2010-07-22 20:00:20", "description": "Dark-ruby, ... Weizenbock", "category": "German Ale" } { "id": "110f37fa30", "rev": "1-000000000", "expiration": 0, "flags": 0, "type": "json" } { "name": "Aventinus", "abv": 8.2, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "110f1f2012", "updated": "2010-07-22 20:00:20", "description": "Dark-ruby, ... Weizenbock", "category": "German Ale" } { "id": "110f37fa30", "rev": "1-000000000", "expiration": 0, "flags": 0, "type": "json" } { "name": "Aventinus", "abv": 8.2, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "110f1f2012", "updated": "2010-07-22 20:00:20", "description": "Dark-ruby, ... Weizenbock", "category": "German Ale" } { "id": "110f37fa30", "rev": "1-000000000", "expiration": 0, "flags": 0, "type": "json" } { "name": "Aventinus", "abv": 8.2, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "110f1f2012", "updated": "2010-07-22 20:00:20", "description": "Dark-ruby, ... Weizenbock", "category": "German Ale" } { "id": "110f37fa30", "rev": "1-000000000", "expiration": 0, "flags": 0, "type": "json" } { "name": "Aventinus", "abv": 8.2, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "110f1f2012", "updated": "2010-07-22 20:00:20", "description": "Dark-ruby, ... Weizenbock", "category": "German Ale" } { "id": "110f37fa30", "rev": "1-000000000", "expiration": 0, "flags": 0, "type": "json" } { "name": "Aventinus", "abv": 8.2, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "110f1f2012", "updated": "2010-07-22 20:00:20", "description": "Dark-ruby, ... Weizenbock", "category": "German Ale" } { "id": "110f37fa30", "rev": "1-000000000", "expiration": 0, "flags": 0, "type": "json" } Create  the  index Friday, April 26, 13
  • 22. Concrete  Example • This  map  func)on: ­ receives  the  document  and  metadata ­ as  developer  you  just  have  to  emit  the  K,V Friday, April 26, 13
  • 24. doc.email meta.id abba@couchbase.com u::1 beta@couchbase.com u::7 jasdeep@couchbase.com u::2 math@couchbase.com u::5 maE@couchbase.com u::6 ye@@couchbase.com u::4 zorro@couchbase.com u::3 ?startkey=”b1”  &  endkey=”zz” Pulls  the  Index-­‐Keys   between  UTF-­‐8  Range   specified  by  the   startkey  and  endkey. ?startkey=”bz”  &  endkey=”zn” Pulls  the  Index-­‐Keys   between  UTF-­‐8  Range   specified  by  the   startkey  and  endkey. Friday, April 26, 13
  • 25. doc.email meta.id abba@couchbase.com u::1 beta@couchbase.com u::7 jasdeep@couchbase.com u::2 math@couchbase.com u::5 maE@couchbase.com u::6 ye@@couchbase.com u::4 zorro@couchbase.com u::3 ?key=”math@couchbase.com”   Match  a  Single  Index-­‐Key Friday, April 26, 13
  • 26. doc.email meta.id abba@couchbase.com u::1 beta@couchbase.com u::7 jasdeep@couchbase.com u::2 math@couchbase.com u::5 maE@couchbase.com u::6 ye@@couchbase.com u::4 zorro@couchbase.com u::3 ?keys=[“math@couchbase.com”, “yeD@couchbase.com”] Query  Mul@ple  in  the   Set  (Array  Nota@on) Friday, April 26, 13
  • 27. How  it  works  ? Friday, April 26, 13
  • 28. COUCHBASE  SERVER    CLUSTER Indexing  and  Querying   User  Configured  Replica  Count  =  1 ACTIVE Doc  5 Doc  2 Doc Doc Doc SERVER  1 REPLICA Doc  4 Doc  1 Doc  8 Doc Doc Doc APP  SERVER  1 COUCHBASE  Client  Library CLUSTER  MAP COUCHBASE  Client  Library CLUSTER  MAP APP  SERVER  2 Doc  9 • Indexing  work  is  distributed  amongst   nodes • Large  data  set  possible • Parallelize  the  effort • Each  node  has  index  for  data  stored  on  it • Queries  combine  the  results  from   required  nodes ACTIVE Doc  5 Doc  2 Doc Doc Doc SERVER  2 REPLICA Doc  4 Doc  1 Doc  8 Doc Doc Doc Doc  9 ACTIVE Doc  5 Doc  2 Doc Doc Doc SERVER  3 REPLICA Doc  4 Doc  1 Doc  8 Doc Doc Doc Doc  9 Query Friday, April 26, 13
  • 29. Couchbase  Server  2.0:  Views • Views  can  cover  a  few  different  use  cases ­ Primary  Index   ­ Simple  secondary  indexes  (the  most  common) ­ Complex  secondary,  ter@ary  and  composite  indexes ­ Aggrega@on  func@ons  (reduc@on) • Example:  count  the  number  of  “North  American  Ales” ­ Organizing  related  data • Built  using  Map/Reduce ­ Map  func@on  creates  a  matrix  from  document  fields ­ Reduce  func@on  summarizes  (reduces)  informa@on Friday, April 26, 13
  • 30. Distributed  Index  Build  Phase • Op)mized  for  lookups,  in-­‐order  access  and  aggrega)ons • All  view  reads  from  disk  (different  performance  profile) • View  builds  against  every  document  on  every  node ­ This  is  why  you  should  group  them  in  a  design  document • Automa)cally  kept  up  to  date ­ “Incremental”  Map  Reduce Friday, April 26, 13
  • 31. Dynamic  Range  Queries  with  Op5onal  Aggrega5on •Efficiently  fetch  an  row  or  group  of  related  rows. •Queries  use  cached  values  from  B-­‐tree  inner  nodes  when  possible •Take  advantage  of  in-­‐order  tree  traversal  with  group_level  queries Doc  4 Doc  2 Doc  5 SERVER  1 Doc  6 Doc  4 SERVER  2 Doc  7 Doc  1 SERVER  3 Doc  3 Doc  9 Doc  7 Doc  8 Doc  6 Doc  3 DOC DOC DOC DOC DOC DOC DOC DOC DOC DOC DOC DOC DOC DOC DOC Doc  9 Doc  5 DOC DOC DOC Doc  1 Doc  8 Doc  2 Replica  Docs Replica  Docs Replica  Docs Ac@ve  Docs Ac@ve  Docs Ac@ve  Docs ?startkey=“J”&endkey=“K” {“rows”:[{“key”:“Juneau”,“value”:null}]} Friday, April 26, 13
  • 32. Append  Only  Index • Disk  acDvity  is  slow • UpdaDng  disk  blocks  is  very  slow • Appending  new  data  to  the  end  of  the  current  file  is  fast • Overhead  of  reverse  reading  is  small • Because  exisDng  blocks  are  not  re-­‐used,  can  lead  to  fragmentaDon ­ Couchbase  will  compact  the  index  automa@cally Doc View Processor Disk Doc View Processor Changed Documents Appended Original Friday, April 26, 13
  • 33. Adding  a  new  Document A-R 15 I-R 8 M-R 5 A B C D F G H I K L N O Q R A-C 3 D-F 2 G-H 2 I-L 3 N-R 4 A-H 7 I-R 7 A-R 14 M new root new key new reductions Friday, April 26, 13
  • 34. What  about  Reduce  ? • Out  of  the  box  func)ons  : ­ _count() ­ _sum() ­ _stats() • Create  your  own  if  needed function(key, values, rereduce) { if (rereduce) { var result = 0; for (var i = 0; i < values.length; i++) { result += values[i]; } return result; } else { return values.length; } } Friday, April 26, 13
  • 35. Reduce  Func)on • Key  and  Arrays  of  values  as  parameters • WriVen  Javascript • Called  aner  the  map  func)on • Used  to  reduce  the  result  of  a  map  of  single  values • Used  with  grouping • Could  be  ignored  when  querying ­ reuse  the  index Friday, April 26, 13
  • 36. • Map()  Result • Reduce() • Result Reduce  in  Ac)on Key Value Belgian-­‐Style  Dubbel 1 Belgian-­‐Style  Dubbel 1 Belgian-­‐Style  Dubbel 1 Belgian-­‐Style  Pale  Ale 1 Belgian-­‐Style  White 1 Belgian-­‐Style  White 1 ... ... _count() Key Value Belgian-­‐Style  Dubbel 3 Belgian-­‐Style  Pale  Ale 1 Belgian-­‐Style  White 2 Friday, April 26, 13
  • 37. How  to  use  it? • Use  client  SDK  to  call  the  view: View view = client.getView("beer", "by_name"); Query query = new Query(); query.setIncludeDocs(true)      .setLimit(20)      .setRangeStart(ComplexKey.of(startKey))      .setRangeEnd(ComplexKey.of(startKey + "uefff")); ViewResponse result = client.query(view, query);   for(ViewRow row : result) { .... } Friday, April 26, 13
  • 39. ≠ Hadoop  &  Couchbase • Deal  with  “Big  Data” • “More”  is  be)er  than  “Faster” • Batch  Oriented • Usually  used  to  “extract/transform”  data • Fully  distributed ­ Map,  Shuffle,  Reduce • Distributed   • Executed  where  the  document  is • Deal  with  “indexing”  data   • As  fast  as  possible • Use  to  query  the  data  in  the  Database Friday, April 26, 13
  • 40. Map  Reduce  in  Couchbase • Like  many  other  NoSQL  Database  :  Used  for  queries  !   • Index  are  distributed  on  each  node  of  the  cluster • Index  are  updated  Incrementally • Write  you  Map  Reduce  in  Javascript Friday, April 26, 13
  • 41. Thank  you! tug@couchbase.com @tgrall Get  Couchbase  Server  at   hEp://www.couchbase.com/download Friday, April 26, 13