SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Downloaden Sie, um offline zu lesen
Mongo-Hadoop Integration
Justin Lee
Software Engineer @ MongoDB
We will cover:
•what it is
•how it works
•a tour of what it can do
A quick briefing on what Mongo
and Hadoop are all about:
(Q+A at the end)
document-oriented database with
dynamic schema
stores data in JSON-like documents:
{
_id : “kosmo kramer”,
age : 42,
location : {
state : ”NY”,
zip : ”10024”
},
favorite_colors : [“red”, “green”]
}
different structure in each document
values can be simple like strings and ints or nested documents
mongodb scales horizontally via
sharding to handle lots of data and load
app
Java-based framework for MapReduce
Excels at batch processing on large data sets
by taking advantage of parallelism
map reduce created by google (white paper)
implemented in open source by hadoop
Mongo-Hadoop Connector - Why
Lots of people using Hadoop and Mongo
separately but need integration
Custom import/export scripts often
used to get data in+out
Scalability and flexibility with changes in
Hadoop or MongoDB configurations
Need to process data across multiple sources
custom scripts slow, fragile
Mongo-Hadoop Connector
Turn MongoDB into a Hadoop-enabled filesystem:
use as the input or output for Hadoop
.BSON
-or-
input
data
.BSON
-or-
Hadoop
Cluster
output
results
bson file new in 1.1
bson is the output of mongodump
Mongo-Hadoop Connector
Benefits + Features
Takes advantage of full multi-core
parallelism to process data in Mongo
Full integration with Hadoop and JVM ecosystems
Can be used with Amazon Elastic MapReduce
Can read and write backup files from local
filesystem, HDFS, or S3
Mongo-Hadoop Connector
Vanilla Java MapReduce
write MapReduce code in
ruby
or if you don’t want to use Java,
support for Hadoop Streaming.
Benefits + Features
can write your own language binding
Mongo-Hadoop Connector
Support for Pig
high-level scripting language for data analysis and
building MapReduce workflows
Support for Hive
SQL-like language for ad-hoc queries + analysis of data sets on
Hadoop-compatible file systems
Benefits + Features
Mongo-Hadoop Connector
How it works:
Adapter examines the MongoDB input collection and
calculates a set of splits from the data
Each split gets assigned to a node in Hadoop cluster
In parallel, Hadoop nodes pull data for splits from
MongoDB (or BSON) and process them locally
Hadoop merges results and streams output back to
MongoDB or BSON
Tour of Mongo-Hadoop, by Example
- Using Java MapReduce with Mongo-Hadoop
- Using Hadoop Streaming
- Pig and Hive with Mongo-Hadoop
- Elastic MapReduce + BSON
{
"_id" : ObjectId("4f2ad4c4d1e2d3f15a000000"),
"body" : "Here is our forecastnn ",
"filename" : "1.",
"headers" : {
"From" : "phillip.allen@enron.com",
"Subject" : "Forecast Info",
"X-bcc" : "",
"To" : "tim.belden@enron.com",
"X-Origin" : "Allen-P",
"X-From" : "Phillip K Allen",
"Date" : "Mon, 14 May 2001 16:39:00 -0700 (PDT)",
"X-To" : "Tim Belden ",
"Message-ID" : "<18782981.1075855378110.JavaMail.evans@thyme>",
"Content-Type" : "text/plain; charset=us-ascii",
"Mime-Version" : "1.0"
}
}
Input Data: Enron e-mail corpus (501k records, 1.75Gb)
each document is one email
sender
recipients
{"_id": {"t":"bob@enron.com", "f":"alice@enron.com"}, "count" : 14}
{"_id": {"t":"bob@enron.com", "f":"eve@enron.com"}, "count" : 9}
{"_id": {"t":"alice@enron.com", "f":"charlie@enron.com"}, "count" : 99}
{"_id": {"t":"charlie@enron.com", "f":"bob@enron.com"}, "count" : 48}
{"_id": {"t":"eve@enron.com", "f":"charlie@enron.com"}, "count" : 20}
Let’s use Hadoop to build a graph of (senders → recipients)
and the count of messages exchanged between each pair
bob
alice
eve
charlie
14
99
9
48
20
sample, simplified data
nodes are people. edges/arrows # of msgs from A to B
Example 1 - Java MapReduce
mongodb document passed into
Hadoop MapReduce
Map phase - each input doc gets
passed through a Mapper function
@Override
public	
  void	
  map(NullWritable	
  key,	
  BSONObject	
  val,	
  final	
  Context	
  context){
	
  	
  	
  	
  BSONObject	
  headers	
  =	
  (BSONObject)val.get("headers");
	
  	
  	
  	
  if(headers.containsKey("From")	
  &&	
  headers.containsKey("To")){
	
  	
  	
  	
  	
  	
  	
  	
  String	
  from	
  =	
  (String)headers.get("From");
	
  	
  	
  	
  	
  	
  	
  	
  String	
  to	
  =	
  (String)headers.get("To");
	
  	
  	
  	
  	
  	
  	
  	
  String[]	
  recips	
  =	
  to.split(",");
	
  	
  	
  	
  	
  	
  	
  	
  for(int	
  i=0;i<recips.length;i++){
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  String	
  recip	
  =	
  recips[i].trim();
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  context.write(new	
  MailPair(from,	
  recip),	
  new	
  IntWritable(1));
	
  	
  	
  	
  	
  	
  	
  	
  }
	
  	
  	
  	
  }
}
input value doc from mongo. connector will handle translation into
BSONObject for you
output written back to MongoDB
Example 1 - Java MapReduce (cont)
Reduce phase - outputs of Map are grouped
together by key and passed to Reducer
the {to, from} key
list of all the values
collected under the key
	
  	
  	
  	
  public	
  void	
  reduce(	
  final	
  MailPair	
  pKey,
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  final	
  Iterable<IntWritable>	
  pValues,
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  final	
  Context	
  pContext	
  ){
	
  	
  	
  	
  	
  	
  	
  	
  int	
  sum	
  =	
  0;
	
  	
  	
  	
  	
  	
  	
  	
  for	
  (	
  final	
  IntWritable	
  value	
  :	
  pValues	
  ){
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  sum	
  +=	
  value.get();
	
  	
  	
  	
  	
  	
  	
  	
  }
	
  	
  	
  	
  	
  	
  	
  	
  BSONObject	
  outDoc	
  =	
  new	
  BasicDBObjectBuilder().start()
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .add(	
  "f"	
  ,	
  pKey.from)
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .add(	
  "t"	
  ,	
  pKey.to	
  )
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .get();
	
  	
  	
  	
  	
  	
  	
  	
  BSONWritable	
  pkeyOut	
  =	
  new	
  BSONWritable(outDoc);
	
  	
  	
  	
  	
  	
  	
  	
  pContext.write(	
  pkeyOut,	
  new	
  IntWritable(sum)	
  );
	
  	
  	
  	
  }
Example 1 - Java MapReduce (cont)
mongo.job.input.format=com.mongodb.hadoop.MongoInputFormat
mongo.input.uri=mongodb://my-db:27017/enron.messages
Read from MongoDB
Read from BSON
mongo.job.input.format=com.mongodb.hadoop.BSONFileInputFormat
mapred.input.dir=file:///tmp/messages.bson
hdfs:///tmp/messages.bson
s3:///tmp/messages.bson
Example 1 - Java MapReduce (cont)
mongo.job.output.format=com.mongodb.hadoop.MongoOutputFormat
mongo.output.uri=mongodb://my-db:27017/enron.results_out
Write output to MongoDB
Write output to BSON
mongo.job.output.format=com.mongodb.hadoop.BSONFileOutputFormat
mapred.output.dir=file:///tmp/results.bson
hdfs:///tmp/results.bson
s3:///tmp/results.bson
Results : Output Data
mongos> db.results_out.find({"_id.t": /^kenneth.lay/})
{ "_id" : { "t" : "kenneth.lay@enron.com",
"f" : "15126-1267@m2.innovyx.com" }, "count" : 1 }
{ "_id" : { "t" : "kenneth.lay@enron.com",
"f" : "2586207@www4.imakenews.com" }, "count" : 1 }
{ "_id" : { "t" : "kenneth.lay@enron.com",
"f" : "40enron@enron.com" }, "count" : 2 }
{ "_id" : { "t" : "kenneth.lay@enron.com",
"f" : "a..davis@enron.com" }, "count" : 2 }
{ "_id" : { "t" : "kenneth.lay@enron.com",
"f" : "a..hughes@enron.com" }, "count" : 4 }
{ "_id" : { "t" : "kenneth.lay@enron.com",
"f" : "a..lindholm@enron.com" }, "count" : 1 }
{ "_id" : { "t" : "kenneth.lay@enron.com",
"f" : "a..schroeder@enron.com" }, "count" : 1 }
...
has more
Example 2 - Hadoop Streaming
Let’s do the same Enron MapReduce job with
Python instead of Java
$ pip install pymongo_hadoop
Example 2 - Hadoop Streaming (cont)
Hadoop passes data to an external process
via STDOUT/STDIN
map(k, v)
map(k, v)
map(k, v)map()
JVM
STDIN
Python / Ruby / JS
interpreter
STDOUT
Hadoop (JVM)
def mapper(documents):
. . .
Example 2 - Hadoop Streaming (cont)
from pymongo_hadoop import BSONMapper
def mapper(documents):
i = 0
for doc in documents:
i = i + 1
from_field = doc['headers']['From']
to_field = doc['headers']['To']
recips = [x.strip() for x in to_field.split(',')]
for r in recips:
yield {'_id': {'f':from_field, 't':r}, 'count': 1}
BSONMapper(mapper)
print >> sys.stderr, "Done Mapping."
BSONMapper is pymongo layer that translates from hadoop streaming
back to hadoop
Example 2 - Hadoop Streaming (cont)
from pymongo_hadoop import BSONReducer
def reducer(key, values):
print >> sys.stderr, "Processing from/to %s" % str(key)
_count = 0
for v in values:
_count += v['count']
return {'_id': key, 'count': _count}
BSONReducer(reducer)
Surviving Hadoop:
making MapReduce easier
Pig + Hive
writing m/r jobs from scratch can be clunky and cumbersome
Example 3 - Mongo-Hadoop and Pig
Let’s do the same thing yet again, but this
time using Pig
Pig is a powerful language that can generate
sophisticated MapReduce workflows from simple
scripts
Can perform JOIN, GROUP, and execute
user-defined functions (UDFs)
Example 3 - Mongo-Hadoop and Pig (cont)
Pig directives for loading data:
BSONLoader and MongoLoader
Writing data out
BSONStorage and MongoInsertStorage
data = LOAD 'mongodb://localhost:27017/db.collection'
using com.mongodb.hadoop.pig.MongoLoader;
STORE records INTO 'file:///output.bson'
using com.mongodb.hadoop.pig.BSONStorage;
Pig has its own special datatypes:
Bags, Maps, and Tuples
Mongo-Hadoop Connector intelligently
converts between Pig datatypes and
MongoDB datatypes
Example 3 - Mongo-Hadoop and Pig (cont)
bags -> arrays
maps -> objects
raw = LOAD 'hdfs:///messages.bson'
using com.mongodb.hadoop.pig.BSONLoader('','headers:[]') ;
send_recip = FOREACH raw GENERATE $0#'From' as from, $0#'To' as to;
send_recip_filtered = FILTER send_recip BY to IS NOT NULL;
send_recip_split = FOREACH send_recip_filtered GENERATE
from as from, TRIM(FLATTEN(TOKENIZE(to))) as to;
send_recip_grouped = GROUP send_recip_split BY (from, to);
send_recip_counted = FOREACH send_recip_grouped GENERATE
group, COUNT($1) as count;
STORE send_recip_counted INTO 'file:///enron_results.bson'
using com.mongodb.hadoop.pig.BSONStorage;
Example 3 - Mongo-Hadoop and Pig (cont)
Hive with Mongo-Hadoop
Similar idea to Pig - process your data without
needing to write MapReduce code from
scratch
...but with SQL as the language of choice
Hive with Mongo-Hadoop
Sample Data:
db.users
db.users.find()
{ "_id": 1, "name": "Tom", "age": 28 }
{ "_id": 2, "name": "Alice", "age": 18 }
{ "_id": 3, "name": "Bob", "age": 29 }
{ "_id": 101, "name": "Scott", "age": 10 }
{ "_id": 104, "name": "Jesse", "age": 52 }
{ "_id": 110, "name": "Mike", "age": 32 }
...
CREATE TABLE mongo_users (id int, name string, age int)
STORED BY "com.mongodb.hadoop.hive.MongoStorageHandler"
WITH SERDEPROPERTIES( "mongo.columns.mapping" = "_id,name,age" )
TBLPROPERTIES ( "mongo.uri" = "mongodb://localhost:27017/test.users");
first, declare the collection to be
accessible in Hive:
Hive with Mongo-Hadoop
...then you can run SQL on it, like a table.
SELECT name,age FROM mongo_users WHERE id > 100 ;
SELECT * FROM mongo_users GROUP BY age WHERE id > 100 ;
you can use GROUP BY:
or JOIN multiple tables/collections together:
SELECT * FROM mongo_users T1
JOIN user_emails T2
WHERE T1.id = T2.id;
subset of SQL
Write the output of queries back into new tables:
INSERT OVERWRITE TABLE old_users SELECT id,name,age
FROM mongo_users WHERE age > 100 ;
DROP TABLE mongo_users;
Drop a table in Hive to delete the
underlying collection in MongoDB
use “external” when declaring your table to prevent the collection drop
Usage with Amazon Elastic MapReduce
Run mongo-hadoop jobs without
needing to set up or manage your
own Hadoop cluster.
Pig, Hive, and streaming work on EMR, too!
Logs get captured into S3 files
Usage with Amazon Elastic MapReduce
First, make a “bootstrap” script that
fetches dependencies (mongo-hadoop
jar and java drivers)
#!/bin/sh
wget -P /home/hadoop/lib http://central.maven.org/maven2/org/
mongodb/mongo-java-driver/2.12.2/mongo-java-driver-2.12.2.jar
wget -P /home/hadoop/lib https://s3.amazonaws.com/mongo-hadoop-
code/mongo-hadoop-core_1.1.2-1.1.0.jar
this will get executed on each node in
the cluster that EMR builds for us.
working on updating hadoop artifacts in maven
Example 4 - Usage with Amazon Elastic MapReduce
Put the bootstrap script, and all your code,
into an S3 bucket where Amazon can see it.
s3cp ./bootstrap.sh s3://$S3_BUCKET/bootstrap.sh
s3mod s3://$S3_BUCKET/bootstrap.sh public-read
s3cp $HERE/../enron/target/enron-example.jar s3://$S3_BUCKET/
enron-example.jar
s3mod s3://$S3_BUCKET/enron-example.jar public-read
$ elastic-mapreduce --create --jobflow ENRON000
--instance-type m1.xlarge
--num-instances 5
--bootstrap-action s3://$S3_BUCKET/bootstrap.sh
--log-uri s3://$S3_BUCKET/enron_logs
--jar s3://$S3_BUCKET/enron-example.jar
--arg -D --arg mongo.job.input.format=com.mongodb.hadoop.BSONFileInputFormat
--arg -D --arg mapred.input.dir=s3n://mongo-test-data/messages.bson
--arg -D --arg mapred.output.dir=s3n://$S3_BUCKET/BSON_OUT
--arg -D --arg mongo.job.output.format=com.mongodb.hadoop.BSONFileOutputFormat
# (any additional parameters here)
Example 4 - Usage with Amazon Elastic MapReduce
...then launch the job from the command
line, pointing to your S3 locations
Control the type and
number of instances
in the cluster
Example 4 - Usage with Amazon Elastic MapReduce
Easy to kick off a Hadoop job, without needing
to manage a Hadoop cluster
Pig, Hive, and streaming work on EMR, too!
Logs get captured into S3 files
Example 5 - New Feature: MongoUpdateWritable
... but we can also modify an existing output
collection
Works by applying mongodb update modifiers:
$push, $pull, $addToSet, $inc, $set, etc.
Can be used to do incremental MapReduce or
“join” two collections
In previous examples, we wrote job output data
by inserting into a new collection
Example 5 - MongoUpdateWritable
For example,
let’s say we have two collections.
{
	
  	
  "_id":	
  ObjectId("51b792d381c3e67b0a18d678"),
	
  	
  "sensor_id":	
  ObjectId("51b792d381c3e67b0a18d4a1"),
	
  	
  "value":	
  3328.5895416489802,
	
  	
  "timestamp":	
  ISODate("2013-­‐05-­‐18T13:11:38.709-­‐0400"),
	
  	
  "loc":	
  [-­‐175.13,51.658]
}
{
	
  	
  "_id":	
  ObjectId("51b792d381c3e67b0a18d0ed"),
	
  	
  "name":	
  "730LsRkX",
	
  	
  "type":	
  "pressure",
	
  	
  "owner":	
  "steve",
}
sensors
log
events
refers to which sensor
logged the event
For each owner, we want to calculate how many events
were recorded for each type of sensor that logged it.
For each owner, we want to calculate how many events
were recorded for each type of sensor that logged it.
Plain english:
Bob’s sensors for temperature have stored 1300 readings
Bob’s sensors for pressure have stored 400 readings
Alice’s sensors for humidity have stored 600 readings
Alice’s sensors for temperature have stored 700 readings
etc...
sensors
(mongodb collection)
Stage 1 -MapReduce
on sensors collection
Results
(mongodb collection)
for each sensor, emit:
{key: owner+type, value: _id}
group data from map() under each key, output:
{key: owner+type, val: [ list of _ids] }
read from
mongodb
insert() new records
to mongodb
MapReduce
log events
(mongodb collection)
do this in two stages
the sensor’s
owner and type
After stage one, the output
docs look like:
list of ID’s of
sensors with this
owner and type
{
	
  	
  "_id":	
  "alice	
  pressure",
	
  	
  "sensors":	
  [
	
  	
  	
  	
  ObjectId("51b792d381c3e67b0a18d475"),
	
  	
  	
  	
  ObjectId("51b792d381c3e67b0a18d16d"),
	
  	
  	
  	
  ObjectId("51b792d381c3e67b0a18d2bf"),
	
  	
  	
  	
  …
	
  	
  ]
}
Now we just need to count the total # of log events recorded for
any sensors that appear in the list for each owner/type group.
sensors
(mongodb collection)
Stage 2 -MapReduce on
log events collection
read from
mongodb
Results
(mongodb collection)
update() existing
records in mongodb
MapReduce
log events
(mongodb collection)
for each sensor, emit:
{key: sensor_id, value: 1}
group data from map() under each key
for each value in that key:
update({sensors: key}, {$inc : {logs_count:1}})
context.write(null,	
  
new	
  MongoUpdateWritable(
	
  	
  	
  query,	
  //which	
  documents	
  to	
  modify	
  
	
  	
  	
  update,	
  //how	
  to	
  modify	
  ($inc)
	
  	
  	
  true,	
  	
  	
  	
  //upsert
	
  	
  	
  false)
);	
  //	
  multi
Example - MongoUpdateWritable
Result after stage 2
{
	
  	
  "_id":	
  "1UoTcvnCTz	
  temp",
	
  	
  "sensors":	
  [
	
  	
  	
  	
  ObjectId("51b792d381c3e67b0a18d475"),
	
  	
  	
  	
  ObjectId("51b792d381c3e67b0a18d16d"),
	
  	
  	
  	
  ObjectId("51b792d381c3e67b0a18d2bf"),
	
  	
  	
  	
  …
	
  	
  ],
	
  	
  "logs_count":	
  1050616
}
now populated with correct count
New Features in v1.2 and beyond
Continually improving Hive support
Performance Improvements - Lazy BSON
Support for multi-collection input sources
API for adding
custom splitter implementations
and more
primarily focusing on hive but pig is next
maven central
Recap
Mongo-Hadoop - use Hadoop to do massive computations
on big data sets stored in MongoDB/BSON
Tools and APIs make it easier:
Streaming, Pig, Hive, EMR, etc.
MongoDB becomes a Hadoop-enabled filesystem
Questions?
https://github.com/mongodb/mongo-hadoop/tree/
master/examples
Examples can be found on github:
MongoDB World
New York City, June 23-25
Save 25% with 25JustinLee
Register at world.mongodb.com

Weitere ähnliche Inhalte

Was ist angesagt?

MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know Norberto Leite
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework MongoDB
 
Back to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationBack to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationMongoDB
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsMongoDB
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...Gianfranco Palumbo
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesMongoDB
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB
 
Webinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and JavaWebinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and JavaMongoDB
 
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation FrameworkConceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation FrameworkMongoDB
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation Amit Ghosh
 
Webinar: Transitioning from SQL to MongoDB
Webinar: Transitioning from SQL to MongoDBWebinar: Transitioning from SQL to MongoDB
Webinar: Transitioning from SQL to MongoDBMongoDB
 
MongoDB - Back to Basics - La tua prima Applicazione
MongoDB - Back to Basics - La tua prima ApplicazioneMongoDB - Back to Basics - La tua prima Applicazione
MongoDB - Back to Basics - La tua prima ApplicazioneMassimo Brignoli
 
MongoDB for Analytics
MongoDB for AnalyticsMongoDB for Analytics
MongoDB for AnalyticsMongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBantoinegirbal
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkMongoDB
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBMongoDB
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLMongoDB
 

Was ist angesagt? (20)

MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework
 
Back to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationBack to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB Application
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
 
Webinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and JavaWebinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and Java
 
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation FrameworkConceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation
 
Webinar: Transitioning from SQL to MongoDB
Webinar: Transitioning from SQL to MongoDBWebinar: Transitioning from SQL to MongoDB
Webinar: Transitioning from SQL to MongoDB
 
MongoDB - Back to Basics - La tua prima Applicazione
MongoDB - Back to Basics - La tua prima ApplicazioneMongoDB - Back to Basics - La tua prima Applicazione
MongoDB - Back to Basics - La tua prima Applicazione
 
MongoDB for Analytics
MongoDB for AnalyticsMongoDB for Analytics
MongoDB for Analytics
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation Framework
 
MongoDB 3.2 - Analytics
MongoDB 3.2  - AnalyticsMongoDB 3.2  - Analytics
MongoDB 3.2 - Analytics
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
 
An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDB
 
Indexing
IndexingIndexing
Indexing
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 

Andere mochten auch

Migration from SQL to MongoDB - A Case Study at TheKnot.com
Migration from SQL to MongoDB - A Case Study at TheKnot.com Migration from SQL to MongoDB - A Case Study at TheKnot.com
Migration from SQL to MongoDB - A Case Study at TheKnot.com MongoDB
 
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time ActionApache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time ActionJoão Gabriel Lima
 
Migrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDBMigrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDBMongoDB
 
Transitioning from SQL to MongoDB
Transitioning from SQL to MongoDBTransitioning from SQL to MongoDB
Transitioning from SQL to MongoDBMongoDB
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBMongoDB
 

Andere mochten auch (6)

Migration from SQL to MongoDB - A Case Study at TheKnot.com
Migration from SQL to MongoDB - A Case Study at TheKnot.com Migration from SQL to MongoDB - A Case Study at TheKnot.com
Migration from SQL to MongoDB - A Case Study at TheKnot.com
 
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time ActionApache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
 
Migrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDBMigrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDB
 
Spark and MongoDB
Spark and MongoDBSpark and MongoDB
Spark and MongoDB
 
Transitioning from SQL to MongoDB
Transitioning from SQL to MongoDBTransitioning from SQL to MongoDB
Transitioning from SQL to MongoDB
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDB
 

Ähnlich wie Hadoop - MongoDB Webinar June 2014

Barcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop PresentationBarcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop PresentationNorberto Leite
 
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDaysConexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDaysCAPSiDE
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorHenrik Ingo
 
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRealtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRick Copeland
 
Spray Json and MongoDB Queries: Insights and Simple Tricks.
Spray Json and MongoDB Queries: Insights and Simple Tricks.Spray Json and MongoDB Queries: Insights and Simple Tricks.
Spray Json and MongoDB Queries: Insights and Simple Tricks.Andrii Lashchenko
 
2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongo2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongoMichael Bright
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and PythonMike Bright
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation FrameworkCaserta
 
Hadoop_Pennonsoft
Hadoop_PennonsoftHadoop_Pennonsoft
Hadoop_PennonsoftPennonSoft
 
Spring Data MongoDB 介紹
Spring Data MongoDB 介紹Spring Data MongoDB 介紹
Spring Data MongoDB 介紹Kuo-Chun Su
 
REST Web API with MongoDB
REST Web API with MongoDBREST Web API with MongoDB
REST Web API with MongoDBMongoDB
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
Spring Data, Jongo & Co.
Spring Data, Jongo & Co.Spring Data, Jongo & Co.
Spring Data, Jongo & Co.Tobias Trelle
 
D3 Mapping Visualization
D3 Mapping VisualizationD3 Mapping Visualization
D3 Mapping VisualizationSudhir Chowbina
 
CouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 HourCouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 HourPeter Friese
 
Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation OptionsWebinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation OptionsMongoDB
 
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...Inhacking
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Аліна Шепшелей
 

Ähnlich wie Hadoop - MongoDB Webinar June 2014 (20)

Barcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop PresentationBarcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop Presentation
 
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDaysConexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
 
hadoop
hadoophadoop
hadoop
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
 
Mongo db dla administratora
Mongo db dla administratoraMongo db dla administratora
Mongo db dla administratora
 
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRealtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
 
Spray Json and MongoDB Queries: Insights and Simple Tricks.
Spray Json and MongoDB Queries: Insights and Simple Tricks.Spray Json and MongoDB Queries: Insights and Simple Tricks.
Spray Json and MongoDB Queries: Insights and Simple Tricks.
 
2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongo2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongo
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and Python
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
Hadoop_Pennonsoft
Hadoop_PennonsoftHadoop_Pennonsoft
Hadoop_Pennonsoft
 
Spring Data MongoDB 介紹
Spring Data MongoDB 介紹Spring Data MongoDB 介紹
Spring Data MongoDB 介紹
 
REST Web API with MongoDB
REST Web API with MongoDBREST Web API with MongoDB
REST Web API with MongoDB
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Spring Data, Jongo & Co.
Spring Data, Jongo & Co.Spring Data, Jongo & Co.
Spring Data, Jongo & Co.
 
D3 Mapping Visualization
D3 Mapping VisualizationD3 Mapping Visualization
D3 Mapping Visualization
 
CouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 HourCouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 Hour
 
Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation OptionsWebinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation Options
 
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
 

Mehr von MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Kürzlich hochgeladen

Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Kürzlich hochgeladen (20)

Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Hadoop - MongoDB Webinar June 2014

  • 2. We will cover: •what it is •how it works •a tour of what it can do A quick briefing on what Mongo and Hadoop are all about: (Q+A at the end)
  • 3. document-oriented database with dynamic schema stores data in JSON-like documents: { _id : “kosmo kramer”, age : 42, location : { state : ”NY”, zip : ”10024” }, favorite_colors : [“red”, “green”] } different structure in each document values can be simple like strings and ints or nested documents
  • 4. mongodb scales horizontally via sharding to handle lots of data and load app
  • 5. Java-based framework for MapReduce Excels at batch processing on large data sets by taking advantage of parallelism map reduce created by google (white paper) implemented in open source by hadoop
  • 6. Mongo-Hadoop Connector - Why Lots of people using Hadoop and Mongo separately but need integration Custom import/export scripts often used to get data in+out Scalability and flexibility with changes in Hadoop or MongoDB configurations Need to process data across multiple sources custom scripts slow, fragile
  • 7. Mongo-Hadoop Connector Turn MongoDB into a Hadoop-enabled filesystem: use as the input or output for Hadoop .BSON -or- input data .BSON -or- Hadoop Cluster output results bson file new in 1.1 bson is the output of mongodump
  • 8. Mongo-Hadoop Connector Benefits + Features Takes advantage of full multi-core parallelism to process data in Mongo Full integration with Hadoop and JVM ecosystems Can be used with Amazon Elastic MapReduce Can read and write backup files from local filesystem, HDFS, or S3
  • 9. Mongo-Hadoop Connector Vanilla Java MapReduce write MapReduce code in ruby or if you don’t want to use Java, support for Hadoop Streaming. Benefits + Features can write your own language binding
  • 10. Mongo-Hadoop Connector Support for Pig high-level scripting language for data analysis and building MapReduce workflows Support for Hive SQL-like language for ad-hoc queries + analysis of data sets on Hadoop-compatible file systems Benefits + Features
  • 11. Mongo-Hadoop Connector How it works: Adapter examines the MongoDB input collection and calculates a set of splits from the data Each split gets assigned to a node in Hadoop cluster In parallel, Hadoop nodes pull data for splits from MongoDB (or BSON) and process them locally Hadoop merges results and streams output back to MongoDB or BSON
  • 12. Tour of Mongo-Hadoop, by Example - Using Java MapReduce with Mongo-Hadoop - Using Hadoop Streaming - Pig and Hive with Mongo-Hadoop - Elastic MapReduce + BSON
  • 13. { "_id" : ObjectId("4f2ad4c4d1e2d3f15a000000"), "body" : "Here is our forecastnn ", "filename" : "1.", "headers" : { "From" : "phillip.allen@enron.com", "Subject" : "Forecast Info", "X-bcc" : "", "To" : "tim.belden@enron.com", "X-Origin" : "Allen-P", "X-From" : "Phillip K Allen", "Date" : "Mon, 14 May 2001 16:39:00 -0700 (PDT)", "X-To" : "Tim Belden ", "Message-ID" : "<18782981.1075855378110.JavaMail.evans@thyme>", "Content-Type" : "text/plain; charset=us-ascii", "Mime-Version" : "1.0" } } Input Data: Enron e-mail corpus (501k records, 1.75Gb) each document is one email sender recipients
  • 14. {"_id": {"t":"bob@enron.com", "f":"alice@enron.com"}, "count" : 14} {"_id": {"t":"bob@enron.com", "f":"eve@enron.com"}, "count" : 9} {"_id": {"t":"alice@enron.com", "f":"charlie@enron.com"}, "count" : 99} {"_id": {"t":"charlie@enron.com", "f":"bob@enron.com"}, "count" : 48} {"_id": {"t":"eve@enron.com", "f":"charlie@enron.com"}, "count" : 20} Let’s use Hadoop to build a graph of (senders → recipients) and the count of messages exchanged between each pair bob alice eve charlie 14 99 9 48 20 sample, simplified data nodes are people. edges/arrows # of msgs from A to B
  • 15. Example 1 - Java MapReduce mongodb document passed into Hadoop MapReduce Map phase - each input doc gets passed through a Mapper function @Override public  void  map(NullWritable  key,  BSONObject  val,  final  Context  context){        BSONObject  headers  =  (BSONObject)val.get("headers");        if(headers.containsKey("From")  &&  headers.containsKey("To")){                String  from  =  (String)headers.get("From");                String  to  =  (String)headers.get("To");                String[]  recips  =  to.split(",");                for(int  i=0;i<recips.length;i++){                        String  recip  =  recips[i].trim();                        context.write(new  MailPair(from,  recip),  new  IntWritable(1));                }        } } input value doc from mongo. connector will handle translation into BSONObject for you
  • 16. output written back to MongoDB Example 1 - Java MapReduce (cont) Reduce phase - outputs of Map are grouped together by key and passed to Reducer the {to, from} key list of all the values collected under the key        public  void  reduce(  final  MailPair  pKey,                                                final  Iterable<IntWritable>  pValues,                                                final  Context  pContext  ){                int  sum  =  0;                for  (  final  IntWritable  value  :  pValues  ){                        sum  +=  value.get();                }                BSONObject  outDoc  =  new  BasicDBObjectBuilder().start()                                                        .add(  "f"  ,  pKey.from)                            .add(  "t"  ,  pKey.to  )                            .get();                BSONWritable  pkeyOut  =  new  BSONWritable(outDoc);                pContext.write(  pkeyOut,  new  IntWritable(sum)  );        }
  • 17. Example 1 - Java MapReduce (cont) mongo.job.input.format=com.mongodb.hadoop.MongoInputFormat mongo.input.uri=mongodb://my-db:27017/enron.messages Read from MongoDB Read from BSON mongo.job.input.format=com.mongodb.hadoop.BSONFileInputFormat mapred.input.dir=file:///tmp/messages.bson hdfs:///tmp/messages.bson s3:///tmp/messages.bson
  • 18. Example 1 - Java MapReduce (cont) mongo.job.output.format=com.mongodb.hadoop.MongoOutputFormat mongo.output.uri=mongodb://my-db:27017/enron.results_out Write output to MongoDB Write output to BSON mongo.job.output.format=com.mongodb.hadoop.BSONFileOutputFormat mapred.output.dir=file:///tmp/results.bson hdfs:///tmp/results.bson s3:///tmp/results.bson
  • 19. Results : Output Data mongos> db.results_out.find({"_id.t": /^kenneth.lay/}) { "_id" : { "t" : "kenneth.lay@enron.com", "f" : "15126-1267@m2.innovyx.com" }, "count" : 1 } { "_id" : { "t" : "kenneth.lay@enron.com", "f" : "2586207@www4.imakenews.com" }, "count" : 1 } { "_id" : { "t" : "kenneth.lay@enron.com", "f" : "40enron@enron.com" }, "count" : 2 } { "_id" : { "t" : "kenneth.lay@enron.com", "f" : "a..davis@enron.com" }, "count" : 2 } { "_id" : { "t" : "kenneth.lay@enron.com", "f" : "a..hughes@enron.com" }, "count" : 4 } { "_id" : { "t" : "kenneth.lay@enron.com", "f" : "a..lindholm@enron.com" }, "count" : 1 } { "_id" : { "t" : "kenneth.lay@enron.com", "f" : "a..schroeder@enron.com" }, "count" : 1 } ... has more
  • 20. Example 2 - Hadoop Streaming Let’s do the same Enron MapReduce job with Python instead of Java $ pip install pymongo_hadoop
  • 21. Example 2 - Hadoop Streaming (cont) Hadoop passes data to an external process via STDOUT/STDIN map(k, v) map(k, v) map(k, v)map() JVM STDIN Python / Ruby / JS interpreter STDOUT Hadoop (JVM) def mapper(documents): . . .
  • 22. Example 2 - Hadoop Streaming (cont) from pymongo_hadoop import BSONMapper def mapper(documents): i = 0 for doc in documents: i = i + 1 from_field = doc['headers']['From'] to_field = doc['headers']['To'] recips = [x.strip() for x in to_field.split(',')] for r in recips: yield {'_id': {'f':from_field, 't':r}, 'count': 1} BSONMapper(mapper) print >> sys.stderr, "Done Mapping." BSONMapper is pymongo layer that translates from hadoop streaming back to hadoop
  • 23. Example 2 - Hadoop Streaming (cont) from pymongo_hadoop import BSONReducer def reducer(key, values): print >> sys.stderr, "Processing from/to %s" % str(key) _count = 0 for v in values: _count += v['count'] return {'_id': key, 'count': _count} BSONReducer(reducer)
  • 24. Surviving Hadoop: making MapReduce easier Pig + Hive writing m/r jobs from scratch can be clunky and cumbersome
  • 25. Example 3 - Mongo-Hadoop and Pig Let’s do the same thing yet again, but this time using Pig Pig is a powerful language that can generate sophisticated MapReduce workflows from simple scripts Can perform JOIN, GROUP, and execute user-defined functions (UDFs)
  • 26. Example 3 - Mongo-Hadoop and Pig (cont) Pig directives for loading data: BSONLoader and MongoLoader Writing data out BSONStorage and MongoInsertStorage data = LOAD 'mongodb://localhost:27017/db.collection' using com.mongodb.hadoop.pig.MongoLoader; STORE records INTO 'file:///output.bson' using com.mongodb.hadoop.pig.BSONStorage;
  • 27. Pig has its own special datatypes: Bags, Maps, and Tuples Mongo-Hadoop Connector intelligently converts between Pig datatypes and MongoDB datatypes Example 3 - Mongo-Hadoop and Pig (cont) bags -> arrays maps -> objects
  • 28. raw = LOAD 'hdfs:///messages.bson' using com.mongodb.hadoop.pig.BSONLoader('','headers:[]') ; send_recip = FOREACH raw GENERATE $0#'From' as from, $0#'To' as to; send_recip_filtered = FILTER send_recip BY to IS NOT NULL; send_recip_split = FOREACH send_recip_filtered GENERATE from as from, TRIM(FLATTEN(TOKENIZE(to))) as to; send_recip_grouped = GROUP send_recip_split BY (from, to); send_recip_counted = FOREACH send_recip_grouped GENERATE group, COUNT($1) as count; STORE send_recip_counted INTO 'file:///enron_results.bson' using com.mongodb.hadoop.pig.BSONStorage; Example 3 - Mongo-Hadoop and Pig (cont)
  • 29. Hive with Mongo-Hadoop Similar idea to Pig - process your data without needing to write MapReduce code from scratch ...but with SQL as the language of choice
  • 30. Hive with Mongo-Hadoop Sample Data: db.users db.users.find() { "_id": 1, "name": "Tom", "age": 28 } { "_id": 2, "name": "Alice", "age": 18 } { "_id": 3, "name": "Bob", "age": 29 } { "_id": 101, "name": "Scott", "age": 10 } { "_id": 104, "name": "Jesse", "age": 52 } { "_id": 110, "name": "Mike", "age": 32 } ... CREATE TABLE mongo_users (id int, name string, age int) STORED BY "com.mongodb.hadoop.hive.MongoStorageHandler" WITH SERDEPROPERTIES( "mongo.columns.mapping" = "_id,name,age" ) TBLPROPERTIES ( "mongo.uri" = "mongodb://localhost:27017/test.users"); first, declare the collection to be accessible in Hive:
  • 31. Hive with Mongo-Hadoop ...then you can run SQL on it, like a table. SELECT name,age FROM mongo_users WHERE id > 100 ; SELECT * FROM mongo_users GROUP BY age WHERE id > 100 ; you can use GROUP BY: or JOIN multiple tables/collections together: SELECT * FROM mongo_users T1 JOIN user_emails T2 WHERE T1.id = T2.id; subset of SQL
  • 32. Write the output of queries back into new tables: INSERT OVERWRITE TABLE old_users SELECT id,name,age FROM mongo_users WHERE age > 100 ; DROP TABLE mongo_users; Drop a table in Hive to delete the underlying collection in MongoDB use “external” when declaring your table to prevent the collection drop
  • 33. Usage with Amazon Elastic MapReduce Run mongo-hadoop jobs without needing to set up or manage your own Hadoop cluster. Pig, Hive, and streaming work on EMR, too! Logs get captured into S3 files
  • 34. Usage with Amazon Elastic MapReduce First, make a “bootstrap” script that fetches dependencies (mongo-hadoop jar and java drivers) #!/bin/sh wget -P /home/hadoop/lib http://central.maven.org/maven2/org/ mongodb/mongo-java-driver/2.12.2/mongo-java-driver-2.12.2.jar wget -P /home/hadoop/lib https://s3.amazonaws.com/mongo-hadoop- code/mongo-hadoop-core_1.1.2-1.1.0.jar this will get executed on each node in the cluster that EMR builds for us. working on updating hadoop artifacts in maven
  • 35. Example 4 - Usage with Amazon Elastic MapReduce Put the bootstrap script, and all your code, into an S3 bucket where Amazon can see it. s3cp ./bootstrap.sh s3://$S3_BUCKET/bootstrap.sh s3mod s3://$S3_BUCKET/bootstrap.sh public-read s3cp $HERE/../enron/target/enron-example.jar s3://$S3_BUCKET/ enron-example.jar s3mod s3://$S3_BUCKET/enron-example.jar public-read
  • 36. $ elastic-mapreduce --create --jobflow ENRON000 --instance-type m1.xlarge --num-instances 5 --bootstrap-action s3://$S3_BUCKET/bootstrap.sh --log-uri s3://$S3_BUCKET/enron_logs --jar s3://$S3_BUCKET/enron-example.jar --arg -D --arg mongo.job.input.format=com.mongodb.hadoop.BSONFileInputFormat --arg -D --arg mapred.input.dir=s3n://mongo-test-data/messages.bson --arg -D --arg mapred.output.dir=s3n://$S3_BUCKET/BSON_OUT --arg -D --arg mongo.job.output.format=com.mongodb.hadoop.BSONFileOutputFormat # (any additional parameters here) Example 4 - Usage with Amazon Elastic MapReduce ...then launch the job from the command line, pointing to your S3 locations Control the type and number of instances in the cluster
  • 37. Example 4 - Usage with Amazon Elastic MapReduce Easy to kick off a Hadoop job, without needing to manage a Hadoop cluster Pig, Hive, and streaming work on EMR, too! Logs get captured into S3 files
  • 38. Example 5 - New Feature: MongoUpdateWritable ... but we can also modify an existing output collection Works by applying mongodb update modifiers: $push, $pull, $addToSet, $inc, $set, etc. Can be used to do incremental MapReduce or “join” two collections In previous examples, we wrote job output data by inserting into a new collection
  • 39. Example 5 - MongoUpdateWritable For example, let’s say we have two collections. {    "_id":  ObjectId("51b792d381c3e67b0a18d678"),    "sensor_id":  ObjectId("51b792d381c3e67b0a18d4a1"),    "value":  3328.5895416489802,    "timestamp":  ISODate("2013-­‐05-­‐18T13:11:38.709-­‐0400"),    "loc":  [-­‐175.13,51.658] } {    "_id":  ObjectId("51b792d381c3e67b0a18d0ed"),    "name":  "730LsRkX",    "type":  "pressure",    "owner":  "steve", } sensors log events refers to which sensor logged the event For each owner, we want to calculate how many events were recorded for each type of sensor that logged it.
  • 40. For each owner, we want to calculate how many events were recorded for each type of sensor that logged it. Plain english: Bob’s sensors for temperature have stored 1300 readings Bob’s sensors for pressure have stored 400 readings Alice’s sensors for humidity have stored 600 readings Alice’s sensors for temperature have stored 700 readings etc...
  • 41. sensors (mongodb collection) Stage 1 -MapReduce on sensors collection Results (mongodb collection) for each sensor, emit: {key: owner+type, value: _id} group data from map() under each key, output: {key: owner+type, val: [ list of _ids] } read from mongodb insert() new records to mongodb MapReduce log events (mongodb collection) do this in two stages
  • 42. the sensor’s owner and type After stage one, the output docs look like: list of ID’s of sensors with this owner and type {    "_id":  "alice  pressure",    "sensors":  [        ObjectId("51b792d381c3e67b0a18d475"),        ObjectId("51b792d381c3e67b0a18d16d"),        ObjectId("51b792d381c3e67b0a18d2bf"),        …    ] } Now we just need to count the total # of log events recorded for any sensors that appear in the list for each owner/type group.
  • 43. sensors (mongodb collection) Stage 2 -MapReduce on log events collection read from mongodb Results (mongodb collection) update() existing records in mongodb MapReduce log events (mongodb collection) for each sensor, emit: {key: sensor_id, value: 1} group data from map() under each key for each value in that key: update({sensors: key}, {$inc : {logs_count:1}}) context.write(null,   new  MongoUpdateWritable(      query,  //which  documents  to  modify        update,  //how  to  modify  ($inc)      true,        //upsert      false) );  //  multi
  • 44. Example - MongoUpdateWritable Result after stage 2 {    "_id":  "1UoTcvnCTz  temp",    "sensors":  [        ObjectId("51b792d381c3e67b0a18d475"),        ObjectId("51b792d381c3e67b0a18d16d"),        ObjectId("51b792d381c3e67b0a18d2bf"),        …    ],    "logs_count":  1050616 } now populated with correct count
  • 45. New Features in v1.2 and beyond Continually improving Hive support Performance Improvements - Lazy BSON Support for multi-collection input sources API for adding custom splitter implementations and more primarily focusing on hive but pig is next maven central
  • 46. Recap Mongo-Hadoop - use Hadoop to do massive computations on big data sets stored in MongoDB/BSON Tools and APIs make it easier: Streaming, Pig, Hive, EMR, etc. MongoDB becomes a Hadoop-enabled filesystem
  • 48. MongoDB World New York City, June 23-25 Save 25% with 25JustinLee Register at world.mongodb.com