SlideShare ist ein Scribd-Unternehmen logo
1 von 63
Codemotion Milano 2013

Data Processing and
Aggregation
Massimo Brignoli
Solutions Architect, MongoDB Inc.

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Who Am I?
• Solutions Architect/Evangelist in MongoDB Inc.
• 20 years of experience in databases
• Former MySQL employee

• Previous life: web, web, web

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Big Data

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
What is Big Data?
• Big Data is like teenage sex:
• everyone talks about it
• nobody really knows how to do it

• everyone thinks everyone else is doing it
• so everyone claims they are doing it…

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Understanding Big Data – It’s Not Very “Big”

64% - Ingest diverse,
new data in real-time

15% - More than 100TB
of data
20% - Less than 100TB
(average of all? <20TB)
from Big Data Executive Summary – 50+ top executives from Government and F500 firms
For over a decade

Big Data == Custom
Software

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Lots of Great Innovations Since 1970
Including the Relational Database
RDBMS Makes Development Hard

Code

XML Config

DB Schema

Application

Object Relational
Mapping

Relational
Database
And Even Harder To Iterate
New
Table

New
Column

New
Table
Name

Pet

Phone

New
Column

3 months later…

Email
From Complexity to Simplicity
MongoDB

RDBMS

{

_id : ObjectId("4c4ba5e5e8aabf3"),
employee_name: "Dunham, Justin",
department : "Marketing",
title : "Product Manager, Web",
report_up: "Neray, Graham",
pay_band: “C",
benefits : [
{

type :

"Health",

plan : "PPO Plus" },
{

type :

"Dental",

plan : "Standard" }
]
}
In the past few years
Open source software has
emerged enabling the rest of
us to handle Big Data

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Use Popular, Well-Known Technologies

Source: Silicon Angle, 2012
Enterprise Big Data Stack

CRM, ERP, Collaboration, Mobile, BI

Data Management
Online Data
RDBMS
RDBMS

Offline Data
Hadoop

Infrastructure
OS & Virtualization, Compute, Storage, Network

EDW

Security & Auditing

Management & Monitoring

Applications
Consideration – Online vs. Offline
Online

• Real-time
• Low-latency
• High availability

vs.

Offline

• Long-running
• High-Latency
• Availability is lower priority
How MongoDB Meets Our
Requirements
• MongoDB is an operational database
• MongoDB provides high performance for storage

and retrieval at large scale
• MongoDB has a robust query interface permitting

intelligent operations
• MongoDB is not a data processing engine, but

provides processing functionality

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
MongoDB data processing options
http://www.flickr.com/photos/torek/4444673930/ http://createivecommons.org/licenses/by-nc-sa/3.0/
Except where otherwise noted, this work is licensed under
Getting Example Data

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
The “hello world” of
MapReduce is counting words
in a paragraph of text.
Let’s try something a little
more interesting…

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
What is the most popular pub
name?

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Open Street Map Data
#!/usr/bin/env python
# Data Source
# http://www.overpass-api.de/api/xapi?*[amenity=pub][bbox=-10.5,49.78,1.78,59]
import re
import sys
from imposm.parser import OSMParser
import pymongo
class Handler(object):
def nodes(self, nodes):
if not nodes:
return
docs = []
for node in nodes:
osm_id, doc, (lon, lat) = node
if "name" not in doc:
node_points[osm_id] = (lon, lat)
continue
doc["name"] = doc["name"].title().lstrip("The ").replace("And", "&")
doc["_id"] = osm_id
doc["location"] = {"type": "Point", "coordinates": [lon, lat]}
docs.append(doc)
collection.insert(docs)

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Example Pub Data
{
"_id" : 451152,
"amenity" : "pub",
"name" : "The Dignity",
"addr:housenumber" : "363",
"addr:street" : "Regents Park Road",
"addr:city" : "London",
"addr:postcode" : "N3 1DH",
"toilets" : "yes",
"toilets:access" : "customers",
"location" : {
"type" : "Point",
"coordinates" : [-0.1945732, 51.6008172]
}
}

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
MapReduce

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
MongoDB MapReduce
•

map
MongoDB

reduce
finalize

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
map

Map Function
MongoDB

reduce

> var map = function() {
finalize

emit(this.name, 1);

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
map

Reduce Function
MongoDB

reduce

> var reduce = function (key, values) {
finalize

var sum = 0;
values.forEach( function (val) {sum += val;} );
return sum;
}

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Results
> db.pub_names.find().sort({value: -1}).limit(10)
{ "_id" : "The Red Lion", "value" : 407 }
{ "_id" : "The Royal Oak", "value" : 328 }
{ "_id" : "The Crown", "value" : 242 }
{ "_id" : "The White Hart", "value" : 214 }
{ "_id" : "The White Horse", "value" : 200 }
{ "_id" : "The New Inn", "value" : 187 }
{ "_id" : "The Plough", "value" : 185 }
{ "_id" : "The Rose & Crown", "value" : 164 }
{ "_id" : "The Wheatsheaf", "value" : 147 }
{ "_id" : "The Swan", "value" : 140 }

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Pub Names in the Center of London
> db.pubs.mapReduce(map, reduce, { out: "pub_names",
query: {
location: {
$within: { $centerSphere: [[-0.12, 51.516], 2 / 3959] }
}}
})

{
"result" : "pub_names",
"timeMillis" : 116,
"counts" : {
"input" : 643,
"emit" : 643,
"reduce" : 54,
"output" : 537
},
"ok" : 1,
}
Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Results
> db.pub_names.find().sort({value: -1}).limit(10)
{
{
{
{
{
{
{
{
{
{

"_id"
"_id"
"_id"
"_id"
"_id"
"_id"
"_id"
"_id"
"_id"
"_id"

:
:
:
:
:
:
:
:
:
:

"All Bar One", "value" : 11 }
"The Slug & Lettuce", "value" : 7 }
"The Coach & Horses", "value" : 6 }
"The Green Man", "value" : 5 }
"The Kings Arms", "value" : 5 }
"The Red Lion", "value" : 5 }
"Corney & Barrow", "value" : 4 }
"O'Neills", "value" : 4 }
"Pitcher & Piano", "value" : 4 }
"The Crown", "value" : 4 }

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
MongoDB MapReduce
• Real-time
• Output directly to document or collection
• Runs inside MongoDB on local data

− Adds load to your DB
− In Javascript – debugging can be a challenge
− Translating in and out of C++

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Aggregation Framework

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Aggregation Framework
•

op1
MongoDB

op2

opN
Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Aggregation Framework in 60
Seconds

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Aggregation Framework Operators
• $project
• $match
• $limit

• $skip
• $sort
• $unwind
• $group

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
$match
• Filter documents
• Uses existing query syntax
• If using $geoNear it has to be first in pipeline

• $where is not supported

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Matching Field Values
{

"_id" : 271421,
"amenity" : "pub",
"name" : "Sir Walter Tyrrell",
"location" : {
"type" : "Point",
"coordinates" : [
-1.6192422,
50.9131996
]
}
}

{ "$match": {

"name": "The Red Lion"
}}

{

"_id" : 271466,
"amenity" : "pub",
"name" : "The Red Lion",
"location" : {
"type" : "Point",
"coordinates" : [
-1.5494749,
50.7837119
]}

{
"_id" : 271466,
"amenity" : "pub",
"name" : "The Red Lion",
"location" : {
"type" : "Point",
"coordinates" : [
-1.5494749,
50.7837119
]
}

}

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
$project
• Reshape documents
• Include, exclude or rename fields
• Inject computed fields

• Create sub-document fields

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Including and Excluding Fields
{ “$project”: {

{

"_id" : 271466,

"name" : "The Red Lion",

“_id”: 0,
“amenity”: 1,
“name”: 1,

"location" : {

}}

"amenity" : "pub",

"type" : "Point",
"coordinates" : [
-1.5494749,
50.7837119
]
}

{
“amenity” : “pub”,
“name” : “The Red Lion”
}

}
Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Reformatting Documents
{ “$project”: {

{

"_id" : 271466,
"amenity" : "pub",
"name" : "The Red Lion",
"location" : {

“_id”: 0,
“name”: 1,
“meta”: {
“type”: “$amenity”}
}}

"type" : "Point",
"coordinates" : [
-1.5494749,
50.7837119
]
}
}

{
“name” : “The Red Lion”
“meta” : {
“type” : “pub”
}}

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Dealing with Arrays
{ “$project”: {

{

"_id" : 271466,
"amenity" : "pub",
"name" : "The Red Lion",
"facilities" : [

"toilets",

“_id”: 0,
“name”: 1,
“meta”: {
“type”: “$amenity”}
}}
{"$unwind": "$facility"}

"food"
],
}

{ "name" : "The Red Lion",
"facility" : "toilets" },
{ "name" : "The Red Lion",
"facility" : "food" }

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
$group
• Group documents by an ID
• Field reference, object, constant
• Other output fields are computed

$max, $min, $avg, $sum
$addToSet, $push $first, $last
• Processes all data in memory

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Back to the pub!

•

http://www.offwestend.com/index.php/theatres/pastshows/71

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Popular Pub Names
>var popular_pub_names = [
{ $match : location:
{ $within: { $centerSphere:
[[-0.12, 51.516], 2 / 3959]}}}
},
{ $group :
{ _id: “$name”
value: {$sum: 1} }
},
{ $sort : {value: -1} },
{ $limit : 10 }

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Results
> db.pubs.aggregate(popular_pub_names)
{
"result" : [
{ "_id" : "All Bar One", "value" : 11 }
{ "_id" : "The Slug & Lettuce", "value" : 7 }
{ "_id" : "The Coach & Horses", "value" : 6 }
{ "_id" : "The Green Man", "value" : 5 }
{ "_id" : "The Kings Arms", "value" : 5 }
{ "_id" : "The Red Lion", "value" : 5 }
{ "_id" : "Corney & Barrow", "value" : 4 }
{ "_id" : "O'Neills", "value" : 4 }
{ "_id" : "Pitcher & Piano", "value" : 4 }
{ "_id" : "The Crown", "value" : 4 }
],
"ok" : 1
}
Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Aggregation Framework Benefits
• Real-time
• Simple yet powerful interface
• Declared in JSON, executes in C++

• Runs inside MongoDB on local data

− Adds load to your DB
− Limited Operators
− Data output is limited

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Analyzing MongoDB Data in
External Systems

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
MongoDB with Hadoop
•

MongoDB

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
MongoDB with Hadoop
•

MongoDB

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/

warehouse
MongoDB with Hadoop
•

ETL

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/

MongoDB
Map Pub Names in Python
#!/usr/bin/env python
from pymongo_hadoop import BSONMapper
def mapper(documents):
bounds = get_bounds() # ~2 mile polygon
for doc in documents:
geo = get_geo(doc["location"]) # Convert the geo type
if not geo:
continue
if bounds.intersects(geo):
yield {'_id': doc['name'], 'count': 1}

BSONMapper(mapper)
print >> sys.stderr, "Done Mapping."

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Reduce Pub Names in Python
#!/usr/bin/env python

from pymongo_hadoop import BSONReducer

def reducer(key, values):
_count = 0
for v in values:
_count += v['count']
return {'_id': key, 'value': _count}

BSONReducer(reducer)
Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Execute MapReduce
hadoop jar target/mongo-hadoop-streaming-assembly-1.1.0-rc0.jar 
-mapper examples/pub/map.py 
-reducer examples/pub/reduce.py 
-mongo mongodb://127.0.0.1/demo.pubs 
-outputURI mongodb://127.0.0.1/demo.pub_names

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Popular Pub Names Nearby
> db.pub_names.find().sort({value: -1}).limit(10)
{
{
{
{
{
{
{
{
{
{

"_id"
"_id"
"_id"
"_id"
"_id"
"_id"
"_id"
"_id"
"_id"
"_id"

:
:
:
:
:
:
:
:
:
:

"All Bar One", "value" : 11 }
"The Slug & Lettuce", "value" : 7 }
"The Coach & Horses", "value" : 6 }
"The Kings Arms", "value" : 5 }
"Corney & Barrow", "value" : 4 }
"O'Neills", "value" : 4 }
"Pitcher & Piano", "value" : 4 }
"The Crown", "value" : 4 }
"The George", "value" : 4 }
"The Green Man", "value" : 4 }

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
MongoDB and Hadoop
• Away from data store
• Can leverage existing data processing infrastructure
• Can horizontally scale your data processing
- Offline batch processing
- Requires synchronisation between store &

processor
- Infrastructure is much more complex

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
The Future of Big Data and
MongoDB

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
What is Big Data?
Big Data today will be
normal tomorrow

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Exponential Data Growth
Billions of URLs indexed by Google
10000
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
2000

2002

2004

2006

2008

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/

2010

2012
MongoDB enables you to
scale big

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
MongoDB is evolving

so you can process the
big

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Data Processing with MongoDB
• Process in MongoDB using Map/Reduce
• Process in MongoDB using Aggregation

Framework
• Process outside MongoDB using Hadoop and

other external tools

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Questions?

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Codemotion Milano

Thanks!
Massimo Brignoli
Solutions Architect, MongoDB Inc.

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/

Weitere ähnliche Inhalte

Was ist angesagt?

Web Scraper Shibuya.pm tech talk #8
Web Scraper Shibuya.pm tech talk #8Web Scraper Shibuya.pm tech talk #8
Web Scraper Shibuya.pm tech talk #8Tatsuhiko Miyagawa
 
Webinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and JavaWebinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and JavaMongoDB
 
Learn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBLearn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBMarakana Inc.
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBMongoDB
 
Apache CouchDB talk at Ontario GNU Linux Fest
Apache CouchDB talk at Ontario GNU Linux FestApache CouchDB talk at Ontario GNU Linux Fest
Apache CouchDB talk at Ontario GNU Linux FestMyles Braithwaite
 
OSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBOSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBBradley Holt
 
Java development with MongoDB
Java development with MongoDBJava development with MongoDB
Java development with MongoDBJames Williams
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
 Back to Basics Webinar 3: Schema Design Thinking in Documents Back to Basics Webinar 3: Schema Design Thinking in Documents
Back to Basics Webinar 3: Schema Design Thinking in DocumentsMongoDB
 
Honing headers for highly hardened highspeed hypertext
Honing headers for highly hardened highspeed hypertextHoning headers for highly hardened highspeed hypertext
Honing headers for highly hardened highspeed hypertextFastly
 
Going on an HTTP Diet: Front-End Web Performance
Going on an HTTP Diet: Front-End Web PerformanceGoing on an HTTP Diet: Front-End Web Performance
Going on an HTTP Diet: Front-End Web PerformanceAdam Norwood
 
Emerging threats jonkman_sans_cti_summit_2015
Emerging threats jonkman_sans_cti_summit_2015Emerging threats jonkman_sans_cti_summit_2015
Emerging threats jonkman_sans_cti_summit_2015Emerging Threats
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB
 
How to make Ajax work for you
How to make Ajax work for youHow to make Ajax work for you
How to make Ajax work for youSimon Willison
 
Active Https Cookie Stealing
Active Https Cookie StealingActive Https Cookie Stealing
Active Https Cookie StealingSecurityTube.Net
 
MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know Norberto Leite
 
My First Cluster with MongoDB Atlas
My First Cluster with MongoDB AtlasMy First Cluster with MongoDB Atlas
My First Cluster with MongoDB AtlasJay Gordon
 

Was ist angesagt? (19)

Analyse Yourself
Analyse YourselfAnalyse Yourself
Analyse Yourself
 
Web Scraper Shibuya.pm tech talk #8
Web Scraper Shibuya.pm tech talk #8Web Scraper Shibuya.pm tech talk #8
Web Scraper Shibuya.pm tech talk #8
 
Webinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and JavaWebinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and Java
 
Learn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBLearn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDB
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
 
Apache CouchDB talk at Ontario GNU Linux Fest
Apache CouchDB talk at Ontario GNU Linux FestApache CouchDB talk at Ontario GNU Linux Fest
Apache CouchDB talk at Ontario GNU Linux Fest
 
OSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBOSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDB
 
Nodejs meetup-12-2-2015
Nodejs meetup-12-2-2015Nodejs meetup-12-2-2015
Nodejs meetup-12-2-2015
 
Java development with MongoDB
Java development with MongoDBJava development with MongoDB
Java development with MongoDB
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
 Back to Basics Webinar 3: Schema Design Thinking in Documents Back to Basics Webinar 3: Schema Design Thinking in Documents
Back to Basics Webinar 3: Schema Design Thinking in Documents
 
Honing headers for highly hardened highspeed hypertext
Honing headers for highly hardened highspeed hypertextHoning headers for highly hardened highspeed hypertext
Honing headers for highly hardened highspeed hypertext
 
Going on an HTTP Diet: Front-End Web Performance
Going on an HTTP Diet: Front-End Web PerformanceGoing on an HTTP Diet: Front-End Web Performance
Going on an HTTP Diet: Front-End Web Performance
 
Emerging threats jonkman_sans_cti_summit_2015
Emerging threats jonkman_sans_cti_summit_2015Emerging threats jonkman_sans_cti_summit_2015
Emerging threats jonkman_sans_cti_summit_2015
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
 
How to make Ajax work for you
How to make Ajax work for youHow to make Ajax work for you
How to make Ajax work for you
 
Active Https Cookie Stealing
Active Https Cookie StealingActive Https Cookie Stealing
Active Https Cookie Stealing
 
MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know
 
My First Cluster with MongoDB Atlas
My First Cluster with MongoDB AtlasMy First Cluster with MongoDB Atlas
My First Cluster with MongoDB Atlas
 
Python and MongoDB
Python and MongoDBPython and MongoDB
Python and MongoDB
 

Andere mochten auch

Lambda Architecture in Practice
Lambda Architecture in PracticeLambda Architecture in Practice
Lambda Architecture in PracticeNavneet kumar
 
MongoDB Europe 2016 - The Rise of the Data Lake
MongoDB Europe 2016 - The Rise of the Data LakeMongoDB Europe 2016 - The Rise of the Data Lake
MongoDB Europe 2016 - The Rise of the Data LakeMongoDB
 
My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionSteve Loughran
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeMongoDB
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Tugdual Grall
 
L’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneL’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneMongoDB
 
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...MongoDB
 
MongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big ComputeMongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big ComputeMongoDB
 
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right WayMongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right WayMongoDB
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big dataTrieu Nguyen
 
Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionGuido Schmutz
 

Andere mochten auch (12)

Lambda Architecture in Practice
Lambda Architecture in PracticeLambda Architecture in Practice
Lambda Architecture in Practice
 
MongoDB Europe 2016 - The Rise of the Data Lake
MongoDB Europe 2016 - The Rise of the Data LakeMongoDB Europe 2016 - The Rise of the Data Lake
MongoDB Europe 2016 - The Rise of the Data Lake
 
My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 edition
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
 
L’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneL’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova Generazione
 
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
 
MongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big ComputeMongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big Compute
 
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right WayMongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
 
Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in Action
 

Ähnlich wie Past, Present and Future of Data Processing in Apache Hadoop

Mongo db first steps with csharp
Mongo db first steps with csharpMongo db first steps with csharp
Mongo db first steps with csharpSerdar Buyuktemiz
 
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Codemotion
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB
 
Practical Use of MongoDB for Node.js
Practical Use of MongoDB for Node.jsPractical Use of MongoDB for Node.js
Practical Use of MongoDB for Node.jsasync_io
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB at Gilt Groupe
MongoDB at Gilt GroupeMongoDB at Gilt Groupe
MongoDB at Gilt GroupeMongoDB
 
MongoDB and Ruby on Rails
MongoDB and Ruby on RailsMongoDB and Ruby on Rails
MongoDB and Ruby on Railsrfischer20
 
How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...Antonios Giannopoulos
 
Accra MongoDB User Group
Accra MongoDB User GroupAccra MongoDB User Group
Accra MongoDB User GroupMongoDB
 
MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101
MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101
MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101MongoDB
 
OrientDB for real & Web App development
OrientDB for real & Web App developmentOrientDB for real & Web App development
OrientDB for real & Web App developmentLuca Garulli
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB
 
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...MongoDB
 
Back to Basics Webinar 2 - Your First MongoDB Application
Back to  Basics Webinar 2 - Your First MongoDB ApplicationBack to  Basics Webinar 2 - Your First MongoDB Application
Back to Basics Webinar 2 - Your First MongoDB ApplicationJoe Drumgoole
 
Back to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationBack to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationMongoDB
 
Mongodb at-gilt-groupe-seattle-2012-09-14-final
Mongodb at-gilt-groupe-seattle-2012-09-14-finalMongodb at-gilt-groupe-seattle-2012-09-14-final
Mongodb at-gilt-groupe-seattle-2012-09-14-finalMongoDB
 
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCPSimpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCPDaniel Zivkovic
 
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisMongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisJason Terpko
 
Getting started with MongoDB and Scala - Open Source Bridge 2012
Getting started with MongoDB and Scala - Open Source Bridge 2012Getting started with MongoDB and Scala - Open Source Bridge 2012
Getting started with MongoDB and Scala - Open Source Bridge 2012sullis
 

Ähnlich wie Past, Present and Future of Data Processing in Apache Hadoop (20)

Mongo db first steps with csharp
Mongo db first steps with csharpMongo db first steps with csharp
Mongo db first steps with csharp
 
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
 
MongoDB
MongoDBMongoDB
MongoDB
 
Practical Use of MongoDB for Node.js
Practical Use of MongoDB for Node.jsPractical Use of MongoDB for Node.js
Practical Use of MongoDB for Node.js
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB at Gilt Groupe
MongoDB at Gilt GroupeMongoDB at Gilt Groupe
MongoDB at Gilt Groupe
 
MongoDB and Ruby on Rails
MongoDB and Ruby on RailsMongoDB and Ruby on Rails
MongoDB and Ruby on Rails
 
How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...
 
Accra MongoDB User Group
Accra MongoDB User GroupAccra MongoDB User Group
Accra MongoDB User Group
 
MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101
MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101
MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101
 
OrientDB for real & Web App development
OrientDB for real & Web App developmentOrientDB for real & Web App development
OrientDB for real & Web App development
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
 
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
 
Back to Basics Webinar 2 - Your First MongoDB Application
Back to  Basics Webinar 2 - Your First MongoDB ApplicationBack to  Basics Webinar 2 - Your First MongoDB Application
Back to Basics Webinar 2 - Your First MongoDB Application
 
Back to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationBack to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB Application
 
Mongodb at-gilt-groupe-seattle-2012-09-14-final
Mongodb at-gilt-groupe-seattle-2012-09-14-finalMongodb at-gilt-groupe-seattle-2012-09-14-final
Mongodb at-gilt-groupe-seattle-2012-09-14-final
 
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCPSimpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
 
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisMongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
 
Getting started with MongoDB and Scala - Open Source Bridge 2012
Getting started with MongoDB and Scala - Open Source Bridge 2012Getting started with MongoDB and Scala - Open Source Bridge 2012
Getting started with MongoDB and Scala - Open Source Bridge 2012
 

Mehr von Codemotion

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Codemotion
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyCodemotion
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaCodemotion
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserCodemotion
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Codemotion
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Codemotion
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Codemotion
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 - Codemotion
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Codemotion
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Codemotion
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Codemotion
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Codemotion
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Codemotion
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Codemotion
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Codemotion
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...Codemotion
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Codemotion
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Codemotion
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Codemotion
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Codemotion
 

Mehr von Codemotion (20)

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
 

Kürzlich hochgeladen

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Kürzlich hochgeladen (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Past, Present and Future of Data Processing in Apache Hadoop

  • 1. Codemotion Milano 2013 Data Processing and Aggregation Massimo Brignoli Solutions Architect, MongoDB Inc. Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 2. Who Am I? • Solutions Architect/Evangelist in MongoDB Inc. • 20 years of experience in databases • Former MySQL employee • Previous life: web, web, web Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 3. Big Data Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 4. What is Big Data? • Big Data is like teenage sex: • everyone talks about it • nobody really knows how to do it • everyone thinks everyone else is doing it • so everyone claims they are doing it… Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 5. Understanding Big Data – It’s Not Very “Big” 64% - Ingest diverse, new data in real-time 15% - More than 100TB of data 20% - Less than 100TB (average of all? <20TB) from Big Data Executive Summary – 50+ top executives from Government and F500 firms
  • 6. For over a decade Big Data == Custom Software Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 7. Lots of Great Innovations Since 1970
  • 9. RDBMS Makes Development Hard Code XML Config DB Schema Application Object Relational Mapping Relational Database
  • 10. And Even Harder To Iterate New Table New Column New Table Name Pet Phone New Column 3 months later… Email
  • 11. From Complexity to Simplicity MongoDB RDBMS { _id : ObjectId("4c4ba5e5e8aabf3"), employee_name: "Dunham, Justin", department : "Marketing", title : "Product Manager, Web", report_up: "Neray, Graham", pay_band: “C", benefits : [ { type : "Health", plan : "PPO Plus" }, { type : "Dental", plan : "Standard" } ] }
  • 12. In the past few years Open source software has emerged enabling the rest of us to handle Big Data Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 13. Use Popular, Well-Known Technologies Source: Silicon Angle, 2012
  • 14. Enterprise Big Data Stack CRM, ERP, Collaboration, Mobile, BI Data Management Online Data RDBMS RDBMS Offline Data Hadoop Infrastructure OS & Virtualization, Compute, Storage, Network EDW Security & Auditing Management & Monitoring Applications
  • 15. Consideration – Online vs. Offline Online • Real-time • Low-latency • High availability vs. Offline • Long-running • High-Latency • Availability is lower priority
  • 16. How MongoDB Meets Our Requirements • MongoDB is an operational database • MongoDB provides high performance for storage and retrieval at large scale • MongoDB has a robust query interface permitting intelligent operations • MongoDB is not a data processing engine, but provides processing functionality Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 17. MongoDB data processing options http://www.flickr.com/photos/torek/4444673930/ http://createivecommons.org/licenses/by-nc-sa/3.0/ Except where otherwise noted, this work is licensed under
  • 18. Getting Example Data Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 19. The “hello world” of MapReduce is counting words in a paragraph of text. Let’s try something a little more interesting… Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 20. What is the most popular pub name? Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 21. Open Street Map Data #!/usr/bin/env python # Data Source # http://www.overpass-api.de/api/xapi?*[amenity=pub][bbox=-10.5,49.78,1.78,59] import re import sys from imposm.parser import OSMParser import pymongo class Handler(object): def nodes(self, nodes): if not nodes: return docs = [] for node in nodes: osm_id, doc, (lon, lat) = node if "name" not in doc: node_points[osm_id] = (lon, lat) continue doc["name"] = doc["name"].title().lstrip("The ").replace("And", "&") doc["_id"] = osm_id doc["location"] = {"type": "Point", "coordinates": [lon, lat]} docs.append(doc) collection.insert(docs) Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 22. Example Pub Data { "_id" : 451152, "amenity" : "pub", "name" : "The Dignity", "addr:housenumber" : "363", "addr:street" : "Regents Park Road", "addr:city" : "London", "addr:postcode" : "N3 1DH", "toilets" : "yes", "toilets:access" : "customers", "location" : { "type" : "Point", "coordinates" : [-0.1945732, 51.6008172] } } Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 23. MapReduce Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 24. MongoDB MapReduce • map MongoDB reduce finalize Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 25. map Map Function MongoDB reduce > var map = function() { finalize emit(this.name, 1); Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 26. map Reduce Function MongoDB reduce > var reduce = function (key, values) { finalize var sum = 0; values.forEach( function (val) {sum += val;} ); return sum; } Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 27. Results > db.pub_names.find().sort({value: -1}).limit(10) { "_id" : "The Red Lion", "value" : 407 } { "_id" : "The Royal Oak", "value" : 328 } { "_id" : "The Crown", "value" : 242 } { "_id" : "The White Hart", "value" : 214 } { "_id" : "The White Horse", "value" : 200 } { "_id" : "The New Inn", "value" : 187 } { "_id" : "The Plough", "value" : 185 } { "_id" : "The Rose & Crown", "value" : 164 } { "_id" : "The Wheatsheaf", "value" : 147 } { "_id" : "The Swan", "value" : 140 } Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 28. Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 29. Pub Names in the Center of London > db.pubs.mapReduce(map, reduce, { out: "pub_names", query: { location: { $within: { $centerSphere: [[-0.12, 51.516], 2 / 3959] } }} }) { "result" : "pub_names", "timeMillis" : 116, "counts" : { "input" : 643, "emit" : 643, "reduce" : 54, "output" : 537 }, "ok" : 1, } Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 30. Results > db.pub_names.find().sort({value: -1}).limit(10) { { { { { { { { { { "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" : : : : : : : : : : "All Bar One", "value" : 11 } "The Slug & Lettuce", "value" : 7 } "The Coach & Horses", "value" : 6 } "The Green Man", "value" : 5 } "The Kings Arms", "value" : 5 } "The Red Lion", "value" : 5 } "Corney & Barrow", "value" : 4 } "O'Neills", "value" : 4 } "Pitcher & Piano", "value" : 4 } "The Crown", "value" : 4 } Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 31. MongoDB MapReduce • Real-time • Output directly to document or collection • Runs inside MongoDB on local data − Adds load to your DB − In Javascript – debugging can be a challenge − Translating in and out of C++ Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 32. Aggregation Framework Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 33. Aggregation Framework • op1 MongoDB op2 opN Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 34. Aggregation Framework in 60 Seconds Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 35. Aggregation Framework Operators • $project • $match • $limit • $skip • $sort • $unwind • $group Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 36. $match • Filter documents • Uses existing query syntax • If using $geoNear it has to be first in pipeline • $where is not supported Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 37. Matching Field Values { "_id" : 271421, "amenity" : "pub", "name" : "Sir Walter Tyrrell", "location" : { "type" : "Point", "coordinates" : [ -1.6192422, 50.9131996 ] } } { "$match": { "name": "The Red Lion" }} { "_id" : 271466, "amenity" : "pub", "name" : "The Red Lion", "location" : { "type" : "Point", "coordinates" : [ -1.5494749, 50.7837119 ]} { "_id" : 271466, "amenity" : "pub", "name" : "The Red Lion", "location" : { "type" : "Point", "coordinates" : [ -1.5494749, 50.7837119 ] } } Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 38. $project • Reshape documents • Include, exclude or rename fields • Inject computed fields • Create sub-document fields Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 39. Including and Excluding Fields { “$project”: { { "_id" : 271466, "name" : "The Red Lion", “_id”: 0, “amenity”: 1, “name”: 1, "location" : { }} "amenity" : "pub", "type" : "Point", "coordinates" : [ -1.5494749, 50.7837119 ] } { “amenity” : “pub”, “name” : “The Red Lion” } } Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 40. Reformatting Documents { “$project”: { { "_id" : 271466, "amenity" : "pub", "name" : "The Red Lion", "location" : { “_id”: 0, “name”: 1, “meta”: { “type”: “$amenity”} }} "type" : "Point", "coordinates" : [ -1.5494749, 50.7837119 ] } } { “name” : “The Red Lion” “meta” : { “type” : “pub” }} Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 41. Dealing with Arrays { “$project”: { { "_id" : 271466, "amenity" : "pub", "name" : "The Red Lion", "facilities" : [ "toilets", “_id”: 0, “name”: 1, “meta”: { “type”: “$amenity”} }} {"$unwind": "$facility"} "food" ], } { "name" : "The Red Lion", "facility" : "toilets" }, { "name" : "The Red Lion", "facility" : "food" } Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 42. $group • Group documents by an ID • Field reference, object, constant • Other output fields are computed $max, $min, $avg, $sum $addToSet, $push $first, $last • Processes all data in memory Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 43. Back to the pub! • http://www.offwestend.com/index.php/theatres/pastshows/71 Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 44. Popular Pub Names >var popular_pub_names = [ { $match : location: { $within: { $centerSphere: [[-0.12, 51.516], 2 / 3959]}}} }, { $group : { _id: “$name” value: {$sum: 1} } }, { $sort : {value: -1} }, { $limit : 10 } Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 45. Results > db.pubs.aggregate(popular_pub_names) { "result" : [ { "_id" : "All Bar One", "value" : 11 } { "_id" : "The Slug & Lettuce", "value" : 7 } { "_id" : "The Coach & Horses", "value" : 6 } { "_id" : "The Green Man", "value" : 5 } { "_id" : "The Kings Arms", "value" : 5 } { "_id" : "The Red Lion", "value" : 5 } { "_id" : "Corney & Barrow", "value" : 4 } { "_id" : "O'Neills", "value" : 4 } { "_id" : "Pitcher & Piano", "value" : 4 } { "_id" : "The Crown", "value" : 4 } ], "ok" : 1 } Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 46. Aggregation Framework Benefits • Real-time • Simple yet powerful interface • Declared in JSON, executes in C++ • Runs inside MongoDB on local data − Adds load to your DB − Limited Operators − Data output is limited Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 47. Analyzing MongoDB Data in External Systems Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 48. MongoDB with Hadoop • MongoDB Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 49. MongoDB with Hadoop • MongoDB Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/ warehouse
  • 50. MongoDB with Hadoop • ETL Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/ MongoDB
  • 51. Map Pub Names in Python #!/usr/bin/env python from pymongo_hadoop import BSONMapper def mapper(documents): bounds = get_bounds() # ~2 mile polygon for doc in documents: geo = get_geo(doc["location"]) # Convert the geo type if not geo: continue if bounds.intersects(geo): yield {'_id': doc['name'], 'count': 1} BSONMapper(mapper) print >> sys.stderr, "Done Mapping." Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 52. Reduce Pub Names in Python #!/usr/bin/env python from pymongo_hadoop import BSONReducer def reducer(key, values): _count = 0 for v in values: _count += v['count'] return {'_id': key, 'value': _count} BSONReducer(reducer) Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 53. Execute MapReduce hadoop jar target/mongo-hadoop-streaming-assembly-1.1.0-rc0.jar -mapper examples/pub/map.py -reducer examples/pub/reduce.py -mongo mongodb://127.0.0.1/demo.pubs -outputURI mongodb://127.0.0.1/demo.pub_names Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 54. Popular Pub Names Nearby > db.pub_names.find().sort({value: -1}).limit(10) { { { { { { { { { { "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" : : : : : : : : : : "All Bar One", "value" : 11 } "The Slug & Lettuce", "value" : 7 } "The Coach & Horses", "value" : 6 } "The Kings Arms", "value" : 5 } "Corney & Barrow", "value" : 4 } "O'Neills", "value" : 4 } "Pitcher & Piano", "value" : 4 } "The Crown", "value" : 4 } "The George", "value" : 4 } "The Green Man", "value" : 4 } Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 55. MongoDB and Hadoop • Away from data store • Can leverage existing data processing infrastructure • Can horizontally scale your data processing - Offline batch processing - Requires synchronisation between store & processor - Infrastructure is much more complex Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 56. The Future of Big Data and MongoDB Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 57. What is Big Data? Big Data today will be normal tomorrow Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 58. Exponential Data Growth Billions of URLs indexed by Google 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 2000 2002 2004 2006 2008 Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/ 2010 2012
  • 59. MongoDB enables you to scale big Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 60. MongoDB is evolving so you can process the big Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 61. Data Processing with MongoDB • Process in MongoDB using Map/Reduce • Process in MongoDB using Aggregation Framework • Process outside MongoDB using Hadoop and other external tools Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 62. Questions? Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 63. Codemotion Milano Thanks! Massimo Brignoli Solutions Architect, MongoDB Inc. Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/

Hinweis der Redaktion

  1. IBM designed IMS with Rockwell and Caterpillar starting in 1966 for the Apollo program. IMS&apos;s challenge was to inventory the very large bill of materials (BOM) for the Saturn V moon rocket and Apollo space vehicle.
  2. This is helpful because as much as 95% of enterprise information is unstructured, and doesn’t fit neatly into tidy rows and columns. NoSQL and Hadoop allow for dynamic schema.
  3. The industry is talking about Hadoop and MongoDB for Big Data. So should you
  4. This is where MongoDB fits into the existing enterprise IT stackMongoDB is an operational data store used for online data, in the same way that Oracle is an operational data store. It supports applications that ingest, store, manage and even analyze data in real-time. (Compared to Hadoop and data warehouses, which are used for offline, batch analytical workloads.)
  5. Another common use case we see is warehousing of data -* again the connector allows you to utilize existing libraries via hadoopUS
  6. The third most common usecase is an ETL - extract transform load - function.Then putting the aggregated data into mongodb for further analysis.