SlideShare ist ein Scribd-Unternehmen logo
1 von 58
Principal Solutions Architect, MongoDB, Inc.
Asya Kamsky
Data Processing and
Aggregation Options
#BigDataCamp @MongoDB @asya999
Applications and data
Store
Process
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Big Data
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Big Data in MongoDB
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Big Data in MongoDB
• An ideal operational database
• High performance for storage and
retrieval at large scale
• Robust query interface for intelligent
operations
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
MongoDB data processing
options
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Big Data in MongoDB
Pre-aggregate in MongoDB for real-time queries
Process in MongoDB using Aggregation
Framework
Process in MongoDB using Map/Reduce
Process outside MongoDB using Hadoop and
other external tools
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Aggregation Framework
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Aggregation Framework
• Declared in JSON, executes in C++
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Aggregation Framework
• Declared in JSON, executes in C++
• Flexible, functional, and simple
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Aggregation Framework
• Declared in JSON, executes in C++
• Flexible, functional, and simple
• Plays nice with sharding
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Pipeline
ps ax | grep mongod | head 1
Piping command line operations
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Pipeline
$match $group | $sort|
Piping aggregation operations
Stream of documents Result document
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Pipeline Operators
• $match
• $project
• $group
• $unwind
• $sort/$skip/$limit
• $redact
• $geoNear
• $out
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
$match
• Filter documents
• Uses existing query syntax
• 2.4 added support for geospatial operations
• 2.6 added support for full text search indexes
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
{ $match : { state : "NY" } }
{
city: “SAN FRANCISCO",
loc: [-122.4614, 37.781],
state: ”CA"
}
{
city: "NEW YORK",
loc: [ -73.989, 40.731],
state: "NY"
}
{
city: “PALO ALTO",
loc: [ -122.127, 37.418],
state: ”CA"
}
{ $match : { loc : { $geoWithin:
{$centerSphere : [ [ -122.4, 37.79 ], 20/3959 ] }
{
city: “SAN FRANCISCO",
loc: [-122.4614, 37.781],
state: ”CA"
}
{
city: "NEW YORK",
loc: [ -73.989, 40.731],
state: "NY"
}
{
city: “PALO ALTO",
loc: [ -122.127, 37.418],
state: ”CA"
}
$project
• Reshape documents
• Include, exclude or rename fields
• Inject computed fields
• Create sub-document fields
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
{
loc: [-122.3892, 37.7864],
state: ”CA"
}
{
_id: "94105",
city: “SAN FRANCISCO",
loc: [-122.3892, 37.7864],
state: ”CA"
}
Selecting and Excluding
Fields
$project: { _id: 0, loc: 1, state: 1 }
{
zip: "94105",
cityState: ”SAN FRANCISCO,
CA"
}
{
_id: "94105",
city: “SAN FRANCISCO",
loc: [-122.3892, 37.7864],
state: ”CA"
}
$project:{zip:"$_id",cityState: {$concat:["$city", ", ", "$state" ]},_id:0}
Renaming and Computing
Fields
{
zip: "94105",
cityState: ”SAN FRANCISCO,
CA"
}
{
_id: "94105",
city: “SAN FRANCISCO",
loc: [-122.3892, 37.7864],
state: ”CA"
}
$project:{zip:"$_id",cityState: {$concat:["$city", ", ", "$state" ]},_id:0}
Renaming and Computing
Fields
New Field Operation
{
dt : {
y : 2012,
m : 9,
d : 1
},
totalprice: 123350.97,
status: "F"
}
{
_id : 6694,
cname : "Cust#000060209",
status" : "F",
totalprice : 123350.97,
orderdate : ISODate("2012-09-
01T13:11:31Z"),
lineitems: [
{ ... },
{ ... },
{ ... }
]
}
Renaming and Computing
Fields
$project : { dt: { y : { "$year" : "$orderdate" },
m : { "$month" : "$orderdate" },
d : { "$dayOfMonth" : "$orderdate" } },
totalprice : 1, status : 1, _id : 0 }
$group
• Group documents by an ID
– Field reference, object, constant
• Other output fields are computed
– $max, $min, $avg, $sum
– $addToSet, $push
– $first, $last
• Processes all data in memory
– can utilize external disk-based sort in 2.6
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Find the smallest cities
within twenty miles of San
Francisco{ _id: "94306",
city: “PALO ALTO",
loc: [ -122.127, 37.418],
pop: 24309 }
{ _id: "10280",
city: "NEW YORK",
loc: [ -74.016, 40.710],
pop: 5574 }
{ _id: "94124",
city: “SAN FRANCISCO",
loc: [-122.388, 37.73],
pop: 27239 }
{
_id: "WOODACRE",
pop: 1524
}
{
_id: "STINSON BEACH",
pop: 630
}
{ _id: "94306",
city: “PALO ALTO",
loc: [ -122.127, 37.418],
pop: 24309 }
{ _id: "10280",
city: "NEW YORK",
loc: [ -74.016, 40.710],
pop: 5574 }
{ _id: "94124",
city: “SAN FRANCISCO",
loc: [-122.388, 37.73],
pop: 27239 }
{
_id: "BOLINAS",
pop: 1555
}
{ $match : { loc :
{ $geoWithin:
{ $centerSphere : [
[ -122.4, 37.79 ],
20/3959
]
} } }
{ $group : {
_id : "$city",
pop : {$sum:
"$pop"}
}
}
{ $sort : { "pop" : 1 } },
{ $limit : 3 }
Find the smallest cities
within twenty miles of San
Francisco
$unwind
• Operate on an array field
• Yield new documents for each array element
– Array replaced by element value
– Missing/empty fields → no output
– Non-array fields → error
• Pipe to $group to aggregate array values
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
$unwind
{
title: "The Great Gatsby",
ISBN: "9781857150193",
subjects: "Long Island"
}
{ $unwind: "$subjects" }
{
title: "The Great Gatsby",
ISBN: "9781857150193",
subjects: "New York"
}
{
title: "The Great Gatsby",
ISBN: "9781857150193",
subjects: "1920s"
}
{
title: "The Great Gatsby",
ISBN: "9781857150193",
subjects: [
"Long Island",
"New York",
"1920s"
]
}
2.6 Improvements
• Returns a cursor (not a document)
– just like a regular find
• New stages
– $redact
– $out
• New operators:
– set expression operators.
– $let and $map operators to allow for the use of variables.
– $literal operator and $size operator
– $cond expression object
• Integrated $text search
• Performance improvements, "explain" and more
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Advantages
• Runs on the server
– Uses indexes
– Uses shards
• Simple to build complex pipelines
• Easy to use from any driver
• Fast -er than other options
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Limitations
• Pipeline operator memory limits
– 10% of total system RAM in 2.4 and earlier
– 100MB in 2.6 but can use disk for external sort
• Some data types not allowed
– Code, CodeWithScope, etc.
• Result size limited• Result size limited (in 2.4 and earlier)
– 2.6 returns a cursor or direct output to a new collection
No result size limit!
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
MapReduce
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
MapReduce
• Versatile, powerful
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
MapReduce
• Versatile, powerful
• Intended for complex data
analysis
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
MapReduce
• Versatile, powerful
• Intended for complex data
analysis
• Overkill for simple aggregations
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
MapReduce
Worker thread
calls mapper
Data Set
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
MapReduce
Workers call Reduce()
Data Set
Output
Worker thread
calls mapper
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
{
_id: 375,
title: "The Great Gatsby",
ISBN: "9781857150193",
available: true,
pages: 218,
chapters: 9,
subjects: [
"Long Island",
"New York",
"1920s"
],
language: "English"
}
Our Example Data
MapReduce
db.books.mapReduce(
map, reduce, {finalize: finalize, out: { inline : 1} } )
db.books.mapReduce(
map, reduce, {finalize: finalize, out: { inline : 1} } )
function map() {
var key = this.language;
emit ( key, { totalPages : this.pages, numBooks : 1
} )
}
MapReduce
db.books.mapReduce(
map, reduce, {finalize: finalize, out: { inline : 1} } )
db.books.mapReduce(
map, reduce, {finalize: finalize, out: { inline : 1} } )
function reduce(key, values) {
var result = { numBooks : 0, totalPages : 0};
values.forEach(function (value) {
result.numBooks += value.numBooks;
result.totalPages += value.totalPages;
});
return result;
}
MapReduce
db.books.mapReduce(
map, reduce, {finalize: finalize, out: { inline : 1} } )
db.books.mapReduce(
map, reduce, {finalize: finalize, out: { inline : 1} } )
function finalize( key, value ) {
if ( value.numBooks != 0 )
return value.totalPages / value.numBooks;
}
MapReduce
db.books.mapReduce(
map, reduce, {finalize: finalize, out: { inline : 1} } )
function finalize( key, value ) {
if ( value.numBooks != 0 )
return value.totalPages / value.numBooks;
}
db.books.mapReduce(
map, reduce, {finalize: finalize, out: { inline : 1} } )
MapReduce
db.books.mapReduce(
map, reduce, {finalize: finalize, out: { inline : 1} } )
"results" : [
{
"_id" : "English",
"value" : 653
},
{
"_id" : "Russian",
"value" : 1440
}
]
Advantages
• Map and reduce code can be arbitrarily complex
– JavaScript, helper functions
• Results can be saved into a new collection
– replace, merge or re-reduce
• Incremental MapReduce
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Limitations
• Implemented with JavaScript
– Single-threaded
• Slower than Aggregation Framework
– Batch, not real time
• Harder to understand, implement, debug...
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Analyzing MongoDB Data in
External Systems
Hadoop
Framework that allows for the distributed processing
of large data sets across clusters of computers
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Hadoop MongoDB Connector
• MongoDB or BSON files as input/output
• Source data can be filtered with queries
• Hadoop Streaming support
– For jobs written in Python, Ruby, Node.js
• Supports Hadoop tools such as Pig and Hive
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Processing Big Data
• Data broken up into smaller pieces
• Process data across multiple nodes
Hadoop Hadoop Hadoop Hadoop
Hadoop Hadoop Hadoop Hadoop Hadoop
Hadoop
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Input splits on Non-sharded
Systems
Single Map
Reduce
Hadoop Hadoop Hadoop Hadoop Hadoop
Hadoop Hadoop Hadoop Hadoop Hadoop
Total Dataset
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Advantages
• Processing decoupled
from data store
• Parallel processing
• Leverage existing
infrastructure
• Java has rich set of data
processing libraries
– And other languages if
using Hadoop Streaming
• Batch processing
• Requires synchronization
between data store and
processor
• Adds complexity to
infrastructure
Disadvantages
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Storm
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Storm
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Storm MongoDB connector
• Spout for MongoDB oplog or capped collections
– Filtering capabilities
– Threaded and non-blocking
• Output to new or existing documents
– Insert/update bolt
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Aggregating MongoDB’s
Data Processing Options
Internal Tools
• Storing pre-aggregated data
– An exercise in schema design
• Aggregation Framework
• MapReduce
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
External Tools
Data Processing andAggregation Options in MongoDB / Asya
Kamsky
Questions?
Principal Solutions Architect, MongoDB Inc.
Asya Kamsky
Thank You
#BigDataCamp @MongoDB @asya999

Weitere ähnliche Inhalte

Was ist angesagt?

The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation FrameworkMongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBNosh Petigara
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBMongoDB
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsMongoDB
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorHenrik Ingo
 
Back to Basics: My First MongoDB Application
Back to Basics: My First MongoDB ApplicationBack to Basics: My First MongoDB Application
Back to Basics: My First MongoDB ApplicationMongoDB
 
Back to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationBack to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationMongoDB
 
Social Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBSocial Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBTakahiro Inoue
 
Conceptos bĂĄsicos. Seminario web 5: IntroducciĂłn a Aggregation Framework
Conceptos bĂĄsicos. Seminario web 5: IntroducciĂłn a Aggregation FrameworkConceptos bĂĄsicos. Seminario web 5: IntroducciĂłn a Aggregation Framework
Conceptos bĂĄsicos. Seminario web 5: IntroducciĂłn a Aggregation FrameworkMongoDB
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopSteven Francia
 
Getting Started with MongoDB and NodeJS
Getting Started with MongoDB and NodeJSGetting Started with MongoDB and NodeJS
Getting Started with MongoDB and NodeJSMongoDB
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation FrameworkCaserta
 
MongoDB for Analytics
MongoDB for AnalyticsMongoDB for Analytics
MongoDB for AnalyticsMongoDB
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLMongoDB
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesMongoDB
 
Webinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation FrameworkWebinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation FrameworkMongoDB
 
Conceptos bĂĄsicos. Seminario web 4: IndexaciĂłn avanzada, Ă­ndices de texto y g...
Conceptos bĂĄsicos. Seminario web 4: IndexaciĂłn avanzada, Ă­ndices de texto y g...Conceptos bĂĄsicos. Seminario web 4: IndexaciĂłn avanzada, Ă­ndices de texto y g...
Conceptos bĂĄsicos. Seminario web 4: IndexaciĂłn avanzada, Ă­ndices de texto y g...MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBantoinegirbal
 
MongoDB - Back to Basics - La tua prima Applicazione
MongoDB - Back to Basics - La tua prima ApplicazioneMongoDB - Back to Basics - La tua prima Applicazione
MongoDB - Back to Basics - La tua prima ApplicazioneMassimo Brignoli
 

Was ist angesagt? (20)

The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
 
Back to Basics: My First MongoDB Application
Back to Basics: My First MongoDB ApplicationBack to Basics: My First MongoDB Application
Back to Basics: My First MongoDB Application
 
Back to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationBack to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB Application
 
Social Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBSocial Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDB
 
Conceptos bĂĄsicos. Seminario web 5: IntroducciĂłn a Aggregation Framework
Conceptos bĂĄsicos. Seminario web 5: IntroducciĂłn a Aggregation FrameworkConceptos bĂĄsicos. Seminario web 5: IntroducciĂłn a Aggregation Framework
Conceptos bĂĄsicos. Seminario web 5: IntroducciĂłn a Aggregation Framework
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
 
Getting Started with MongoDB and NodeJS
Getting Started with MongoDB and NodeJSGetting Started with MongoDB and NodeJS
Getting Started with MongoDB and NodeJS
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
MongoDB for Analytics
MongoDB for AnalyticsMongoDB for Analytics
MongoDB for Analytics
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
 
An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDB
 
Webinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation FrameworkWebinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation Framework
 
Conceptos bĂĄsicos. Seminario web 4: IndexaciĂłn avanzada, Ă­ndices de texto y g...
Conceptos bĂĄsicos. Seminario web 4: IndexaciĂłn avanzada, Ă­ndices de texto y g...Conceptos bĂĄsicos. Seminario web 4: IndexaciĂłn avanzada, Ă­ndices de texto y g...
Conceptos bĂĄsicos. Seminario web 4: IndexaciĂłn avanzada, Ă­ndices de texto y g...
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB - Back to Basics - La tua prima Applicazione
MongoDB - Back to Basics - La tua prima ApplicazioneMongoDB - Back to Basics - La tua prima Applicazione
MongoDB - Back to Basics - La tua prima Applicazione
 

Andere mochten auch

Ag big datacampla-06-14-2014-ajay_gopal
Ag big datacampla-06-14-2014-ajay_gopalAg big datacampla-06-14-2014-ajay_gopal
Ag big datacampla-06-14-2014-ajay_gopalData Con LA
 
Aziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jhaAziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jhaData Con LA
 
Summit v4 dave wolcott
Summit v4 dave wolcottSummit v4 dave wolcott
Summit v4 dave wolcottData Con LA
 
Big datacamp june14_alex_liu
Big datacamp june14_alex_liuBig datacamp june14_alex_liu
Big datacamp june14_alex_liuData Con LA
 
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...Data Con LA
 
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...Data Con LA
 
Yarn cloudera-kathleenting061414 kate-ting
Yarn cloudera-kathleenting061414 kate-tingYarn cloudera-kathleenting061414 kate-ting
Yarn cloudera-kathleenting061414 kate-tingData Con LA
 
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...Data Con LA
 
20140614 introduction to spark-ben white
20140614 introduction to spark-ben white20140614 introduction to spark-ben white
20140614 introduction to spark-ben whiteData Con LA
 
Kiji cassandra la june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kellyKiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la june 2014 - v02 clint-kellyData Con LA
 
La big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixitLa big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixitData Con LA
 
140614 bigdatacamp-la-keynote-jon hsieh
140614 bigdatacamp-la-keynote-jon hsieh140614 bigdatacamp-la-keynote-jon hsieh
140614 bigdatacamp-la-keynote-jon hsiehData Con LA
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRData Con LA
 
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...Data Con LA
 
Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Data Con LA
 
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...Data Con LA
 
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...Data Con LA
 
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...Data Con LA
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...Data Con LA
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Data Con LA
 

Andere mochten auch (20)

Ag big datacampla-06-14-2014-ajay_gopal
Ag big datacampla-06-14-2014-ajay_gopalAg big datacampla-06-14-2014-ajay_gopal
Ag big datacampla-06-14-2014-ajay_gopal
 
Aziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jhaAziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jha
 
Summit v4 dave wolcott
Summit v4 dave wolcottSummit v4 dave wolcott
Summit v4 dave wolcott
 
Big datacamp june14_alex_liu
Big datacamp june14_alex_liuBig datacamp june14_alex_liu
Big datacamp june14_alex_liu
 
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
 
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
 
Yarn cloudera-kathleenting061414 kate-ting
Yarn cloudera-kathleenting061414 kate-tingYarn cloudera-kathleenting061414 kate-ting
Yarn cloudera-kathleenting061414 kate-ting
 
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
 
20140614 introduction to spark-ben white
20140614 introduction to spark-ben white20140614 introduction to spark-ben white
20140614 introduction to spark-ben white
 
Kiji cassandra la june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kellyKiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la june 2014 - v02 clint-kelly
 
La big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixitLa big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixit
 
140614 bigdatacamp-la-keynote-jon hsieh
140614 bigdatacamp-la-keynote-jon hsieh140614 bigdatacamp-la-keynote-jon hsieh
140614 bigdatacamp-la-keynote-jon hsieh
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
 
Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014
 
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
 
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
 
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
 

Ähnlich wie 2014 bigdatacamp asya_kamsky

Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsJoins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsAndrew Morgan
 
MongoDB 3.2 - Analytics
MongoDB 3.2  - AnalyticsMongoDB 3.2  - Analytics
MongoDB 3.2 - AnalyticsMassimo Brignoli
 
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2MongoDB
 
Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation OptionsWebinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation OptionsMongoDB
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...confluent
 
Introduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopIntroduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopAhmedabadJavaMeetup
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
EinfĂźhrung in MongoDB
EinfĂźhrung in MongoDBEinfĂźhrung in MongoDB
EinfĂźhrung in MongoDBNETUserGroupBern
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analyticsMongoDB
 
Gab document db scaling database
Gab   document db scaling databaseGab   document db scaling database
Gab document db scaling databaseMUG PerĂş
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB
 
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5Keshav Murthy
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Keshav Murthy
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & AggregationMongoDB
 
OSCON 2011 CouchApps
OSCON 2011 CouchAppsOSCON 2011 CouchApps
OSCON 2011 CouchAppsBradley Holt
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation FrameworkMongoDB
 

Ähnlich wie 2014 bigdatacamp asya_kamsky (20)

Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsJoins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation Enhancements
 
MongoDB 3.2 - Analytics
MongoDB 3.2  - AnalyticsMongoDB 3.2  - Analytics
MongoDB 3.2 - Analytics
 
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
 
Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation OptionsWebinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation Options
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
 
Introduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopIntroduction to MongoDB and Workshop
Introduction to MongoDB and Workshop
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Presentation
PresentationPresentation
Presentation
 
EinfĂźhrung in MongoDB
EinfĂźhrung in MongoDBEinfĂźhrung in MongoDB
EinfĂźhrung in MongoDB
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
 
Gab document db scaling database
Gab   document db scaling databaseGab   document db scaling database
Gab document db scaling database
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
 
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
 
Querying mongo db
Querying mongo dbQuerying mongo db
Querying mongo db
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
 
OSCON 2011 CouchApps
OSCON 2011 CouchAppsOSCON 2011 CouchApps
OSCON 2011 CouchApps
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
MongoDB Meetup
MongoDB MeetupMongoDB Meetup
MongoDB Meetup
 

Mehr von Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

Mehr von Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

KĂźrzlich hochgeladen

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vĂĄzquez
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 

KĂźrzlich hochgeladen (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 

2014 bigdatacamp asya_kamsky

  • 1. Principal Solutions Architect, MongoDB, Inc. Asya Kamsky Data Processing and Aggregation Options #BigDataCamp @MongoDB @asya999
  • 2. Applications and data Store Process Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 3. Big Data Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 4. Big Data in MongoDB Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 5. Big Data in MongoDB • An ideal operational database • High performance for storage and retrieval at large scale • Robust query interface for intelligent operations Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 6. MongoDB data processing options Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 7. Big Data in MongoDB Pre-aggregate in MongoDB for real-time queries Process in MongoDB using Aggregation Framework Process in MongoDB using Map/Reduce Process outside MongoDB using Hadoop and other external tools Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 8. Aggregation Framework Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 9. Aggregation Framework • Declared in JSON, executes in C++ Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 10. Aggregation Framework • Declared in JSON, executes in C++ • Flexible, functional, and simple Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 11. Aggregation Framework • Declared in JSON, executes in C++ • Flexible, functional, and simple • Plays nice with sharding Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 12. Pipeline ps ax | grep mongod | head 1 Piping command line operations Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 13. Pipeline $match $group | $sort| Piping aggregation operations Stream of documents Result document Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 14. Pipeline Operators • $match • $project • $group • $unwind • $sort/$skip/$limit • $redact • $geoNear • $out Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 15. $match • Filter documents • Uses existing query syntax • 2.4 added support for geospatial operations • 2.6 added support for full text search indexes Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 16. { $match : { state : "NY" } } { city: “SAN FRANCISCO", loc: [-122.4614, 37.781], state: ”CA" } { city: "NEW YORK", loc: [ -73.989, 40.731], state: "NY" } { city: “PALO ALTO", loc: [ -122.127, 37.418], state: ”CA" }
  • 17. { $match : { loc : { $geoWithin: {$centerSphere : [ [ -122.4, 37.79 ], 20/3959 ] } { city: “SAN FRANCISCO", loc: [-122.4614, 37.781], state: ”CA" } { city: "NEW YORK", loc: [ -73.989, 40.731], state: "NY" } { city: “PALO ALTO", loc: [ -122.127, 37.418], state: ”CA" }
  • 18. $project • Reshape documents • Include, exclude or rename fields • Inject computed fields • Create sub-document fields Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 19. { loc: [-122.3892, 37.7864], state: ”CA" } { _id: "94105", city: “SAN FRANCISCO", loc: [-122.3892, 37.7864], state: ”CA" } Selecting and Excluding Fields $project: { _id: 0, loc: 1, state: 1 }
  • 20. { zip: "94105", cityState: ”SAN FRANCISCO, CA" } { _id: "94105", city: “SAN FRANCISCO", loc: [-122.3892, 37.7864], state: ”CA" } $project:{zip:"$_id",cityState: {$concat:["$city", ", ", "$state" ]},_id:0} Renaming and Computing Fields
  • 21. { zip: "94105", cityState: ”SAN FRANCISCO, CA" } { _id: "94105", city: “SAN FRANCISCO", loc: [-122.3892, 37.7864], state: ”CA" } $project:{zip:"$_id",cityState: {$concat:["$city", ", ", "$state" ]},_id:0} Renaming and Computing Fields New Field Operation
  • 22. { dt : { y : 2012, m : 9, d : 1 }, totalprice: 123350.97, status: "F" } { _id : 6694, cname : "Cust#000060209", status" : "F", totalprice : 123350.97, orderdate : ISODate("2012-09- 01T13:11:31Z"), lineitems: [ { ... }, { ... }, { ... } ] } Renaming and Computing Fields $project : { dt: { y : { "$year" : "$orderdate" }, m : { "$month" : "$orderdate" }, d : { "$dayOfMonth" : "$orderdate" } }, totalprice : 1, status : 1, _id : 0 }
  • 23. $group • Group documents by an ID – Field reference, object, constant • Other output fields are computed – $max, $min, $avg, $sum – $addToSet, $push – $first, $last • Processes all data in memory – can utilize external disk-based sort in 2.6 Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 24. Find the smallest cities within twenty miles of San Francisco{ _id: "94306", city: “PALO ALTO", loc: [ -122.127, 37.418], pop: 24309 } { _id: "10280", city: "NEW YORK", loc: [ -74.016, 40.710], pop: 5574 } { _id: "94124", city: “SAN FRANCISCO", loc: [-122.388, 37.73], pop: 27239 }
  • 25. { _id: "WOODACRE", pop: 1524 } { _id: "STINSON BEACH", pop: 630 } { _id: "94306", city: “PALO ALTO", loc: [ -122.127, 37.418], pop: 24309 } { _id: "10280", city: "NEW YORK", loc: [ -74.016, 40.710], pop: 5574 } { _id: "94124", city: “SAN FRANCISCO", loc: [-122.388, 37.73], pop: 27239 } { _id: "BOLINAS", pop: 1555 } { $match : { loc : { $geoWithin: { $centerSphere : [ [ -122.4, 37.79 ], 20/3959 ] } } } { $group : { _id : "$city", pop : {$sum: "$pop"} } } { $sort : { "pop" : 1 } }, { $limit : 3 } Find the smallest cities within twenty miles of San Francisco
  • 26. $unwind • Operate on an array field • Yield new documents for each array element – Array replaced by element value – Missing/empty fields → no output – Non-array fields → error • Pipe to $group to aggregate array values Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 27. $unwind { title: "The Great Gatsby", ISBN: "9781857150193", subjects: "Long Island" } { $unwind: "$subjects" } { title: "The Great Gatsby", ISBN: "9781857150193", subjects: "New York" } { title: "The Great Gatsby", ISBN: "9781857150193", subjects: "1920s" } { title: "The Great Gatsby", ISBN: "9781857150193", subjects: [ "Long Island", "New York", "1920s" ] }
  • 28. 2.6 Improvements • Returns a cursor (not a document) – just like a regular find • New stages – $redact – $out • New operators: – set expression operators. – $let and $map operators to allow for the use of variables. – $literal operator and $size operator – $cond expression object • Integrated $text search • Performance improvements, "explain" and more Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 29. Advantages • Runs on the server – Uses indexes – Uses shards • Simple to build complex pipelines • Easy to use from any driver • Fast -er than other options Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 30. Limitations • Pipeline operator memory limits – 10% of total system RAM in 2.4 and earlier – 100MB in 2.6 but can use disk for external sort • Some data types not allowed – Code, CodeWithScope, etc. • Result size limited• Result size limited (in 2.4 and earlier) – 2.6 returns a cursor or direct output to a new collection No result size limit! Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 31. MapReduce Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 32. MapReduce • Versatile, powerful Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 33. MapReduce • Versatile, powerful • Intended for complex data analysis Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 34. MapReduce • Versatile, powerful • Intended for complex data analysis • Overkill for simple aggregations Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 35. MapReduce Worker thread calls mapper Data Set Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 36. MapReduce Workers call Reduce() Data Set Output Worker thread calls mapper Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 37. { _id: 375, title: "The Great Gatsby", ISBN: "9781857150193", available: true, pages: 218, chapters: 9, subjects: [ "Long Island", "New York", "1920s" ], language: "English" } Our Example Data
  • 38. MapReduce db.books.mapReduce( map, reduce, {finalize: finalize, out: { inline : 1} } ) db.books.mapReduce( map, reduce, {finalize: finalize, out: { inline : 1} } ) function map() { var key = this.language; emit ( key, { totalPages : this.pages, numBooks : 1 } ) }
  • 39. MapReduce db.books.mapReduce( map, reduce, {finalize: finalize, out: { inline : 1} } ) db.books.mapReduce( map, reduce, {finalize: finalize, out: { inline : 1} } ) function reduce(key, values) { var result = { numBooks : 0, totalPages : 0}; values.forEach(function (value) { result.numBooks += value.numBooks; result.totalPages += value.totalPages; }); return result; }
  • 40. MapReduce db.books.mapReduce( map, reduce, {finalize: finalize, out: { inline : 1} } ) db.books.mapReduce( map, reduce, {finalize: finalize, out: { inline : 1} } ) function finalize( key, value ) { if ( value.numBooks != 0 ) return value.totalPages / value.numBooks; }
  • 41. MapReduce db.books.mapReduce( map, reduce, {finalize: finalize, out: { inline : 1} } ) function finalize( key, value ) { if ( value.numBooks != 0 ) return value.totalPages / value.numBooks; } db.books.mapReduce( map, reduce, {finalize: finalize, out: { inline : 1} } )
  • 42. MapReduce db.books.mapReduce( map, reduce, {finalize: finalize, out: { inline : 1} } ) "results" : [ { "_id" : "English", "value" : 653 }, { "_id" : "Russian", "value" : 1440 } ]
  • 43. Advantages • Map and reduce code can be arbitrarily complex – JavaScript, helper functions • Results can be saved into a new collection – replace, merge or re-reduce • Incremental MapReduce Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 44. Limitations • Implemented with JavaScript – Single-threaded • Slower than Aggregation Framework – Batch, not real time • Harder to understand, implement, debug... Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 45. Analyzing MongoDB Data in External Systems
  • 46. Hadoop Framework that allows for the distributed processing of large data sets across clusters of computers Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 47. Hadoop MongoDB Connector • MongoDB or BSON files as input/output • Source data can be filtered with queries • Hadoop Streaming support – For jobs written in Python, Ruby, Node.js • Supports Hadoop tools such as Pig and Hive Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 48. Processing Big Data • Data broken up into smaller pieces • Process data across multiple nodes Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 49. Input splits on Non-sharded Systems Single Map Reduce Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Total Dataset Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 50. Advantages • Processing decoupled from data store • Parallel processing • Leverage existing infrastructure • Java has rich set of data processing libraries – And other languages if using Hadoop Streaming • Batch processing • Requires synchronization between data store and processor • Adds complexity to infrastructure Disadvantages Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 51. Storm Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 52. Storm Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 53. Storm MongoDB connector • Spout for MongoDB oplog or capped collections – Filtering capabilities – Threaded and non-blocking • Output to new or existing documents – Insert/update bolt Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 55. Internal Tools • Storing pre-aggregated data – An exercise in schema design • Aggregation Framework • MapReduce Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 56. External Tools Data Processing andAggregation Options in MongoDB / Asya Kamsky
  • 58. Principal Solutions Architect, MongoDB Inc. Asya Kamsky Thank You #BigDataCamp @MongoDB @asya999

Hinweis der Redaktion

  1. "h" : { "$hour" : "$time" }, "m" : { "$minute" : "$time" }, "s" : { "$second" : "$time" },
  2. { $match : { loc : { $geoWithin: {$centerSphere : [ [ -122.4, 37.79 ], 20/3959 ] }}} { city: “PALO ALTO", loc: [ -122.127, 37.418], state: ”CA" }
  3. { $match : { loc : { $geoWithin: {$centerSphere : [ [ -122.4, 37.79 ], 20/3959 ] }}} { city: “PALO ALTO", loc: [ -122.127, 37.418], state: ”CA" }
  4. 2.4 will improve somewhat
  5. 2.4 will improve somewhat
  6. Distributed, real-time computation system.
  7. Distributed, real-time computation system.