SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Downloaden Sie, um offline zu lesen
#MongoDB 
Using MongoDB and Hadoop 
Together For Success 
Buzz Moschetti 
buzz.moschetti@mongodb.com 
Enterprise Architect, MongoDB
Who is your Presenter? 
• Yes, I use “Buzz” on my business cards 
• Former Investment Bank Chief Architect at 
JPMorganChase and Bear Stearns before that 
• Over 25 years of designing and building systems 
• Big and small 
• Super-specialized to broadly useful in any vertical 
• “Traditional” to completely disruptive 
• Advocate of language leverage and strong factoring 
• Still programming – using emacs, of course
Agenda 
• (Occasionally) Brutal Truths about Big Data 
• The Key To Success in Large Scale Data Management 
• Review of Directed Content Business Architecture 
• Technical Implementation Examples 
• Recommendation Capability 
• Realtime Trade / Position Risk 
• Q & A
Truths 
• Clear definition of Big Data still maturing 
• Efficiently operationalizing Big Data is non-trivial 
• Developing, debugging, understanding MapReduce 
• Cluster monitoring & management, job scheduling/recovery 
• If you thought regular ETL Hell was bad…. 
• Big Data is not about math/set accuracy 
• The last 25000 items in a 25,497,612 set “don’t matter” 
• Big Data questions are best asked periodically 
• “Are we there yet?” 
• Realtime means … realtime
It’s About The Functions, not the Terms 
DON’T ASK: 
• Is this an operations or an analytics problem? 
• Is this online or offline? 
• What query language should we use? 
• What is my integration strategy across tools? 
ASK INSTEAD: 
• Am I incrementally addressing data (esp. writes)? 
• Am I computing a precise answer or a trend? 
• Do I need to operate on this data in realtime? 
• What is my holistic architecture?
Success in Big Data: MongoDB + Hadoop 
• Efficient Operationalization 
• Robust data movements 
• Clarity and fidelity of data movements 
• Designing for change 
• Analysis Feedback 
• Data computed in Hadoop integrated back into 
MongoDB
What We’re Going to “Build” today 
Realtime Directed Content System 
• Based on what users click, “recommended” 
content is returned in addition to the target 
• The example is sector (manufacturing, financial 
services, retail) neutral 
• System dynamically updates behavior in response 
to user activity
The Participants and Their Roles 
Directed 
Content 
System 
Customers 
Analysts/ 
Data Scientists 
Content 
Creators 
Management/ 
Strategy 
Operate on data to 
identify trends and 
develop tag domains 
Generate and tag 
content from a known 
domain of tags 
Make decisions based 
on trends and other 
summarized data 
Developers/ 
ProdOps 
Bring it all together: 
apps, SDLC, integration, 
etc.
Priority #1: Maximizing User value 
Considerations/Requirements 
Maximize realtime user value and experience 
Provide management reporting and trend analysis 
Engineer for Day 2 agility on recommendation engine 
Provide scrubbed click history for customer 
Permit low-cost horizontal scaling 
Minimize technical integration 
Minimize technical footprint 
Use conventional and/or approved tools 
Provide a RESTful service layer 
…..
The Architecture 
App(s) MongoDB Hadoop MapReduce
Complementary Strengths 
App(s) MongoDB Hadoop MapReduce 
• Standard design paradigm (objects, 
tools, 3rd party products, IDEs, test 
drivers, skill pool, etc. etc.) 
• Language flexibility (Java, C#, C++ 
python, Scala, …) 
• Webscale deployment model 
• appservers, DMZ, monitoring 
• High performance rich shape CRUD 
• MapReduce design paradigm 
• Node deployment model 
• Very large set operations 
• Computationally intensive, longer 
duration 
• Read-dominated workload
“Legacy” Approach: Somewhat unidirectional 
ETL 
App(s) MongoDB Hadoop MapReduce 
• Extract data from mongoDB and other 
sources nightly (or weekly) 
• Generate reports for people to read 
• Same pains as existing ETL: 
reconciliation, transformation, change 
management …
Somewhat better approach 
ETL 
App(s) MongoDB Hadoop MapReduce 
ETL 
• Extract data from mongoDB and other 
sources nightly (or weekly) 
• Generate reports for people to read 
• Move important summary data back to 
mongoDB for consumption by apps. 
• Still in ETL-dominated landscape
…but the overall problem remains: 
• How to realtime integrate and operate upon both 
periodically generated data and realtime current 
data? 
• Lackluster integration between OLTP and Hadoop 
• It’s not just about the database: you need a 
realtime profile and profile update function
The legacy problem in pseudocode 
onContentClick() {! 
String[] tags = content.getTags();! 
Resource[] r = f1(database, tags);! 
}! 
• Realtime intraday state not well-handled 
• Baselining is a different problem than click 
handling
The Right Approach 
• Users have a specific Profile entity 
• The Profile captures trend analytics as baselining 
information 
• The Profile has per-tag “counters” that are updated with 
each interaction / click 
• Counters plus baselining are passed to fetch function 
• The fetch function itself could be dynamic!
24 hours in the life of The System 
• Assume some content has been created and tagged 
• Two systemetized tags: Pets & PowerTools
Monday, 1:30AM EST 
App(s) MongoDB Hadoop MapReduce 
• Fetch all user Profiles from MongoDB; load into Hadoop 
• Or skip if using the MongoDB-Hadoop connector!
MongoDB-Hadoop MapReduce Example 
public class ProfileMapper ! 
extends Mapper<Object, BSONObject, IntWritable, IntWritable> 
{! 
@Override! 
public void map(final Object pKey,! 
! ! ! !final BSONObject pValue,! 
! ! ! !final Context pContext )! 
!throws IOException, InterruptedException{! 
String user = (String)pValue.get(”user");! 
Date d1 = (Date)pValue.get(“lastUpdate”);! 
int count = 0;! 
List<String> keys = pValue.get(“tags”).keys();! 
for ( String tag : keys) {! 
count += pValue.get(tag).get(“hist”).size();! 
)! 
int avg = count / keys.size();! 
pContext.write( new IntWritable( count), new 
IntWritable( avg ) );! 
}! 
}!
MongoDB-Hadoop v1 (today) 
Hadoop 
MR Mapper 
v1 
MongoDB-Hadoop 
ü V1 adapter draws data directly from MongoDB 
ü No ETL, scripts, change management, etc. 
ü Storage optimized: NO data copies
MongoDB-Hadoop v2 (soon) 
Hadoop 
MR Mapper 
HDFS 
ü V2 flows data directly into HDFS via a special 
MongoDB secondary 
ü No ETL, scripts, change management, etc. 
ü Data is copied – but still one data fabric 
ü Realtime data with snapshotting as an option
Monday, 1:45AM EST 
App(s) MongoDB Hadoop MapReduce 
• Grind through all content data and user Profile data to produce: 
• Tags based on feature extraction (vs. creator-applied tags) 
• Trend baseline per user for tags Pets and PowerTools 
• Load Profiles with new baseline back into MongoDB
Monday, 8AM EST 
App(s) MongoDB Hadoop MapReduce 
• User Bob logs in and Profile retrieved from MongoDB 
• Bob clicks on Content X which is already tagged as “Pets” 
• Bob has clicked on Pets tagged content many times 
• Adjust Profile for tag “Pets” and save back to MongoDB 
• Analysis = f(Profile) 
• Analysis can be “anything”; it is simply a result. It could trigger 
an ad, a compliance alert, etc.
Monday, 8:02AM EST 
App(s) MongoDB Hadoop MapReduce 
• Bob clicks on Content Y which is already tagged as “Spices” 
• Spice is a new tag type for Bob 
• Adjust Profile for tag “Spices” and save back to MongoDB 
• Analysis = f(profile)
Profile in Detail 
{! 
user: “Bob”,! 
personalData: {! 
zip: “10024”,! 
gender: “M”! 
},! 
tags: {! 
PETS: { algo: “A4”, ! 
baseline: [0,0,10,4,1322,44,23, … ],! 
hist: [! 
{ ts: datetime1, url: url1 },! 
{ ts: datetime2, url: url2 } // 100 more! 
]},! 
SPICE: { hist: [! 
{ ts: datetime3, url: url3 }! 
]}! 
}! 
}!
Tag-based algorithm detail 
getRecommendedContent(profile, [“PETS”, other]) { 
if algo for a tag available {! 
!filter = algo(profile, tag);! 
}! 
fetch N recommendations (filter);! 
}! 
! 
A4(profile, tag) {! 
weight = get tag (“PETS”) global weighting;! 
adjustForPersonalBaseline(weight, “PETS” baseline); ! 
if “PETS” clicked more than 2 times in past 10 mins! 
then weight += 10;! 
if “PETS” clicked more than 10 times in past 2 days! 
then weight += 3; !! 
! 
return new filter({“PETS”, weight}, globals)! 
}!
Tuesday, 1AM EST 
App(s) MongoDB Hadoop MapReduce 
• Fetch all user Profiles from MongoDB; load into Hadoop 
• Or skip if using the MongoDB-Hadoop connector!
Tuesday, 1:30AM EST 
App(s) MongoDB Hadoop MapReduce 
• Grind through all content data and user profile data to produce: 
• Tags based on feature extraction (vs. creator-applied tags) 
• Trend baseline for Pets and PowerTools and Spice 
• Data can be specific to individual or by group 
• Load new baselines back into MongoDB
New Profile in Detail 
{! 
user: “Bob”,! 
personalData: {! 
zip: “10024”,! 
gender: “M”! 
},! 
tags: {! 
PETS: { algo: “A4”, ! 
baseline: [0,4,10,4,1322,44,23, … ],! 
hist: [! 
{ ts: datetime1, url: url1 },! 
{ ts: datetime2, url: url2 } // 100 more! 
]},! 
SPICE: { hist: [! 
baseline: [1],! 
{ ts: datetime3, url: url3 }! 
]}! 
}! 
}!
Tuesday, 1:35AM EST 
App(s) MongoDB Hadoop MapReduce 
• Perform maintenance on user Profiles 
• Click history trimming (variety of algorithms) 
• “Dead tag” removal 
• Update of auxiliary reference data
New Profile in Detail 
{! 
user: “Bob”,! 
personalData: {! 
zip: “10022”,! 
gender: “M”! 
},! 
tags: {! 
PETS: { algo: “A4”, ! 
baseline: [ 1322,44,23, … ],! 
hist: [! 
{ ts: datetime1, url: url1 } // 50 more! 
]},! 
SPICE: { algo: “Z1”, hist: [! 
baseline: [1],! 
{ ts: datetime3, url: url3 }! 
]}! 
}! 
}!
Feel free to run the baselining more frequently 
App(s) MongoDB Hadoop MapReduce 
… but avoid “Are We There Yet?”
Nearterm / Realtime Questions & Actions 
With respect to the Customer: 
• What has Bob done over the past 24 hours? 
• Given an input, make a logic decision in 100ms or less 
With respect to the Provider: 
• What are all current users doing or looking at? 
• Can we nearterm correlate single events to shifts in behavior?
Longterm/ Not Realtime Questions & Actions 
With respect to the Customer: 
• Any way to explain historic performance / actions? 
• What are recommendations for the future? 
With respect to the Provider: 
• Can we correlate multiple events from multiple sources 
over a long period of time to identify trends? 
• What is my entire customer base doing over 2 years? 
• Show me a time vs. aggregate tag hit chart 
• Slice and dice and aggregate tags vs. XYZ 
• What tags are trending up or down?
Another Example: Realtime Risk 
Applications 
Trade Processing 
Risk 
Risk Service 
Calculation 
(Spark) 
Log trade 
activities 
Query 
trades 
Query 
Risk 
Risk 
Params 
Admin 
Analysis/ 
Reporting 
(Impala) 
OTHER 
HDFS DATA 
OTHER 
HDFS DATA
Recording a trade 
Applications 
Trade Processing 
1. Bank makes a trade 
2. Trade sent to Trade Processing 
3. Trade Processing writes trade to MongoDB 
4. Realtime replicate trade to Hadoop/HDFS 
Non-functional notes: 
• High volume of data ingestion (10,000s or more 
events per second) 
• Durable storage of trade data 
• Store trade events across all asset classes 
1 
2 
3 
4
Querying deal / trade / event data 
1. Query on deal attributes (id, counterparty, asset 
class, termination date, notional amount, book) 
2. MongoDB performs index-optimized query and 
Trade Processing assembles Deal/Trade/Event data 
into response packet 
3. Return response packet to caller 
Non-functional notes: 
• System can support very high volume (10,000s 
or more queries per second) 
• Millisecond response times 
Applications 
1 
Trade Processing 
2 
3
Updating intra-day risk data 
1. Mirror of trade data already stored in HDFS 
Trade data partitioned into time windows 
2. Signal/timer kicks off a “run” 
3. Spark ingests new partition of trade data as RDD 
and calculates and merges risk data based on 
latest trade data 
4. Risk data written directly to MongoDB and indexed 
and available for online queries / aggregations / 
applications logic 
Applications 
Risk Service 
1 
Risk 
Calculation 
(Spark) 
2 
4 
3
Querying detail & aggregated risk on demand 
1. Applications can use full MongoDB query API to 
access risk data and trade data 
2. Risk data can be indexed on multiple fields for fast 
access by multiple dimensions 
3. Hadoop jobs periodically apply incremental 
updates to risk data with no down time 
4. Interpolated / matrix risk can be computed on-the-fly 
Non-functional notes 
• System can support very high volume (10,000s 
or more queries per second) 
• Millisecond response times 
Applications 
1 
Risk Service 
2 
3
Trade Analytics & Reporting 
1. Impala provides full SQL access to all content in 
Hadoop 
2. Dashboards and Reporting frameworks deliver 
periodic information to consumers 
3. Breadth of data discovery / ad-hoc analysis tools 
can be brought bear on all data in Hadoop 
Non-functional notes: 
• Lower query frequency 
• Full SQL query flexibility 
• Most queries / analysis yield value accessing large 
volumes of data (e.g. all events in the last 30 days 
– or 30 months) 
Applications 
Impala 
Dashboards Reports 
Ad-hoc 
Analysis
The Key To Success: It is One System 
MongoDB 
App(s) 
Hadoop 
MapReduce
Q&A 
buzz.moschetti@mongodb.com
#MongoDB 
Thank You 
Buzz Moschetti 
buzz.moschetti@mongodb.com

Weitere ähnliche Inhalte

Was ist angesagt?

Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use CasesDATAVERSITY
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDBMongoDB
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...MongoDB
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...Gianfranco Palumbo
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMongoDB
 
The Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBThe Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBMongoDB
 
Mongo DB: Operational Big Data Database
Mongo DB: Operational Big Data DatabaseMongo DB: Operational Big Data Database
Mongo DB: Operational Big Data DatabaseXpand IT
 
Webinar: MongoDB + Hadoop
Webinar: MongoDB + HadoopWebinar: MongoDB + Hadoop
Webinar: MongoDB + HadoopMongoDB
 
Webinar: When to Use MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDBMongoDB
 
3 scenarios when to use MongoDB!
3 scenarios when to use MongoDB!3 scenarios when to use MongoDB!
3 scenarios when to use MongoDB!Edureka!
 
Migrating to MongoDB: Best Practices
Migrating to MongoDB: Best PracticesMigrating to MongoDB: Best Practices
Migrating to MongoDB: Best PracticesMongoDB
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionBrian Enochson
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRavi Teja
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...MongoDB
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDBMongoDB
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBLee Theobald
 
Common MongoDB Use Cases
Common MongoDB Use Cases Common MongoDB Use Cases
Common MongoDB Use Cases MongoDB
 
A Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - HabilelabsA Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - HabilelabsHabilelabs
 

Was ist angesagt? (20)

Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use Cases
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
The Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBThe Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDB
 
Mongo DB: Operational Big Data Database
Mongo DB: Operational Big Data DatabaseMongo DB: Operational Big Data Database
Mongo DB: Operational Big Data Database
 
Webinar: MongoDB + Hadoop
Webinar: MongoDB + HadoopWebinar: MongoDB + Hadoop
Webinar: MongoDB + Hadoop
 
Webinar: When to Use MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDB
 
MongoDB and Spark
MongoDB and SparkMongoDB and Spark
MongoDB and Spark
 
3 scenarios when to use MongoDB!
3 scenarios when to use MongoDB!3 scenarios when to use MongoDB!
3 scenarios when to use MongoDB!
 
Migrating to MongoDB: Best Practices
Migrating to MongoDB: Best PracticesMigrating to MongoDB: Best Practices
Migrating to MongoDB: Best Practices
 
MongoDB + Spring
MongoDB + SpringMongoDB + Spring
MongoDB + Spring
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB Introdction
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
 
Common MongoDB Use Cases
Common MongoDB Use Cases Common MongoDB Use Cases
Common MongoDB Use Cases
 
A Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - HabilelabsA Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - Habilelabs
 

Ähnlich wie Using MongoDB + Hadoop Together

Big Data: Guidelines and Examples for the Enterprise Decision Maker
Big Data: Guidelines and Examples for the Enterprise Decision MakerBig Data: Guidelines and Examples for the Enterprise Decision Maker
Big Data: Guidelines and Examples for the Enterprise Decision MakerMongoDB
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB
 
Building your first app with MongoDB
Building your first app with MongoDBBuilding your first app with MongoDB
Building your first app with MongoDBNorberto Leite
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseMongoDB
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDBMongoDB
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Databricks
 
Pre-Aggregated Analytics And Social Feeds Using MongoDB
Pre-Aggregated Analytics And Social Feeds Using MongoDBPre-Aggregated Analytics And Social Feeds Using MongoDB
Pre-Aggregated Analytics And Social Feeds Using MongoDBRackspace
 
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB MongoDB
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauMongoDB
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDBDenny Lee
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB
 
Learn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBLearn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBMarakana Inc.
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBMongoDB
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB
 
Large scale computing
Large scale computing Large scale computing
Large scale computing Bhupesh Bansal
 
FDMEE Scripting - Cloud and On-Premises - It Ain't Groovy, But It's My Bread ...
FDMEE Scripting - Cloud and On-Premises - It Ain't Groovy, But It's My Bread ...FDMEE Scripting - Cloud and On-Premises - It Ain't Groovy, But It's My Bread ...
FDMEE Scripting - Cloud and On-Premises - It Ain't Groovy, But It's My Bread ...Joseph Alaimo Jr
 
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...WebExpo
 
MongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
MongoDB Evenings Dallas: What's the Scoop on MongoDB & HadoopMongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
MongoDB Evenings Dallas: What's the Scoop on MongoDB & HadoopMongoDB
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 

Ähnlich wie Using MongoDB + Hadoop Together (20)

Big Data: Guidelines and Examples for the Enterprise Decision Maker
Big Data: Guidelines and Examples for the Enterprise Decision MakerBig Data: Guidelines and Examples for the Enterprise Decision Maker
Big Data: Guidelines and Examples for the Enterprise Decision Maker
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
 
Building your first app with MongoDB
Building your first app with MongoDBBuilding your first app with MongoDB
Building your first app with MongoDB
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
 
Pre-Aggregated Analytics And Social Feeds Using MongoDB
Pre-Aggregated Analytics And Social Feeds Using MongoDBPre-Aggregated Analytics And Social Feeds Using MongoDB
Pre-Aggregated Analytics And Social Feeds Using MongoDB
 
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDB
 
Learn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBLearn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDB
 
MediaGlu and Mongo DB
MediaGlu and Mongo DBMediaGlu and Mongo DB
MediaGlu and Mongo DB
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
FDMEE Scripting - Cloud and On-Premises - It Ain't Groovy, But It's My Bread ...
FDMEE Scripting - Cloud and On-Premises - It Ain't Groovy, But It's My Bread ...FDMEE Scripting - Cloud and On-Premises - It Ain't Groovy, But It's My Bread ...
FDMEE Scripting - Cloud and On-Premises - It Ain't Groovy, But It's My Bread ...
 
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
 
MongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
MongoDB Evenings Dallas: What's the Scoop on MongoDB & HadoopMongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
MongoDB Evenings Dallas: What's the Scoop on MongoDB & Hadoop
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 

Mehr von MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Kürzlich hochgeladen

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Using MongoDB + Hadoop Together

  • 1. #MongoDB Using MongoDB and Hadoop Together For Success Buzz Moschetti buzz.moschetti@mongodb.com Enterprise Architect, MongoDB
  • 2. Who is your Presenter? • Yes, I use “Buzz” on my business cards • Former Investment Bank Chief Architect at JPMorganChase and Bear Stearns before that • Over 25 years of designing and building systems • Big and small • Super-specialized to broadly useful in any vertical • “Traditional” to completely disruptive • Advocate of language leverage and strong factoring • Still programming – using emacs, of course
  • 3. Agenda • (Occasionally) Brutal Truths about Big Data • The Key To Success in Large Scale Data Management • Review of Directed Content Business Architecture • Technical Implementation Examples • Recommendation Capability • Realtime Trade / Position Risk • Q & A
  • 4. Truths • Clear definition of Big Data still maturing • Efficiently operationalizing Big Data is non-trivial • Developing, debugging, understanding MapReduce • Cluster monitoring & management, job scheduling/recovery • If you thought regular ETL Hell was bad…. • Big Data is not about math/set accuracy • The last 25000 items in a 25,497,612 set “don’t matter” • Big Data questions are best asked periodically • “Are we there yet?” • Realtime means … realtime
  • 5. It’s About The Functions, not the Terms DON’T ASK: • Is this an operations or an analytics problem? • Is this online or offline? • What query language should we use? • What is my integration strategy across tools? ASK INSTEAD: • Am I incrementally addressing data (esp. writes)? • Am I computing a precise answer or a trend? • Do I need to operate on this data in realtime? • What is my holistic architecture?
  • 6. Success in Big Data: MongoDB + Hadoop • Efficient Operationalization • Robust data movements • Clarity and fidelity of data movements • Designing for change • Analysis Feedback • Data computed in Hadoop integrated back into MongoDB
  • 7. What We’re Going to “Build” today Realtime Directed Content System • Based on what users click, “recommended” content is returned in addition to the target • The example is sector (manufacturing, financial services, retail) neutral • System dynamically updates behavior in response to user activity
  • 8. The Participants and Their Roles Directed Content System Customers Analysts/ Data Scientists Content Creators Management/ Strategy Operate on data to identify trends and develop tag domains Generate and tag content from a known domain of tags Make decisions based on trends and other summarized data Developers/ ProdOps Bring it all together: apps, SDLC, integration, etc.
  • 9. Priority #1: Maximizing User value Considerations/Requirements Maximize realtime user value and experience Provide management reporting and trend analysis Engineer for Day 2 agility on recommendation engine Provide scrubbed click history for customer Permit low-cost horizontal scaling Minimize technical integration Minimize technical footprint Use conventional and/or approved tools Provide a RESTful service layer …..
  • 10. The Architecture App(s) MongoDB Hadoop MapReduce
  • 11. Complementary Strengths App(s) MongoDB Hadoop MapReduce • Standard design paradigm (objects, tools, 3rd party products, IDEs, test drivers, skill pool, etc. etc.) • Language flexibility (Java, C#, C++ python, Scala, …) • Webscale deployment model • appservers, DMZ, monitoring • High performance rich shape CRUD • MapReduce design paradigm • Node deployment model • Very large set operations • Computationally intensive, longer duration • Read-dominated workload
  • 12. “Legacy” Approach: Somewhat unidirectional ETL App(s) MongoDB Hadoop MapReduce • Extract data from mongoDB and other sources nightly (or weekly) • Generate reports for people to read • Same pains as existing ETL: reconciliation, transformation, change management …
  • 13. Somewhat better approach ETL App(s) MongoDB Hadoop MapReduce ETL • Extract data from mongoDB and other sources nightly (or weekly) • Generate reports for people to read • Move important summary data back to mongoDB for consumption by apps. • Still in ETL-dominated landscape
  • 14. …but the overall problem remains: • How to realtime integrate and operate upon both periodically generated data and realtime current data? • Lackluster integration between OLTP and Hadoop • It’s not just about the database: you need a realtime profile and profile update function
  • 15. The legacy problem in pseudocode onContentClick() {! String[] tags = content.getTags();! Resource[] r = f1(database, tags);! }! • Realtime intraday state not well-handled • Baselining is a different problem than click handling
  • 16. The Right Approach • Users have a specific Profile entity • The Profile captures trend analytics as baselining information • The Profile has per-tag “counters” that are updated with each interaction / click • Counters plus baselining are passed to fetch function • The fetch function itself could be dynamic!
  • 17. 24 hours in the life of The System • Assume some content has been created and tagged • Two systemetized tags: Pets & PowerTools
  • 18. Monday, 1:30AM EST App(s) MongoDB Hadoop MapReduce • Fetch all user Profiles from MongoDB; load into Hadoop • Or skip if using the MongoDB-Hadoop connector!
  • 19. MongoDB-Hadoop MapReduce Example public class ProfileMapper ! extends Mapper<Object, BSONObject, IntWritable, IntWritable> {! @Override! public void map(final Object pKey,! ! ! ! !final BSONObject pValue,! ! ! ! !final Context pContext )! !throws IOException, InterruptedException{! String user = (String)pValue.get(”user");! Date d1 = (Date)pValue.get(“lastUpdate”);! int count = 0;! List<String> keys = pValue.get(“tags”).keys();! for ( String tag : keys) {! count += pValue.get(tag).get(“hist”).size();! )! int avg = count / keys.size();! pContext.write( new IntWritable( count), new IntWritable( avg ) );! }! }!
  • 20. MongoDB-Hadoop v1 (today) Hadoop MR Mapper v1 MongoDB-Hadoop ü V1 adapter draws data directly from MongoDB ü No ETL, scripts, change management, etc. ü Storage optimized: NO data copies
  • 21. MongoDB-Hadoop v2 (soon) Hadoop MR Mapper HDFS ü V2 flows data directly into HDFS via a special MongoDB secondary ü No ETL, scripts, change management, etc. ü Data is copied – but still one data fabric ü Realtime data with snapshotting as an option
  • 22. Monday, 1:45AM EST App(s) MongoDB Hadoop MapReduce • Grind through all content data and user Profile data to produce: • Tags based on feature extraction (vs. creator-applied tags) • Trend baseline per user for tags Pets and PowerTools • Load Profiles with new baseline back into MongoDB
  • 23. Monday, 8AM EST App(s) MongoDB Hadoop MapReduce • User Bob logs in and Profile retrieved from MongoDB • Bob clicks on Content X which is already tagged as “Pets” • Bob has clicked on Pets tagged content many times • Adjust Profile for tag “Pets” and save back to MongoDB • Analysis = f(Profile) • Analysis can be “anything”; it is simply a result. It could trigger an ad, a compliance alert, etc.
  • 24. Monday, 8:02AM EST App(s) MongoDB Hadoop MapReduce • Bob clicks on Content Y which is already tagged as “Spices” • Spice is a new tag type for Bob • Adjust Profile for tag “Spices” and save back to MongoDB • Analysis = f(profile)
  • 25. Profile in Detail {! user: “Bob”,! personalData: {! zip: “10024”,! gender: “M”! },! tags: {! PETS: { algo: “A4”, ! baseline: [0,0,10,4,1322,44,23, … ],! hist: [! { ts: datetime1, url: url1 },! { ts: datetime2, url: url2 } // 100 more! ]},! SPICE: { hist: [! { ts: datetime3, url: url3 }! ]}! }! }!
  • 26. Tag-based algorithm detail getRecommendedContent(profile, [“PETS”, other]) { if algo for a tag available {! !filter = algo(profile, tag);! }! fetch N recommendations (filter);! }! ! A4(profile, tag) {! weight = get tag (“PETS”) global weighting;! adjustForPersonalBaseline(weight, “PETS” baseline); ! if “PETS” clicked more than 2 times in past 10 mins! then weight += 10;! if “PETS” clicked more than 10 times in past 2 days! then weight += 3; !! ! return new filter({“PETS”, weight}, globals)! }!
  • 27. Tuesday, 1AM EST App(s) MongoDB Hadoop MapReduce • Fetch all user Profiles from MongoDB; load into Hadoop • Or skip if using the MongoDB-Hadoop connector!
  • 28. Tuesday, 1:30AM EST App(s) MongoDB Hadoop MapReduce • Grind through all content data and user profile data to produce: • Tags based on feature extraction (vs. creator-applied tags) • Trend baseline for Pets and PowerTools and Spice • Data can be specific to individual or by group • Load new baselines back into MongoDB
  • 29. New Profile in Detail {! user: “Bob”,! personalData: {! zip: “10024”,! gender: “M”! },! tags: {! PETS: { algo: “A4”, ! baseline: [0,4,10,4,1322,44,23, … ],! hist: [! { ts: datetime1, url: url1 },! { ts: datetime2, url: url2 } // 100 more! ]},! SPICE: { hist: [! baseline: [1],! { ts: datetime3, url: url3 }! ]}! }! }!
  • 30. Tuesday, 1:35AM EST App(s) MongoDB Hadoop MapReduce • Perform maintenance on user Profiles • Click history trimming (variety of algorithms) • “Dead tag” removal • Update of auxiliary reference data
  • 31. New Profile in Detail {! user: “Bob”,! personalData: {! zip: “10022”,! gender: “M”! },! tags: {! PETS: { algo: “A4”, ! baseline: [ 1322,44,23, … ],! hist: [! { ts: datetime1, url: url1 } // 50 more! ]},! SPICE: { algo: “Z1”, hist: [! baseline: [1],! { ts: datetime3, url: url3 }! ]}! }! }!
  • 32. Feel free to run the baselining more frequently App(s) MongoDB Hadoop MapReduce … but avoid “Are We There Yet?”
  • 33. Nearterm / Realtime Questions & Actions With respect to the Customer: • What has Bob done over the past 24 hours? • Given an input, make a logic decision in 100ms or less With respect to the Provider: • What are all current users doing or looking at? • Can we nearterm correlate single events to shifts in behavior?
  • 34. Longterm/ Not Realtime Questions & Actions With respect to the Customer: • Any way to explain historic performance / actions? • What are recommendations for the future? With respect to the Provider: • Can we correlate multiple events from multiple sources over a long period of time to identify trends? • What is my entire customer base doing over 2 years? • Show me a time vs. aggregate tag hit chart • Slice and dice and aggregate tags vs. XYZ • What tags are trending up or down?
  • 35. Another Example: Realtime Risk Applications Trade Processing Risk Risk Service Calculation (Spark) Log trade activities Query trades Query Risk Risk Params Admin Analysis/ Reporting (Impala) OTHER HDFS DATA OTHER HDFS DATA
  • 36. Recording a trade Applications Trade Processing 1. Bank makes a trade 2. Trade sent to Trade Processing 3. Trade Processing writes trade to MongoDB 4. Realtime replicate trade to Hadoop/HDFS Non-functional notes: • High volume of data ingestion (10,000s or more events per second) • Durable storage of trade data • Store trade events across all asset classes 1 2 3 4
  • 37. Querying deal / trade / event data 1. Query on deal attributes (id, counterparty, asset class, termination date, notional amount, book) 2. MongoDB performs index-optimized query and Trade Processing assembles Deal/Trade/Event data into response packet 3. Return response packet to caller Non-functional notes: • System can support very high volume (10,000s or more queries per second) • Millisecond response times Applications 1 Trade Processing 2 3
  • 38. Updating intra-day risk data 1. Mirror of trade data already stored in HDFS Trade data partitioned into time windows 2. Signal/timer kicks off a “run” 3. Spark ingests new partition of trade data as RDD and calculates and merges risk data based on latest trade data 4. Risk data written directly to MongoDB and indexed and available for online queries / aggregations / applications logic Applications Risk Service 1 Risk Calculation (Spark) 2 4 3
  • 39. Querying detail & aggregated risk on demand 1. Applications can use full MongoDB query API to access risk data and trade data 2. Risk data can be indexed on multiple fields for fast access by multiple dimensions 3. Hadoop jobs periodically apply incremental updates to risk data with no down time 4. Interpolated / matrix risk can be computed on-the-fly Non-functional notes • System can support very high volume (10,000s or more queries per second) • Millisecond response times Applications 1 Risk Service 2 3
  • 40. Trade Analytics & Reporting 1. Impala provides full SQL access to all content in Hadoop 2. Dashboards and Reporting frameworks deliver periodic information to consumers 3. Breadth of data discovery / ad-hoc analysis tools can be brought bear on all data in Hadoop Non-functional notes: • Lower query frequency • Full SQL query flexibility • Most queries / analysis yield value accessing large volumes of data (e.g. all events in the last 30 days – or 30 months) Applications Impala Dashboards Reports Ad-hoc Analysis
  • 41. The Key To Success: It is One System MongoDB App(s) Hadoop MapReduce
  • 43. #MongoDB Thank You Buzz Moschetti buzz.moschetti@mongodb.com