SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Big Data: Examples and
Guidelines for the Enterprise
Decision Maker
Solutions Architect, MongoDB
Buzz Moschetti
buzz.moschetti@mongodb.com
#MongoDB
Who is your Presenter?
• Yes, I use “Buzz” on my business cards
• Former Investment Bank Chief Architect at
JPMorganChase and Bear Stearns before that
• Over 25 years of designing and building systems
• Big and small
• Super-specialized to broadly useful in any vertical
• “Traditional” to completely disruptive
• Advocate of language leverage and strong factoring
• Still programming – using emacs, of course
Agenda
• (Occasionally) Brutal Truths about Big Data
• Review of Directed Content Business Architecture
• A Simple Technical Implementation
Truths
• Clear definition of Big Data still maturing
• Efficiently operationalizing Big Data is non-trivial
• Developing, debugging, understanding MapReduce
• Cluster monitoring & management, job scheduling/recovery
• If you thought regular ETL Hell was bad….
• Big Data is not about math/set accuracy
• The last 25000 items in a 25,497,612 set “don’t matter”
• Big Data questions are best asked periodically
• “Are we there yet?”
• Realtime means … realtime
It’s About The Functions, not the
Terms
DON’T ASK:
• Is this an operations or an analytics problem?
• Is this online or offline?
• What query language should we use?
• What is my integration strategy across tools?
ASK INSTEAD:
• Am I incrementally addressing data (esp.
writes)?
• Am I computing a precise answer or a trend?
• Do I need to operate on this data in realtime?
• What is my holistic architecture?
What We’re Going to “Build” today
Realtime Directed Content System
• Based on what users click, “recommended”
content is returned in addition to the target
• The example is sector (manufacturing, financial
services, retail) neutral
• System dynamically updates behavior in
response to user activity
The Participants and Their Roles
Directed
Content
System
Customer
s
Content
Creators
Management/
Strategy
Analysts/
Data Scientists
Generate and tag
content from a known
domain of tags
Make decisions based
on trends and other
summarized data
Operate on data to
identify trends and
develop tag domains
Developers/
ProdOps
Bring it all together:
apps, SDLC,
integration, etc.
Priority #1: Maximizing User value
Considerations/Requirements
Maximize realtime user value and experience
Provide management reporting and trend analysis
Engineer for Day 2 agility on recommendation engine
Provide scrubbed click history for customer
Permit low-cost horizontal scaling
Minimize technical integration
Minimize technical footprint
Use conventional and/or approved tools
Provide a RESTful service layer
…..
The Architecture
mongoDB HadoopApp(s) MapReduce
Complementary Strengths
mongoDB HadoopApp(s) MapReduce
• Standard design paradigm
(objects, tools, 3rd party products,
IDEs, test drivers, skill pool, etc.
etc.)
• Language flexibility (Java, C#, C++
python, Scala, …)
• Webscale deployment model
• appservers, DMZ, monitoring
• High performance rich shape
CRUD
• MapReduce design paradigm
• Node deployment model
• Very large set operations
• Computationally intensive, longer
duration
• Read-dominated workload
“Legacy” Approach: Somewhat
unidirectional
mongoDB HadoopApp(s) MapReduce
• Extract data from mongoDB and other
sources nightly (or weekly)
• Run analytics
• Generate reports for people to read
• Where’s the feedback?
Somewhat better approach
mongoDB HadoopApp(s) MapReduce
• Extract data from mongoDB and other
sources nightly (or weekly)
• Run analytics
• Generate reports for people to read
• Move important summary data back to
mongoDB for consumption by apps.
…but the overall problem remains:
• How to realtime integrate and operate upon both
periodically generated data and realtime current
data?
• Lackluster integration between OLTP and
Hadoop
• It’s not just about the database: you need a
realtime profile and profile update function
The legacy problem in pseudocode
onContentClick() {
String[] tags = content.getTags();
Resource[] r = f1(database, tags);
}
• Realtime intraday state not well-handled
• Baselining is a different problem than click
handling
The Right Approach
• Users have a specific Profile entity
• The Profile captures trend analytics as baselining
information
• The Profile has per-tag “counters” that are updated
with each interaction / click
• Counters plus baselining are passed to fetch function
• The fetch function itself could be dynamic!
24 hours in the life of The System
• Assume some content has been created and tagged
• Two systemetized tags: Pets & PowerTools
Monday, 1:30AM EST
• Fetch all user Profiles from mongoDB; load into Hadoop
• Or skip if using the mongoDB-Hadoop
connector!
mongoDB HadoopApp(s) MapReduce
mongoDB-Hadoop MapReduce Example
public class ProfileMapper
extends Mapper<Object, BSONObject, IntWritable, IntWritable>
{
@Override
public void map(final Object pKey,
final BSONObject pValue,
final Context pContext )
throws IOException, InterruptedException{
String user = (String)pValue.get(”user");
Date d1 = (Date)pValue.get(“lastUpdate”);
int count = 0;
List<String> keys = pValue.get(“tags”).keys();
for ( String tag : keys) {
count += pValue.get(tag).get(“hist”).size();
)
int avg = count / keys.size();
pContext.write( new IntWritable( count), new
IntWritable( avg ) );
}
}
Monday, 1:45AM EST
• Grind through all content data and user Profile data to
produce:
• Tags based on feature extraction (vs. creator-applied
tags)
• Trend baseline per user for tags Pets and PowerTools
• Load Profiles with new baseline back into mongoDB
• Or skip if using the mongoDB-Hadoop connector!
mongoDB HadoopApp(s) MapReduce
Monday, 8AM EST
• User Bob logs in and Profile retrieved from mongoDB
• Bob clicks on Content X which is already tagged as “Pets”
• Bob has clicked on Pets tagged content many times
• Adjust Profile for tag “Pets” and save back to mongoDB
• Analysis = f(Profile)
• Analysis can be “anything”; it is simply a result. It could trigger
an ad, a compliance alert, etc.
mongoDB HadoopApp(s) MapReduce
Monday, 8:02AM EST
• Bob clicks on Content Y which is already tagged as “Spices”
• Spice is a new tag type for Bob
• Adjust Profile for tag “Spices” and save back to mongoDB
• Analysis = f(profile)
mongoDB HadoopApp(s) MapReduce
Profile in Detail
{
user: “Bob”,
personalData: {
zip: “10024”,
gender: “M”
},
tags: {
PETS: { algo: “A4”,
baseline: [0,0,10,4,1322,44,23, … ],
hist: [
{ ts: datetime1, url: url1 },
{ ts: datetime2, url: url2 } // 100 more
]},
SPICE: { hist: [
{ ts: datetime3, url: url3 }
]}
}
}
Tag-based algorithm detail
getRecommendedContent(profile, [“PETS”, other]) {
if algo for a tag available {
filter = algo(profile, tag);
}
fetch N recommendations (filter);
}
A4(profile, tag) {
weight = get tag (“PETS”) global weighting;
adjustForPersonalBaseline(weight, “PETS” baseline);
if “PETS” clicked more than 2 times in past 10 mins
then weight += 10;
if “PETS” clicked more than 10 times in past 2 days
then weight += 3;
return new filter({“PETS”, weight}, globals)
}
Tuesday, 1AM EST
mongoDB HadoopApp(s) MapReduce
• Fetch all user Profiles from mongoDB; load into Hadoop
• Or skip if using the mongoDB-Hadoop
connector!
Tuesday, 1:30AM EST
• Grind through all content data and user profile data to
produce:
• Tags based on feature extraction (vs. creator-applied
tags)
• Trend baseline for Pets and PowerTools and Spice
• Data can be specific to individual or by group
• Load baseline back into mongoDB
• Or skip if using the mongoDB-Hadoop connector!
mongoDB HadoopApp(s) MapReduce
New Profile in Detail
{
user: “Bob”,
personalData: {
zip: “10024”,
gender: “M”
},
tags: {
PETS: { algo: “A4”,
baseline: [0,0,10,4,1322,44,23, … ],
hist: [
{ ts: datetime1, url: url1 },
{ ts: datetime2, url: url2 } // 100 more
]},
SPICE: { hist: [
baseline: [0],
{ ts: datetime3, url: url3 }
]}
}
}
Tuesday, 1:35AM EST
• Perform maintenance on user Profiles
• Click history trimming (variety of algorithms)
• “Dead tag” removal
• Update of auxiliary reference data
mongoDB HadoopApp(s) MapReduce
New Profile in Detail
{
user: “Bob”,
personalData: {
zip: “10022”,
gender: “M”
},
tags: {
PETS: { algo: “A4”,
baseline: [ 1322,44,23, … ],
hist: [
{ ts: datetime1, url: url1 } // 50 more
]},
SPICE: { algo: “Z1”, hist: [
baseline: [0],
{ ts: datetime3, url: url3 }
]}
}
}
Feel free to run the baselining more
frequently
… but avoid “Are We There
Yet?”
mongoDB HadoopApp(s) MapReduce
Nearterm / Realtime Questions & Actions
With respect to the Customer:
• What has Bob done over the past 24 hours?
• Given an input, make a logic decision in 100ms or less
With respect to the Provider:
• What are all current users doing or looking at?
• Can we nearterm correlate single events to shifts in
behavior?
Longterm/ Not Realtime Questions &
Actions
With respect to the Customer:
• Any way to explain historic performance / actions?
• What are recommendations for the future?
With respect to the Provider:
• Can we correlate multiple events from multiple sources
over a long period of time to identify trends?
• What is my entire customer base doing over 2 years?
• Show me a time vs. aggregate tag hit chart
• Slice and dice and aggregate tags vs. XYZ
• What tags are trending up or down?
The Key To Success: It is One System
mongoDB
Hadoop
App(s)
MapReduce
Webex Q&A
Thank You
Buzz Moschetti
buzz.moschetti@mongodb.com
#MongoDB

Weitere ähnliche Inhalte

Was ist angesagt?

When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDBMongoDB
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use CasesDATAVERSITY
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewPierre Baillet
 
Webinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDBWebinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDBMongoDB
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsAndrew Brust
 
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBRick Copeland
 
A Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - HabilelabsA Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - HabilelabsHabilelabs
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...MongoDB
 
Migrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDBMigrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDBMongoDB
 
NoSQL Slideshare Presentation
NoSQL Slideshare Presentation NoSQL Slideshare Presentation
NoSQL Slideshare Presentation Ericsson Labs
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionBrian Enochson
 
MongoDB Operations for Developers
MongoDB Operations for DevelopersMongoDB Operations for Developers
MongoDB Operations for DevelopersMongoDB
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQLTony Tam
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBMongoDB
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Managementsameerfaizan
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMongoDB
 
NoSQL and The Big Data Hullabaloo
NoSQL and The Big Data HullabalooNoSQL and The Big Data Hullabaloo
NoSQL and The Big Data HullabalooAndrew Brust
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherMongoDB
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Andrew Brust
 

Was ist angesagt? (20)

When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use Cases
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of view
 
Webinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDBWebinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDB
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
 
A Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - HabilelabsA Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - Habilelabs
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
Migrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDBMigrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDB
 
NoSQL Slideshare Presentation
NoSQL Slideshare Presentation NoSQL Slideshare Presentation
NoSQL Slideshare Presentation
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB Introdction
 
MongoDB Operations for Developers
MongoDB Operations for DevelopersMongoDB Operations for Developers
MongoDB Operations for Developers
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDB
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
 
Mongodb
MongodbMongodb
Mongodb
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
NoSQL and The Big Data Hullabaloo
NoSQL and The Big Data HullabalooNoSQL and The Big Data Hullabaloo
NoSQL and The Big Data Hullabaloo
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop Together
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
 

Ähnlich wie Big Data: Guidelines and Examples for the Enterprise Decision Maker

Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseMongoDB
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauMongoDB
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015NoSQLmatters
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDBDenny Lee
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB
 
L’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneL’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneMongoDB
 
Learn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBLearn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBMarakana Inc.
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
Open Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsOpen Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsMatthew Kalan
 
Mongodb intro
Mongodb introMongodb intro
Mongodb introchristkv
 
Mongodb ExpressJS HandlebarsJS NodeJS FullStack
Mongodb ExpressJS HandlebarsJS NodeJS FullStackMongodb ExpressJS HandlebarsJS NodeJS FullStack
Mongodb ExpressJS HandlebarsJS NodeJS FullStackNarendranath Reddy
 
L’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazioneL’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazioneMongoDB
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at ScaleMongoDB
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Databricks
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDBMongoDB
 
From Business Intelligence to Big Data - hack/reduce Dec 2014
From Business Intelligence to Big Data - hack/reduce Dec 2014From Business Intelligence to Big Data - hack/reduce Dec 2014
From Business Intelligence to Big Data - hack/reduce Dec 2014Adam Ferrari
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use CasesMax De Marzi
 
Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015StampedeCon
 
Systems, processes & how we stop the wheels falling off
Systems, processes & how we stop the wheels falling offSystems, processes & how we stop the wheels falling off
Systems, processes & how we stop the wheels falling offWellcome Library
 

Ähnlich wie Big Data: Guidelines and Examples for the Enterprise Decision Maker (20)

Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
 
L’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneL’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova Generazione
 
Learn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBLearn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDB
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Open Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsOpen Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design Patterns
 
Mongodb intro
Mongodb introMongodb intro
Mongodb intro
 
Mongodb ExpressJS HandlebarsJS NodeJS FullStack
Mongodb ExpressJS HandlebarsJS NodeJS FullStackMongodb ExpressJS HandlebarsJS NodeJS FullStack
Mongodb ExpressJS HandlebarsJS NodeJS FullStack
 
L’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazioneL’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazione
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
From Business Intelligence to Big Data - hack/reduce Dec 2014
From Business Intelligence to Big Data - hack/reduce Dec 2014From Business Intelligence to Big Data - hack/reduce Dec 2014
From Business Intelligence to Big Data - hack/reduce Dec 2014
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015
 
Systems, processes & how we stop the wheels falling off
Systems, processes & how we stop the wheels falling offSystems, processes & how we stop the wheels falling off
Systems, processes & how we stop the wheels falling off
 

Mehr von MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Kürzlich hochgeladen

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 

Kürzlich hochgeladen (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 

Big Data: Guidelines and Examples for the Enterprise Decision Maker

  • 1. Big Data: Examples and Guidelines for the Enterprise Decision Maker Solutions Architect, MongoDB Buzz Moschetti buzz.moschetti@mongodb.com #MongoDB
  • 2. Who is your Presenter? • Yes, I use “Buzz” on my business cards • Former Investment Bank Chief Architect at JPMorganChase and Bear Stearns before that • Over 25 years of designing and building systems • Big and small • Super-specialized to broadly useful in any vertical • “Traditional” to completely disruptive • Advocate of language leverage and strong factoring • Still programming – using emacs, of course
  • 3. Agenda • (Occasionally) Brutal Truths about Big Data • Review of Directed Content Business Architecture • A Simple Technical Implementation
  • 4. Truths • Clear definition of Big Data still maturing • Efficiently operationalizing Big Data is non-trivial • Developing, debugging, understanding MapReduce • Cluster monitoring & management, job scheduling/recovery • If you thought regular ETL Hell was bad…. • Big Data is not about math/set accuracy • The last 25000 items in a 25,497,612 set “don’t matter” • Big Data questions are best asked periodically • “Are we there yet?” • Realtime means … realtime
  • 5. It’s About The Functions, not the Terms DON’T ASK: • Is this an operations or an analytics problem? • Is this online or offline? • What query language should we use? • What is my integration strategy across tools? ASK INSTEAD: • Am I incrementally addressing data (esp. writes)? • Am I computing a precise answer or a trend? • Do I need to operate on this data in realtime? • What is my holistic architecture?
  • 6. What We’re Going to “Build” today Realtime Directed Content System • Based on what users click, “recommended” content is returned in addition to the target • The example is sector (manufacturing, financial services, retail) neutral • System dynamically updates behavior in response to user activity
  • 7. The Participants and Their Roles Directed Content System Customer s Content Creators Management/ Strategy Analysts/ Data Scientists Generate and tag content from a known domain of tags Make decisions based on trends and other summarized data Operate on data to identify trends and develop tag domains Developers/ ProdOps Bring it all together: apps, SDLC, integration, etc.
  • 8. Priority #1: Maximizing User value Considerations/Requirements Maximize realtime user value and experience Provide management reporting and trend analysis Engineer for Day 2 agility on recommendation engine Provide scrubbed click history for customer Permit low-cost horizontal scaling Minimize technical integration Minimize technical footprint Use conventional and/or approved tools Provide a RESTful service layer …..
  • 10. Complementary Strengths mongoDB HadoopApp(s) MapReduce • Standard design paradigm (objects, tools, 3rd party products, IDEs, test drivers, skill pool, etc. etc.) • Language flexibility (Java, C#, C++ python, Scala, …) • Webscale deployment model • appservers, DMZ, monitoring • High performance rich shape CRUD • MapReduce design paradigm • Node deployment model • Very large set operations • Computationally intensive, longer duration • Read-dominated workload
  • 11. “Legacy” Approach: Somewhat unidirectional mongoDB HadoopApp(s) MapReduce • Extract data from mongoDB and other sources nightly (or weekly) • Run analytics • Generate reports for people to read • Where’s the feedback?
  • 12. Somewhat better approach mongoDB HadoopApp(s) MapReduce • Extract data from mongoDB and other sources nightly (or weekly) • Run analytics • Generate reports for people to read • Move important summary data back to mongoDB for consumption by apps.
  • 13. …but the overall problem remains: • How to realtime integrate and operate upon both periodically generated data and realtime current data? • Lackluster integration between OLTP and Hadoop • It’s not just about the database: you need a realtime profile and profile update function
  • 14. The legacy problem in pseudocode onContentClick() { String[] tags = content.getTags(); Resource[] r = f1(database, tags); } • Realtime intraday state not well-handled • Baselining is a different problem than click handling
  • 15. The Right Approach • Users have a specific Profile entity • The Profile captures trend analytics as baselining information • The Profile has per-tag “counters” that are updated with each interaction / click • Counters plus baselining are passed to fetch function • The fetch function itself could be dynamic!
  • 16. 24 hours in the life of The System • Assume some content has been created and tagged • Two systemetized tags: Pets & PowerTools
  • 17. Monday, 1:30AM EST • Fetch all user Profiles from mongoDB; load into Hadoop • Or skip if using the mongoDB-Hadoop connector! mongoDB HadoopApp(s) MapReduce
  • 18. mongoDB-Hadoop MapReduce Example public class ProfileMapper extends Mapper<Object, BSONObject, IntWritable, IntWritable> { @Override public void map(final Object pKey, final BSONObject pValue, final Context pContext ) throws IOException, InterruptedException{ String user = (String)pValue.get(”user"); Date d1 = (Date)pValue.get(“lastUpdate”); int count = 0; List<String> keys = pValue.get(“tags”).keys(); for ( String tag : keys) { count += pValue.get(tag).get(“hist”).size(); ) int avg = count / keys.size(); pContext.write( new IntWritable( count), new IntWritable( avg ) ); } }
  • 19. Monday, 1:45AM EST • Grind through all content data and user Profile data to produce: • Tags based on feature extraction (vs. creator-applied tags) • Trend baseline per user for tags Pets and PowerTools • Load Profiles with new baseline back into mongoDB • Or skip if using the mongoDB-Hadoop connector! mongoDB HadoopApp(s) MapReduce
  • 20. Monday, 8AM EST • User Bob logs in and Profile retrieved from mongoDB • Bob clicks on Content X which is already tagged as “Pets” • Bob has clicked on Pets tagged content many times • Adjust Profile for tag “Pets” and save back to mongoDB • Analysis = f(Profile) • Analysis can be “anything”; it is simply a result. It could trigger an ad, a compliance alert, etc. mongoDB HadoopApp(s) MapReduce
  • 21. Monday, 8:02AM EST • Bob clicks on Content Y which is already tagged as “Spices” • Spice is a new tag type for Bob • Adjust Profile for tag “Spices” and save back to mongoDB • Analysis = f(profile) mongoDB HadoopApp(s) MapReduce
  • 22. Profile in Detail { user: “Bob”, personalData: { zip: “10024”, gender: “M” }, tags: { PETS: { algo: “A4”, baseline: [0,0,10,4,1322,44,23, … ], hist: [ { ts: datetime1, url: url1 }, { ts: datetime2, url: url2 } // 100 more ]}, SPICE: { hist: [ { ts: datetime3, url: url3 } ]} } }
  • 23. Tag-based algorithm detail getRecommendedContent(profile, [“PETS”, other]) { if algo for a tag available { filter = algo(profile, tag); } fetch N recommendations (filter); } A4(profile, tag) { weight = get tag (“PETS”) global weighting; adjustForPersonalBaseline(weight, “PETS” baseline); if “PETS” clicked more than 2 times in past 10 mins then weight += 10; if “PETS” clicked more than 10 times in past 2 days then weight += 3; return new filter({“PETS”, weight}, globals) }
  • 24. Tuesday, 1AM EST mongoDB HadoopApp(s) MapReduce • Fetch all user Profiles from mongoDB; load into Hadoop • Or skip if using the mongoDB-Hadoop connector!
  • 25. Tuesday, 1:30AM EST • Grind through all content data and user profile data to produce: • Tags based on feature extraction (vs. creator-applied tags) • Trend baseline for Pets and PowerTools and Spice • Data can be specific to individual or by group • Load baseline back into mongoDB • Or skip if using the mongoDB-Hadoop connector! mongoDB HadoopApp(s) MapReduce
  • 26. New Profile in Detail { user: “Bob”, personalData: { zip: “10024”, gender: “M” }, tags: { PETS: { algo: “A4”, baseline: [0,0,10,4,1322,44,23, … ], hist: [ { ts: datetime1, url: url1 }, { ts: datetime2, url: url2 } // 100 more ]}, SPICE: { hist: [ baseline: [0], { ts: datetime3, url: url3 } ]} } }
  • 27. Tuesday, 1:35AM EST • Perform maintenance on user Profiles • Click history trimming (variety of algorithms) • “Dead tag” removal • Update of auxiliary reference data mongoDB HadoopApp(s) MapReduce
  • 28. New Profile in Detail { user: “Bob”, personalData: { zip: “10022”, gender: “M” }, tags: { PETS: { algo: “A4”, baseline: [ 1322,44,23, … ], hist: [ { ts: datetime1, url: url1 } // 50 more ]}, SPICE: { algo: “Z1”, hist: [ baseline: [0], { ts: datetime3, url: url3 } ]} } }
  • 29. Feel free to run the baselining more frequently … but avoid “Are We There Yet?” mongoDB HadoopApp(s) MapReduce
  • 30. Nearterm / Realtime Questions & Actions With respect to the Customer: • What has Bob done over the past 24 hours? • Given an input, make a logic decision in 100ms or less With respect to the Provider: • What are all current users doing or looking at? • Can we nearterm correlate single events to shifts in behavior?
  • 31. Longterm/ Not Realtime Questions & Actions With respect to the Customer: • Any way to explain historic performance / actions? • What are recommendations for the future? With respect to the Provider: • Can we correlate multiple events from multiple sources over a long period of time to identify trends? • What is my entire customer base doing over 2 years? • Show me a time vs. aggregate tag hit chart • Slice and dice and aggregate tags vs. XYZ • What tags are trending up or down?
  • 32. The Key To Success: It is One System mongoDB Hadoop App(s) MapReduce

Hinweis der Redaktion

  1. Hello this is Buzz Moschetti; welcome to the webinar entitled “Big Data…” if your travel plans do not include Big Data, please exit the aircraft and see a customer agent. Today we are going to explore using mongoDB and Hadoop in a well integrated way to solve a familiar but chronically thorny problem in the directed content space. We’ll cover the agenda in just a sec but first some logistics: The presentation audio & slides will be recorded and made available to you in about 24 hours. We have an hour set up but I’ll use about 40 minutes of that for the presentation with some time for questions. You can use the webex Q&A box to ask questions at any time but I will wait until the end of the presentation to address them. If you have technical issues, please send a webex message to the participant ID’d as mongoDB webinar team; otherwise keep your Qs focused on the content.
  2. I am a fan of presentations that are useful after the presentation so you’ll see lots of text, code, etc.
  3. Clear def: Lots of terms. Online, analytical. we speak of the Three V and that’s good – but then what? Also: Big Data platform MAY need to perform known, tuned operations on data AND also provide a sandbox for analysis and experimentation. For this, tech and performance/flexibilty tradeoffs, not to mention SDLC controls will likely be different. Operationalizing: One-off experiments do not equal a day to day production environment Math/Set accuracy: if you care about today’s EOD close for EMEA, not big data. If you care about the latest price adjustment for widget X, not big data Realtime means millisecond response. Not 10 seconds. Not 2 seconds.
  4. DON’T ASK Terms like online / offline are vague Looking at integration strategy puts you on the path to creating islands of tech. Unfortunate part is the tech might actually be OK but the overall solution is impaired by second-class bridging schemes.
  5. Agenda item #2: The example: A realtime directed content system using mongoDB and Hadoop.
  6. Analysts may also develop machine learning and other approaches to auto tag content.
  7. Pretty simple, eh? Let’s start with the basics.
  8. High perf CRUD includes index-optimized queries, aggregation, etc.
  9. Still very batchy
  10. Traditional approaches tend to treat the OLTP and Hadoop sides of the house separately
  11. Chicken and egg Pets & PowerTools : Both together a little scary (ha!) but it really doesn’t matter what the tags are. Taxonomy / ontology is related but independent of the Big Data machinery in play here. That is the output of the Analysts/Data Scientists By systemetized we mean where we know how we want to optimize behavior. Other tags can exist that are not systemetized – and we’ll see an example of that.
  12. It’s Sunday night – we’re going to do the weekly trend baselining The mongoDB Hadoop connector speaks MapReduce on one side and mongoDB driver API on the other. If you’re NOT running a 1000 node Hadoop clueter, chances are you can significantly benefit from the connector to Eliminate ETL Manage a single “data lake”
  13. Serving suggestion – but the important part is that the MapReduce job gets a rich BSONObject to work with, not a String[] cracked from a CSV! Useful for development VITAL for day 2 agility because rich types can flow and be addressed by name, not position.
  14. IMPORTANT: We are saving the profile back to mongoDB! This is the realtime update component.
  15. 1,000,000 entries bigger than this takes 1ms to find. Updates run at thousands per/second. personalData is there to assist algos. ActivityProfile may or may not be co-mingled with AccountProfile. PETS tag has been hit a lot SPICE is new so no algo yet…
  16. Pseudocode! The point is that the A4 algo can flexibly deal with nearterm data stored in the Profile histlist PLUS per-user aggregated baseline PLUS system-wide globals
  17. SPICE is not yet systemetized by analysts so no special algo assigned; default algo / weighting will be used. Later, analysts can change the nightly grind run.
  18. Chopped off 4 entries on baseline, trimmed up hist. Changed zip .
  19. In our example we ran baselining nightly. Running hourly or by minute does not add significant information value across large data sets. Maybe weekly is better? Less burden? More time to observe effects of changes to algos? The point is the actions you take in realtime, nearterm interaction with the system are different than those computed over huge sets of data over long periods of time.