SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
Analyze Yourselves
Agenda
Rationale
Tools
Schema Design
Instruction Set
Ola!
Norberto Leite
Technical Evangelist
Madrid, Spain
http://www.mongodb.com/norberto
@nleite
norberto@mongodb.com
Rationale
With all this big data stuff out there …
Things to consider
•  Data Handling
–  Processing
–  Storage
•  Which schema?
•  Data types to use?
•  Visualization
–  Access to data
–  Use Data
•  Usage
–  Enrichement
–  Actualization / Updates
–  Format Changes
How we can use our day-to-day data to
experiment different "bigdata" options
And all for fun … if your that kind of person
9
Feeds
Machine Data Twitter Feed Facebook Posts
scapy Implementation
Sniffer
TwitterAPI facebook-sdk
All out/inbound traffic
for the last hours
All tweets that match a
set of terms
All my personal posts
Tools
11
Tools
•  MongoDB
–  Standard query language
–  Aggregation Framework
•  Python
–  2.7.10 (yes I'm lagging behind!)
–  scapy
–  pymongo
–  TwitterAPI
–  facebook-sdk
–  Matplotlib
–  Ipython notebook
Schema Design
13
Different Approaches
•  Raw Data Collection
–  Individual Feed Collections
–  Global Feed Collections
•  Base Structured Documents
•  Time Series Model
•  Purpose Modeling
–  Read Oriented
–  Write Oriented
Raw Collections
db.network.findOne()
{
"_id": ObjectId("55fc4faf4cc75f4fa21b2f64"),
"src": "00:11:32:34:9a:b7",
"ip": {
"frag": NumberLong("0"),
"src": "192.168.1.45",
"proto": 6,
"tos": 0,
"dst": "192.168.1.39",
"chksum": 47515,
...
}
db.fb.findOne()
{
"_id": ObjectId("55fc4fa44cc75f4fa21b2de0"),
"picture": "https://fbcdn-photos-b-
a.akamaihd.net/hphotos-ak-xpf1/v/t1.0-0/
s130x130/11938079_10153567958826624_15
15311618300487358_n.jpg?
oh=0a59f8eebaea7536939c04e178fe8f29&oe
=56A52C83&__gda__=1453828245_72225acf
102eeeb4f4f02cb09d668ab9",
"story": "Norberto Leite updated his cover
photo.",
"likes": {
"paging": {
"cursors": {
...
}
db.twitter.findOne()
{
"_id":
ObjectId("55fe4d194cc75f0157a8c8b4"),
"contributors": null,
"truncated": false,
"text": "We compared #python vs #nodejs
see results: http://t.co/WVeOGWMR5V",
"in_reply_to_status_id": null,
"id": NumberLong("64547933684644659
"favorite_count": 0,
Raw Collections
Posi%ve	
   Not	
  So	
  Much	
  
Simple	
  Approach	
   Hard	
  to	
  Maintain	
  
Fast	
  to	
  Develop	
   More	
  logic	
  on	
  the	
  App	
  Layer	
  
Direct	
  Model	
  to	
  Service	
   Dependency	
  on	
  3rd	
  Party	
  Model	
  
Simple	
  direct	
  queries	
   More	
  complicated	
  to	
  Merge	
  
Results	
  
Single Raw Collection
db.raw.find()
{
"_id": ObjectId("55fe4d194cc75f0157a8c8b4"),
"contributors": null,
"truncated": false,
"text": "We compared #python vs #nodejs - see results: http://t.co/WVeOGWMR5V",
...
}
{
{
"_id": ObjectId("55fc4fa44cc75f4fa21b2de0"),
"picture": "https://fbcdn-photos-b-a.akamaihd.net/hphotos..."
...
{
"_id": ObjectId("55fc4faf4cc75f4fa21b2f64"),
"src": "00:11:32:34:9a:b7",
"ip": {
Single Raw
Posi%ve	
   Not	
  So	
  Much	
  
Single	
  Access	
  Point	
   Even	
  Harder	
  to	
  Maintain	
  
Same	
  development	
  speed	
   Loading	
  data	
  requires	
  Codecs	
  
to	
  be	
  done	
  well	
  
Faster	
  Access	
  to	
  Result	
  Set	
   More	
  complicated	
  to	
  Filter	
  
Results	
  
Semi-structure Collection
{
"_id": ObjectId("55fea46a4cc75f1848559476"),
"feed": "network",
…
]
},
"process_date": ISODate("2015-09-20T14:19:54.945Z"),
"type": 2048
}
Semi-structure Single Collection
Posi%ve	
   Not	
  So	
  Much	
  
Single	
  Access	
  Point	
   Needs	
  modeling	
  	
  
Common	
  Structure	
  to	
  all	
  data	
  
Faster	
  Access	
  to	
  Result	
  Set	
  
Single	
  "Shardable"	
  collecDon	
  
Time Series
21
Time Series
Positive Not So Much
Size Deterministic Discards Data
In-place Updates
Fast Operations – reads and
writes
Purpose Model
Purpose Model- Fan on Write
Purpose Model – Fan On Read
Instruction Set
26
Instruction Set Available
•  Standard CRUD Operations
–  Queries
–  Updates – "$set", "$inc", "$setOnInsert", "$upsert"
•  Aggregation Framework
–  Worst name ever for a framework!
•  Grouping
•  Project
•  Unwind
Takeways
28
Takeway
•  A good schema is crucial to the performance of your
system
–  Functional
–  Logical
•  Different usage of data will shape your Schema
•  Storage Engines will also be important
–  Different storage Engines perform different according
with workload
MongoDB Days 2015
5	
  November,	
  2015	
   London	
  
https://www.mongodb.com/events/mongodb-days-uk
Obrigado!
Norberto Leite
Technical Evangelist
norberto@mongodb.com
@nleite
Analyse Yourself

Weitere ähnliche Inhalte

Was ist angesagt?

Nodejs Explained with Examples
Nodejs Explained with ExamplesNodejs Explained with Examples
Nodejs Explained with ExamplesGabriele Lana
 
연구자 및 교육자를 위한 계산 및 분석 플랫폼 설계 - PyCon KR 2015
연구자 및 교육자를 위한 계산 및 분석 플랫폼 설계 - PyCon KR 2015연구자 및 교육자를 위한 계산 및 분석 플랫폼 설계 - PyCon KR 2015
연구자 및 교육자를 위한 계산 및 분석 플랫폼 설계 - PyCon KR 2015Jeongkyu Shin
 
N hidden gems in forge (as of may '17)
N hidden gems in forge (as of may '17)N hidden gems in forge (as of may '17)
N hidden gems in forge (as of may '17)Woonsan Ko
 
"Service Worker: Let Your Web App Feel Like a Native "
"Service Worker: Let Your Web App Feel Like a Native ""Service Worker: Let Your Web App Feel Like a Native "
"Service Worker: Let Your Web App Feel Like a Native "FDConf
 
Got Logs? Get Answers with Elasticsearch ELK - PuppetConf 2014
Got Logs? Get Answers with Elasticsearch ELK - PuppetConf 2014Got Logs? Get Answers with Elasticsearch ELK - PuppetConf 2014
Got Logs? Get Answers with Elasticsearch ELK - PuppetConf 2014Puppet
 
Building Your First Application with MongoDB
Building Your First Application with MongoDBBuilding Your First Application with MongoDB
Building Your First Application with MongoDBMongoDB
 
Introduction to node.js
Introduction to node.jsIntroduction to node.js
Introduction to node.jsjacekbecela
 
3 Things Everyone Knows About Node JS That You Don't
3 Things Everyone Knows About Node JS That You Don't3 Things Everyone Knows About Node JS That You Don't
3 Things Everyone Knows About Node JS That You Don'tF5 Buddy
 
HTML5 tutorial: canvas, offfline & sockets
HTML5 tutorial: canvas, offfline & socketsHTML5 tutorial: canvas, offfline & sockets
HTML5 tutorial: canvas, offfline & socketsRemy Sharp
 
Node.js introduction
Node.js introductionNode.js introduction
Node.js introductionParth Joshi
 
Intro to Node.js (v1)
Intro to Node.js (v1)Intro to Node.js (v1)
Intro to Node.js (v1)Chris Cowan
 
An Introduction to Tornado
An Introduction to TornadoAn Introduction to Tornado
An Introduction to TornadoGavin Roy
 
Even faster django
Even faster djangoEven faster django
Even faster djangoGage Tseng
 

Was ist angesagt? (20)

Nodejs Explained with Examples
Nodejs Explained with ExamplesNodejs Explained with Examples
Nodejs Explained with Examples
 
NodeJS
NodeJSNodeJS
NodeJS
 
연구자 및 교육자를 위한 계산 및 분석 플랫폼 설계 - PyCon KR 2015
연구자 및 교육자를 위한 계산 및 분석 플랫폼 설계 - PyCon KR 2015연구자 및 교육자를 위한 계산 및 분석 플랫폼 설계 - PyCon KR 2015
연구자 및 교육자를 위한 계산 및 분석 플랫폼 설계 - PyCon KR 2015
 
Selenium&scrapy
Selenium&scrapySelenium&scrapy
Selenium&scrapy
 
hacking with node.JS
hacking with node.JShacking with node.JS
hacking with node.JS
 
N hidden gems in forge (as of may '17)
N hidden gems in forge (as of may '17)N hidden gems in forge (as of may '17)
N hidden gems in forge (as of may '17)
 
"Service Worker: Let Your Web App Feel Like a Native "
"Service Worker: Let Your Web App Feel Like a Native ""Service Worker: Let Your Web App Feel Like a Native "
"Service Worker: Let Your Web App Feel Like a Native "
 
Got Logs? Get Answers with Elasticsearch ELK - PuppetConf 2014
Got Logs? Get Answers with Elasticsearch ELK - PuppetConf 2014Got Logs? Get Answers with Elasticsearch ELK - PuppetConf 2014
Got Logs? Get Answers with Elasticsearch ELK - PuppetConf 2014
 
Scrapy workshop
Scrapy workshopScrapy workshop
Scrapy workshop
 
Building Your First Application with MongoDB
Building Your First Application with MongoDBBuilding Your First Application with MongoDB
Building Your First Application with MongoDB
 
Introduction to node.js
Introduction to node.jsIntroduction to node.js
Introduction to node.js
 
3 Things Everyone Knows About Node JS That You Don't
3 Things Everyone Knows About Node JS That You Don't3 Things Everyone Knows About Node JS That You Don't
3 Things Everyone Knows About Node JS That You Don't
 
HTML5 tutorial: canvas, offfline & sockets
HTML5 tutorial: canvas, offfline & socketsHTML5 tutorial: canvas, offfline & sockets
HTML5 tutorial: canvas, offfline & sockets
 
Dancing with websocket
Dancing with websocketDancing with websocket
Dancing with websocket
 
Node.js introduction
Node.js introductionNode.js introduction
Node.js introduction
 
webworkers
webworkerswebworkers
webworkers
 
Intro to Node.js (v1)
Intro to Node.js (v1)Intro to Node.js (v1)
Intro to Node.js (v1)
 
An Introduction to Tornado
An Introduction to TornadoAn Introduction to Tornado
An Introduction to Tornado
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Even faster django
Even faster djangoEven faster django
Even faster django
 

Andere mochten auch

How To Get Hadoop App Intelligence with Driven
How To Get Hadoop App Intelligence with DrivenHow To Get Hadoop App Intelligence with Driven
How To Get Hadoop App Intelligence with DrivenCascading
 
OPENEXPO Madrid 2015 - Advanced Applications with MongoDB
OPENEXPO Madrid 2015 - Advanced Applications with MongoDB OPENEXPO Madrid 2015 - Advanced Applications with MongoDB
OPENEXPO Madrid 2015 - Advanced Applications with MongoDB MongoDB
 
Advanced applications with MongoDB
Advanced applications with MongoDBAdvanced applications with MongoDB
Advanced applications with MongoDBNorberto Leite
 
Advanced MongoDB Aggregation Pipelines
Advanced MongoDB Aggregation PipelinesAdvanced MongoDB Aggregation Pipelines
Advanced MongoDB Aggregation PipelinesTom Schreiber
 
Data Treatment MongoDB
Data Treatment MongoDBData Treatment MongoDB
Data Treatment MongoDBNorberto Leite
 
Geospatial and MongoDB
Geospatial and MongoDBGeospatial and MongoDB
Geospatial and MongoDBNorberto Leite
 
MongoDB on Financial Services Sector
MongoDB on Financial Services SectorMongoDB on Financial Services Sector
MongoDB on Financial Services SectorNorberto Leite
 
MongoDB Certification Study Group - May 2016
MongoDB Certification Study Group - May 2016MongoDB Certification Study Group - May 2016
MongoDB Certification Study Group - May 2016Norberto Leite
 
From Monolithic to Microservices in 45 Minutes
From Monolithic to Microservices in 45 MinutesFrom Monolithic to Microservices in 45 Minutes
From Monolithic to Microservices in 45 MinutesMongoDB
 
How Financial Services Organizations Use MongoDB
How Financial Services Organizations Use MongoDBHow Financial Services Organizations Use MongoDB
How Financial Services Organizations Use MongoDBMongoDB
 
Retail Reference Architecture
Retail Reference ArchitectureRetail Reference Architecture
Retail Reference ArchitectureMongoDB
 

Andere mochten auch (14)

How To Get Hadoop App Intelligence with Driven
How To Get Hadoop App Intelligence with DrivenHow To Get Hadoop App Intelligence with Driven
How To Get Hadoop App Intelligence with Driven
 
Data Distribution Theory
Data Distribution TheoryData Distribution Theory
Data Distribution Theory
 
OPENEXPO Madrid 2015 - Advanced Applications with MongoDB
OPENEXPO Madrid 2015 - Advanced Applications with MongoDB OPENEXPO Madrid 2015 - Advanced Applications with MongoDB
OPENEXPO Madrid 2015 - Advanced Applications with MongoDB
 
Advanced applications with MongoDB
Advanced applications with MongoDBAdvanced applications with MongoDB
Advanced applications with MongoDB
 
Advanced MongoDB Aggregation Pipelines
Advanced MongoDB Aggregation PipelinesAdvanced MongoDB Aggregation Pipelines
Advanced MongoDB Aggregation Pipelines
 
Data Treatment MongoDB
Data Treatment MongoDBData Treatment MongoDB
Data Treatment MongoDB
 
MongoDB + Spring
MongoDB + SpringMongoDB + Spring
MongoDB + Spring
 
MongoDB and Python
MongoDB and PythonMongoDB and Python
MongoDB and Python
 
Geospatial and MongoDB
Geospatial and MongoDBGeospatial and MongoDB
Geospatial and MongoDB
 
MongoDB on Financial Services Sector
MongoDB on Financial Services SectorMongoDB on Financial Services Sector
MongoDB on Financial Services Sector
 
MongoDB Certification Study Group - May 2016
MongoDB Certification Study Group - May 2016MongoDB Certification Study Group - May 2016
MongoDB Certification Study Group - May 2016
 
From Monolithic to Microservices in 45 Minutes
From Monolithic to Microservices in 45 MinutesFrom Monolithic to Microservices in 45 Minutes
From Monolithic to Microservices in 45 Minutes
 
How Financial Services Organizations Use MongoDB
How Financial Services Organizations Use MongoDBHow Financial Services Organizations Use MongoDB
How Financial Services Organizations Use MongoDB
 
Retail Reference Architecture
Retail Reference ArchitectureRetail Reference Architecture
Retail Reference Architecture
 

Ähnlich wie Analyse Yourself

Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Jimmy Lai
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB
 
Building your first app with MongoDB
Building your first app with MongoDBBuilding your first app with MongoDB
Building your first app with MongoDBNorberto Leite
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherMongoDB
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02BIWUG
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointJoris Poelmans
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB
 
Accra MongoDB User Group
Accra MongoDB User GroupAccra MongoDB User Group
Accra MongoDB User GroupMongoDB
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data PipelineJesus Rodriguez
 
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB MongoDB
 
Large scale computing
Large scale computing Large scale computing
Large scale computing Bhupesh Bansal
 
A general introduction to Spring Data / Neo4J
A general introduction to Spring Data / Neo4JA general introduction to Spring Data / Neo4J
A general introduction to Spring Data / Neo4JFlorent Biville
 
The Quest for an Open Source Data Science Platform
 The Quest for an Open Source Data Science Platform The Quest for an Open Source Data Science Platform
The Quest for an Open Source Data Science PlatformQAware GmbH
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBMongoDB
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about SparkGiivee The
 
Anaconda and PyData Solutions
Anaconda and PyData SolutionsAnaconda and PyData Solutions
Anaconda and PyData SolutionsTravis Oliphant
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbMongoDB APAC
 
Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...Debmalya Biswas
 

Ähnlich wie Analyse Yourself (20)

Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
 
Building your first app with MongoDB
Building your first app with MongoDBBuilding your first app with MongoDB
Building your first app with MongoDB
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop Together
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePoint
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDB
 
Accra MongoDB User Group
Accra MongoDB User GroupAccra MongoDB User Group
Accra MongoDB User Group
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data Pipeline
 
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
Neo4j in Depth
Neo4j in DepthNeo4j in Depth
Neo4j in Depth
 
A general introduction to Spring Data / Neo4J
A general introduction to Spring Data / Neo4JA general introduction to Spring Data / Neo4J
A general introduction to Spring Data / Neo4J
 
The Quest for an Open Source Data Science Platform
 The Quest for an Open Source Data Science Platform The Quest for an Open Source Data Science Platform
The Quest for an Open Source Data Science Platform
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
 
Anaconda and PyData Solutions
Anaconda and PyData SolutionsAnaconda and PyData Solutions
Anaconda and PyData Solutions
 
Publishing Linked Data using Schema.org
Publishing Linked Data using Schema.orgPublishing Linked Data using Schema.org
Publishing Linked Data using Schema.org
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...
 

Mehr von Norberto Leite

Data Modelling for MongoDB - MongoDB.local Tel Aviv
Data Modelling for MongoDB - MongoDB.local Tel AvivData Modelling for MongoDB - MongoDB.local Tel Aviv
Data Modelling for MongoDB - MongoDB.local Tel AvivNorberto Leite
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger InternalsNorberto Leite
 
MongoDB 3.2 Feature Preview
MongoDB 3.2 Feature PreviewMongoDB 3.2 Feature Preview
MongoDB 3.2 Feature PreviewNorberto Leite
 
MongoDB: Agile Combustion Engine
MongoDB: Agile Combustion EngineMongoDB: Agile Combustion Engine
MongoDB: Agile Combustion EngineNorberto Leite
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity PlanningNorberto Leite
 
Strongly Typed Languages and Flexible Schemas
Strongly Typed Languages and Flexible SchemasStrongly Typed Languages and Flexible Schemas
Strongly Typed Languages and Flexible SchemasNorberto Leite
 
Effectively Deploying MongoDB on AEM
Effectively Deploying MongoDB on AEMEffectively Deploying MongoDB on AEM
Effectively Deploying MongoDB on AEMNorberto Leite
 
Let the Tiger Roar - MongoDB 3.0
Let the Tiger Roar - MongoDB 3.0Let the Tiger Roar - MongoDB 3.0
Let the Tiger Roar - MongoDB 3.0Norberto Leite
 
MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know Norberto Leite
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity PlanningNorberto Leite
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichNorberto Leite
 

Mehr von Norberto Leite (20)

Data Modelling for MongoDB - MongoDB.local Tel Aviv
Data Modelling for MongoDB - MongoDB.local Tel AvivData Modelling for MongoDB - MongoDB.local Tel Aviv
Data Modelling for MongoDB - MongoDB.local Tel Aviv
 
Avoid Query Pitfalls
Avoid Query PitfallsAvoid Query Pitfalls
Avoid Query Pitfalls
 
MongoDB and Spark
MongoDB and SparkMongoDB and Spark
MongoDB and Spark
 
Mongo db 3.4 Overview
Mongo db 3.4 OverviewMongo db 3.4 Overview
Mongo db 3.4 Overview
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger Internals
 
MongoDB 3.2 Feature Preview
MongoDB 3.2 Feature PreviewMongoDB 3.2 Feature Preview
MongoDB 3.2 Feature Preview
 
Mongodb Spring
Mongodb SpringMongodb Spring
Mongodb Spring
 
MongoDB on Azure
MongoDB on AzureMongoDB on Azure
MongoDB on Azure
 
MongoDB: Agile Combustion Engine
MongoDB: Agile Combustion EngineMongoDB: Agile Combustion Engine
MongoDB: Agile Combustion Engine
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
 
Spark and MongoDB
Spark and MongoDBSpark and MongoDB
Spark and MongoDB
 
Python and MongoDB
Python and MongoDB Python and MongoDB
Python and MongoDB
 
Strongly Typed Languages and Flexible Schemas
Strongly Typed Languages and Flexible SchemasStrongly Typed Languages and Flexible Schemas
Strongly Typed Languages and Flexible Schemas
 
Effectively Deploying MongoDB on AEM
Effectively Deploying MongoDB on AEMEffectively Deploying MongoDB on AEM
Effectively Deploying MongoDB on AEM
 
MongoDB Ops Manager
MongoDB Ops ManagerMongoDB Ops Manager
MongoDB Ops Manager
 
Let the Tiger Roar - MongoDB 3.0
Let the Tiger Roar - MongoDB 3.0Let the Tiger Roar - MongoDB 3.0
Let the Tiger Roar - MongoDB 3.0
 
MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
 

Kürzlich hochgeladen

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Kürzlich hochgeladen (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Analyse Yourself

  • 1.
  • 4. Ola! Norberto Leite Technical Evangelist Madrid, Spain http://www.mongodb.com/norberto @nleite norberto@mongodb.com
  • 6. With all this big data stuff out there …
  • 7. Things to consider •  Data Handling –  Processing –  Storage •  Which schema? •  Data types to use? •  Visualization –  Access to data –  Use Data •  Usage –  Enrichement –  Actualization / Updates –  Format Changes
  • 8. How we can use our day-to-day data to experiment different "bigdata" options And all for fun … if your that kind of person
  • 9. 9 Feeds Machine Data Twitter Feed Facebook Posts scapy Implementation Sniffer TwitterAPI facebook-sdk All out/inbound traffic for the last hours All tweets that match a set of terms All my personal posts
  • 10. Tools
  • 11. 11 Tools •  MongoDB –  Standard query language –  Aggregation Framework •  Python –  2.7.10 (yes I'm lagging behind!) –  scapy –  pymongo –  TwitterAPI –  facebook-sdk –  Matplotlib –  Ipython notebook
  • 13. 13 Different Approaches •  Raw Data Collection –  Individual Feed Collections –  Global Feed Collections •  Base Structured Documents •  Time Series Model •  Purpose Modeling –  Read Oriented –  Write Oriented
  • 14. Raw Collections db.network.findOne() { "_id": ObjectId("55fc4faf4cc75f4fa21b2f64"), "src": "00:11:32:34:9a:b7", "ip": { "frag": NumberLong("0"), "src": "192.168.1.45", "proto": 6, "tos": 0, "dst": "192.168.1.39", "chksum": 47515, ... } db.fb.findOne() { "_id": ObjectId("55fc4fa44cc75f4fa21b2de0"), "picture": "https://fbcdn-photos-b- a.akamaihd.net/hphotos-ak-xpf1/v/t1.0-0/ s130x130/11938079_10153567958826624_15 15311618300487358_n.jpg? oh=0a59f8eebaea7536939c04e178fe8f29&oe =56A52C83&__gda__=1453828245_72225acf 102eeeb4f4f02cb09d668ab9", "story": "Norberto Leite updated his cover photo.", "likes": { "paging": { "cursors": { ... } db.twitter.findOne() { "_id": ObjectId("55fe4d194cc75f0157a8c8b4"), "contributors": null, "truncated": false, "text": "We compared #python vs #nodejs see results: http://t.co/WVeOGWMR5V", "in_reply_to_status_id": null, "id": NumberLong("64547933684644659 "favorite_count": 0,
  • 15. Raw Collections Posi%ve   Not  So  Much   Simple  Approach   Hard  to  Maintain   Fast  to  Develop   More  logic  on  the  App  Layer   Direct  Model  to  Service   Dependency  on  3rd  Party  Model   Simple  direct  queries   More  complicated  to  Merge   Results  
  • 16. Single Raw Collection db.raw.find() { "_id": ObjectId("55fe4d194cc75f0157a8c8b4"), "contributors": null, "truncated": false, "text": "We compared #python vs #nodejs - see results: http://t.co/WVeOGWMR5V", ... } { { "_id": ObjectId("55fc4fa44cc75f4fa21b2de0"), "picture": "https://fbcdn-photos-b-a.akamaihd.net/hphotos..." ... { "_id": ObjectId("55fc4faf4cc75f4fa21b2f64"), "src": "00:11:32:34:9a:b7", "ip": {
  • 17. Single Raw Posi%ve   Not  So  Much   Single  Access  Point   Even  Harder  to  Maintain   Same  development  speed   Loading  data  requires  Codecs   to  be  done  well   Faster  Access  to  Result  Set   More  complicated  to  Filter   Results  
  • 18. Semi-structure Collection { "_id": ObjectId("55fea46a4cc75f1848559476"), "feed": "network", … ] }, "process_date": ISODate("2015-09-20T14:19:54.945Z"), "type": 2048 }
  • 19. Semi-structure Single Collection Posi%ve   Not  So  Much   Single  Access  Point   Needs  modeling     Common  Structure  to  all  data   Faster  Access  to  Result  Set   Single  "Shardable"  collecDon  
  • 21. 21 Time Series Positive Not So Much Size Deterministic Discards Data In-place Updates Fast Operations – reads and writes
  • 23. Purpose Model- Fan on Write
  • 24. Purpose Model – Fan On Read
  • 26. 26 Instruction Set Available •  Standard CRUD Operations –  Queries –  Updates – "$set", "$inc", "$setOnInsert", "$upsert" •  Aggregation Framework –  Worst name ever for a framework! •  Grouping •  Project •  Unwind
  • 28. 28 Takeway •  A good schema is crucial to the performance of your system –  Functional –  Logical •  Different usage of data will shape your Schema •  Storage Engines will also be important –  Different storage Engines perform different according with workload
  • 29. MongoDB Days 2015 5  November,  2015   London   https://www.mongodb.com/events/mongodb-days-uk