SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Shafaq Abdullah
Principal, Zenprise
   MongoDB features and Architecture
   Business Intelligence: Overview
   Model of Data using SQL vs NoSQL
   Concept of MapReduce
   Real-world use-case of Business Analytics
   Scalability
   Conclusions
   Key-value
   Column
   Document-based
   Graph
   Schema less data model
   Adhoc Query
   Scalability
   High Availability
   Speed
consistency    Availability




         Partition
        Tolerance
http://www.mongodb.org
Business + smart information =
                Business Intelligence




  Consists of
   querying,
reporting, and
 analytics for
  businesses
                                    Enable
                                  business to
                                  make smart
                                  decision to
                                   execute
   Modelization of Data in SQL
       A 1-many relation of node (id, value) with
    other nodes related by two different relations


      Node                     Relation

       id                      id_node1
       value
                               id_node2
NoSQL Modelization mapped on Relational
Database Modelization



  Node                   Relation
                         c          c
   id                   _id
   value
                        value 1
                        value2
   Using Complex Type Attributes to Model data



                      Nodes

                 _id
                 valued
                 relations []
                                value 1
                                value 2
                                 …
                                value n
   No join operation required

   Nesting of documents, resulting in
    instantaneous access to retrieve nodes/

   Supporting agile method of programming

   Schema flexible adaptive to changing
    business needs
   MapReduce
    ◦ Programming model for managing large amount of
      data in a parallel fashion

    ◦ Map : Processing of a data list to create key/value
      pairs

    ◦ Reduce: Porcess above pair to create new
      aggregated key/value pairs
map(k1, v1) = list(k2,v2)

 reduce(k1, list(v2)) = list(v3)



List : (a; 2) (a; 4)(b; 4)(b; 2)(a;1)(c;5)

Map: (a;[2, 4, 1]), (b;[4,2]), (c,[5])

Reduce: (a;7), (b;6),(c;5)
   Problem


The task is to find the 2 closest
cities in each country

Given that earth as 2D plane

The distance b/w P1 (x1,y1) and
P2 (x2,y2) is computed as
Square-Root of { (x1-x2)2 + (y1-
y2)2 }
Computations   Groups
create view city_dist as
select c1.CountryID,
       c1.CityId, c1.City,
        c2.CityId as CityId2, c2.City as City2,    dist(c1,c2) as
Dist from cities c1 inner join cities c2
where c1.CountryID = c2.CountryID
and c1.CityId < c2.CityId
       /* Calculate distance between 2 cities only once */
select city_dist.* from
( select CountryID, min(Dist) as MinDist from
city_dist where Dist > 0 /* Avoid cities which
share Latitude & Longitude */ group by
CountryID ) a inner join city_dist on
a.CountryID = city_dist.CountryID and
a.MinDist = city_dist.Dist;
Map   Reduce   Finalize
function MapCode() {
emit(this.CountryID, {
     "data":
     [ { "city": this.City, "lat": this.Latitude, "lon":
   this.Longitude } ]
});
 }
function ReduceCode(key, values) {
 var reduced = {"data":[]};
 for (var i in values) {
      var inter = values[i];
             for (var j in inter.data) {

     reduced.data.push(inter.data[j]);
     }
 } return reduced;
function Finalize(key, reduced) {
   if (reduced.data.length == 1) { return { "message" : "This Country contains only 1 City" };
   }
      var min_dist = 999999999999;
   var city1 = { "name": "" };
   var city2 = { "name": "" };
      var c1; var c2; var d;
   for (var i in reduced.data) {
           for (var j in reduced.data) {
                         if (i>=j) continue;
                         c1 = reduced.data[i];
                         c2 = reduced.data[j];
                         d = Dist(c1,c2)
                         if (d < min_dist && d > 0) {
                                      min_dist = d; city1 = c1; city2 = c2;
                         }
           }
      }
   return {"city1": city1.name, "city2": city2.name, "dist": min_dist};
 }
   Shard!
   For write intensive, increase number of
    shards

   For read intensive, increase number of
    replica-sets within shards

   Best Read performance : Data in Shard
    breadth in memory
   MongoDB has application well-suited for BI

   Schemaless Model allows high-performance

   Replication with Sharding allows Scaling out
   http://www.mongodb.org/
   http://www.jaspersoft.com/
   R. Cattell. Scalable SQL and NoSQL Data
    Stores.http://www.cattell.net/datastores/Dat
    astores.pdf
   C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. R.
    Bradski, A. Y.Ng, and K. Olukotun. Map-
    reduce for machine learning on multicore. In
    NIPS, pages 281–288, 2006.
   http://www.mongovue.com/2010/11/03/yet
    -another-mongodb-map-reduce-tutorial/

Weitere ähnliche Inhalte

Ähnlich wie MongoDB MapReduce Business Intelligence

Map reduce
Map reduceMap reduce
Map reducexydii
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingcoolmirza143
 
MongoDB Distilled
MongoDB DistilledMongoDB Distilled
MongoDB Distilledb0ris_1
 
Understanding Connected Data through Visualization
Understanding Connected Data through VisualizationUnderstanding Connected Data through Visualization
Understanding Connected Data through VisualizationSebastian Müller
 
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015 Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015 Sameer Farooqui
 
Scala meetup - Intro to spark
Scala meetup - Intro to sparkScala meetup - Intro to spark
Scala meetup - Intro to sparkJavier Arrieta
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersAbhishek Singh
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkDatabricks
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAprithan
 
Boston Spark Meetup event Slides Update
Boston Spark Meetup event Slides UpdateBoston Spark Meetup event Slides Update
Boston Spark Meetup event Slides Updatevithakur
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...Adrian Florea
 
UMLtoGraphDB: Mapping Conceptual Schemas to Graph Databases
UMLtoGraphDB: Mapping Conceptual Schemas to Graph DatabasesUMLtoGraphDB: Mapping Conceptual Schemas to Graph Databases
UMLtoGraphDB: Mapping Conceptual Schemas to Graph DatabasesGwendal Daniel
 
MongoDB Live Hacking
MongoDB Live HackingMongoDB Live Hacking
MongoDB Live HackingTobias Trelle
 
“ Implimentation of SD Processor Based On CRDC Algorithm ”
“ Implimentation of SD Processor Based On CRDC Algorithm ”“ Implimentation of SD Processor Based On CRDC Algorithm ”
“ Implimentation of SD Processor Based On CRDC Algorithm ”inventionjournals
 
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)Spark Summit
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingNesreen K. Ahmed
 
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel Computing
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel ComputingIRJET- Performance Analysis of RSA Algorithm with CUDA Parallel Computing
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel ComputingIRJET Journal
 
2015 02-09 - NoSQL Vorlesung Mosbach
2015 02-09 - NoSQL Vorlesung Mosbach2015 02-09 - NoSQL Vorlesung Mosbach
2015 02-09 - NoSQL Vorlesung MosbachJohannes Hoppe
 

Ähnlich wie MongoDB MapReduce Business Intelligence (20)

Map reduce
Map reduceMap reduce
Map reduce
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
 
MongoDB Distilled
MongoDB DistilledMongoDB Distilled
MongoDB Distilled
 
Understanding Connected Data through Visualization
Understanding Connected Data through VisualizationUnderstanding Connected Data through Visualization
Understanding Connected Data through Visualization
 
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015 Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
 
Scala meetup - Intro to spark
Scala meetup - Intro to sparkScala meetup - Intro to spark
Scala meetup - Intro to spark
 
Mapreduce Osdi04
Mapreduce Osdi04Mapreduce Osdi04
Mapreduce Osdi04
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDA
 
Boston Spark Meetup event Slides Update
Boston Spark Meetup event Slides UpdateBoston Spark Meetup event Slides Update
Boston Spark Meetup event Slides Update
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
 
UMLtoGraphDB: Mapping Conceptual Schemas to Graph Databases
UMLtoGraphDB: Mapping Conceptual Schemas to Graph DatabasesUMLtoGraphDB: Mapping Conceptual Schemas to Graph Databases
UMLtoGraphDB: Mapping Conceptual Schemas to Graph Databases
 
Big data
Big dataBig data
Big data
 
MongoDB Live Hacking
MongoDB Live HackingMongoDB Live Hacking
MongoDB Live Hacking
 
“ Implimentation of SD Processor Based On CRDC Algorithm ”
“ Implimentation of SD Processor Based On CRDC Algorithm ”“ Implimentation of SD Processor Based On CRDC Algorithm ”
“ Implimentation of SD Processor Based On CRDC Algorithm ”
 
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel Computing
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel ComputingIRJET- Performance Analysis of RSA Algorithm with CUDA Parallel Computing
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel Computing
 
2015 02-09 - NoSQL Vorlesung Mosbach
2015 02-09 - NoSQL Vorlesung Mosbach2015 02-09 - NoSQL Vorlesung Mosbach
2015 02-09 - NoSQL Vorlesung Mosbach
 

Kürzlich hochgeladen

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Kürzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

MongoDB MapReduce Business Intelligence

  • 2. MongoDB features and Architecture  Business Intelligence: Overview  Model of Data using SQL vs NoSQL  Concept of MapReduce  Real-world use-case of Business Analytics  Scalability  Conclusions
  • 3. Key-value  Column  Document-based  Graph
  • 4. Schema less data model  Adhoc Query  Scalability  High Availability  Speed
  • 5. consistency Availability Partition Tolerance
  • 7. Business + smart information = Business Intelligence Consists of querying, reporting, and analytics for businesses Enable business to make smart decision to execute
  • 8. Modelization of Data in SQL A 1-many relation of node (id, value) with other nodes related by two different relations Node Relation id id_node1 value id_node2
  • 9. NoSQL Modelization mapped on Relational Database Modelization Node Relation c c id _id value value 1 value2
  • 10. Using Complex Type Attributes to Model data Nodes _id valued relations [] value 1 value 2 … value n
  • 11. No join operation required  Nesting of documents, resulting in instantaneous access to retrieve nodes/  Supporting agile method of programming  Schema flexible adaptive to changing business needs
  • 12. MapReduce ◦ Programming model for managing large amount of data in a parallel fashion ◦ Map : Processing of a data list to create key/value pairs ◦ Reduce: Porcess above pair to create new aggregated key/value pairs
  • 13. map(k1, v1) = list(k2,v2) reduce(k1, list(v2)) = list(v3) List : (a; 2) (a; 4)(b; 4)(b; 2)(a;1)(c;5) Map: (a;[2, 4, 1]), (b;[4,2]), (c,[5]) Reduce: (a;7), (b;6),(c;5)
  • 14.
  • 15. Problem The task is to find the 2 closest cities in each country Given that earth as 2D plane The distance b/w P1 (x1,y1) and P2 (x2,y2) is computed as Square-Root of { (x1-x2)2 + (y1- y2)2 }
  • 16.
  • 17. Computations Groups
  • 18. create view city_dist as select c1.CountryID, c1.CityId, c1.City, c2.CityId as CityId2, c2.City as City2, dist(c1,c2) as Dist from cities c1 inner join cities c2 where c1.CountryID = c2.CountryID and c1.CityId < c2.CityId /* Calculate distance between 2 cities only once */
  • 19. select city_dist.* from ( select CountryID, min(Dist) as MinDist from city_dist where Dist > 0 /* Avoid cities which share Latitude & Longitude */ group by CountryID ) a inner join city_dist on a.CountryID = city_dist.CountryID and a.MinDist = city_dist.Dist;
  • 20.
  • 21. Map Reduce Finalize
  • 22. function MapCode() { emit(this.CountryID, { "data": [ { "city": this.City, "lat": this.Latitude, "lon": this.Longitude } ] }); }
  • 23.
  • 24. function ReduceCode(key, values) { var reduced = {"data":[]}; for (var i in values) { var inter = values[i]; for (var j in inter.data) { reduced.data.push(inter.data[j]); } } return reduced;
  • 25. function Finalize(key, reduced) { if (reduced.data.length == 1) { return { "message" : "This Country contains only 1 City" }; } var min_dist = 999999999999; var city1 = { "name": "" }; var city2 = { "name": "" }; var c1; var c2; var d; for (var i in reduced.data) { for (var j in reduced.data) { if (i>=j) continue; c1 = reduced.data[i]; c2 = reduced.data[j]; d = Dist(c1,c2) if (d < min_dist && d > 0) { min_dist = d; city1 = c1; city2 = c2; } } } return {"city1": city1.name, "city2": city2.name, "dist": min_dist}; }
  • 26. Shard!  For write intensive, increase number of shards  For read intensive, increase number of replica-sets within shards  Best Read performance : Data in Shard breadth in memory
  • 27.
  • 28. MongoDB has application well-suited for BI  Schemaless Model allows high-performance  Replication with Sharding allows Scaling out
  • 29. http://www.mongodb.org/  http://www.jaspersoft.com/  R. Cattell. Scalable SQL and NoSQL Data Stores.http://www.cattell.net/datastores/Dat astores.pdf  C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. R. Bradski, A. Y.Ng, and K. Olukotun. Map- reduce for machine learning on multicore. In NIPS, pages 281–288, 2006.  http://www.mongovue.com/2010/11/03/yet -another-mongodb-map-reduce-tutorial/