SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Big Data Project
on
Crystal Ball
Submitted By:
Sushil Sedai(984474)
Suvash Shah(984461)
Submitted to:
Prof. Prem Nair
Pair approach (Mapper) – pseudo
code
method map(docid id, doc d)
for each term w in doc d do
total = 0;
for each neighbor u in Neighbor(w) do
Emit(Pair(w, u), 1);
total++;
Emit(Pair(w, *), total);
Pair approach (Mapper) – Java
Code
Pair approach (Reducer) – pseudo
code
method reduce(Pair p, Iterable<Int> values)
if p.secondValue == *
if p.firstValue is new
currentvalue = p.firstvalue;
marginal = sum(values)
else
marginal += sum(values)
else
Emit(p, sum(values)/marginal);
Pair approach (Reducer) – Java
Code
Pair approach - input
Mapper1 input
18 29 12 34 79 18 56 12 34 92
Mapper2 input
18 29 12 34 79 18 56 12 34 92
Pair approach – Output (Reducer1)
(10,12) 0.5
(10,34) 0.5
(12,10) 0.09090909090909091
(12,18) 0.09090909090909091
(12,34) 0.36363636363636365
(12,56) 0.18181818181818182
(12,79) 0.09090909090909091
(12,92) 0.18181818181818182
(18,12) 0.25
(18,29) 0.125
(18,34) 0.25
(18,56) 0.125
(18,79) 0.125
(18,92) 0.125
(29,10) 0.06666666666666667
(29,12) 0.26666666666666666
(29,18) 0.06666666666666667
(29,34) 0.26666666666666666
(29,56) 0.13333333333333333
(29,79) 0.06666666666666667
(29,92) 0.13333333333333333
(34,10) 0.08333333333333333
(34,12) 0.25
(34,18) 0.08333333333333333
(34,29) 0.08333333333333333
(34,56) 0.25
(34,79) 0.08333333333333333
(34,92) 0.16666666666666666
(56,10) 0.1
(56,12) 0.3
(56,29) 0.1
(56,34) 0.3
(56,92) 0.2
(92,10) 0.3333333333333333
(92,12) 0.3333333333333333
(92,34) 0.3333333333333333
Pair approach – Output (Reducer2)
(79,12) 0.2
(79,18) 0.2
(79,34) 0.2
(79,56) 0.2
(79,92)0.2
Stripe approach (Mapper) –
pseudo code
method map(docid id, doc d)
Stripe H;
for each term w in doc d do
clear(H);
for each neighbor u in Neighbor(w) do
if H.containsKey(u)
H{u} += 1;
else
H.add(u, 1);
Emit(w, H);
Stripe approach (Mapper) – Java
Code
Stripe approach (Reducer) –
pseudo code
total = 0;
method reduce(Text key, Stripe H [H1, H2, …])
total = sumValues(H);
for each Item h in H do
h.secondValue /= total;
Emit(key, H);
Stripe approach (Reducer) – Java
Code
Stripe appoach (Reducer) – Java
Code
Stripe approach – input
Mapper1 input
34 56 29 12 34 56 92 10 34 12
Mapper2 input
18 29 12 34 79 18 56 12 34 92
Stripe approach –
Output(Reducer1)
10 [ (34,0.5000) (12,0.5000) ]
12 [ (56,0.1818) (92,0.1818) (34,0.3636) (18,0.0909) (79,0.0909) (10,0.0909) ]
18 [ (56,0.1250) (92,0.1250) (34,0.2500) (79,0.1250) (29,0.1250) (12,0.2500) ]
29 [ (56,0.1333) (92,0.1333) (34,0.2667) (18,0.0667) (79,0.0667) (10,0.0667)
(12,0.2667) ]
34 [ (56,0.2500) (92,0.1667) (18,0.0833) (79,0.0833) (29,0.0833) (10,0.0833)
(12,0.2500) ]
56 [ (92,0.2000) (34,0.3000) (29,0.1000) (10,0.1000) (12,0.3000) ]
92 [ (34,0.3333) (10,0.3333) (12,0.3333) ]
Stripe approach –
Output(Reducer2)
79 [ (56,0.2000) (92,0.2000) (34,0.2000) (18,0.2000)
(12,0.2000) ]
Hybrid approach (Mapper) –
pseudo code
method map(docid id, doc d)
HashMap H;
for each term w in doc d do
for each neighbor u in Neighbor(w) do
if H.contains(Pair(w, u))
H{Pair(w, u)} += 1;
else
H.add(Pair(w, u));
for each Pair p in H do
Emit(p, H(p));
Hybrid approach (Mapper) – Java
Code
Hybrid approach (Reducer) –
pseudo codeprev = null;
HashMap H;
Method reduce(Pair p, Iterable<Int> values)
if p.firstValue != prev and not first
total = sumValues(H);
for each item h in H
h(prev.secondValue) /= total;
Emit(p.firstValue, H);
clear(H);
End if
prev = p.firstValue;
H.add(p.secondValue, sum(values));
Method close
//for last pair
total = sumValues(H);
for each item h in H
h(prev.secondValue) /= total;
Emit(p.firstValue, H);
Hybrid approach (Reducer) – Java
Code
Hybrid approach (Reducer) – Java
Code
Hybrid approach - Input
Mapper1 input
34 56 29 12 34 56 92 10 34 12
Mapper2 input
18 29 12 34 79 18 56 12 34 92
Hybrid approach –
Output(Reducer1)
10 (12,0.5) (34,0.5)
12 (10,0.09090909) (18,0.09090909) (34,0.36363637) (56,0.18181819) (79,0.09090909)
(92,0.18181819)
18 (12,0.25) (29,0.125) (34,0.25) (56,0.125) (79,0.125) (92,0.125)
29 (10,0.06666667) (12,0.26666668) (18,0.06666667) (34,0.26666668) (56,0.13333334)
(79,0.06666667) (92,0.13333334)
34 (10,0.083333336) (12,0.25) (18,0.083333336) (29,0.083333336) (56,0.25) (79,0.083333336)
(92,0.16666667)
56 (10,0.1) (12,0.3) (29,0.1) (34,0.3) (92,0.2)
92 (10,0.33333334) (12,0.33333334) (34,0.33333334)
Hybrid approach –
Output(Reducer2)
79 (12,0.2) (18,0.2) (34,0.2) (56,0.2) (92,0.2)
Comparison
Apache Spark
Write a java program on spark to calculate total number of
students in MUM coming in different entries.This program
should display total number student by country.
Spark - Java Code
Spark - input
2014 Feb Nepal 20
2014 Feb India 15
2014 Oct Italy 2
2014 July France 1
2015 Feb Nepal 10
2015 Feb India 25
2015 Oct Italy 7
Spark - Output
(France,1)
(Italy,9)
(Nepal,30)
(India,40)
Tools Used
• VMPlayer Pro 7
• cloudera-quickstart-vm-5.4.0-0-vmware
• EclipseVersion: Luna Service Release 2 (4.4.2)
• Windows 8.1
References
• http://glebche.appspot.com/static/hadoop-
ecosystem/mapreduce-job-java.html
• https://hadoopi.wordpress.com/2013/06/05/hadoop-
implementing-the-tool-interface-for-mapreduce-driver/
• http://www.bogotobogo.com/Hadoop/BigData_hadoop_
Apache_Spark.php
ThankYou

Weitere ähnliche Inhalte

Was ist angesagt?

OPTIMAL BINARY SEARCH
OPTIMAL BINARY SEARCHOPTIMAL BINARY SEARCH
OPTIMAL BINARY SEARCHCool Guy
 
Bellman Ford Routing Algorithm-Computer Networks
Bellman Ford Routing Algorithm-Computer NetworksBellman Ford Routing Algorithm-Computer Networks
Bellman Ford Routing Algorithm-Computer NetworksSimranJain63
 
Sharbani bhattacharya sacta 2014
Sharbani bhattacharya sacta 2014Sharbani bhattacharya sacta 2014
Sharbani bhattacharya sacta 2014Sharbani Bhattacharya
 
Lecture 15 data structures and algorithms
Lecture 15 data structures and algorithmsLecture 15 data structures and algorithms
Lecture 15 data structures and algorithmsAakash deep Singhal
 
A gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojureA gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojurePaul Lam
 
Dijkstra's Algorithm
Dijkstra's AlgorithmDijkstra's Algorithm
Dijkstra's Algorithmguest862df4e
 
Monads from Definition
Monads from DefinitionMonads from Definition
Monads from DefinitionDierk König
 
Generative Adversarial Nets
Generative Adversarial NetsGenerative Adversarial Nets
Generative Adversarial NetsJinho Lee
 
R
RR
Rexsuns
 
[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)
[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)
[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)Kobkrit Viriyayudhakorn
 
Trident International Graphics Workshop 2014 5/5
Trident International Graphics Workshop 2014 5/5Trident International Graphics Workshop 2014 5/5
Trident International Graphics Workshop 2014 5/5Takao Wada
 
Seminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mmeSeminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mmeVyacheslav Arbuzov
 
Glm talk Tomas
Glm talk TomasGlm talk Tomas
Glm talk TomasSri Ambati
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavVyacheslav Arbuzov
 
Introduction to Haskell@Open Source Conference 2007 Hokkaido
Introduction to Haskell@Open Source Conference 2007 HokkaidoIntroduction to Haskell@Open Source Conference 2007 Hokkaido
Introduction to Haskell@Open Source Conference 2007 Hokkaidoikegami__
 
Numpy tutorial(final) 20160303
Numpy tutorial(final) 20160303Numpy tutorial(final) 20160303
Numpy tutorial(final) 20160303Namgee Lee
 
Numpy python cheat_sheet
Numpy python cheat_sheetNumpy python cheat_sheet
Numpy python cheat_sheetNishant Upadhyay
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RHappy Garg
 
A Signature Scheme as Secure as the Diffie Hellman Problem
A Signature Scheme as Secure as the Diffie Hellman ProblemA Signature Scheme as Secure as the Diffie Hellman Problem
A Signature Scheme as Secure as the Diffie Hellman Problemvsubhashini
 

Was ist angesagt? (20)

OPTIMAL BINARY SEARCH
OPTIMAL BINARY SEARCHOPTIMAL BINARY SEARCH
OPTIMAL BINARY SEARCH
 
Bellman Ford Routing Algorithm-Computer Networks
Bellman Ford Routing Algorithm-Computer NetworksBellman Ford Routing Algorithm-Computer Networks
Bellman Ford Routing Algorithm-Computer Networks
 
Sharbani bhattacharya sacta 2014
Sharbani bhattacharya sacta 2014Sharbani bhattacharya sacta 2014
Sharbani bhattacharya sacta 2014
 
Lecture 15 data structures and algorithms
Lecture 15 data structures and algorithmsLecture 15 data structures and algorithms
Lecture 15 data structures and algorithms
 
A gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojureA gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojure
 
Dijkstra's Algorithm
Dijkstra's AlgorithmDijkstra's Algorithm
Dijkstra's Algorithm
 
Monads from Definition
Monads from DefinitionMonads from Definition
Monads from Definition
 
Generative Adversarial Nets
Generative Adversarial NetsGenerative Adversarial Nets
Generative Adversarial Nets
 
R
RR
R
 
[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)
[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)
[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)
 
Trident International Graphics Workshop 2014 5/5
Trident International Graphics Workshop 2014 5/5Trident International Graphics Workshop 2014 5/5
Trident International Graphics Workshop 2014 5/5
 
Seminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mmeSeminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mme
 
Glm talk Tomas
Glm talk TomasGlm talk Tomas
Glm talk Tomas
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
 
Introduction to Haskell@Open Source Conference 2007 Hokkaido
Introduction to Haskell@Open Source Conference 2007 HokkaidoIntroduction to Haskell@Open Source Conference 2007 Hokkaido
Introduction to Haskell@Open Source Conference 2007 Hokkaido
 
Numpy tutorial(final) 20160303
Numpy tutorial(final) 20160303Numpy tutorial(final) 20160303
Numpy tutorial(final) 20160303
 
Aaex2 group2
Aaex2 group2Aaex2 group2
Aaex2 group2
 
Numpy python cheat_sheet
Numpy python cheat_sheetNumpy python cheat_sheet
Numpy python cheat_sheet
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
A Signature Scheme as Secure as the Diffie Hellman Problem
A Signature Scheme as Secure as the Diffie Hellman ProblemA Signature Scheme as Secure as the Diffie Hellman Problem
A Signature Scheme as Secure as the Diffie Hellman Problem
 

Ă„hnlich wie BIG DATA PROJECT ANALYSIS

Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and SparkCrystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and SparkJivan Nepali
 
module4_dynamic programming_2022.pdf
module4_dynamic programming_2022.pdfmodule4_dynamic programming_2022.pdf
module4_dynamic programming_2022.pdfShiwani Gupta
 
Q-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeQ-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeMagdi Mohamed
 
Q-Metrics in Theory And Practice
Q-Metrics in Theory And PracticeQ-Metrics in Theory And Practice
Q-Metrics in Theory And Practiceguest3550292
 
Sensors and Samples: A Homological Approach
Sensors and Samples:  A Homological ApproachSensors and Samples:  A Homological Approach
Sensors and Samples: A Homological ApproachDon Sheehy
 
Iaetsd vlsi implementation of gabor filter based image edge detection
Iaetsd vlsi implementation of gabor filter based image edge detectionIaetsd vlsi implementation of gabor filter based image edge detection
Iaetsd vlsi implementation of gabor filter based image edge detectionIaetsd Iaetsd
 
Digital Distance Geometry
Digital Distance GeometryDigital Distance Geometry
Digital Distance Geometryppd1961
 
Cg my own programs
Cg my own programsCg my own programs
Cg my own programsAmit Kapoor
 
R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...
R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...
R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...Revolution Analytics
 
Dic rd theory_quantization_07
Dic rd theory_quantization_07Dic rd theory_quantization_07
Dic rd theory_quantization_07wtyru1989
 
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...Flink Forward
 
Algorithm Design and Analysis - Practical File
Algorithm Design and Analysis - Practical FileAlgorithm Design and Analysis - Practical File
Algorithm Design and Analysis - Practical FileKushagraChadha1
 
GraphX and Pregel - Apache Spark
GraphX and Pregel - Apache SparkGraphX and Pregel - Apache Spark
GraphX and Pregel - Apache SparkAshutosh Trivedi
 
Geolocation on Rails
Geolocation on RailsGeolocation on Rails
Geolocation on Railsnebirhos
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduceDavid Gleich
 
Vector Distance Transform Maps for Autonomous Mobile Robot Navigation
Vector Distance Transform Maps for Autonomous Mobile Robot NavigationVector Distance Transform Maps for Autonomous Mobile Robot Navigation
Vector Distance Transform Maps for Autonomous Mobile Robot NavigationJanindu Arukgoda
 
Graph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & TrendsGraph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & TrendsLuc Brun
 

Ă„hnlich wie BIG DATA PROJECT ANALYSIS (20)

Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and SparkCrystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
 
module4_dynamic programming_2022.pdf
module4_dynamic programming_2022.pdfmodule4_dynamic programming_2022.pdf
module4_dynamic programming_2022.pdf
 
Q-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeQ-Metrics in Theory and Practice
Q-Metrics in Theory and Practice
 
Q-Metrics in Theory And Practice
Q-Metrics in Theory And PracticeQ-Metrics in Theory And Practice
Q-Metrics in Theory And Practice
 
Jan 2012 HUG: RHadoop
Jan 2012 HUG: RHadoopJan 2012 HUG: RHadoop
Jan 2012 HUG: RHadoop
 
Sensors and Samples: A Homological Approach
Sensors and Samples:  A Homological ApproachSensors and Samples:  A Homological Approach
Sensors and Samples: A Homological Approach
 
Iaetsd vlsi implementation of gabor filter based image edge detection
Iaetsd vlsi implementation of gabor filter based image edge detectionIaetsd vlsi implementation of gabor filter based image edge detection
Iaetsd vlsi implementation of gabor filter based image edge detection
 
Digital Distance Geometry
Digital Distance GeometryDigital Distance Geometry
Digital Distance Geometry
 
Cg my own programs
Cg my own programsCg my own programs
Cg my own programs
 
R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...
R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...
R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...
 
Dic rd theory_quantization_07
Dic rd theory_quantization_07Dic rd theory_quantization_07
Dic rd theory_quantization_07
 
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
 
Algorithm Design and Analysis - Practical File
Algorithm Design and Analysis - Practical FileAlgorithm Design and Analysis - Practical File
Algorithm Design and Analysis - Practical File
 
GraphX and Pregel - Apache Spark
GraphX and Pregel - Apache SparkGraphX and Pregel - Apache Spark
GraphX and Pregel - Apache Spark
 
Geolocation on Rails
Geolocation on RailsGeolocation on Rails
Geolocation on Rails
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduce
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
 
Vector Distance Transform Maps for Autonomous Mobile Robot Navigation
Vector Distance Transform Maps for Autonomous Mobile Robot NavigationVector Distance Transform Maps for Autonomous Mobile Robot Navigation
Vector Distance Transform Maps for Autonomous Mobile Robot Navigation
 
Graph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & TrendsGraph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & Trends
 

KĂĽrzlich hochgeladen

Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROmotivationalword821
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 

KĂĽrzlich hochgeladen (20)

Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTRO
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 

BIG DATA PROJECT ANALYSIS

  • 1. Big Data Project on Crystal Ball Submitted By: Sushil Sedai(984474) Suvash Shah(984461) Submitted to: Prof. Prem Nair
  • 2. Pair approach (Mapper) – pseudo code method map(docid id, doc d) for each term w in doc d do total = 0; for each neighbor u in Neighbor(w) do Emit(Pair(w, u), 1); total++; Emit(Pair(w, *), total);
  • 3. Pair approach (Mapper) – Java Code
  • 4. Pair approach (Reducer) – pseudo code method reduce(Pair p, Iterable<Int> values) if p.secondValue == * if p.firstValue is new currentvalue = p.firstvalue; marginal = sum(values) else marginal += sum(values) else Emit(p, sum(values)/marginal);
  • 5. Pair approach (Reducer) – Java Code
  • 6. Pair approach - input Mapper1 input 18 29 12 34 79 18 56 12 34 92 Mapper2 input 18 29 12 34 79 18 56 12 34 92
  • 7. Pair approach – Output (Reducer1) (10,12) 0.5 (10,34) 0.5 (12,10) 0.09090909090909091 (12,18) 0.09090909090909091 (12,34) 0.36363636363636365 (12,56) 0.18181818181818182 (12,79) 0.09090909090909091 (12,92) 0.18181818181818182 (18,12) 0.25 (18,29) 0.125 (18,34) 0.25 (18,56) 0.125 (18,79) 0.125 (18,92) 0.125 (29,10) 0.06666666666666667 (29,12) 0.26666666666666666 (29,18) 0.06666666666666667 (29,34) 0.26666666666666666 (29,56) 0.13333333333333333 (29,79) 0.06666666666666667 (29,92) 0.13333333333333333 (34,10) 0.08333333333333333 (34,12) 0.25 (34,18) 0.08333333333333333 (34,29) 0.08333333333333333 (34,56) 0.25 (34,79) 0.08333333333333333 (34,92) 0.16666666666666666 (56,10) 0.1 (56,12) 0.3 (56,29) 0.1 (56,34) 0.3 (56,92) 0.2 (92,10) 0.3333333333333333 (92,12) 0.3333333333333333 (92,34) 0.3333333333333333
  • 8. Pair approach – Output (Reducer2) (79,12) 0.2 (79,18) 0.2 (79,34) 0.2 (79,56) 0.2 (79,92)0.2
  • 9. Stripe approach (Mapper) – pseudo code method map(docid id, doc d) Stripe H; for each term w in doc d do clear(H); for each neighbor u in Neighbor(w) do if H.containsKey(u) H{u} += 1; else H.add(u, 1); Emit(w, H);
  • 10. Stripe approach (Mapper) – Java Code
  • 11. Stripe approach (Reducer) – pseudo code total = 0; method reduce(Text key, Stripe H [H1, H2, …]) total = sumValues(H); for each Item h in H do h.secondValue /= total; Emit(key, H);
  • 12. Stripe approach (Reducer) – Java Code
  • 13. Stripe appoach (Reducer) – Java Code
  • 14. Stripe approach – input Mapper1 input 34 56 29 12 34 56 92 10 34 12 Mapper2 input 18 29 12 34 79 18 56 12 34 92
  • 15. Stripe approach – Output(Reducer1) 10 [ (34,0.5000) (12,0.5000) ] 12 [ (56,0.1818) (92,0.1818) (34,0.3636) (18,0.0909) (79,0.0909) (10,0.0909) ] 18 [ (56,0.1250) (92,0.1250) (34,0.2500) (79,0.1250) (29,0.1250) (12,0.2500) ] 29 [ (56,0.1333) (92,0.1333) (34,0.2667) (18,0.0667) (79,0.0667) (10,0.0667) (12,0.2667) ] 34 [ (56,0.2500) (92,0.1667) (18,0.0833) (79,0.0833) (29,0.0833) (10,0.0833) (12,0.2500) ] 56 [ (92,0.2000) (34,0.3000) (29,0.1000) (10,0.1000) (12,0.3000) ] 92 [ (34,0.3333) (10,0.3333) (12,0.3333) ]
  • 16. Stripe approach – Output(Reducer2) 79 [ (56,0.2000) (92,0.2000) (34,0.2000) (18,0.2000) (12,0.2000) ]
  • 17. Hybrid approach (Mapper) – pseudo code method map(docid id, doc d) HashMap H; for each term w in doc d do for each neighbor u in Neighbor(w) do if H.contains(Pair(w, u)) H{Pair(w, u)} += 1; else H.add(Pair(w, u)); for each Pair p in H do Emit(p, H(p));
  • 18. Hybrid approach (Mapper) – Java Code
  • 19. Hybrid approach (Reducer) – pseudo codeprev = null; HashMap H; Method reduce(Pair p, Iterable<Int> values) if p.firstValue != prev and not first total = sumValues(H); for each item h in H h(prev.secondValue) /= total; Emit(p.firstValue, H); clear(H); End if prev = p.firstValue; H.add(p.secondValue, sum(values)); Method close //for last pair total = sumValues(H); for each item h in H h(prev.secondValue) /= total; Emit(p.firstValue, H);
  • 20. Hybrid approach (Reducer) – Java Code
  • 21. Hybrid approach (Reducer) – Java Code
  • 22. Hybrid approach - Input Mapper1 input 34 56 29 12 34 56 92 10 34 12 Mapper2 input 18 29 12 34 79 18 56 12 34 92
  • 23. Hybrid approach – Output(Reducer1) 10 (12,0.5) (34,0.5) 12 (10,0.09090909) (18,0.09090909) (34,0.36363637) (56,0.18181819) (79,0.09090909) (92,0.18181819) 18 (12,0.25) (29,0.125) (34,0.25) (56,0.125) (79,0.125) (92,0.125) 29 (10,0.06666667) (12,0.26666668) (18,0.06666667) (34,0.26666668) (56,0.13333334) (79,0.06666667) (92,0.13333334) 34 (10,0.083333336) (12,0.25) (18,0.083333336) (29,0.083333336) (56,0.25) (79,0.083333336) (92,0.16666667) 56 (10,0.1) (12,0.3) (29,0.1) (34,0.3) (92,0.2) 92 (10,0.33333334) (12,0.33333334) (34,0.33333334)
  • 24. Hybrid approach – Output(Reducer2) 79 (12,0.2) (18,0.2) (34,0.2) (56,0.2) (92,0.2)
  • 26. Apache Spark Write a java program on spark to calculate total number of students in MUM coming in different entries.This program should display total number student by country.
  • 27. Spark - Java Code
  • 28. Spark - input 2014 Feb Nepal 20 2014 Feb India 15 2014 Oct Italy 2 2014 July France 1 2015 Feb Nepal 10 2015 Feb India 25 2015 Oct Italy 7
  • 30. Tools Used • VMPlayer Pro 7 • cloudera-quickstart-vm-5.4.0-0-vmware • EclipseVersion: Luna Service Release 2 (4.4.2) • Windows 8.1