SlideShare ist ein Scribd-Unternehmen logo
1 von 43
An Insight into Map Reduce and related technology Renjith Peediackal 09BM8040
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Importance of understanding MR
The case for Map Reduce
Recommendation System ,[object Object],[object Object],[object Object],[object Object]
A lot more questions ,[object Object],What pages are my customer’s reading
A lot more questions contd.. ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
And the typical problems with recommendation systems
Problems with popularity ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Mixing expert opinion ,[object Object],[object Object],[object Object]
Pearls of wisdom in the net
But internet data is unfriendly ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The case for a new technique ,[object Object],[object Object]
What is Data in flight? ,[object Object],[object Object],[object Object],[object Object],[object Object]
Map and reduce ,[object Object],[object Object]
Terminology explained.. ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How does a map reduce programme work  Programmer has to specify two methods: Map and Reduce
map (k, v) -> <k', v'>* ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Shuffle or sort ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
–  reduce (k', <v'>*) -> <k', v'>* ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How this new way helpful for our recommendation system? ,[object Object],[object Object],[object Object],[object Object]
Maps into a denser smaller table
Fault tolerance two different types- Database school of thought
Fault tolerance two different types- MR school of thought
Hierarchy of Parallelism:  Cycle of brute force fault tolerance
Criticisms ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Why it is valuable still? ,[object Object],[object Object],[object Object]
Why can’t parallel DB deliver the same? ,[object Object],[object Object],[object Object],[object Object],[object Object]
An example: Invite you to the complexity-sequential web access-based recommendation system
sequential web access-based recommendation system ,[object Object]
Recommendation ,[object Object],[object Object],[object Object],[object Object]
Some details ,[object Object],[object Object],[object Object],Session ID  Web access sequence  1  abdac  2  eaebcac  3  babfae  4  abbacfc
Details Access events can be classified into frequent and infrequent based on frequency crossing a threshold level And a tree consisting of frequent access events can be created. Length of sequence  Sequential web access pattern with support  1  a:4. b:4, c:3   2  aa:4. ab:4. oc3. ba:4. bc:3   3  aac:3, aba;4, obc:3, bac:3   4  Abac:3
 
The Map and reduce ,[object Object],[object Object],[object Object],[object Object],[object Object]
DBMS for the same case? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Parallel DB vs MapReduce ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Summary of advantages of MR ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Is mapReduce the final word?
What is hadoop ,[object Object],[object Object]
Pig ,[object Object],[object Object],[object Object],[object Object]
What does hive do? ,[object Object],[object Object],[object Object],[object Object]
Hive architecture
Many more tools But

Weitere ähnliche Inhalte

Was ist angesagt?

How To Buy Data Warehouse
How To Buy Data WarehouseHow To Buy Data Warehouse
How To Buy Data Warehouse
Eric Sun
 
Using Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data ManagementUsing Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data Management
DataWorks Summit
 
Belladati Meetup Singapore Workshop
Belladati Meetup Singapore WorkshopBelladati Meetup Singapore Workshop
Belladati Meetup Singapore Workshop
belladati
 

Was ist angesagt? (20)

Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
How To Buy Data Warehouse
How To Buy Data WarehouseHow To Buy Data Warehouse
How To Buy Data Warehouse
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy
 
Using Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data ManagementUsing Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data Management
 
Finding business value in Big Data
Finding business value in Big DataFinding business value in Big Data
Finding business value in Big Data
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Power BI Advanced Data Modeling Virtual Workshop
Power BI Advanced Data Modeling Virtual WorkshopPower BI Advanced Data Modeling Virtual Workshop
Power BI Advanced Data Modeling Virtual Workshop
 
Ibm watson
Ibm watsonIbm watson
Ibm watson
 
Building enterprise advance analytics platform
Building enterprise advance analytics platformBuilding enterprise advance analytics platform
Building enterprise advance analytics platform
 
Analytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopAnalytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual Workshop
 
Architecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case StudyArchitecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case Study
 
Mammothdb - Public VC Pitchdeck!
Mammothdb - Public VC Pitchdeck!Mammothdb - Public VC Pitchdeck!
Mammothdb - Public VC Pitchdeck!
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
 
4870 ibm-storage-solutions-final_nov26_18_34019934_usen
4870  ibm-storage-solutions-final_nov26_18_34019934_usen4870  ibm-storage-solutions-final_nov26_18_34019934_usen
4870 ibm-storage-solutions-final_nov26_18_34019934_usen
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
 
Scalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and HowScalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and How
 
Building Modern Data Platform with AWS
Building Modern Data Platform with AWSBuilding Modern Data Platform with AWS
Building Modern Data Platform with AWS
 
Belladati Meetup Singapore Workshop
Belladati Meetup Singapore WorkshopBelladati Meetup Singapore Workshop
Belladati Meetup Singapore Workshop
 
How we Learned to Stop Worrying and Solve the Distributed Graph Problem
How we Learned to Stop Worrying and Solve the Distributed Graph ProblemHow we Learned to Stop Worrying and Solve the Distributed Graph Problem
How we Learned to Stop Worrying and Solve the Distributed Graph Problem
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 

Ähnlich wie Mr bi

The Recent Pronouncement Of The World Wide Web (Www) Had
The Recent Pronouncement Of The World Wide Web (Www) HadThe Recent Pronouncement Of The World Wide Web (Www) Had
The Recent Pronouncement Of The World Wide Web (Www) Had
Deborah Gastineau
 
Discussion post· The proper implementation of a database is es.docx
Discussion post· The proper implementation of a database is es.docxDiscussion post· The proper implementation of a database is es.docx
Discussion post· The proper implementation of a database is es.docx
madlynplamondon
 
What does Scott do?
What does Scott do?What does Scott do?
What does Scott do?
Scott Taylor
 
Business Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search EngineBusiness Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search Engine
ankur881120
 
Document Based Data Modeling Technique
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling Technique
Carmen Sanborn
 
locotalk-whitepaper-2016
locotalk-whitepaper-2016locotalk-whitepaper-2016
locotalk-whitepaper-2016
Anthony Wijnen
 

Ähnlich wie Mr bi (20)

The Recent Pronouncement Of The World Wide Web (Www) Had
The Recent Pronouncement Of The World Wide Web (Www) HadThe Recent Pronouncement Of The World Wide Web (Www) Had
The Recent Pronouncement Of The World Wide Web (Www) Had
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
Discussion post· The proper implementation of a database is es.docx
Discussion post· The proper implementation of a database is es.docxDiscussion post· The proper implementation of a database is es.docx
Discussion post· The proper implementation of a database is es.docx
 
How to Radically Simplify Your Business Data Management
How to Radically Simplify Your Business Data ManagementHow to Radically Simplify Your Business Data Management
How to Radically Simplify Your Business Data Management
 
What does Scott do?
What does Scott do?What does Scott do?
What does Scott do?
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
TechoERP.pdf
TechoERP.pdfTechoERP.pdf
TechoERP.pdf
 
Business Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search EngineBusiness Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search Engine
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeFishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake
 
Dwh faqs
Dwh faqsDwh faqs
Dwh faqs
 
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSUSING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Document Based Data Modeling Technique
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling Technique
 
Why Data Virtualization? An Introduction by Denodo
Why Data Virtualization? An Introduction by DenodoWhy Data Virtualization? An Introduction by Denodo
Why Data Virtualization? An Introduction by Denodo
 
Enabling SQL Access to Data Lakes
Enabling SQL Access to Data LakesEnabling SQL Access to Data Lakes
Enabling SQL Access to Data Lakes
 
MongoDb Schema Pattern - Kalpit Pandit.pptx
MongoDb Schema Pattern - Kalpit Pandit.pptxMongoDb Schema Pattern - Kalpit Pandit.pptx
MongoDb Schema Pattern - Kalpit Pandit.pptx
 
locotalk-whitepaper-2016
locotalk-whitepaper-2016locotalk-whitepaper-2016
locotalk-whitepaper-2016
 
MongoDB and In-Memory Computing
MongoDB and In-Memory ComputingMongoDB and In-Memory Computing
MongoDB and In-Memory Computing
 
Overcoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDB
 

Kürzlich hochgeladen

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Kürzlich hochgeladen (20)

Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 

Mr bi