SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Big data
PREPARE BY:
AHMED ALTAYEB SHEIKH EDREES
FAWAZ AWAD YAHIA ABDELGADIR
Outlines
1. INTRODUCTION
2. WHAT IS BIG DATA
3. BIG DATA GENERATORS
4. CHARACTERISTIC OF BIG DATA
5. BENEFIT OF BIG DATA
6. HADOOP
 HDFS
 Map Reduce
7. BI VS BIG DATA
Introduction
What is Big data
Is very large data sets that may be analyzed computationally to reveal patterns,
trends, and associations, especially relating to Customers behavior and
interactions.
Big Data in general is defined as high volume, velocity and variety information
assets that demand cost-effective, innovative forms of information processing
for enhanced insight and decision making.”
A technology term about Data that becomes too large to be managed in a
manner that is previously known to work normally.
Big Data generators
This data comes from everywhere:
sensors used to gather climate information,
posts to social media sites,
digital pictures
online Shopping
Airlines
purchase transaction records, and many more…
This data is “ big data.”
Characteristic
“Big data is the data characterized by 3 attributes: volume, velocity and variety .”
Volume
It is the size of the data which determines the value and potential of the data under
consideration. The name ‘Big Data’ itself contains a term which is related to size and
hence the characteristic.
Variety
Data today comes in all types of formats. Structured, numeric data in traditional
databases. Unstructured text documents, email, stock ticker data and financial
transactions and semi-structured data too.
Velocity
speed of generation of data or how fast the data is generated and processed to meet the
demands and the challenges which lie ahead in the path of growth and development.
FB generates 100TB daily
Twitter generates 8TB of data Daily
Benefit of Big data
Cost Reduction from Big Data Technologies
Time Reduction from Big Data
Developing New Big Data-Based Offerings
Supporting Internal Business Decisions
Real-time big data isn’t just a process for storing petabytes or Exabyte's of data in
a data warehouse, It’s about the ability to make better decisions and take
meaningful actions at the right time.
What is Hadoop
Flexible and available architecture for large scale computation and data
performance on a network of commodity hardware
Framework that allows for distributed processing of large data sets across clusters
of commodity servers
– Store large amount of data
– Process the large amount of data stored
Why Hadoop ?
open source,
highly reliable,
distributed data processing platform
Handles large amounts of data
Stores data in native format
Delivers linear scalability at low cost
Resilient in case of infrastructure failures
Transparent application scalability
HDFS Hadoop Distributed File System
HDFS enables Hadoop to store huge files. It’s a scalable file system
that distributes and stores data across all machines in a Hadoop cluster.
Scale-Out Architecture - Add servers to increase capacity
High Availability - Serve mission-critical workflows and applications
Fault Tolerance - Automatically and seamlessly recover from failures
Load Balancing - Place data intelligently for maximum efficiency and
utilization
Tunable Replication - Multiple copies of each file provide data protection and
computational performance
Namenode and datanode
DataNode- There is a piece of software running on each of these nodes of the cluster called Datanode
which
runs on slave nodes which make up the majority of the machines of a cluster. The name node
places the data into these data nodes.
NameNode- It Runs on a master node that tracks and directs the storage of the cluster.
Also we know that the nodes or blocks which make up the original 150 MB file and
that is handled by a separate machine is the Namenode. Information stored here is
called as metadata.
MapReduce
MapReduce is a programming model for processing large data sets with a parallel,
distributed algorithm on a cluster
Scale-out Architecture - Add servers to increase processing power
Security & Authentication - Works with HDFS security to make sure that only
approved users can operate against the data in the system
Resource Manager - Employs data locality and server resources to determine optimal
computing operations
Optimized Scheduling - Completes jobs according to prioritization
Flexibility - Procedures can be written in virtually any programming language
Resiliency & High Availability - Multiple job and task trackers ensure that jobs fail
independently and restart automatically
BI VS Big Data
 big data and hadoop
 big data and hadoop

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshersrajkamaltibacademy
 
Big data introduction
Big data introductionBig data introduction
Big data introductionChirag Ahuja
 
Big data presentation
Big data presentationBig data presentation
Big data presentationSreeSowmya7
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations PresentationAdam Doyle
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampSpotle.ai
 
Big data management
Big data managementBig data management
Big data managementzeba khanam
 
Core concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsCore concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsKaniska Mandal
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizITJobZone.biz
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentationAmrut Patil
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache HadoopSuman Saurabh
 

Was ist angesagt? (20)

Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshers
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
 
Big data presentation
Big data presentationBig data presentation
Big data presentation
 
Big data
Big dataBig data
Big data
 
Big Data
Big DataBig Data
Big Data
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
Big Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning GuruBig Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning Guru
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 
Big data Ppt
Big data PptBig data Ppt
Big data Ppt
 
Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
 
Big data management
Big data managementBig data management
Big data management
 
Big data hadoop
Big data hadoopBig data hadoop
Big data hadoop
 
Core concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsCore concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data Analytics
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
 
Big data
Big dataBig data
Big data
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 

Andere mochten auch

Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
Beyond Big Data: Harnessing the Industrial Internet for Wind Power
Beyond Big Data: Harnessing the Industrial Internet for Wind PowerBeyond Big Data: Harnessing the Industrial Internet for Wind Power
Beyond Big Data: Harnessing the Industrial Internet for Wind PowerGE_India
 
Big Data Analytics for the Industrial Internet of Things
Big Data Analytics for the Industrial Internet of ThingsBig Data Analytics for the Industrial Internet of Things
Big Data Analytics for the Industrial Internet of ThingsAnthony Chen
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBernard Marr
 

Andere mochten auch (6)

Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
 
Beyond Big Data: Harnessing the Industrial Internet for Wind Power
Beyond Big Data: Harnessing the Industrial Internet for Wind PowerBeyond Big Data: Harnessing the Industrial Internet for Wind Power
Beyond Big Data: Harnessing the Industrial Internet for Wind Power
 
GE Predix - The IIoT Platform
GE Predix - The IIoT PlatformGE Predix - The IIoT Platform
GE Predix - The IIoT Platform
 
5 Big Data Use Cases for 2013
5 Big Data Use Cases for 20135 Big Data Use Cases for 2013
5 Big Data Use Cases for 2013
 
Big Data Analytics for the Industrial Internet of Things
Big Data Analytics for the Industrial Internet of ThingsBig Data Analytics for the Industrial Internet of Things
Big Data Analytics for the Industrial Internet of Things
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 

Ähnlich wie big data and hadoop

Big data peresintaion
Big data peresintaion Big data peresintaion
Big data peresintaion ahmed alshikh
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeSysfore Technologies
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation Shivanee garg
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Scienceijtsrd
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopSri Kanth
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overviewNitesh Ghosh
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringIRJET Journal
 

Ähnlich wie big data and hadoop (20)

Big data peresintaion
Big data peresintaion Big data peresintaion
Big data peresintaion
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Big Data
Big DataBig Data
Big Data
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
paper
paperpaper
paper
 
Hadoop
HadoopHadoop
Hadoop
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 
IJARCCE_49
IJARCCE_49IJARCCE_49
IJARCCE_49
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
 
Bigdata overview
Bigdata overviewBigdata overview
Bigdata overview
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and Storing
 

Kürzlich hochgeladen

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Kürzlich hochgeladen (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

big data and hadoop

  • 1. Big data PREPARE BY: AHMED ALTAYEB SHEIKH EDREES FAWAZ AWAD YAHIA ABDELGADIR
  • 2. Outlines 1. INTRODUCTION 2. WHAT IS BIG DATA 3. BIG DATA GENERATORS 4. CHARACTERISTIC OF BIG DATA 5. BENEFIT OF BIG DATA 6. HADOOP  HDFS  Map Reduce 7. BI VS BIG DATA
  • 4.
  • 5. What is Big data Is very large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to Customers behavior and interactions. Big Data in general is defined as high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.” A technology term about Data that becomes too large to be managed in a manner that is previously known to work normally.
  • 6. Big Data generators This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures online Shopping Airlines purchase transaction records, and many more… This data is “ big data.”
  • 7. Characteristic “Big data is the data characterized by 3 attributes: volume, velocity and variety .”
  • 8. Volume It is the size of the data which determines the value and potential of the data under consideration. The name ‘Big Data’ itself contains a term which is related to size and hence the characteristic.
  • 9. Variety Data today comes in all types of formats. Structured, numeric data in traditional databases. Unstructured text documents, email, stock ticker data and financial transactions and semi-structured data too.
  • 10. Velocity speed of generation of data or how fast the data is generated and processed to meet the demands and the challenges which lie ahead in the path of growth and development.
  • 11. FB generates 100TB daily Twitter generates 8TB of data Daily
  • 12. Benefit of Big data Cost Reduction from Big Data Technologies Time Reduction from Big Data Developing New Big Data-Based Offerings Supporting Internal Business Decisions Real-time big data isn’t just a process for storing petabytes or Exabyte's of data in a data warehouse, It’s about the ability to make better decisions and take meaningful actions at the right time.
  • 13. What is Hadoop Flexible and available architecture for large scale computation and data performance on a network of commodity hardware Framework that allows for distributed processing of large data sets across clusters of commodity servers – Store large amount of data – Process the large amount of data stored
  • 14. Why Hadoop ? open source, highly reliable, distributed data processing platform Handles large amounts of data Stores data in native format Delivers linear scalability at low cost Resilient in case of infrastructure failures Transparent application scalability
  • 15. HDFS Hadoop Distributed File System HDFS enables Hadoop to store huge files. It’s a scalable file system that distributes and stores data across all machines in a Hadoop cluster. Scale-Out Architecture - Add servers to increase capacity High Availability - Serve mission-critical workflows and applications Fault Tolerance - Automatically and seamlessly recover from failures Load Balancing - Place data intelligently for maximum efficiency and utilization Tunable Replication - Multiple copies of each file provide data protection and computational performance
  • 16. Namenode and datanode DataNode- There is a piece of software running on each of these nodes of the cluster called Datanode which runs on slave nodes which make up the majority of the machines of a cluster. The name node places the data into these data nodes. NameNode- It Runs on a master node that tracks and directs the storage of the cluster. Also we know that the nodes or blocks which make up the original 150 MB file and that is handled by a separate machine is the Namenode. Information stored here is called as metadata.
  • 17.
  • 18.
  • 19. MapReduce MapReduce is a programming model for processing large data sets with a parallel, distributed algorithm on a cluster Scale-out Architecture - Add servers to increase processing power Security & Authentication - Works with HDFS security to make sure that only approved users can operate against the data in the system Resource Manager - Employs data locality and server resources to determine optimal computing operations Optimized Scheduling - Completes jobs according to prioritization Flexibility - Procedures can be written in virtually any programming language Resiliency & High Availability - Multiple job and task trackers ensure that jobs fail independently and restart automatically
  • 20.
  • 21. BI VS Big Data