SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
BIG

DATA
WHY NOW ?
World’s information totaled over

2 Zetabytes
That’s 2 Trillion Gigabytes

By 2020, this number will be

35 Trillion ZB
“world’s data is doubling every

1.2 years”
“80% of this data is unstructured”
5V
Money
Big Table
Google File
System
Map
Reduce
2003

2004

2005

2006

Impala

Amazon
Dremel
Dynamo
Apache
Hadoop
Apache
Cassandra
2007

2008

2009

2010

2011

2012

Spanner ?

2013

2013

Today
Analytics

Realtime

(Hadoop)

(“NoSql”)
THE ECOSYSTEM
Hadoop Ecosystem
Apache Hadoop is an open-source software
framework that supports running applications on
large clusters of commodity hardware.
Replication
Fault Tolerant
Commodity Hardware
Map Reduce
Map Reduce
Word Count
World's largest biometric identity platform

2,00,00,00,00,000

Biometric Matches

2 PB

Data

Hadoop

Stack
This is just the Beginning of
This is just the Beginning of
“Big Data Revolution”
“Big Data Revolution”
sameer.sawhney@gmail.com
@sameersaw at twitter

Images
Raymond Bryson
Marius B
IntelFreePress License
Pedro Moura Pinheiro

Weitere ähnliche Inhalte

Was ist angesagt?

Genome-scale Big Data Pipelines
Genome-scale Big Data PipelinesGenome-scale Big Data Pipelines
Genome-scale Big Data PipelinesLynn Langit
 
0 to kaggle in 30 minutes
0 to kaggle in 30 minutes0 to kaggle in 30 minutes
0 to kaggle in 30 minutesmiztsai
 
Overview of bigdata
Overview of bigdataOverview of bigdata
Overview of bigdataAbinaya B
 
VariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomicsVariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomicsLynn Langit
 
Industry trends.v0.1pptx
Industry trends.v0.1pptxIndustry trends.v0.1pptx
Industry trends.v0.1pptxArindam Banerji
 
Biq query devfest2017_slides
Biq query devfest2017_slidesBiq query devfest2017_slides
Biq query devfest2017_slidesgetdinesh
 
See the forest AND the trees: Free tools for data visualisation
See the forest AND the trees: Free tools for data visualisationSee the forest AND the trees: Free tools for data visualisation
See the forest AND the trees: Free tools for data visualisationPaul Rowe
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Sciencesarith divakar
 
Genomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesGenomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesLynn Langit
 
Opportunities for Genetic Programming Researchers in Blockchain
Opportunities for Genetic Programming Researchers in BlockchainOpportunities for Genetic Programming Researchers in Blockchain
Opportunities for Genetic Programming Researchers in BlockchainTrent McConaghy
 
The Industry 4.0 revolution
The Industry 4.0 revolutionThe Industry 4.0 revolution
The Industry 4.0 revolutionKwanwoo Park
 
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...Alex Liu
 
Customer Case Study: How Novel Compute Technology Transforms Medical and Life...
Customer Case Study: How Novel Compute Technology Transforms Medical and Life...Customer Case Study: How Novel Compute Technology Transforms Medical and Life...
Customer Case Study: How Novel Compute Technology Transforms Medical and Life...Amazon Web Services
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Arohi Khandelwal
 
Big Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveBig Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveHien Luu
 

Was ist angesagt? (20)

Big Data
Big DataBig Data
Big Data
 
Genome-scale Big Data Pipelines
Genome-scale Big Data PipelinesGenome-scale Big Data Pipelines
Genome-scale Big Data Pipelines
 
0 to kaggle in 30 minutes
0 to kaggle in 30 minutes0 to kaggle in 30 minutes
0 to kaggle in 30 minutes
 
Overview of bigdata
Overview of bigdataOverview of bigdata
Overview of bigdata
 
Hadoop
HadoopHadoop
Hadoop
 
VariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomicsVariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomics
 
Hadoop
HadoopHadoop
Hadoop
 
Industry trends.v0.1pptx
Industry trends.v0.1pptxIndustry trends.v0.1pptx
Industry trends.v0.1pptx
 
Biq query devfest2017_slides
Biq query devfest2017_slidesBiq query devfest2017_slides
Biq query devfest2017_slides
 
See the forest AND the trees: Free tools for data visualisation
See the forest AND the trees: Free tools for data visualisationSee the forest AND the trees: Free tools for data visualisation
See the forest AND the trees: Free tools for data visualisation
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Genomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesGenomic Scale Big Data Pipelines
Genomic Scale Big Data Pipelines
 
Opportunities for Genetic Programming Researchers in Blockchain
Opportunities for Genetic Programming Researchers in BlockchainOpportunities for Genetic Programming Researchers in Blockchain
Opportunities for Genetic Programming Researchers in Blockchain
 
The Industry 4.0 revolution
The Industry 4.0 revolutionThe Industry 4.0 revolution
The Industry 4.0 revolution
 
Easylearning Guru online Hadoop class
Easylearning Guru online Hadoop class Easylearning Guru online Hadoop class
Easylearning Guru online Hadoop class
 
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
 
big data and hadoop
big data and hadoopbig data and hadoop
big data and hadoop
 
Customer Case Study: How Novel Compute Technology Transforms Medical and Life...
Customer Case Study: How Novel Compute Technology Transforms Medical and Life...Customer Case Study: How Novel Compute Technology Transforms Medical and Life...
Customer Case Study: How Novel Compute Technology Transforms Medical and Life...
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop
 
Big Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveBig Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's Perspective
 

Ähnlich wie Big Data

Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big DecisionsInnoTech
 
Predictive modelling with azure ml
Predictive modelling with azure mlPredictive modelling with azure ml
Predictive modelling with azure mlKoray Kocabas
 
The Walking Data
The Walking DataThe Walking Data
The Walking DataJESS3
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1gauravsc36
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataSitaram Kotnis
 
State of Big Data Markets
State of Big Data MarketsState of Big Data Markets
State of Big Data MarketsKyle Redinger
 
Big Data Basic Concepts | Presented in 2014
Big Data Basic Concepts  | Presented in 2014Big Data Basic Concepts  | Presented in 2014
Big Data Basic Concepts | Presented in 2014Kenneth Igiri
 
Where does hadoop come handy
Where does hadoop come handyWhere does hadoop come handy
Where does hadoop come handyPraveen Sripati
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Eli White
 
Hw09 Protein Alignment
Hw09   Protein AlignmentHw09   Protein Alignment
Hw09 Protein AlignmentCloudera, Inc.
 
Tech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big DataTech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big DataSteve Watt
 
Big Data Expo 2015 - Clusterpoint The Future of Big Data
Big Data Expo 2015 - Clusterpoint The Future of Big DataBig Data Expo 2015 - Clusterpoint The Future of Big Data
Big Data Expo 2015 - Clusterpoint The Future of Big DataBigDataExpo
 
IBM Smart Camp: Philippe Souidi on Big Data
IBM Smart Camp: Philippe Souidi on Big DataIBM Smart Camp: Philippe Souidi on Big Data
IBM Smart Camp: Philippe Souidi on Big DataPhilippe Souidi
 
Big Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend StoryBig Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend StoryAmazon Web Services
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 

Ähnlich wie Big Data (20)

Addressing dm-cloud
Addressing dm-cloudAddressing dm-cloud
Addressing dm-cloud
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
Predictive modelling with azure ml
Predictive modelling with azure mlPredictive modelling with azure ml
Predictive modelling with azure ml
 
The Walking Data
The Walking DataThe Walking Data
The Walking Data
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
State of Big Data Markets
State of Big Data MarketsState of Big Data Markets
State of Big Data Markets
 
Big Data Basic Concepts | Presented in 2014
Big Data Basic Concepts  | Presented in 2014Big Data Basic Concepts  | Presented in 2014
Big Data Basic Concepts | Presented in 2014
 
Where does hadoop come handy
Where does hadoop come handyWhere does hadoop come handy
Where does hadoop come handy
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
 
Big Data & Data Mining
Big Data & Data MiningBig Data & Data Mining
Big Data & Data Mining
 
Hw09 Protein Alignment
Hw09   Protein AlignmentHw09   Protein Alignment
Hw09 Protein Alignment
 
Tech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big DataTech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big Data
 
Galaxy of bits
Galaxy of bitsGalaxy of bits
Galaxy of bits
 
Big Data Expo 2015 - Clusterpoint The Future of Big Data
Big Data Expo 2015 - Clusterpoint The Future of Big DataBig Data Expo 2015 - Clusterpoint The Future of Big Data
Big Data Expo 2015 - Clusterpoint The Future of Big Data
 
IBM Smart Camp: Philippe Souidi on Big Data
IBM Smart Camp: Philippe Souidi on Big DataIBM Smart Camp: Philippe Souidi on Big Data
IBM Smart Camp: Philippe Souidi on Big Data
 
Big Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend StoryBig Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend Story
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Bigdata
Bigdata Bigdata
Bigdata
 

Kürzlich hochgeladen

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 

Kürzlich hochgeladen (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

Big Data

Hinweis der Redaktion

  1. We all live in Data Age ….. While data storage capacity has increased, the speed at which data is read is still very slow.. Amount of data that is publicly available is increasing at a very past pace..Big data[1][2] is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications
  2. Collosal amount of data is being generated, and this has changed things..
  3. In good old days, we were using RDMS to store and process this data…we used to bring data to processing units but now data is huge…2 technologies have made this possible..
  4. Characteristics of Big Data :Gartner defined 3 Vs : Volume, Velocity and varietyVeracity: Can this data be trusted ?Volume : Peta/Exa not TB ? Twitter alone around 7 TB of data every day, Facebook 10 TB, Google 20 PB every dayIn 2013: 200 million active users creating over 400 million Tweets each day.In 2011: Every day 200 million tweets, 10 million page book , reading this text will take 31 years In 2010: 65 million a dayIn 2009: 2 million tweets a dayVariety : Different sourcesValue : Meaningful
  5. What is the problem that solution solves?Technology overviewSpecific solutionChallenges in current implementation/solution if any?Advantages and DisadvantagesAny alternatives of the specific solutionWay forward for the technology/solution?(Optional)
  6. In defining big data, it’s also important to understand the mix of unstructured and multi-structured data that comprises the volume of information.Unstructured data comes from information that is not organized or easily interpreted by traditional databases or data models, and typically, it’s text-heavy. Metadata, Twitter tweets, and other social media posts are good examples of unstructured data.Multi-structured data refers to a variety of data formats and types and can be derived from interactions between people and machines, such as web applications or social networks. A great example is web log data, which includes a combination of text and visual images along with structured data like form or transactional information. 
  7. Characteristics of Big Data :Gartner defined 3 Vs : Volume, Velocity and varietyVeracity: Can this data be trusted ?Volume : Peta/Exa not TB ? Twitter alone around 7 TB of data every day, Facebook 10 TB, Google 20 PB every dayIn 2013: 200 million active users creating over 400 million Tweets each day.In 2011: Every day 200 million tweets, 10 million page book , reading this text will take 31 years In 2010: 65 million a dayIn 2009: 2 million tweets a dayVariety : Different sourcesValue : Meaningful
  8. Characteristics of Big Data :Gartner defined 3 Vs : Volume, Velocity and varietyVeracity: Can this data be trusted ?Volume : Peta/Exa not TB ? Twitter alone around 7 TB of data every day, Facebook 10 TB, Google 20 PB every dayIn 2013: 200 million active users creating over 400 million Tweets each day.In 2011: Every day 200 million tweets, 10 million page book , reading this text will take 31 years In 2010: 65 million a dayIn 2009: 2 million tweets a dayVariety : Different sourcesValue : Meaningful
  9. Characteristics of Big Data :Gartner defined 3 Vs : Volume, Velocity and varietyVeracity: Can this data be trusted ?Volume : Peta/Exa not TB ? Twitter alone around 7 TB of data every day, Facebook 10 TB, Google 20 PB every dayIn 2013: 200 million active users creating over 400 million Tweets each day.In 2011: Every day 200 million tweets, 10 million page book , reading this text will take 31 years In 2010: 65 million a dayIn 2009: 2 million tweets a dayVariety : Different sourcesValue : Meaningful
  10. Data Source : Data Repository (data persists) : Filter and Transform : Compute (Distributed Scale out system)Map Reduce is inevitable.1980: Impedance Mismatch problem : Row/Columns for Relational Databases Integration Mechanism ( Relational Dominance into the 2000s)1990: Object databases : 2000: Big Internet sites, Amazon , Google ( Traffic) Lots of trafficBigger boxes : Real limits, CostLot of little boxes, SQL was designed on single node system.Google: Big TableAmazon: DynamoNoSQL movement: term comes from Johan Oskarsson : san francisco --- London , proposed meetup (late 2000), twitter hashtag,Short unique, #nosql, (Twitter hashtag to advertise a single meeting)Data Model:1. Key-Value: 2. Document Data model : JSON ( No schema), portions of documents, 3. Column Family : Single Row key having multiple column families, where each column family is aggregate of columsn which fit together.Aggregate is about storing all related items in 1cluster.
  11. 1980: Impedance Mismatch problem : Row/Columns for Relational Databases Integration Mechanism ( Relational Dominance into the 2000s)1990: Object databases : 2000: Big Internet sites, Amazon , Google ( Traffic) Lots of trafficBigger boxes : Real limits, CostLot of little boxes, SQL was designed on single node system.Google: Big TableAmazon: DynamoNoSQL movement: term comes from Johan Oskarsson : san francisco --- London , proposed meetup (late 2000), twitter hashtag,Short unique, #nosql, (Twitter hashtag to advertise a single meeting)Data Model:1. Key-Value: 2. Document Data model : JSON ( No schema), portions of documents, 3. Column Family : Single Row key having multiple column families, where each column family is aggregate of columsn which fit together.Aggregate is about storing all related items in 1cluster.
  12. HadoopMapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.
  13. The Mapper implementation (lines 14-26), via the map method (lines 18-25), processes one line at a time, as provided by the specified TextInputFormat (line 49). It then splits the line into tokens separated by whitespaces, via the StringTokenizer, and emits a key-value pair of < <word>, 1>.The Reducer implementation (lines 28-36), via the reduce method (lines 29-35) just sums up the values, which are the occurence counts for each key (i.e. words in this example).
  14. 1980: Impedance Mismatch problem : Row/Columns for Relational Databases Integration Mechanism ( Relational Dominance into the 2000s)1990: Object databases : 2000: Big Internet sites, Amazon , Google ( Traffic) Lots of trafficBigger boxes : Real limits, CostLot of little boxes, SQL was designed on single node system.Google: Big TableAmazon: DynamoNoSQL movement: term comes from Johan Oskarsson : san francisco --- London , proposed meetup (late 2000), twitter hashtag,Short unique, #nosql, (Twitter hashtag to advertise a single meeting)Data Model:1. Key-Value: 2. Document Data model : JSON ( No schema), portions of documents, 3. Column Family : Single Row key having multiple column families, where each column family is aggregate of columsn which fit together.Aggregate is about storing all related items in 1cluster.