SlideShare a Scribd company logo
1 of 20
BIG DATA
Presented By,
R.S.M.N.PRASAD.
(pvpsit)
OUTLOOK
 Introduction
 Hadoop
 MapReduce
 Hyper Table
 Advantages
BIG DATA
• The data comes from everywhere: sensors used to
gather climate information, posts to social media sites,
digital pictures and videos, purchase transaction records,
and cell phone GPS signals to name a few. This data
is called Big Data.
• Every day, we create 2.5 quintillion bytes (one quintillion
bytes = one billion gigabytes). Of all data, so much of
90% of the data in the world today has been created in
the last two years alone.
IN FACT, IN A MINUTE…
• Email users send more than 204 million messages;
• Mobile Web receives 217 new users;
• Google receives over 2 million search queries;
• YouTube users upload 48 hours of new video;
• Facebook users share 684,000 bits of content;
• Twitter users send more than 100,000 tweets;
• Consumers spend $272,000 on Web shopping;
• Apple receives around 47,000 application downloads;
• Brands receive more than 34,000 Facebook 'likes';
• Tumblr blog owners publish 27,000 new posts;
• Instagram users share 3,600 new photos;
• Flickr users , on the other hand , add 3,125 new photos;
• Foursquare users perform 2,000 check-ins;
• WordPress users publish close to 350 new blog posts.
Big Data Vectors
• High-volume:
Amount of data
• High-velocity:
Speed rate in collecting or acquiring or generating or
processing of data
• High-variety:
Different data type such as audio, video, image data
Big Data = Transactions + Interactions + Observations
What is Hadoop?
• HADOOP
High-availability distributed object-oriented platform or
“Hadoop” is a software framework which analyze structured
and unstructured data and distribute applications on different
servers.
• Basic Application of Hadoop
Hadoop is used in maintaining, scaling, error handling,
self healing and securing large scale of data. These data can
be structured or unstructured. What I mean to say is if data is
large then traditional systems are unable to handle it.
HADOOP
DIFFERENT COMPONENTS ARE..........
Data Access Components :- PIG & HIVE
Data Storage Components :- HBASE
Data Integration Components :- APACHEFLUME ,SQOOP, CHUKWA.
Data Management Components :- AMBARI , ZOOKEEPER.
Data Serialization Components :- THRIFT & AVRO
Data Intelligence Components :- APACHE MAHOUT, DRILL
What does it do?
• Hadoop implements Google’s MapReduce, using
HDFS
• MapReduce divides applications into many small
blocks of work.
• HDFS creates multiple replicas of data blocks for
reliability, placing them on compute nodes
around the cluster.
• MapReduce can then process the data where it
is located.
• Hadoop ‘s target is to run on clusters of the order
of 10,000-nodes.
How does MapReduce work?
• The run time partitions the input and provides it
to different Map instances;
• Map (key, value)  (key’, value’)
• The run time collects the (key’, value’) pairs and
distributes them to several Reduce functions so
that each Reduce function gets the pairs with the
same key’.
• Each Reduce produces a single (or zero) file
output.
• Map and Reduce are user written functions.
HYPERTABLE
What is it?
• Open source Big table clone
• Manages massive sparse tables with timestamped cell
versions
• Single primary key index
What is it not?
• No joins
• No secondary indexes (not yet)
• No transactions (not yet)
SCALING
TABLE: VISUAL REPRESENTATION
TABLE: ACTUAL REPRESENTATION
SYSTEM OVERVIEW
RANGE SERVER
• Manages ranges of table data
• Caches updates in memory (Cell Cache)
• Periodically spills (compacts) cached updates to disk (CellStore)
PERFORMANCE OPTIMIZATIONS
Block Cache
• Caches CellStore blocks
• Blocks are cached uncompressed
Bloom Filter
• Avoids unnecessary disk access
• Filter by rows or rows + columns
• Configurable false positive rate
Access Groups
• Physically store co-accessed columns together
• Improves performance by minimizing I/O
ADVANTAGES
• Flexible : Easily to access Structured & Unstructured
Data
• Scalable: It can store & distributed very large data , sets
100’s of inexpensive Servers that Operate in Parallel.
• Efficient: By distributing the data, it can process it in
parallel on the nodes where the data is located.
• Resistant to Failure: It automatically maintains
multiple copies of data and automatically redeploys
computing tasks based on failures.
QUERIES????
Big data

More Related Content

What's hot

Bridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architectureBridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architecture
IBM Analytics
 

What's hot (20)

Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
 
Hadoop
HadoopHadoop
Hadoop
 
BlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for Hadoop
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real World
 
Apache hadoop
Apache hadoopApache hadoop
Apache hadoop
 
Microsoft and Revolution Analytics -- what's the add-value? 20150629
Microsoft and Revolution Analytics -- what's the add-value? 20150629Microsoft and Revolution Analytics -- what's the add-value? 20150629
Microsoft and Revolution Analytics -- what's the add-value? 20150629
 
Bridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architectureBridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architecture
 
Introducing Direct Database Access with Snowflake + Intrinio
Introducing Direct Database Access with Snowflake + IntrinioIntroducing Direct Database Access with Snowflake + Intrinio
Introducing Direct Database Access with Snowflake + Intrinio
 
Managed Cluster Services
Managed Cluster ServicesManaged Cluster Services
Managed Cluster Services
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
Oracle digital days sept 2016 v1 (1)
Oracle digital days sept 2016 v1 (1)Oracle digital days sept 2016 v1 (1)
Oracle digital days sept 2016 v1 (1)
 
Big data edel
Big data edelBig data edel
Big data edel
 
Data Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the CloudData Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the Cloud
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
Cortana Analytics Suite
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics Suite
 
Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & Applications
 
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
 
Azure Con Cortana Analytics Suite
Azure Con Cortana Analytics Suite Azure Con Cortana Analytics Suite
Azure Con Cortana Analytics Suite
 

Viewers also liked

Spring 2013 Issue of "Florida Libraries"
Spring 2013 Issue of "Florida Libraries" Spring 2013 Issue of "Florida Libraries"
Spring 2013 Issue of "Florida Libraries"
Maria Gebhardt
 
Kişilik
KişilikKişilik
Kişilik
tanerimx
 
Instructivo aprendiz sena (2)
Instructivo aprendiz sena (2)Instructivo aprendiz sena (2)
Instructivo aprendiz sena (2)
daserr
 
Evaluation of Developing Electronic Sports Business in Chinese Game Market - ...
Evaluation of Developing Electronic Sports Business in Chinese Game Market - ...Evaluation of Developing Electronic Sports Business in Chinese Game Market - ...
Evaluation of Developing Electronic Sports Business in Chinese Game Market - ...
Teng Ma
 
PRINCE PPT
PRINCE PPTPRINCE PPT
PRINCE PPT
Prince _
 

Viewers also liked (12)

Презентация IP АТС MyAsterisk
Презентация IP АТС MyAsteriskПрезентация IP АТС MyAsterisk
Презентация IP АТС MyAsterisk
 
10 tips bij het kiezen van een coach
10 tips bij het kiezen van een coach10 tips bij het kiezen van een coach
10 tips bij het kiezen van een coach
 
Automação e Análise da Inserção de falhas Single Event Transient em Circuitos...
Automação e Análise da Inserção de falhas Single Event Transient em Circuitos...Automação e Análise da Inserção de falhas Single Event Transient em Circuitos...
Automação e Análise da Inserção de falhas Single Event Transient em Circuitos...
 
Spring 2013 Issue of "Florida Libraries"
Spring 2013 Issue of "Florida Libraries" Spring 2013 Issue of "Florida Libraries"
Spring 2013 Issue of "Florida Libraries"
 
Lettera di presentazione carmine pucino
Lettera di presentazione carmine pucinoLettera di presentazione carmine pucino
Lettera di presentazione carmine pucino
 
How to sell online/Pudra.ru
How to sell online/Pudra.ruHow to sell online/Pudra.ru
How to sell online/Pudra.ru
 
Kişilik
KişilikKişilik
Kişilik
 
Instructivo aprendiz sena (2)
Instructivo aprendiz sena (2)Instructivo aprendiz sena (2)
Instructivo aprendiz sena (2)
 
кешубаева әсел+экостройсервис+качество и цена
кешубаева әсел+экостройсервис+качество и ценакешубаева әсел+экостройсервис+качество и цена
кешубаева әсел+экостройсервис+качество и цена
 
Evaluation of Developing Electronic Sports Business in Chinese Game Market - ...
Evaluation of Developing Electronic Sports Business in Chinese Game Market - ...Evaluation of Developing Electronic Sports Business in Chinese Game Market - ...
Evaluation of Developing Electronic Sports Business in Chinese Game Market - ...
 
PRINCE PPT
PRINCE PPTPRINCE PPT
PRINCE PPT
 
Complications of Regional Anesthesia
Complications of Regional AnesthesiaComplications of Regional Anesthesia
Complications of Regional Anesthesia
 

Similar to Big data

Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
David Smelker
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Mohit Tare
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsight
Tillmann Eitelberg
 

Similar to Big data (20)

Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
bigdata.pptx
bigdata.pptxbigdata.pptx
bigdata.pptx
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
Big data and hadoop introduction
Big data and hadoop introductionBig data and hadoop introduction
Big data and hadoop introduction
 
bigdata.pdf
bigdata.pdfbigdata.pdf
bigdata.pdf
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Getting started with big data in Azure HDInsight
Getting started with big data in Azure HDInsightGetting started with big data in Azure HDInsight
Getting started with big data in Azure HDInsight
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
 
Big data analysis using hadoop cluster
Big data analysis using hadoop clusterBig data analysis using hadoop cluster
Big data analysis using hadoop cluster
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
 
Big data technology
Big data technology Big data technology
Big data technology
 
Big Data
Big DataBig Data
Big Data
 
Big Data
Big DataBig Data
Big Data
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsight
 

Recently uploaded

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 

Recently uploaded (20)

Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 

Big data

  • 2. OUTLOOK  Introduction  Hadoop  MapReduce  Hyper Table  Advantages
  • 3. BIG DATA • The data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is called Big Data. • Every day, we create 2.5 quintillion bytes (one quintillion bytes = one billion gigabytes). Of all data, so much of 90% of the data in the world today has been created in the last two years alone.
  • 4. IN FACT, IN A MINUTE… • Email users send more than 204 million messages; • Mobile Web receives 217 new users; • Google receives over 2 million search queries; • YouTube users upload 48 hours of new video; • Facebook users share 684,000 bits of content; • Twitter users send more than 100,000 tweets; • Consumers spend $272,000 on Web shopping; • Apple receives around 47,000 application downloads; • Brands receive more than 34,000 Facebook 'likes'; • Tumblr blog owners publish 27,000 new posts; • Instagram users share 3,600 new photos; • Flickr users , on the other hand , add 3,125 new photos; • Foursquare users perform 2,000 check-ins; • WordPress users publish close to 350 new blog posts.
  • 5. Big Data Vectors • High-volume: Amount of data • High-velocity: Speed rate in collecting or acquiring or generating or processing of data • High-variety: Different data type such as audio, video, image data Big Data = Transactions + Interactions + Observations
  • 6. What is Hadoop? • HADOOP High-availability distributed object-oriented platform or “Hadoop” is a software framework which analyze structured and unstructured data and distribute applications on different servers. • Basic Application of Hadoop Hadoop is used in maintaining, scaling, error handling, self healing and securing large scale of data. These data can be structured or unstructured. What I mean to say is if data is large then traditional systems are unable to handle it.
  • 8. DIFFERENT COMPONENTS ARE.......... Data Access Components :- PIG & HIVE Data Storage Components :- HBASE Data Integration Components :- APACHEFLUME ,SQOOP, CHUKWA. Data Management Components :- AMBARI , ZOOKEEPER. Data Serialization Components :- THRIFT & AVRO Data Intelligence Components :- APACHE MAHOUT, DRILL
  • 9. What does it do? • Hadoop implements Google’s MapReduce, using HDFS • MapReduce divides applications into many small blocks of work. • HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. • MapReduce can then process the data where it is located. • Hadoop ‘s target is to run on clusters of the order of 10,000-nodes.
  • 10. How does MapReduce work? • The run time partitions the input and provides it to different Map instances; • Map (key, value)  (key’, value’) • The run time collects the (key’, value’) pairs and distributes them to several Reduce functions so that each Reduce function gets the pairs with the same key’. • Each Reduce produces a single (or zero) file output. • Map and Reduce are user written functions.
  • 11. HYPERTABLE What is it? • Open source Big table clone • Manages massive sparse tables with timestamped cell versions • Single primary key index What is it not? • No joins • No secondary indexes (not yet) • No transactions (not yet)
  • 16. RANGE SERVER • Manages ranges of table data • Caches updates in memory (Cell Cache) • Periodically spills (compacts) cached updates to disk (CellStore)
  • 17. PERFORMANCE OPTIMIZATIONS Block Cache • Caches CellStore blocks • Blocks are cached uncompressed Bloom Filter • Avoids unnecessary disk access • Filter by rows or rows + columns • Configurable false positive rate Access Groups • Physically store co-accessed columns together • Improves performance by minimizing I/O
  • 18. ADVANTAGES • Flexible : Easily to access Structured & Unstructured Data • Scalable: It can store & distributed very large data , sets 100’s of inexpensive Servers that Operate in Parallel. • Efficient: By distributing the data, it can process it in parallel on the nodes where the data is located. • Resistant to Failure: It automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures.