SlideShare ist ein Scribd-Unternehmen logo
1 von 25
 Big Data refers to massive, often unstructured data that is beyond
the processing capabilities of traditional data management tools.
 Big Data can take up terabytes and petabytes of storage space in
diverse formats including text, video, sound, images etc.
 Traditional relational database management systems cannot deal
with such large masses of data.
 Examples : User updates over fb.
Clicks over the internet.
 Volume refers to huge amount of data
being generated every minute.
 90% of the data we have now is created in
just past 2 years.
 IP traffic by 2015 would turn 4X than what
it is now.
 3 billion people would be online by 2015 .
 Velocity refers to SPEED at which new
data is being generated and moves around.
 It includes Real time working systems
such as Online banking.
 Need of low response time.
 Technology “In-Memory Analytics” is
employed to deal with data in motion.
 Variety refers to various datatypes
which we can now use.
 Earlier focus was on neat and
structured data kept in form of tables in
RDBMS.
 80% of data available now is
unstructured data
 Datatypes are anomalous varying from
text to videos to audios to pictures.
Transform problems into possibilities
 It is the process of examining large amounts of data of a variety of
types (big data) to uncover hidden patterns, unknown correlations
and other real- time insights.
 Use of Big Data Analytics – Google Search recommendations,
Satyamev jayte, Genes reading
Data Mining Big data Analytics
Data constraints like data
must be neat and clean
 Big data can not be neat as
it is unstructured
 Elaborate ETL required
thus have to wait for
completion of ETL cycle for
insights.
 Big data analytics provide
real – time insights.
 Descriptive
 Diagnostic
 Predictive
 Prescriptive
 Relational databases failed to store and process Big Data.
 As a result, a new class of big data technology has emerged and is
being used in many big data analytics environments.
 The technologies associated with big data analytics include
 Hadoop
 Mapreduce
 NoSQL
 Hadoop is a open source framework
 Java-based programming framework
 Processing and storing of large data sets
 Distributed computing environment.
 Components of hadoop
 HDFS( hadoop distributed
file system)
 Mapreduce
 HDFS stores data in DISTRIBUTED,SCALABLE and FAULT-
TOLERANT WAY.
 Name node have metadata about data on DataNodes
 DataNodes actually have data on them in form of blocks and
they are capable of communicating
Hadoop SQL
 Data is stored in
form of compressed
files across n number
of commodity servers
 Data is stored in
form of tables and
columns with
relation in them
 Fault tolerant – if
one node fails ,system
still work
 If any one node
crashes ,it gives error
so as to maintain
consistency
Any questions ???...
 Copying same file over all (thousands) of nodes ?
doesn’t it seem like wastage of space !
 It actually is not a waste memory, because of 2 reasons:
 If one node failed ,System would still work as data is
never lost.
 The query is scaled over nodes so it bring about faster
results due to parallel processing
eg- Select the count of word ‘happy’ on twitter.
The query is split across multiple servers with a criteria
(here months), and the results are consolidated.
 MapReduce is a programming model designed for processing
large volumes of data in parallel by dividing the work into a set of
independent tasks.
as in previous example twitter data was processed on
different servers on basis of months .
 Hadoop is the physical implementation of Mapreduce .
 It is combination of 2 java functions : Mapper() and Reducer()
 example: to check popularity of text.
use of word-count..
 Mapper function maps the split files and provide input to reducer
 Mapper ( filename , file –contents):
for each word in file-contents:
emit (word , 1)
 Reducer function clubs the input provided by mapper and
produce output
 Reducer ( word , values):
sum=0;
for each value in values:
sum=sum + value
emit(word , sum)
can anyone think of any disadvantages??..
 There were 2 major disadvantages when hadoop was developed
which now have been dissolved
 HDFS dependency on single Namenode
solution: A secondary Namenode is attached to Primary
Namenode
 MapReduce is a java fraamework and did not support sql
queries
solution: Facebook developed HIVE which allowed scientists
work with sql on distributed database.
 Not only SQL
 Non- relational database management system
 Used where no fix schemas are required and data is scaled
horizontally.
 4 Categories of Nosql databases:
 Key-value pair
 Columnar database
 Graph databases
 Document databases
 KEY-VALUE PAIR
 keys used to get
Value from opaque
Data blocks
 Hash map
 Tremendously fast
Drawback:
No provision for content based queries .
 DOCUMENT DATABASE
• Again a key value store but value is in
form of document.
• Documents are not of fixed schemas
• documents can be nested
• Queries based on content as well as
keys
• Use cases: blogging websites
 COLUMNAR DATABASE
 Works on attributes rather
than tuples
 Key here is column name
and value is contiguous
column values
 Best for aggregation
queries
 Trend : select (1 or 2
column’s values ) where (
same or the other column
value ) = some value.
 GRAPH DATABASES
• Is a collection of nodes
and edges
• Nodes represent data
while edge represent
link between them
• Most dynamic and
flexible
 Websites :
• http://searchbusinessanalytics.techtarget.com/
Experts sound off on big data , Analytics and its tools
• http://www.ibmbigdatahub.com/infographic/four-vs-big-data
Big data and analytics hub
• https://bigdatauniversity.com/bdu-wp/bdu-course/hadoop-
fundamentals-i-version-3/
Hadoop fundamentals
Research papers :
•MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat
Appeared in:
OSDI'04: Sixth Symposium on Operating System Design
San Francisco, CA, December, 2004.
Data is the new oil
Without Big data analysis companies are deaf
and dumb , mere wanderers on web ... Like a
cattle on the highway !
Thank you !
Keep dreaming BIG :D

Weitere ähnliche Inhalte

Was ist angesagt?

Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellKhalid Imran
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeSysfore Technologies
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data HadoopApache Apex
 
Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detikk4ndar
 
Comparison with Traditional databases
Comparison with Traditional databasesComparison with Traditional databases
Comparison with Traditional databasesGowriLatha1
 
Big Data with SQL Server
Big Data with SQL ServerBig Data with SQL Server
Big Data with SQL ServerMark Kromer
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
Data Archiving and Processing
Data Archiving and ProcessingData Archiving and Processing
Data Archiving and ProcessingCRRC-Armenia
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)SahilRaina21
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asiaMuhammad Rifqi
 

Was ist angesagt? (20)

Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
 
Big data
Big dataBig data
Big data
 
Big data landscape
Big data landscapeBig data landscape
Big data landscape
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detik
 
Comparison with Traditional databases
Comparison with Traditional databasesComparison with Traditional databases
Comparison with Traditional databases
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Big Data with SQL Server
Big Data with SQL ServerBig Data with SQL Server
Big Data with SQL Server
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Digital data
Digital dataDigital data
Digital data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Digital Types
Digital TypesDigital Types
Digital Types
 
Data Archiving and Processing
Data Archiving and ProcessingData Archiving and Processing
Data Archiving and Processing
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)
 
Data Engineering Basics
Data Engineering BasicsData Engineering Basics
Data Engineering Basics
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 

Ähnlich wie Big data analytics: Technology's bleeding edge

Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014Stratebi
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopMr. Ankit
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation Shivanee garg
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With HadoopUmair Shafique
 
Google Data Engineering.pdf
Google Data Engineering.pdfGoogle Data Engineering.pdf
Google Data Engineering.pdfavenkatram
 
Data Engineering on GCP
Data Engineering on GCPData Engineering on GCP
Data Engineering on GCPBlibBlobb
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with HadoopNalini Mehta
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataVipin Batra
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvewKunal Khanna
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelEditor IJCATR
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 

Ähnlich wie Big data analytics: Technology's bleeding edge (20)

Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Big Data
Big DataBig Data
Big Data
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With Hadoop
 
Google Data Engineering.pdf
Google Data Engineering.pdfGoogle Data Engineering.pdf
Google Data Engineering.pdf
 
Data Engineering on GCP
Data Engineering on GCPData Engineering on GCP
Data Engineering on GCP
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Big data
Big dataBig data
Big data
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 

Kürzlich hochgeladen

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Kürzlich hochgeladen (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Big data analytics: Technology's bleeding edge

  • 1.
  • 2.
  • 3.  Big Data refers to massive, often unstructured data that is beyond the processing capabilities of traditional data management tools.  Big Data can take up terabytes and petabytes of storage space in diverse formats including text, video, sound, images etc.  Traditional relational database management systems cannot deal with such large masses of data.  Examples : User updates over fb. Clicks over the internet.
  • 4.  Volume refers to huge amount of data being generated every minute.  90% of the data we have now is created in just past 2 years.  IP traffic by 2015 would turn 4X than what it is now.  3 billion people would be online by 2015 .
  • 5.  Velocity refers to SPEED at which new data is being generated and moves around.  It includes Real time working systems such as Online banking.  Need of low response time.  Technology “In-Memory Analytics” is employed to deal with data in motion.
  • 6.  Variety refers to various datatypes which we can now use.  Earlier focus was on neat and structured data kept in form of tables in RDBMS.  80% of data available now is unstructured data  Datatypes are anomalous varying from text to videos to audios to pictures.
  • 7. Transform problems into possibilities
  • 8.  It is the process of examining large amounts of data of a variety of types (big data) to uncover hidden patterns, unknown correlations and other real- time insights.  Use of Big Data Analytics – Google Search recommendations, Satyamev jayte, Genes reading Data Mining Big data Analytics Data constraints like data must be neat and clean  Big data can not be neat as it is unstructured  Elaborate ETL required thus have to wait for completion of ETL cycle for insights.  Big data analytics provide real – time insights.
  • 9.  Descriptive  Diagnostic  Predictive  Prescriptive
  • 10.  Relational databases failed to store and process Big Data.  As a result, a new class of big data technology has emerged and is being used in many big data analytics environments.  The technologies associated with big data analytics include  Hadoop  Mapreduce  NoSQL
  • 11.  Hadoop is a open source framework  Java-based programming framework  Processing and storing of large data sets  Distributed computing environment.  Components of hadoop  HDFS( hadoop distributed file system)  Mapreduce
  • 12.  HDFS stores data in DISTRIBUTED,SCALABLE and FAULT- TOLERANT WAY.  Name node have metadata about data on DataNodes  DataNodes actually have data on them in form of blocks and they are capable of communicating
  • 13. Hadoop SQL  Data is stored in form of compressed files across n number of commodity servers  Data is stored in form of tables and columns with relation in them  Fault tolerant – if one node fails ,system still work  If any one node crashes ,it gives error so as to maintain consistency Any questions ???...
  • 14.  Copying same file over all (thousands) of nodes ? doesn’t it seem like wastage of space !  It actually is not a waste memory, because of 2 reasons:  If one node failed ,System would still work as data is never lost.  The query is scaled over nodes so it bring about faster results due to parallel processing eg- Select the count of word ‘happy’ on twitter. The query is split across multiple servers with a criteria (here months), and the results are consolidated.
  • 15.  MapReduce is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks. as in previous example twitter data was processed on different servers on basis of months .  Hadoop is the physical implementation of Mapreduce .  It is combination of 2 java functions : Mapper() and Reducer()  example: to check popularity of text. use of word-count..
  • 16.
  • 17.  Mapper function maps the split files and provide input to reducer  Mapper ( filename , file –contents): for each word in file-contents: emit (word , 1)  Reducer function clubs the input provided by mapper and produce output  Reducer ( word , values): sum=0; for each value in values: sum=sum + value emit(word , sum) can anyone think of any disadvantages??..
  • 18.  There were 2 major disadvantages when hadoop was developed which now have been dissolved  HDFS dependency on single Namenode solution: A secondary Namenode is attached to Primary Namenode  MapReduce is a java fraamework and did not support sql queries solution: Facebook developed HIVE which allowed scientists work with sql on distributed database.
  • 19.  Not only SQL  Non- relational database management system  Used where no fix schemas are required and data is scaled horizontally.  4 Categories of Nosql databases:  Key-value pair  Columnar database  Graph databases  Document databases
  • 20.  KEY-VALUE PAIR  keys used to get Value from opaque Data blocks  Hash map  Tremendously fast Drawback: No provision for content based queries .
  • 21.  DOCUMENT DATABASE • Again a key value store but value is in form of document. • Documents are not of fixed schemas • documents can be nested • Queries based on content as well as keys • Use cases: blogging websites
  • 22.  COLUMNAR DATABASE  Works on attributes rather than tuples  Key here is column name and value is contiguous column values  Best for aggregation queries  Trend : select (1 or 2 column’s values ) where ( same or the other column value ) = some value.
  • 23.  GRAPH DATABASES • Is a collection of nodes and edges • Nodes represent data while edge represent link between them • Most dynamic and flexible
  • 24.  Websites : • http://searchbusinessanalytics.techtarget.com/ Experts sound off on big data , Analytics and its tools • http://www.ibmbigdatahub.com/infographic/four-vs-big-data Big data and analytics hub • https://bigdatauniversity.com/bdu-wp/bdu-course/hadoop- fundamentals-i-version-3/ Hadoop fundamentals Research papers : •MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat Appeared in: OSDI'04: Sixth Symposium on Operating System Design San Francisco, CA, December, 2004.
  • 25. Data is the new oil Without Big data analysis companies are deaf and dumb , mere wanderers on web ... Like a cattle on the highway ! Thank you ! Keep dreaming BIG :D