SlideShare ist ein Scribd-Unternehmen logo
1 von 9
Prepared By
Dr. swarnalatha K.S
Professor, Dept. of ISE
Nitte Meenakshoi Institute of Technology
Bangalore
 Big data is a class of problems that challenge existing
IT and computing technology and existing algorithms.
 Traditionally, big data is defined as a big volume of
data (in excess of 1 terabyte) generated at high velocity
with high variety and veracity. That is, big data is
identified using 4 Vs, namely, volume, velocity,
variety, and veracity which are defined as follows:
 Volume is the size of the data that an organization holds.
Typically, this can run into several petabytes (1015 bytes) or
exabytes (1016 bytes). Organizations such as telecom and banking
collect and store a large quantity of customer data. Data
collected using satellite and other machine generated data such
as data generated by health and usage monitoring systems
fitted in aircrafts,weather and rain monitoring systems can run
into several exabytes since the data is captured minute by
minute.
 Velocity is the rate at which the data is generated.
For example, AT&T customers generated more than
82 petabytes of data traffic on a daily basis (Anon,
2016).
 Variety refers to the different types of data collected.
In the case of telecom, the different data types are
voice calls, messages in different languages, video
calls, use of Apps, etc.
 Veracity of the data refers to the quality issues.
Especially in social media there could be biased or
incorrect data, which can result in wrong
inferences.
 Big data is mostly misused terminology since a large proportion of
analytics projects are not big data projects, in the sense that they
do not challenge the existing computing technology and
algorithms.
 Many companies see big data as a technology problem. Although
it is true that technology is a constraint while addressing big data
problems, a true big data problem challenges the algorithms that
we have today, that is the algorithms are inadequate to solve these
problems within a reasonable time.
 However, many problems in predictive and prescriptive analytics
belongs to the big data category.When the problem associated
with the data can challenge the existing computing technology
due to its volume, velocity, and variety, then we have a big data
problem. For example, Google processes 24 petabytes of data
every day (Mayer-Schonberger and Cukier, 2013).
 Google was the first to exploit big data for
targeted advertising using clickstream data.
Google also predicted the spread of H1N1 flu
based on the search terms entered by Google
users (Ginsberg et al., 2009). According to
Richard Kellet, Director of Marketing at SAS,
we (human) created 500 times more data in the
last 10 years than what we had done prior to
that, since the beginning of humanity (Scott,
2013).
 Every B787 flight creates half a terabyte of
machine-generated data. For large banks, the
automatic teller machine (ATM) transactions
themselves will run into several billion
transactions per month. Telecom call data, social
media data, banking and financial transactions,
machine-generated data, and healthcare data are a
fewexamples of the sources of big data.
 Alternatively, any problem that can challenge the
existing computing power, IT systems, and/or
algorithms constitutes a big data problem. Big data
problems need innovative ideas in order to handle
the 4 Vs for deriving insights.
 The volume of the data generated is increasing
every day, and most of this data is user generated,
mainly from social media platforms, or machine
generated. With the increase in Internet
penetration and autonomous data capturing, the
velocity of data is also increasing at a faster rate.
 It is estimated that 2.5 exabyte of data is created
every day; this figure is likely to double in the near
future (Mayer-Schonberger and Cukier, 2013).
 As the velocity of the data increases, traditional
models such as regression and classification
techniques may become unstable and invalid for
analysis. The variety of data is another challenge
within big data.
 Even in the case of text mining, users may use
different languages within the same sentence,
especially in a country such as India, which is
home to hundreds of languages.
 Natural language processing (NLP), which is an
essential component of big data, is challenging
when multiple languages are used in the text data.
 Innovative parallel processing capabilities such as
Apache Hadoop (that comes with the ability to
process large-scale data sets in multiple clusters
using the Hadoop Distributed File System) and
MapReduce (which enables parallel processing of
large data sets) are used by organizations to
handle big data.
 Big data technologies are still in a nascent stage
and are far from maturity. However, these big
data technologies do provide better computing
power compared to existing technologies.

Weitere ähnliche Inhalte

Was ist angesagt?

Big data privacy issues in public social media
Big data privacy issues in public social mediaBig data privacy issues in public social media
Big data privacy issues in public social mediaSupriya Radhakrishna
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big DataMatthew Dennis
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Big data ppt
Big data pptBig data ppt
Big data pptYash Raj
 
Forecast of Big Data Trends
Forecast of Big Data TrendsForecast of Big Data Trends
Forecast of Big Data TrendsIMC Institute
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big datakk1718
 
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Edureka!
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataRichard Vidgen
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataHari Priya
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysisPoonam Kshirsagar
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its ChallengesKathirvel Ayyaswamy
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBernard Marr
 

Was ist angesagt? (20)

Big data privacy issues in public social media
Big data privacy issues in public social mediaBig data privacy issues in public social media
Big data privacy issues in public social media
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big Data
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Big data mining
Big data miningBig data mining
Big data mining
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data analysis
Big data analysisBig data analysis
Big data analysis
 
Forecast of Big Data Trends
Forecast of Big Data TrendsForecast of Big Data Trends
Forecast of Big Data Trends
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
 
Big data 101
Big data 101Big data 101
Big data 101
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Big data case study collection
Big data   case study collectionBig data   case study collection
Big data case study collection
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 

Ähnlich wie Bigdata analytics

Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU Love Arora
 
hariri2019.pdf
hariri2019.pdfhariri2019.pdf
hariri2019.pdfAkuhuruf
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Understand the Idea of Big Data and in Present Scenario
Understand the Idea of Big Data and in Present ScenarioUnderstand the Idea of Big Data and in Present Scenario
Understand the Idea of Big Data and in Present ScenarioAI Publications
 
Introduction to big data – convergences.
Introduction to big data – convergences.Introduction to big data – convergences.
Introduction to big data – convergences.saranya270513
 
Mining Big Data to Predicting Future
Mining Big Data to Predicting FutureMining Big Data to Predicting Future
Mining Big Data to Predicting FutureIJERA Editor
 
Governing Big Data : Principles and practices
Governing Big Data : Principles and practicesGoverning Big Data : Principles and practices
Governing Big Data : Principles and practicesPiyush Malik
 
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...AnthonyOtuonye
 
INN530 - Assignment 2, Big data and cloud computing for management
INN530 - Assignment 2, Big data and cloud computing for managementINN530 - Assignment 2, Big data and cloud computing for management
INN530 - Assignment 2, Big data and cloud computing for managementSimen Smaaberg
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)Shahbaz Anjam
 
big data Big Things
big data Big Thingsbig data Big Things
big data Big Thingspateelhs
 
A SURVEY OF BIG DATA ANALYTICS
A SURVEY OF BIG DATA ANALYTICSA SURVEY OF BIG DATA ANALYTICS
A SURVEY OF BIG DATA ANALYTICSijistjournal
 

Ähnlich wie Bigdata analytics (20)

Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU
 
hariri2019.pdf
hariri2019.pdfhariri2019.pdf
hariri2019.pdf
 
Big data
Big data Big data
Big data
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Understand the Idea of Big Data and in Present Scenario
Understand the Idea of Big Data and in Present ScenarioUnderstand the Idea of Big Data and in Present Scenario
Understand the Idea of Big Data and in Present Scenario
 
GADLJRIET850691
GADLJRIET850691GADLJRIET850691
GADLJRIET850691
 
Big data survey
Big data surveyBig data survey
Big data survey
 
Introduction to big data – convergences.
Introduction to big data – convergences.Introduction to big data – convergences.
Introduction to big data – convergences.
 
Mining Big Data to Predicting Future
Mining Big Data to Predicting FutureMining Big Data to Predicting Future
Mining Big Data to Predicting Future
 
Governing Big Data : Principles and practices
Governing Big Data : Principles and practicesGoverning Big Data : Principles and practices
Governing Big Data : Principles and practices
 
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
 
INN530 - Assignment 2, Big data and cloud computing for management
INN530 - Assignment 2, Big data and cloud computing for managementINN530 - Assignment 2, Big data and cloud computing for management
INN530 - Assignment 2, Big data and cloud computing for management
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
 
Big Data Challenges faced by Organizations
Big Data Challenges faced by OrganizationsBig Data Challenges faced by Organizations
Big Data Challenges faced by Organizations
 
Big Data.pdf
Big Data.pdfBig Data.pdf
Big Data.pdf
 
big data Big Things
big data Big Thingsbig data Big Things
big data Big Things
 
A SURVEY OF BIG DATA ANALYTICS
A SURVEY OF BIG DATA ANALYTICSA SURVEY OF BIG DATA ANALYTICS
A SURVEY OF BIG DATA ANALYTICS
 
On Big Data
On Big DataOn Big Data
On Big Data
 

Kürzlich hochgeladen

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 

Kürzlich hochgeladen (20)

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 

Bigdata analytics

  • 1. Prepared By Dr. swarnalatha K.S Professor, Dept. of ISE Nitte Meenakshoi Institute of Technology Bangalore
  • 2.  Big data is a class of problems that challenge existing IT and computing technology and existing algorithms.  Traditionally, big data is defined as a big volume of data (in excess of 1 terabyte) generated at high velocity with high variety and veracity. That is, big data is identified using 4 Vs, namely, volume, velocity, variety, and veracity which are defined as follows:  Volume is the size of the data that an organization holds. Typically, this can run into several petabytes (1015 bytes) or exabytes (1016 bytes). Organizations such as telecom and banking collect and store a large quantity of customer data. Data collected using satellite and other machine generated data such as data generated by health and usage monitoring systems fitted in aircrafts,weather and rain monitoring systems can run into several exabytes since the data is captured minute by minute.
  • 3.  Velocity is the rate at which the data is generated. For example, AT&T customers generated more than 82 petabytes of data traffic on a daily basis (Anon, 2016).  Variety refers to the different types of data collected. In the case of telecom, the different data types are voice calls, messages in different languages, video calls, use of Apps, etc.  Veracity of the data refers to the quality issues. Especially in social media there could be biased or incorrect data, which can result in wrong inferences.
  • 4.  Big data is mostly misused terminology since a large proportion of analytics projects are not big data projects, in the sense that they do not challenge the existing computing technology and algorithms.  Many companies see big data as a technology problem. Although it is true that technology is a constraint while addressing big data problems, a true big data problem challenges the algorithms that we have today, that is the algorithms are inadequate to solve these problems within a reasonable time.  However, many problems in predictive and prescriptive analytics belongs to the big data category.When the problem associated with the data can challenge the existing computing technology due to its volume, velocity, and variety, then we have a big data problem. For example, Google processes 24 petabytes of data every day (Mayer-Schonberger and Cukier, 2013).
  • 5.  Google was the first to exploit big data for targeted advertising using clickstream data. Google also predicted the spread of H1N1 flu based on the search terms entered by Google users (Ginsberg et al., 2009). According to Richard Kellet, Director of Marketing at SAS, we (human) created 500 times more data in the last 10 years than what we had done prior to that, since the beginning of humanity (Scott, 2013).
  • 6.  Every B787 flight creates half a terabyte of machine-generated data. For large banks, the automatic teller machine (ATM) transactions themselves will run into several billion transactions per month. Telecom call data, social media data, banking and financial transactions, machine-generated data, and healthcare data are a fewexamples of the sources of big data.  Alternatively, any problem that can challenge the existing computing power, IT systems, and/or algorithms constitutes a big data problem. Big data problems need innovative ideas in order to handle the 4 Vs for deriving insights.
  • 7.  The volume of the data generated is increasing every day, and most of this data is user generated, mainly from social media platforms, or machine generated. With the increase in Internet penetration and autonomous data capturing, the velocity of data is also increasing at a faster rate.  It is estimated that 2.5 exabyte of data is created every day; this figure is likely to double in the near future (Mayer-Schonberger and Cukier, 2013).  As the velocity of the data increases, traditional models such as regression and classification techniques may become unstable and invalid for analysis. The variety of data is another challenge within big data.
  • 8.  Even in the case of text mining, users may use different languages within the same sentence, especially in a country such as India, which is home to hundreds of languages.  Natural language processing (NLP), which is an essential component of big data, is challenging when multiple languages are used in the text data.  Innovative parallel processing capabilities such as Apache Hadoop (that comes with the ability to process large-scale data sets in multiple clusters using the Hadoop Distributed File System) and MapReduce (which enables parallel processing of large data sets) are used by organizations to handle big data.
  • 9.  Big data technologies are still in a nascent stage and are far from maturity. However, these big data technologies do provide better computing power compared to existing technologies.