SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Big Data Analytics
What Is Big Data Analytics?
● Big Data
– Buzz word
– Two definitions:
● Data sets too large for modern relational databases
● Semi-structured/Unstructured data sets
● Analytics
– The science of measuring and discovering patterns
and trends with data
Source: http://www.socialtalent.co/blog/big-data-whats-the-big-deal
Data, Data, Everywhere...
● In 2004:
– Internet traffic: 1 Exabyte (that's 134,217,728 8GB
flash drives)
– A lot of other media:
● Newspapers/books/magazines
● DVDs
Data, Data, Everywhere...
● Today:
– Internet traffic: 1.3 Zettabytes (that's
178,670,639,360 8 GB sticks)
● 110.3 exabytes per month
– Even more media:
● Mobile devices (phones/tablets/mp3 players/etc)
● The Internet of Things
● Streaming Media
The Internet of Things
● How many of you have...
– Fitness trackers?
– E-readers?
– Ipods?
● Tie them to social sites (i.e. Facebook)?
The Internet of Things
● You're being tracked!
● So what?
– Marketing
– Medical
– Government
● Building fuller picture of what's tracked.
Social Network Integration
Six Degrees of Separation
Source: http://www.83toinfinity.com
Source: http://www.math.cornell.edu/~numb3rs/blanco/social_net.jpg
Data Storage
Data Storage
● Relational Databases
– Structured data
– Can scale to huge volumes of data
● Hadoop
– Semi-structured/unstructured data
– Massively parallel storage and processing
Relational Database
Source: http://www.ntu.edu.sg/home/ehchua/programming/sql/images/ManyToOne.png
Unstructured Data
Source: http://storagegaga.com/2011/12/
Semi-structured
Source: http://www.stylusstudio.com/images/figures/sql_xml_xml_fragment.gif
What Solution to Pick?
● Data Volume and Speed
– Relational Databases Will Cap out
– ”Big Data” Stores Scale (For Now)
● Hadoop
● Spark
● Lucene
– Alternative Modeling Techniques
● Hyper Normalized (6-8NF)
– Inmon's Textual Disambiguation
– Anchor Modeling
– Data Vault
Hadoop
● Version 1
– Giant data store
– File distribution
– File parsing tools
– Generic security
● Version 2
– Giant data store
– Replaced foundation work
– Unified security -LDAP/Kerberos support
Tools
● Oozie
● Hive
● NoSQL Databases
– Hbase
– MongoDB
JSON
{
"employees": [
{ "firstName":"John" , "lastName":"Doe" },
{ "firstName":"Anna" , "lastName":"Smith" },
{ "firstName":"Peter" , "lastName":"Jones" }
]
}
Source: http://www.w3schools.com/json/json_syntax.asp
How to Analyze?
● Performance
● Timeliness
● Accuracy
● Feedback
“Big Data” Solutions
● Search the entire data set
● Great performance
● Highly accurate
● Integrates into Analytics tools
– Only some of the tools are able to support Hadoop,
etc.
Statistics
● Designed for all sizes of data sets
● Decreases time to results
● As accurate as needed
● Analytics tools fully support
● Most “Big Data” tools support
Analytics Tools
● Can access data of most sizes
– Most can handle Hadoop and some NoSQL
databases
● Built for Predictive Modeling
● Starting to handle social/network modeling
How to Get Started
● Grab some tools!
– RapidMiner (http://rapidminer.com/)
– R (http://www.r-project.org/)
– Weka (http://www.cs.waikato.ac.nz/ml/weka/)
● Grab some data!
– http://www.kdnuggets.com/datasets/index.html
– http://aws.amazon.com/publicdatasets/
– http://www.reddit.com/r/datasets
Prizes/Challenges
● Kaggle - https://www.kaggle.com/
● MIT - http://bigdata.csail.mit.edu/challenge
● Heritage Health Prize -
http://www.heritagehealthprize.com/c/hhp
● Twitter -
@OpenDataAlex
● LinkedIn –
alexmeadows
● Github - dbaAlex
Questions? Comments?

Weitere ähnliche Inhalte

Was ist angesagt?

Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big Data
Lewis Crawford
 
Bigdata
BigdataBigdata
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big Analytics
Ajay Ohri
 

Was ist angesagt? (20)

Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Tools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersTools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl Winters
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big Data
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data
 
Big Data
Big DataBig Data
Big Data
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
 
Bigdata
BigdataBigdata
Bigdata
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Big Data Hadoop
Big Data HadoopBig Data Hadoop
Big Data Hadoop
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challenges
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data Science
 
Exploring Big Data Analytics Tools
Exploring Big Data Analytics ToolsExploring Big Data Analytics Tools
Exploring Big Data Analytics Tools
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big Analytics
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Overview of Bigdata Analytics
Overview of Bigdata Analytics Overview of Bigdata Analytics
Overview of Bigdata Analytics
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview ppt
 

Andere mochten auch

Andere mochten auch (16)

Continuous integration with business intelligence and analytics
Continuous integration with business intelligence and analyticsContinuous integration with business intelligence and analytics
Continuous integration with business intelligence and analytics
 
Introduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceIntroduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data Science
 
4 Big Analytic Types That You Should Know By Wayne Chen
4 Big Analytic Types That You Should Know By Wayne Chen4 Big Analytic Types That You Should Know By Wayne Chen
4 Big Analytic Types That You Should Know By Wayne Chen
 
Learning Analytics Medea Webinar, part 1
Learning Analytics Medea Webinar, part 1Learning Analytics Medea Webinar, part 1
Learning Analytics Medea Webinar, part 1
 
Introduction to Big Data & Analytics
Introduction to Big Data & AnalyticsIntroduction to Big Data & Analytics
Introduction to Big Data & Analytics
 
G finals
G finals G finals
G finals
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Automated Testing vs Manual Testing
Automated Testing vs Manual TestingAutomated Testing vs Manual Testing
Automated Testing vs Manual Testing
 
Chemathlon 2016 finals
Chemathlon 2016 finalsChemathlon 2016 finals
Chemathlon 2016 finals
 
Chemathlon 2016
Chemathlon   2016Chemathlon   2016
Chemathlon 2016
 
Introduction to Test Automation
Introduction to Test AutomationIntroduction to Test Automation
Introduction to Test Automation
 
Introduction to Data Mining and Big Data Analytics
Introduction to Data Mining and Big Data AnalyticsIntroduction to Data Mining and Big Data Analytics
Introduction to Data Mining and Big Data Analytics
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
8 Ways to Personalize Your App (in Under 30 Minutes)
8 Ways to Personalize Your App (in Under 30 Minutes)8 Ways to Personalize Your App (in Under 30 Minutes)
8 Ways to Personalize Your App (in Under 30 Minutes)
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 

Ähnlich wie Big Data Analytics - Introduction

INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
Attila Barta
 
Big_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic backgroundBig_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic background
NidhiAhuja30
 

Ähnlich wie Big Data Analytics - Introduction (20)

What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big data
Big dataBig data
Big data
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big Data
 
Big data(1st presentation)
Big data(1st presentation)Big data(1st presentation)
Big data(1st presentation)
 
Big Data
Big Data Big Data
Big Data
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Hadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFSHadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFS
 
Big Data
Big DataBig Data
Big Data
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack
 
SKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSISSKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSIS
 
Big data
Big dataBig data
Big data
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 
Big_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic backgroundBig_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic background
 
Big data
Big dataBig data
Big data
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
 

Mehr von Alex Meadows

Open source data_warehousing_overview
Open source data_warehousing_overviewOpen source data_warehousing_overview
Open source data_warehousing_overview
Alex Meadows
 
Data quality overview
Data quality overviewData quality overview
Data quality overview
Alex Meadows
 
Open Source Business Intelligence Overview
Open Source Business Intelligence OverviewOpen Source Business Intelligence Overview
Open Source Business Intelligence Overview
Alex Meadows
 

Mehr von Alex Meadows (16)

Ethics In A Data Driven World
Ethics In A Data Driven WorldEthics In A Data Driven World
Ethics In A Data Driven World
 
SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data Warehousing
 
Continuous Integration As A Service
Continuous Integration As A ServiceContinuous Integration As A Service
Continuous Integration As A Service
 
Building next generation data warehouses
Building next generation data warehousesBuilding next generation data warehouses
Building next generation data warehouses
 
How Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryHow Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information Discovery
 
Graphing Your Data
Graphing Your DataGraphing Your Data
Graphing Your Data
 
Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To Analytics
 
Big Data Pitfalls
Big Data PitfallsBig Data Pitfalls
Big Data Pitfalls
 
Open Source BI Overview
Open Source BI Overview Open Source BI Overview
Open Source BI Overview
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business Intelligence
 
Open source data_warehousing_overview
Open source data_warehousing_overviewOpen source data_warehousing_overview
Open source data_warehousing_overview
 
Data quality overview
Data quality overviewData quality overview
Data quality overview
 
Mondrian and OLAP Overview
Mondrian and OLAP OverviewMondrian and OLAP Overview
Mondrian and OLAP Overview
 
Open Source Business Intelligence Overview
Open Source Business Intelligence OverviewOpen Source Business Intelligence Overview
Open Source Business Intelligence Overview
 
Choosing the right steps in pentaho kettle
Choosing the right steps in pentaho kettleChoosing the right steps in pentaho kettle
Choosing the right steps in pentaho kettle
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

Big Data Analytics - Introduction