SlideShare ist ein Scribd-Unternehmen logo
1 von 66
Downloaden Sie, um offline zu lesen
LECTURE L18
BIG DATA AND ANALYTICS
Data Gathering
1955 1960 1965
Social Security
Calculate Benefits for 15MM
Recipients (62MM Now)
NASA
Calculate Real-Time Orbital
Determination
IRS
Calculate / Store 55MM
Records (126MM Now)
Data Gathering in US 1950+
Source: Mary Meeker Slide Deck 2019
1955 1965 1975
Banks
Process Checks
Data Gathering in US 1950+
Source: Mary Meeker Slide Deck 2019
Telecoms
Optimise Telephone switching
Hospitals
Manage Patient Data
Airlines
Process transaction / data
Insurance
Optimise Insurance Policies
Retail
Track Inventory / Logistics
Credit Cards
Manage Merchant Network
Source: Mary Meeker Slide Deck 2019
Big Bangs in Data
2006
Amazon AWS
2007
Apple iPhone
Until now, a sophisticated & scalable data
storage infrastructure has been beyond
the reach of small developers. 

— Amazon S3 Launch FAQ, 2006
Why run such a sophisticated operating
system on a mobile device? Well,
because it’s got everything we need. 

— Steve Jobs, iPhone Launch, 2007
Source: Mary Meeker Slide Deck 2019
Growth of Data
Source: Mary Meeker Slide Deck 2019
Decline of Cost
Source: Mary Meeker Slide Deck 2019
Increasing Revenues
Source: Mary Meeker Slide Deck 2019
Source: Mary Meeker Slide Deck 2019
Where Does the Data Come From?
Where Does the Data Come From?
Source: Mary Meeker Slide Deck 2019
Where Does the Data Come From?
Source: Mary Meeker Slide Deck 2019
Big Data
Big Data
With the computer revolution, digital data becomes possible
Over the years, data has grown exponentially
“Big Data” has become a
platform by itself with new
possibilities
Global Data is Growing Fast
Data in Digital Universe vs. Data Storage Cost, 2010-2015
Source: Mary Meeker, KPCB
Evolution of Data Platform
Source: Mary Meeker, KPCB
Data is a New Growth Platform
The
Network
The

Software
The

Infrastructure
The

Data
Large investments in fibre optic & last-mile cable create connectivity
that facilitated the early Internet growth
Optimising the network with software became far more capital
efficient than additional capital expenditure buildouts, ultimately
resulting in the creation of pervasive networks (Siloed DCs -> AWS)
and pervasive software (Siebel -> Salesforce)
Emergence of pervasive software created the need to optimise the
performance of the network and store extraordinary amounts of data
at extremely low prices
Next Big Wave: Leveraging this unlimited connectivity and storage to
collect / aggregate / correlate / interpret all of this data to improve
people’s live and enable enterprises to operate more efficiently
Data Generators
Source: Mary Meeker, KPCB
“Data is moving from something you use
outside the workstream to becoming a part of
the business app itself.”
— Frank Bien, CEO of Looker
Improve people’s live and enable
enterprises to operate more efficiently
Big Data Examples
Big Data Examples
Macy's Inc. and real-time pricing
The retailer adjusts pricing in near-real time for 73 million
items, based on demand and inventory.
Source:Ten big data case studies in a nutshell
Big Data Examples
Tipp24 AG, a platform for placing bets
The company uses software to analyse billions of
transactions and hundreds of customer attributes, and to
develop predictive models that target customers and
personalise marketing messages on the fly.
Source:Ten big data case studies in a nutshell
Big Data Examples
Wal-Mart Stores Inc. and search
The mega-retailer's latest search engine for Walmart.com
includes semantic data. A platform that was designed in-
house, relies on text analysis, machine learning and even
synonym mining to produce relevant search results.
Wal-Mart says adding semantic search has improved
online shoppers completing a purchase by 10% to 15%.
Source:Ten big data case studies in a nutshell
Big Data Examples
PredPol Inc. and repurposing
The Los Angeles and Santa Cruz police departments, a
team of educators and a company called PredPol have
taken an algorithm used to predict earthquakes, tweaked it
and started feeding it crime data.
The software can predict where crimes are likely to occur
down to 500 square feet. In LA, there's been a 33%
reduction in burglaries and 21% reduction in violent crimes
in areas where the software is being used.
Source:Ten big data case studies in a nutshell
Big Data Examples
American Express and business intelligence
AmEx started looking for indicators that could really
predict loyalty and developed sophisticated predictive
models to analyse historical transactions and 115 variables
to forecast potential churn
The company believes it can now identify 24% of Australian
accounts that will close within the next four months
Source:Ten big data case studies in a nutshell
Big Data Examples
A Bank and IBM
A large US bank uses IBM machine learning technologies
to analyse credit card transactions.
Using machine learning and stream computing to detect financial fraud
TEDxUofM - Jameson Toole - Big Data for Tomorrow
What is Big Data?
What is Big Data?
Big data is high-volume, high-velocity and/or high-variety
information assets that demand cost-effective, innovative
forms of information processing that enable enhanced
insight, decision making, and process automation.
Gartner
What is Big Data?
Big data refers to a process that is used when traditional
data mining and handling techniques cannot uncover the
insights and meaning of the underlying data. Data that is
unstructured or time sensitive or simply very large cannot
be processed by relational database engines. This type of
data requires a different processing approach called big
data, which uses massive parallelism on readily-available
hardware.
Techopedia
“Big data is the oil of the 21st century and
analytics is the combustion engine.”
—Peter Sondergaard, Gartner Research
What is Big Data?
How do you measure numbers at large scale?
What is Big Data?
What is a Yottabyte?
1.000.000.000.000.000.000.000.000
What is a Yottabyte?
Byte: one rice
David Wellman: What is Big Data?
What is Big Data?
Byte: one rice

Kilobyte: handful of rice
David Wellman: What is Big Data?
What is Big Data?
Byte: one rice

Kilobyte: handful of rice

Megabyte: Big pot of rice
David Wellman: What is Big Data?
What is Big Data?
Byte: one rice

Kilobyte: handful of rice

Megabyte: Big pot of rice

Gigabyte: Truck full of rice
David Wellman: What is Big Data?
What is Big Data?
Byte: one rice

Kilobyte: handful of rice

Megabyte: Big pot of rice

Gigabyte: Truck full of rice

Terabyte: Containership full of rice
David Wellman: What is Big Data?
What is Big Data?
Byte: one rice

Kilobyte: handful of rice

Megabyte: Big pot of rice

Gigabyte: Truck full of rice

Terabyte: Containership full of rice

Petabyte: Covers Manhattan
David Wellman: What is Big Data?
What is Big Data?
Byte: one rice

Kilobyte: handful of rice

Megabyte: Big pot of rice

Gigabyte: Truck full of rice

Terabyte: Containership full of rice

Petabyte: Covers Manhattan

Exabyte: Covers the west coast of US
David Wellman: What is Big Data?
What is Big Data?
Byte: one rice

Kilobyte: handful of rice

Megabyte: Big pot of rice

Gigabyte: Truck full of rice

Terabyte: Containership full of rice

Petabyte: Covers Manhattan

Exabyte: Covers the west coast of US

Zettabyte: Fills the Pacific Ocean
David Wellman: What is Big Data?
What is Big Data?
Byte: one rice

Kilobyte: handful of rice

Megabyte: Big pot of rice

Gigabyte: Truck full of rice

Terabyte: Containership full of rice

Petabyte: Covers Manhattan

Exabyte: Covers the west coast of US

Zettabyte: Fills the Pacific

Yottabyte: Earth size riceball
David Wellman: What is Big Data?
What is Big Data?
Byte: one rice

Kilobyte: handful of rice

Megabyte: Big pot of rice

Gigabyte: Truck full of rice

Terabyte: Containership full of rice

Petabyte: Covers Manhattan

Exabyte: Covers the west coast of US

Zettabyte: Fills the Pacific

Yottabyte: Earth size riceball
David Wellman: What is Big Data?
Big Data
Internet
Computers
Early computers
What is Big Data?
Big Data is not just about the size of
the data, it’s about the value within
the data
This value can be used for marketing,
businesses optimisation, getting
insights, improving health, security
etc.
What is Big Data?
Data Analytics
Why Big Data Analytics?
Understand the data the company has
Process data to see patterns, corrections and
information that can be used to make better
decisions
Obtain insights that are otherwise not known
Data Analytics
TRADITIONAL APPROACH
Structured and Repeatable Analyses
BIG DATA APPROACH
Iternative and Exploratory Analyses
Business users
Business users
Determine what
questions to ask
IT
Structures the data
to answer the
question
IT
Delivers a platform
to enable creative
discovery
Explores what
questions could be
asked
Tools for Data Analytics
NoSQL databases: MongoDB, Cassandra, Hbase, Hypertable
Storage: S3, Hadoop Distributed File System
Servers: EC2, Google App Engine, Heroku
MapReduce: Hadoop, Hive, Pig, Cascading, S4, MapR
Processing: R, Yahoo! Pipes, Solr/Lucene, BigSheets,
Two Types of Data Analysis Problems
Supervised Learning: Learn from data but we have labels
for all the data we’ve seen so far
Example: Determining Spam Emails
Learn from data but we don’t have any
labels
Example: Grouping Emails, AlphaZero
Unsupervised Learning:
Learning is about discovering hidden patterns in data
Clustering
One of the oldest problems in unsupervised data analysis
In clustering the goal is to group data according to similarity
Algorithms such as K-means are used for clustering
For each artefact found,
the location to N and E
from the Marker is
recorded
That is a Data Set
Before the dig, a historian
has said that three families
lived in the location
Clustering
Similar: close in physical
distance
You assign each data point
to one and only one group
The groups are called
clusters
Clustering
Clustering is the unsupervised learning problem where
you take your data and assign each data point to exactly
one group, or cluster
Uses unlabelled data
Clustering
We may have collection data but we don’t know what to
do with it
We might want to explore the data without a particular
end goal in mind
Perhaps the data will suggest interesting avenues for
further analysis
In this case, we say that we're performing exploratory
data analysis
Clustering
Exploratory data analysis
We don’t know what we are looking for
Data point = colour of pixel and location of pixel
Dissimilarity is the distance in colour
In some cases
labelling is too
expensive
For example,
news change
every day and
there are too
much of them
Exploratory data analysis
Using Big Data to Influence People
Alexander Nix, CEO Cambridge Analytica
Ted Cruz campaign for US Republican President
Data Analysis as a Platform
THEN NOW
Complex tools operated by Data Analysts

Chaos of data silos accross the company
Real-time data analytics platform like Looker
Customer Data as a Platform
Difficult to customise,
lack of automated
customer insights
Real-time Intelligent that
automatically tracks and analysis
interaction with customer
THEN NOW
Mapping Data as a Platform
Difficult and expensive to collect data
Limited in-app digital map usage
Mapping platforms like Mapbox
THEN NOW
Cloud Data Monitoring as a Platform
Expensive and clunky point solution

Lengthy implementation cycles
Only used by System Administrators
Cloud monitoring platforms like
Datadog
THEN NOW
Next
Lecture L19 Network Platforms

Weitere ähnliche Inhalte

Was ist angesagt?

Disruptive technologies - Session 1 - introduction
Disruptive technologies - Session 1 - introductionDisruptive technologies - Session 1 - introduction
Disruptive technologies - Session 1 - introductionBohitesh Misra, PMP
 
Капитализация промышленного интернета
Капитализация промышленного интернетаКапитализация промышленного интернета
Капитализация промышленного интернетаSergey Zhdanov
 
Big data and digital ecosystem mark skilton jan 2014 v1
Big data and digital ecosystem mark skilton jan 2014 v1Big data and digital ecosystem mark skilton jan 2014 v1
Big data and digital ecosystem mark skilton jan 2014 v1Mark Skilton
 
201404 White Paper Digital Universe 2014
201404 White Paper Digital Universe 2014201404 White Paper Digital Universe 2014
201404 White Paper Digital Universe 2014Francisco Calzado
 
Team 2 Big Data Presentation
Team 2 Big Data PresentationTeam 2 Big Data Presentation
Team 2 Big Data PresentationMatthew Urdan
 
Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...
Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...
Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...Andrei Khurshudov
 
Big Data Expo 2015 - IBM Outside the comfort zone
Big Data Expo 2015 - IBM Outside the comfort zoneBig Data Expo 2015 - IBM Outside the comfort zone
Big Data Expo 2015 - IBM Outside the comfort zoneBigDataExpo
 
Cloud Computing, SDN, Big Data and Internet of Everything - Lew Tucker
Cloud Computing, SDN, Big Data and Internet of Everything - Lew TuckerCloud Computing, SDN, Big Data and Internet of Everything - Lew Tucker
Cloud Computing, SDN, Big Data and Internet of Everything - Lew TuckerLew Tucker
 
Internet of Things (IoT) and Big Data
Internet of Things (IoT) and Big DataInternet of Things (IoT) and Big Data
Internet of Things (IoT) and Big DataGuido Schmutz
 
Internet of Things: manage the complexity, seize the opportunity
Internet of Things: manage the complexity, seize the opportunityInternet of Things: manage the complexity, seize the opportunity
Internet of Things: manage the complexity, seize the opportunityThe Marketing Distillery
 
BIG DATA(PPT)
BIG DATA(PPT)BIG DATA(PPT)
BIG DATA(PPT)josnapv
 
Big Data Expo 2015 - IBM 5 predictions
Big Data Expo 2015 - IBM 5 predictionsBig Data Expo 2015 - IBM 5 predictions
Big Data Expo 2015 - IBM 5 predictionsBigDataExpo
 
7 Amazing Examples of Digital Twin Technology In Practice
7 Amazing Examples of Digital Twin Technology In Practice7 Amazing Examples of Digital Twin Technology In Practice
7 Amazing Examples of Digital Twin Technology In PracticeBernard Marr
 
If companies are not careful, "Big Data" will become "Big Dilbert"
If companies are not careful, "Big Data" will become "Big Dilbert"If companies are not careful, "Big Data" will become "Big Dilbert"
If companies are not careful, "Big Data" will become "Big Dilbert"JAX Chamber IT Council
 

Was ist angesagt? (20)

Leveraging IOT and Latest Technologies
Leveraging IOT and Latest TechnologiesLeveraging IOT and Latest Technologies
Leveraging IOT and Latest Technologies
 
Disruptive technologies - Session 1 - introduction
Disruptive technologies - Session 1 - introductionDisruptive technologies - Session 1 - introduction
Disruptive technologies - Session 1 - introduction
 
Капитализация промышленного интернета
Капитализация промышленного интернетаКапитализация промышленного интернета
Капитализация промышленного интернета
 
Big data and digital ecosystem mark skilton jan 2014 v1
Big data and digital ecosystem mark skilton jan 2014 v1Big data and digital ecosystem mark skilton jan 2014 v1
Big data and digital ecosystem mark skilton jan 2014 v1
 
World of Watson IoT Journey Map
World of Watson IoT Journey MapWorld of Watson IoT Journey Map
World of Watson IoT Journey Map
 
Transforming Big Data into business value
Transforming Big Data into business valueTransforming Big Data into business value
Transforming Big Data into business value
 
201404 White Paper Digital Universe 2014
201404 White Paper Digital Universe 2014201404 White Paper Digital Universe 2014
201404 White Paper Digital Universe 2014
 
Team 2 Big Data Presentation
Team 2 Big Data PresentationTeam 2 Big Data Presentation
Team 2 Big Data Presentation
 
Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...
Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...
Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...
 
Executive Summit for ISV & Application builders - January 2015
Executive Summit for ISV & Application builders - January 2015Executive Summit for ISV & Application builders - January 2015
Executive Summit for ISV & Application builders - January 2015
 
Big Data Expo 2015 - IBM Outside the comfort zone
Big Data Expo 2015 - IBM Outside the comfort zoneBig Data Expo 2015 - IBM Outside the comfort zone
Big Data Expo 2015 - IBM Outside the comfort zone
 
Cloud Computing, SDN, Big Data and Internet of Everything - Lew Tucker
Cloud Computing, SDN, Big Data and Internet of Everything - Lew TuckerCloud Computing, SDN, Big Data and Internet of Everything - Lew Tucker
Cloud Computing, SDN, Big Data and Internet of Everything - Lew Tucker
 
Internet of Things (IoT) and Big Data
Internet of Things (IoT) and Big DataInternet of Things (IoT) and Big Data
Internet of Things (IoT) and Big Data
 
Internet of Things: manage the complexity, seize the opportunity
Internet of Things: manage the complexity, seize the opportunityInternet of Things: manage the complexity, seize the opportunity
Internet of Things: manage the complexity, seize the opportunity
 
BIG DATA(PPT)
BIG DATA(PPT)BIG DATA(PPT)
BIG DATA(PPT)
 
Big Data Expo 2015 - IBM 5 predictions
Big Data Expo 2015 - IBM 5 predictionsBig Data Expo 2015 - IBM 5 predictions
Big Data Expo 2015 - IBM 5 predictions
 
Executive Summit for ISV & Application builders - January 2015
Executive Summit for ISV & Application builders - January 2015Executive Summit for ISV & Application builders - January 2015
Executive Summit for ISV & Application builders - January 2015
 
7 Amazing Examples of Digital Twin Technology In Practice
7 Amazing Examples of Digital Twin Technology In Practice7 Amazing Examples of Digital Twin Technology In Practice
7 Amazing Examples of Digital Twin Technology In Practice
 
The M2M platform for a connected world
The M2M platform for a connected worldThe M2M platform for a connected world
The M2M platform for a connected world
 
If companies are not careful, "Big Data" will become "Big Dilbert"
If companies are not careful, "Big Data" will become "Big Dilbert"If companies are not careful, "Big Data" will become "Big Dilbert"
If companies are not careful, "Big Data" will become "Big Dilbert"
 

Ähnlich wie L18 Big Data and Analytics

Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementationSandip Tipayle Patil
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICSNAGARAJAGIDDE
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalIIIT Allahabad
 
Big Data Trends - WorldFuture 2015 Conference
Big Data Trends - WorldFuture 2015 ConferenceBig Data Trends - WorldFuture 2015 Conference
Big Data Trends - WorldFuture 2015 ConferenceDavid Feinleib
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Hritika Raj
 
Big data using Public Cloud
Big data using Public CloudBig data using Public Cloud
Big data using Public CloudIMC Institute
 
QuickView #3 - Big Data
QuickView #3 - Big DataQuickView #3 - Big Data
QuickView #3 - Big DataSonovate
 
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate Oomph! Recruitment
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataHari Priya
 
Future of Power: Big Data - Søren Ravn
Future of Power: Big Data - Søren RavnFuture of Power: Big Data - Søren Ravn
Future of Power: Big Data - Søren RavnIBM Danmark
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 
Big data destruction of bus. models
Big data destruction of bus. modelsBig data destruction of bus. models
Big data destruction of bus. modelsEdgar Revilla Lavado
 

Ähnlich wie L18 Big Data and Analytics (20)

L21 Big Data and Analytics
L21 Big Data and AnalyticsL21 Big Data and Analytics
L21 Big Data and Analytics
 
L18 Big Data and Analytics
L18 Big Data and AnalyticsL18 Big Data and Analytics
L18 Big Data and Analytics
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
 
Big data Analytics
Big data Analytics Big data Analytics
Big data Analytics
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICS
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
Big Data Trends - WorldFuture 2015 Conference
Big Data Trends - WorldFuture 2015 ConferenceBig Data Trends - WorldFuture 2015 Conference
Big Data Trends - WorldFuture 2015 Conference
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
 
Big data using Public Cloud
Big data using Public CloudBig data using Public Cloud
Big data using Public Cloud
 
new.pptx
new.pptxnew.pptx
new.pptx
 
QuickView #3 - Big Data
QuickView #3 - Big DataQuickView #3 - Big Data
QuickView #3 - Big Data
 
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Future of Power: Big Data - Søren Ravn
Future of Power: Big Data - Søren RavnFuture of Power: Big Data - Søren Ravn
Future of Power: Big Data - Søren Ravn
 
Understanding big data
Understanding big dataUnderstanding big data
Understanding big data
 
Bigdata
Bigdata Bigdata
Bigdata
 
Data Mining With Big Data
Data Mining With Big DataData Mining With Big Data
Data Mining With Big Data
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Big data
Big dataBig data
Big data
 
Big data destruction of bus. models
Big data destruction of bus. modelsBig data destruction of bus. models
Big data destruction of bus. models
 

Mehr von Ólafur Andri Ragnarsson

New Technology Summer 2020 Course Introduction
New Technology Summer 2020 Course IntroductionNew Technology Summer 2020 Course Introduction
New Technology Summer 2020 Course IntroductionÓlafur Andri Ragnarsson
 
New Technology 2019 L13 Rise of the Machine
New Technology 2019 L13 Rise of the Machine New Technology 2019 L13 Rise of the Machine
New Technology 2019 L13 Rise of the Machine Ólafur Andri Ragnarsson
 

Mehr von Ólafur Andri Ragnarsson (20)

Nýsköpun - Leiðin til framfara
Nýsköpun - Leiðin til framfaraNýsköpun - Leiðin til framfara
Nýsköpun - Leiðin til framfara
 
Nýjast tækni og framtíðin
Nýjast tækni og framtíðinNýjast tækni og framtíðin
Nýjast tækni og framtíðin
 
New Technology Summer 2020 Course Introduction
New Technology Summer 2020 Course IntroductionNew Technology Summer 2020 Course Introduction
New Technology Summer 2020 Course Introduction
 
L01 Introduction
L01 IntroductionL01 Introduction
L01 Introduction
 
L23 Robotics and Drones
L23 Robotics and Drones L23 Robotics and Drones
L23 Robotics and Drones
 
L22 Augmented and Virtual Reality
L22 Augmented and Virtual RealityL22 Augmented and Virtual Reality
L22 Augmented and Virtual Reality
 
L20 Personalised World
L20 Personalised WorldL20 Personalised World
L20 Personalised World
 
L19 Network Platforms
L19 Network PlatformsL19 Network Platforms
L19 Network Platforms
 
L17 Algorithms and AI
L17 Algorithms and AIL17 Algorithms and AI
L17 Algorithms and AI
 
L16 Internet of Things
L16 Internet of ThingsL16 Internet of Things
L16 Internet of Things
 
L14 From the Internet to Blockchain
L14 From the Internet to BlockchainL14 From the Internet to Blockchain
L14 From the Internet to Blockchain
 
L14 The Mobile Revolution
L14 The Mobile RevolutionL14 The Mobile Revolution
L14 The Mobile Revolution
 
New Technology 2019 L13 Rise of the Machine
New Technology 2019 L13 Rise of the Machine New Technology 2019 L13 Rise of the Machine
New Technology 2019 L13 Rise of the Machine
 
L12 digital transformation
L12 digital transformationL12 digital transformation
L12 digital transformation
 
L10 The Innovator's Dilemma
L10 The Innovator's DilemmaL10 The Innovator's Dilemma
L10 The Innovator's Dilemma
 
L09 Disruptive Technology
L09 Disruptive TechnologyL09 Disruptive Technology
L09 Disruptive Technology
 
L09 Technological Revolutions
L09 Technological RevolutionsL09 Technological Revolutions
L09 Technological Revolutions
 
L07 Becoming Invisible
L07 Becoming InvisibleL07 Becoming Invisible
L07 Becoming Invisible
 
L06 Diffusion of Innovation
L06 Diffusion of InnovationL06 Diffusion of Innovation
L06 Diffusion of Innovation
 
L05 Innovation
L05 InnovationL05 Innovation
L05 Innovation
 

Kürzlich hochgeladen

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Kürzlich hochgeladen (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

L18 Big Data and Analytics

  • 1. LECTURE L18 BIG DATA AND ANALYTICS
  • 3. 1955 1960 1965 Social Security Calculate Benefits for 15MM Recipients (62MM Now) NASA Calculate Real-Time Orbital Determination IRS Calculate / Store 55MM Records (126MM Now) Data Gathering in US 1950+ Source: Mary Meeker Slide Deck 2019
  • 4. 1955 1965 1975 Banks Process Checks Data Gathering in US 1950+ Source: Mary Meeker Slide Deck 2019 Telecoms Optimise Telephone switching Hospitals Manage Patient Data Airlines Process transaction / data Insurance Optimise Insurance Policies Retail Track Inventory / Logistics Credit Cards Manage Merchant Network Source: Mary Meeker Slide Deck 2019
  • 5. Big Bangs in Data 2006 Amazon AWS 2007 Apple iPhone Until now, a sophisticated & scalable data storage infrastructure has been beyond the reach of small developers. — Amazon S3 Launch FAQ, 2006 Why run such a sophisticated operating system on a mobile device? Well, because it’s got everything we need. — Steve Jobs, iPhone Launch, 2007 Source: Mary Meeker Slide Deck 2019
  • 6. Growth of Data Source: Mary Meeker Slide Deck 2019
  • 7. Decline of Cost Source: Mary Meeker Slide Deck 2019
  • 8. Increasing Revenues Source: Mary Meeker Slide Deck 2019
  • 9. Source: Mary Meeker Slide Deck 2019 Where Does the Data Come From?
  • 10. Where Does the Data Come From? Source: Mary Meeker Slide Deck 2019
  • 11. Where Does the Data Come From? Source: Mary Meeker Slide Deck 2019
  • 13. Big Data With the computer revolution, digital data becomes possible Over the years, data has grown exponentially “Big Data” has become a platform by itself with new possibilities
  • 14. Global Data is Growing Fast Data in Digital Universe vs. Data Storage Cost, 2010-2015 Source: Mary Meeker, KPCB
  • 15. Evolution of Data Platform Source: Mary Meeker, KPCB
  • 16. Data is a New Growth Platform The Network The
 Software The
 Infrastructure The
 Data Large investments in fibre optic & last-mile cable create connectivity that facilitated the early Internet growth Optimising the network with software became far more capital efficient than additional capital expenditure buildouts, ultimately resulting in the creation of pervasive networks (Siloed DCs -> AWS) and pervasive software (Siebel -> Salesforce) Emergence of pervasive software created the need to optimise the performance of the network and store extraordinary amounts of data at extremely low prices Next Big Wave: Leveraging this unlimited connectivity and storage to collect / aggregate / correlate / interpret all of this data to improve people’s live and enable enterprises to operate more efficiently
  • 18. “Data is moving from something you use outside the workstream to becoming a part of the business app itself.” — Frank Bien, CEO of Looker
  • 19. Improve people’s live and enable enterprises to operate more efficiently
  • 21. Big Data Examples Macy's Inc. and real-time pricing The retailer adjusts pricing in near-real time for 73 million items, based on demand and inventory. Source:Ten big data case studies in a nutshell
  • 22. Big Data Examples Tipp24 AG, a platform for placing bets The company uses software to analyse billions of transactions and hundreds of customer attributes, and to develop predictive models that target customers and personalise marketing messages on the fly. Source:Ten big data case studies in a nutshell
  • 23. Big Data Examples Wal-Mart Stores Inc. and search The mega-retailer's latest search engine for Walmart.com includes semantic data. A platform that was designed in- house, relies on text analysis, machine learning and even synonym mining to produce relevant search results. Wal-Mart says adding semantic search has improved online shoppers completing a purchase by 10% to 15%. Source:Ten big data case studies in a nutshell
  • 24. Big Data Examples PredPol Inc. and repurposing The Los Angeles and Santa Cruz police departments, a team of educators and a company called PredPol have taken an algorithm used to predict earthquakes, tweaked it and started feeding it crime data. The software can predict where crimes are likely to occur down to 500 square feet. In LA, there's been a 33% reduction in burglaries and 21% reduction in violent crimes in areas where the software is being used. Source:Ten big data case studies in a nutshell
  • 25. Big Data Examples American Express and business intelligence AmEx started looking for indicators that could really predict loyalty and developed sophisticated predictive models to analyse historical transactions and 115 variables to forecast potential churn The company believes it can now identify 24% of Australian accounts that will close within the next four months Source:Ten big data case studies in a nutshell
  • 26. Big Data Examples A Bank and IBM A large US bank uses IBM machine learning technologies to analyse credit card transactions. Using machine learning and stream computing to detect financial fraud
  • 27. TEDxUofM - Jameson Toole - Big Data for Tomorrow
  • 28.
  • 29. What is Big Data?
  • 30. What is Big Data? Big data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation. Gartner
  • 31. What is Big Data? Big data refers to a process that is used when traditional data mining and handling techniques cannot uncover the insights and meaning of the underlying data. Data that is unstructured or time sensitive or simply very large cannot be processed by relational database engines. This type of data requires a different processing approach called big data, which uses massive parallelism on readily-available hardware. Techopedia
  • 32. “Big data is the oil of the 21st century and analytics is the combustion engine.” —Peter Sondergaard, Gartner Research What is Big Data?
  • 33. How do you measure numbers at large scale? What is Big Data?
  • 34. What is a Yottabyte?
  • 36. Byte: one rice David Wellman: What is Big Data? What is Big Data?
  • 37. Byte: one rice
 Kilobyte: handful of rice David Wellman: What is Big Data? What is Big Data?
  • 38. Byte: one rice
 Kilobyte: handful of rice
 Megabyte: Big pot of rice David Wellman: What is Big Data? What is Big Data?
  • 39. Byte: one rice
 Kilobyte: handful of rice
 Megabyte: Big pot of rice
 Gigabyte: Truck full of rice David Wellman: What is Big Data? What is Big Data?
  • 40. Byte: one rice
 Kilobyte: handful of rice
 Megabyte: Big pot of rice
 Gigabyte: Truck full of rice
 Terabyte: Containership full of rice David Wellman: What is Big Data? What is Big Data?
  • 41. Byte: one rice
 Kilobyte: handful of rice
 Megabyte: Big pot of rice
 Gigabyte: Truck full of rice
 Terabyte: Containership full of rice
 Petabyte: Covers Manhattan David Wellman: What is Big Data? What is Big Data?
  • 42. Byte: one rice
 Kilobyte: handful of rice
 Megabyte: Big pot of rice
 Gigabyte: Truck full of rice
 Terabyte: Containership full of rice
 Petabyte: Covers Manhattan
 Exabyte: Covers the west coast of US David Wellman: What is Big Data? What is Big Data?
  • 43. Byte: one rice
 Kilobyte: handful of rice
 Megabyte: Big pot of rice
 Gigabyte: Truck full of rice
 Terabyte: Containership full of rice
 Petabyte: Covers Manhattan
 Exabyte: Covers the west coast of US
 Zettabyte: Fills the Pacific Ocean David Wellman: What is Big Data? What is Big Data?
  • 44. Byte: one rice
 Kilobyte: handful of rice
 Megabyte: Big pot of rice
 Gigabyte: Truck full of rice
 Terabyte: Containership full of rice
 Petabyte: Covers Manhattan
 Exabyte: Covers the west coast of US
 Zettabyte: Fills the Pacific
 Yottabyte: Earth size riceball David Wellman: What is Big Data? What is Big Data?
  • 45. Byte: one rice
 Kilobyte: handful of rice
 Megabyte: Big pot of rice
 Gigabyte: Truck full of rice
 Terabyte: Containership full of rice
 Petabyte: Covers Manhattan
 Exabyte: Covers the west coast of US
 Zettabyte: Fills the Pacific
 Yottabyte: Earth size riceball David Wellman: What is Big Data? Big Data Internet Computers Early computers What is Big Data?
  • 46. Big Data is not just about the size of the data, it’s about the value within the data This value can be used for marketing, businesses optimisation, getting insights, improving health, security etc. What is Big Data?
  • 48. Why Big Data Analytics? Understand the data the company has Process data to see patterns, corrections and information that can be used to make better decisions Obtain insights that are otherwise not known
  • 49. Data Analytics TRADITIONAL APPROACH Structured and Repeatable Analyses BIG DATA APPROACH Iternative and Exploratory Analyses Business users Business users Determine what questions to ask IT Structures the data to answer the question IT Delivers a platform to enable creative discovery Explores what questions could be asked
  • 50. Tools for Data Analytics NoSQL databases: MongoDB, Cassandra, Hbase, Hypertable Storage: S3, Hadoop Distributed File System Servers: EC2, Google App Engine, Heroku MapReduce: Hadoop, Hive, Pig, Cascading, S4, MapR Processing: R, Yahoo! Pipes, Solr/Lucene, BigSheets,
  • 51. Two Types of Data Analysis Problems Supervised Learning: Learn from data but we have labels for all the data we’ve seen so far Example: Determining Spam Emails Learn from data but we don’t have any labels Example: Grouping Emails, AlphaZero Unsupervised Learning: Learning is about discovering hidden patterns in data
  • 52. Clustering One of the oldest problems in unsupervised data analysis In clustering the goal is to group data according to similarity Algorithms such as K-means are used for clustering
  • 53. For each artefact found, the location to N and E from the Marker is recorded That is a Data Set Before the dig, a historian has said that three families lived in the location Clustering
  • 54. Similar: close in physical distance You assign each data point to one and only one group The groups are called clusters Clustering
  • 55. Clustering is the unsupervised learning problem where you take your data and assign each data point to exactly one group, or cluster Uses unlabelled data Clustering
  • 56. We may have collection data but we don’t know what to do with it We might want to explore the data without a particular end goal in mind Perhaps the data will suggest interesting avenues for further analysis In this case, we say that we're performing exploratory data analysis Clustering
  • 57. Exploratory data analysis We don’t know what we are looking for Data point = colour of pixel and location of pixel Dissimilarity is the distance in colour
  • 58. In some cases labelling is too expensive For example, news change every day and there are too much of them Exploratory data analysis
  • 59. Using Big Data to Influence People
  • 60. Alexander Nix, CEO Cambridge Analytica Ted Cruz campaign for US Republican President
  • 61.
  • 62. Data Analysis as a Platform THEN NOW Complex tools operated by Data Analysts
 Chaos of data silos accross the company Real-time data analytics platform like Looker
  • 63. Customer Data as a Platform Difficult to customise, lack of automated customer insights Real-time Intelligent that automatically tracks and analysis interaction with customer THEN NOW
  • 64. Mapping Data as a Platform Difficult and expensive to collect data Limited in-app digital map usage Mapping platforms like Mapbox THEN NOW
  • 65. Cloud Data Monitoring as a Platform Expensive and clunky point solution
 Lengthy implementation cycles Only used by System Administrators Cloud monitoring platforms like Datadog THEN NOW