SlideShare ist ein Scribd-Unternehmen logo
1 von 72
Downloaden Sie, um offline zu lesen
THE AGE OF BIG DATA:
A NEW CLASS OF ECONOMIC ASSET
Asst. Prof. Natawut Nupairoj, Ph.D.
Dept. of Computing Engineering
Faculty of Engineering
Chulalongkorn University
natawut.n@chula.ac.th
@natawutn
http://natawutn.wordpress.com
http://www.slideshare.net/natawutnupairoj
“Data is a new class of economic asset,
like currency and gold”
-- World Economic Forum (2012)
WELCOME TO DATA-DRIVEN ECONOMY
In July 2014, the European Commission outlined a
new strategy on Big Data, supporting and
accelerating the transition towards a data-driven
economy in Europe
In Feb 2015, The White House appointed the first
US chief data scientist
As of today, US Government’s open data publishes
more than 180,000 datasets to the public
(our data.go.th has 593 datasets as of 8 June 2016)
Something that cannot be processed
within reasonable time using 2-3
servers
DATA CHARACTERISTICS
Source: IBM
พระราชบัญญัติ
ว่าด้วยการกระทาความผิดเกี่ยวกับคอมพิวเตอร์
พ.ศ.๒๕๕๐
IT LOG AT CHULALONGKORN UNIVERSITY
Users 40,000+
Servers = 500+
Wifi + NAT
Manual processes
Approx. traffic: 5,000 events / sec
Storage 90 days
= 39,000,000,000 events (6.5TB)
Use Graylog-2 (based on
ElasticSearch technology)
Internal External
Structured Unstructured
BIG DATA’S DRIVERS
MOBILE & DEVICES - COMPUTING EVERYWHERE
Thailand’s rate is 147% (smartphone = 49%)
Wearable devices’ shipment will be doubled in 4 years
(from 72m in 2015 to 155m in 2019)
20% will be healthcare related devices
THE INTERNET OF THINGS
Find Jeans That Fit Your Shape!LIKE A GLOVE
INTRODUCING FDA-APPROVED
INGESTIBLE SENSORS IN PILLS
http://www.forbes.com/sites/singularity/2012/08/09/no-more-skipping-your-medicine-fda-approves-first-digital-pill/
BIG DATA’S DRIVERS
USER GENERATED CONTENTS AND CROWDSOURCING
Blogging, reviewing commenting,
forum, digital video, podcasting,
mobile phone photography, social
networking, crowdsourcing, etc.
Highly influential to consumer
behavior and also enable the study of
consumer behavior
Generate lots of both structured and
unstructured data
BIG DATA’S DRIVERS
CLOUD COMPUTING
Deliver computing services over a
network
Evolution of technology, but
revolution of economy
One of Big Data accelerators:
significant big data sources and
enabling platform for big data
processing
Data Science
(Data Analytics)
Data Engineering
(Big Data)
DATA VALUE CHAIN
Source: http://steinvox.com/blog/big-data-and-analytics-the-analytics-value-chain/
DATA VALUE CHAIN
Source: http://steinvox.com/blog/big-data-and-analytics-the-analytics-value-chain/
Data Engineering
Data Science
มองโจทย์เป็นตัวตั้ง
มองข้อมูลเป็นตัวตั้ง
DATA VALUE CHAIN กับ IT LOG
Source: http://steinvox.com/blog/big-data-and-analytics-the-analytics-value-chain/
Data Engineering
Data Science
DATA ANALYTICS SIMPLIFIED
Descriptive
• “A.Natawut drinks about 1 cup of coffee a
day”
Diagnostic
• “Number of cups that A.Natawut drinks
depend on number of meetings he has each
day”
Predictive
• “Tomorrow, A.Natawut has 2 meetings, it is
very likely that A.Natawut will drink 2 cups
tomorrow”
Prescriptive
• “Inform secretary to prepare 1 cup in the
morning and one in the afternoon for
A.Natawut”
TYPES OF DATA ANALYTICS
CASE STUDY: PREDICTIVE POLICING
Being used by 60 cities in the US e.g. Atlanta, LA, etc.
Source: http://www.forbes.com/sites/ellenhuet/2015/02/11/predpol-predictive-policing
USE CASES BY SUBJECT AREAS
• Infrastructure and Information Management
• Social Listening / Customer Understanding
• Health Improvement
• Logistics and Planning
• Operation / Product Improvement
INFRASTRUCTURE AND INFORMATION
MANAGEMENT
• Bigger and Faster Data Warehouse
• Information Archival and Management
CASE STUDY:
SK TELECOM’S USAGE PATTERN ANALYSIS
Process usage data from 28 millions
subscribers: 40TB/day – 15PB total
Must process data with 530MB/sec
or 1 million records/sec
Use Hadoop, Spark, and ElasticSearch
to provide mobile usage pattern
analytics with low latency ad-hoc
query (< 2 secs)
SOCIAL LISTENING /
CUSTOMER UNDERSTANDING
• Sentimental Analysis / Social Network Trends
• Customer 720
• Customer Segmentation
• Customer Retention
• Targeted Marketing / Personalization Offering
• Click-Stream Analysis
• In-store Tracking
CASE STUDY:
AMAZON’S RECOMMENDATION ENGINE
Mine data from 152 million
customers to suggest products to
customers
Perform collaborative filtering,
click-stream analysis, historical
purchase data analytics
CASE STUDY:
UBER’S DYNAMIC PRICING FARES
Uber’s entire business model is based
on the very Big Data principle of crowd
sourcing
“dynamic pricing” fares are calculated
automatically, using GPS, street data,
demand forecast, and predictive
algorithms
Due to traffic conditions in New York on
New Year’s Eve 2011, the fare of
journey of one mile rose from $27 to
$135
CASE STUDY:
INMOBI’S TARGETED MARKETING
User behaviour changes
dramatically across work,
home, commute, and
other location contexts
Geo context targeting:
create customer micro
segmentation from
customer’s location
activities, time of day,
and app being used
CASE STUDY: MARCY’S
Mid-range to upscale
department store chain
Goal is to offer more
localized, personalized and
smarter customer
experience across all
channels
Deploy 4,000 sensors inside
768 stores to identify
customers’ in-store
locations
HEALTH IMPROVEMENT
• eHR / Care Coordination Record / Patient 360
• Text Analytics for Medical Classification
• Machine Learning for Diagnosis and Screening
• Genome Analytics / Precision Medicine
• Risk Prediction for Patient Care / Urgent Care
Management
• After-discharge monitoring
• Population Health Management / Preventive Healthcare
Behavioral trend tracking – customize fitness program setup
Food intake tracking - visual recognize food intake
Environment factor tracking – modify fitness program recommendation
LOGISTICS AND PLANNING
• Route Optimization
• Location Planning
• Crowdsourcing
• Remote-Sensing-Aided Marketing Research
• Urban Planning
CASE STUDY: PREDICTIVE POLICING
Being used by 60 cities in the US e.g. Atlanta, LA, etc.
Source: http://www.forbes.com/sites/ellenhuet/2015/02/11/predpol-predictive-policing
CASE STUDY: STARBUCKS OPERATION PLANNING
http://www.fastcompany.com/3034792/how-fast-food-chains-pick-their-next-location
CASE STUDY: FASTFOOD STORE PLANNING
http://www.fastcompany.com/3008621/tracking/github-reveals-a-formula-for-your-hacker-persona
Using social network and
POI, we can effectively
identify best store
locations
USHAHIDI
2007
Kenya
2010
Haiti
Chile
Washington DC
Russia
2011
Christchurch
Middle East
India
Japan
Australia
US
Macedonia
2012
Balkans
2014
Kenya
Stratified sampling divides
members of the population into
homogeneous subgroups to
improve effectiveness
Indonesia is a large country which
can be expensive for sampling
Use crowdsourcing + satellite
imagery + K-Mean to better
measure urbanization and lead to
optimal allocation of interviewers
to respondents
CASE STUDY: NIELSEN - GEO ANALYTICS
AND MARKETING RESEARCH
OPERATION / PRODUCT IMPROVEMENT
• New Products / New Services
• Risk Management / Fraud Detection
• Predictive Maintenance
CASE STUDY:
GE’S SMART MACHINES
GE has launched Industrial Internet
initiative
Jet engine has 20 sensors
generating 5,000 data samples per
second
Data can be used for fuel efficiency
and service improvements
“In the future it’s going to be
digital. By the time the plane lands,
we’ll know exactly what the plane
needs.”
CASE STUDY:
JP MORGAN CHASE JP Morgan Chase & Co use Big Data to
aggregate all available information
about a single customer
Data included monthly balances, credit
card transactions, credit bureau data,
demographic data
This allowed bank to offer lower
interest rates by reducing credit card
fraud
Aggregating data of 30 million
customers, they provide US economic
outlooks with “Weathering Volatility:
Big Data on the Financial Ups and
Downs of U.S. Individuals”
CASE STUDY: ALIBABA FRAUD DETECTION
Source: http://www.sciencedirect.com/science/article/pii/S2405918815000021
Machine Learning + Graph Analytics
on user behaviors and network
HOW CAN WE USE BIG DATA FOR MOF
• Bigger / Faster / More Up-to-Date Data
Warehouse
• Improve Efficiency of Fiscal Operations
o Accelerating budget disbursements
o Tax auditing
• Informing Fiscal Policy Making
o Measuring effectiveness of public investment
o Analyzing the effects of government stimulus
programs
Credit: Dr.Athiphat Muthitacharoen
TIME (IN MINUTES) TO READ 1TB OF DATA
0 20 40 60 80 100 120 140 160
Cluster
Mid-Size Server
Single PC
TYPICAL BIG DATA ARCHITECTURE
Data
Source
Data
Source
Data
Source
Data
Source
Data
Ingestion
Fast Data Path
Big Data Path
Data Stream Processors
Data Lake
(Landing Zone)
Data Refinery /
Data Analytics
DataVisualization
Traditional Data Warehouse / Reporting tools
NOSQL
Python R
Opensource software framework inspired by Google Search
Engine Architecture
Provide easy-to-program scale-out foundation for data-intensive
applications on large clusters of commodity hardware
Hadoop File System (HDFS) has been widely used
Users: Yahoo!, Facebook, Amazon, eBay, American Airline, Apple,
Google, HP, IBM, Microsoft, Netflix, New York Times, etc.
Products: IBM InfoSphere BigInsights, Google App Engine, Oracle
Big Data Appliance, Microsoft HDInsight
In-Memory Data Processing from UC Berkeley
Extend MapReduce model to support batch executions,
interactive queries, and stream processing
Support various languages (Java, Python, Scala, R) with
built-in analytic libraries (machine learning, graph
processing)
Strong and growing community
High performance, based on sorting benchmarks, Spark
is 10x – 100x faster than Hadoop
NOSQL – NOT ONLY SQL
Special DBMS for large data that
does not require relational model
e.g. unstructured data
Various types: Document Store,
Graph, Key-Value store, etc.
Products: Parquet, Cassandra,
HBASE, ElasticSearch, Accumulo,
DynamoDB, Redis, Riak, CouchDB,
MangoDB, Neo4j, etc.
Source: http://db-engines.com/en/ranking
PREDICTIVE ANALYTICS
Analyze current and historical data
to automatically find patterns
based on several techniques e.g.
statistics, modeling, machine
learning, data mining, time series
analysis, deep learning, etc.
Utilize other techniques e.g. text
analytics, image processing,
location analytics, etc.
Applications: Micro Customer
Segmentation, Sentiment Analysis,
Customer retention, Fraud
detection, etc.
Database marketing
Fraud detection
Pattern detection
Churn customer detection
Web classification
Customer Segmentation
Collaborative Filtering
OTHER ANALYTICS
Spatial Analytics
Mobility Analytics
Social Network Analytics
“Big data is about having the technology and
people with the appropriate analysis skills to
allow firms to make sense of huge volumes of
data in an affordable manner.”
Source: Forrester Research, 2012
“Data Science is a Team Sport” – DJ Patil
Domain
Knowledge
Math &
Statistics
Computer
Science
Data Scientist
Statistical ResearchData Processing
Machine Learning
WHAT IF WE CAN …
Process large-volume data very quickly e.g. Real-Time Data
Warehouse
Personalize the offering at the personal level
Use unstructured data sources e.g. text, comments, images
etc.
Find correlation or dominant factors that contribute to
changes automatically
Recognize patterns automatically from historical data to
predict the future
Digital Data Driven Organization
CS PROGRAM
Architecture Track
• Map/Reduced
• In-Memory Processing
• Cloud Computing
• Mobile and Networks
Analytics Track
• Machine Learning
• Data Mining
• Big Data Analytics
• Social Network Analysis

Weitere ähnliche Inhalte

Was ist angesagt?

Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional CareersKno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Amit Sheth
 
The NEEDS vs. the WANTS in IoT
The NEEDS vs. the WANTS in IoTThe NEEDS vs. the WANTS in IoT
The NEEDS vs. the WANTS in IoT
Prasant Misra
 
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
Amit Sheth
 
Using Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsUsing Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and Analytics
Perficient, Inc.
 

Was ist angesagt? (19)

Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional CareersKno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
 
Bigdata
BigdataBigdata
Bigdata
 
AI in healthcare - Use Cases
AI in healthcare - Use Cases AI in healthcare - Use Cases
AI in healthcare - Use Cases
 
Datascienceindia article
Datascienceindia articleDatascienceindia article
Datascienceindia article
 
Big Data
Big Data Big Data
Big Data
 
Cri big data
Cri big dataCri big data
Cri big data
 
Big Data in Healthcare and Medical Devices
Big Data in Healthcare and Medical DevicesBig Data in Healthcare and Medical Devices
Big Data in Healthcare and Medical Devices
 
The NEEDS vs. the WANTS in IoT
The NEEDS vs. the WANTS in IoTThe NEEDS vs. the WANTS in IoT
The NEEDS vs. the WANTS in IoT
 
Big Data Analytics for Smart Health Care
Big Data Analytics for Smart Health CareBig Data Analytics for Smart Health Care
Big Data Analytics for Smart Health Care
 
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
 
NUS-ISS Learning Day 2018- Integrating IoT and AI for urban health solutions
NUS-ISS Learning Day 2018- Integrating IoT and AI for urban health solutionsNUS-ISS Learning Day 2018- Integrating IoT and AI for urban health solutions
NUS-ISS Learning Day 2018- Integrating IoT and AI for urban health solutions
 
Using Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsUsing Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and Analytics
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
 
Knowledge Will Propel Machine Understanding of Big Data
Knowledge Will Propel Machine Understanding of Big DataKnowledge Will Propel Machine Understanding of Big Data
Knowledge Will Propel Machine Understanding of Big Data
 
Big Data: Big Opportunity?
Big Data: Big Opportunity?Big Data: Big Opportunity?
Big Data: Big Opportunity?
 
The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...
 
Data science
Data scienceData science
Data science
 
AI in Healthcare: Can AI Help in Diagnosing Coronavirus
AI in Healthcare: Can AI Help in Diagnosing CoronavirusAI in Healthcare: Can AI Help in Diagnosing Coronavirus
AI in Healthcare: Can AI Help in Diagnosing Coronavirus
 
Future of RWE - Big Data and Analytics for Pharma 2017 presentation
Future of RWE - Big Data and Analytics for Pharma 2017 presentationFuture of RWE - Big Data and Analytics for Pharma 2017 presentation
Future of RWE - Big Data and Analytics for Pharma 2017 presentation
 

Ähnlich wie The Age of Big Data: A New Class of Economic Asset

Big data and health care
 Big data and health care Big data and health care
Big data and health care
cjw119
 
BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm.
maigva
 
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
Amit Sheth
 
wireless sensor network
wireless sensor networkwireless sensor network
wireless sensor network
parry prabhu
 

Ähnlich wie The Age of Big Data: A New Class of Economic Asset (20)

Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
 
Big data and health care
 Big data and health care Big data and health care
Big data and health care
 
Big data and health care
 Big data and health care Big data and health care
Big data and health care
 
Innovative project1
Innovative project1Innovative project1
Innovative project1
 
Data_Mining.ppt
Data_Mining.pptData_Mining.ppt
Data_Mining.ppt
 
Smart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart dataSmart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart data
 
Bigdata
BigdataBigdata
Bigdata
 
Bigdata Hadoop introduction
Bigdata Hadoop introductionBigdata Hadoop introduction
Bigdata Hadoop introduction
 
The Evolution of Data and New Opportunities for Analytics
The Evolution of Data and New Opportunities for AnalyticsThe Evolution of Data and New Opportunities for Analytics
The Evolution of Data and New Opportunities for Analytics
 
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaBIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
 
Meeting Federal Research Requirements
Meeting Federal Research RequirementsMeeting Federal Research Requirements
Meeting Federal Research Requirements
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
Your organization and Big Data: Managing access, privacy, and security
Your organization and Big Data: Managing access, privacy, and securityYour organization and Big Data: Managing access, privacy, and security
Your organization and Big Data: Managing access, privacy, and security
 
BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm.
 
Business Analytics - Big Data
Business Analytics - Big DataBusiness Analytics - Big Data
Business Analytics - Big Data
 
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
 
wireless sensor network
wireless sensor networkwireless sensor network
wireless sensor network
 
Big Data - What the Heck?
Big Data - What the Heck?Big Data - What the Heck?
Big Data - What the Heck?
 
What the Heck is Big Data?
What the Heck is Big Data?What the Heck is Big Data?
What the Heck is Big Data?
 
Datapreneurs
DatapreneursDatapreneurs
Datapreneurs
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

The Age of Big Data: A New Class of Economic Asset

  • 1. THE AGE OF BIG DATA: A NEW CLASS OF ECONOMIC ASSET Asst. Prof. Natawut Nupairoj, Ph.D. Dept. of Computing Engineering Faculty of Engineering Chulalongkorn University natawut.n@chula.ac.th @natawutn http://natawutn.wordpress.com http://www.slideshare.net/natawutnupairoj
  • 2. “Data is a new class of economic asset, like currency and gold” -- World Economic Forum (2012)
  • 3. WELCOME TO DATA-DRIVEN ECONOMY In July 2014, the European Commission outlined a new strategy on Big Data, supporting and accelerating the transition towards a data-driven economy in Europe In Feb 2015, The White House appointed the first US chief data scientist As of today, US Government’s open data publishes more than 180,000 datasets to the public (our data.go.th has 593 datasets as of 8 June 2016)
  • 4. Something that cannot be processed within reasonable time using 2-3 servers
  • 5.
  • 8. IT LOG AT CHULALONGKORN UNIVERSITY Users 40,000+ Servers = 500+ Wifi + NAT Manual processes Approx. traffic: 5,000 events / sec Storage 90 days = 39,000,000,000 events (6.5TB) Use Graylog-2 (based on ElasticSearch technology)
  • 9.
  • 11. BIG DATA’S DRIVERS MOBILE & DEVICES - COMPUTING EVERYWHERE Thailand’s rate is 147% (smartphone = 49%) Wearable devices’ shipment will be doubled in 4 years (from 72m in 2015 to 155m in 2019) 20% will be healthcare related devices
  • 12. THE INTERNET OF THINGS
  • 13.
  • 14.
  • 15. Find Jeans That Fit Your Shape!LIKE A GLOVE
  • 16. INTRODUCING FDA-APPROVED INGESTIBLE SENSORS IN PILLS http://www.forbes.com/sites/singularity/2012/08/09/no-more-skipping-your-medicine-fda-approves-first-digital-pill/
  • 17. BIG DATA’S DRIVERS USER GENERATED CONTENTS AND CROWDSOURCING Blogging, reviewing commenting, forum, digital video, podcasting, mobile phone photography, social networking, crowdsourcing, etc. Highly influential to consumer behavior and also enable the study of consumer behavior Generate lots of both structured and unstructured data
  • 18. BIG DATA’S DRIVERS CLOUD COMPUTING Deliver computing services over a network Evolution of technology, but revolution of economy One of Big Data accelerators: significant big data sources and enabling platform for big data processing
  • 19. Data Science (Data Analytics) Data Engineering (Big Data)
  • 20. DATA VALUE CHAIN Source: http://steinvox.com/blog/big-data-and-analytics-the-analytics-value-chain/
  • 21. DATA VALUE CHAIN Source: http://steinvox.com/blog/big-data-and-analytics-the-analytics-value-chain/ Data Engineering Data Science มองโจทย์เป็นตัวตั้ง มองข้อมูลเป็นตัวตั้ง
  • 22. DATA VALUE CHAIN กับ IT LOG Source: http://steinvox.com/blog/big-data-and-analytics-the-analytics-value-chain/ Data Engineering Data Science
  • 23. DATA ANALYTICS SIMPLIFIED Descriptive • “A.Natawut drinks about 1 cup of coffee a day” Diagnostic • “Number of cups that A.Natawut drinks depend on number of meetings he has each day” Predictive • “Tomorrow, A.Natawut has 2 meetings, it is very likely that A.Natawut will drink 2 cups tomorrow” Prescriptive • “Inform secretary to prepare 1 cup in the morning and one in the afternoon for A.Natawut”
  • 24. TYPES OF DATA ANALYTICS
  • 25. CASE STUDY: PREDICTIVE POLICING Being used by 60 cities in the US e.g. Atlanta, LA, etc. Source: http://www.forbes.com/sites/ellenhuet/2015/02/11/predpol-predictive-policing
  • 26. USE CASES BY SUBJECT AREAS • Infrastructure and Information Management • Social Listening / Customer Understanding • Health Improvement • Logistics and Planning • Operation / Product Improvement
  • 27. INFRASTRUCTURE AND INFORMATION MANAGEMENT • Bigger and Faster Data Warehouse • Information Archival and Management
  • 28. CASE STUDY: SK TELECOM’S USAGE PATTERN ANALYSIS Process usage data from 28 millions subscribers: 40TB/day – 15PB total Must process data with 530MB/sec or 1 million records/sec Use Hadoop, Spark, and ElasticSearch to provide mobile usage pattern analytics with low latency ad-hoc query (< 2 secs)
  • 29. SOCIAL LISTENING / CUSTOMER UNDERSTANDING • Sentimental Analysis / Social Network Trends • Customer 720 • Customer Segmentation • Customer Retention • Targeted Marketing / Personalization Offering • Click-Stream Analysis • In-store Tracking
  • 30.
  • 31. CASE STUDY: AMAZON’S RECOMMENDATION ENGINE Mine data from 152 million customers to suggest products to customers Perform collaborative filtering, click-stream analysis, historical purchase data analytics
  • 32. CASE STUDY: UBER’S DYNAMIC PRICING FARES Uber’s entire business model is based on the very Big Data principle of crowd sourcing “dynamic pricing” fares are calculated automatically, using GPS, street data, demand forecast, and predictive algorithms Due to traffic conditions in New York on New Year’s Eve 2011, the fare of journey of one mile rose from $27 to $135
  • 33. CASE STUDY: INMOBI’S TARGETED MARKETING User behaviour changes dramatically across work, home, commute, and other location contexts Geo context targeting: create customer micro segmentation from customer’s location activities, time of day, and app being used
  • 34.
  • 35. CASE STUDY: MARCY’S Mid-range to upscale department store chain Goal is to offer more localized, personalized and smarter customer experience across all channels Deploy 4,000 sensors inside 768 stores to identify customers’ in-store locations
  • 36.
  • 37. HEALTH IMPROVEMENT • eHR / Care Coordination Record / Patient 360 • Text Analytics for Medical Classification • Machine Learning for Diagnosis and Screening • Genome Analytics / Precision Medicine • Risk Prediction for Patient Care / Urgent Care Management • After-discharge monitoring • Population Health Management / Preventive Healthcare
  • 38.
  • 39.
  • 40. Behavioral trend tracking – customize fitness program setup Food intake tracking - visual recognize food intake Environment factor tracking – modify fitness program recommendation
  • 41. LOGISTICS AND PLANNING • Route Optimization • Location Planning • Crowdsourcing • Remote-Sensing-Aided Marketing Research • Urban Planning
  • 42. CASE STUDY: PREDICTIVE POLICING Being used by 60 cities in the US e.g. Atlanta, LA, etc. Source: http://www.forbes.com/sites/ellenhuet/2015/02/11/predpol-predictive-policing
  • 43. CASE STUDY: STARBUCKS OPERATION PLANNING http://www.fastcompany.com/3034792/how-fast-food-chains-pick-their-next-location
  • 44. CASE STUDY: FASTFOOD STORE PLANNING http://www.fastcompany.com/3008621/tracking/github-reveals-a-formula-for-your-hacker-persona Using social network and POI, we can effectively identify best store locations
  • 46. Stratified sampling divides members of the population into homogeneous subgroups to improve effectiveness Indonesia is a large country which can be expensive for sampling Use crowdsourcing + satellite imagery + K-Mean to better measure urbanization and lead to optimal allocation of interviewers to respondents CASE STUDY: NIELSEN - GEO ANALYTICS AND MARKETING RESEARCH
  • 47.
  • 48.
  • 49. OPERATION / PRODUCT IMPROVEMENT • New Products / New Services • Risk Management / Fraud Detection • Predictive Maintenance
  • 50. CASE STUDY: GE’S SMART MACHINES GE has launched Industrial Internet initiative Jet engine has 20 sensors generating 5,000 data samples per second Data can be used for fuel efficiency and service improvements “In the future it’s going to be digital. By the time the plane lands, we’ll know exactly what the plane needs.”
  • 51. CASE STUDY: JP MORGAN CHASE JP Morgan Chase & Co use Big Data to aggregate all available information about a single customer Data included monthly balances, credit card transactions, credit bureau data, demographic data This allowed bank to offer lower interest rates by reducing credit card fraud Aggregating data of 30 million customers, they provide US economic outlooks with “Weathering Volatility: Big Data on the Financial Ups and Downs of U.S. Individuals”
  • 52. CASE STUDY: ALIBABA FRAUD DETECTION Source: http://www.sciencedirect.com/science/article/pii/S2405918815000021 Machine Learning + Graph Analytics on user behaviors and network
  • 53. HOW CAN WE USE BIG DATA FOR MOF • Bigger / Faster / More Up-to-Date Data Warehouse • Improve Efficiency of Fiscal Operations o Accelerating budget disbursements o Tax auditing • Informing Fiscal Policy Making o Measuring effectiveness of public investment o Analyzing the effects of government stimulus programs Credit: Dr.Athiphat Muthitacharoen
  • 54.
  • 55.
  • 56.
  • 57. TIME (IN MINUTES) TO READ 1TB OF DATA 0 20 40 60 80 100 120 140 160 Cluster Mid-Size Server Single PC
  • 58.
  • 59. TYPICAL BIG DATA ARCHITECTURE Data Source Data Source Data Source Data Source Data Ingestion Fast Data Path Big Data Path Data Stream Processors Data Lake (Landing Zone) Data Refinery / Data Analytics DataVisualization Traditional Data Warehouse / Reporting tools
  • 61. Opensource software framework inspired by Google Search Engine Architecture Provide easy-to-program scale-out foundation for data-intensive applications on large clusters of commodity hardware Hadoop File System (HDFS) has been widely used Users: Yahoo!, Facebook, Amazon, eBay, American Airline, Apple, Google, HP, IBM, Microsoft, Netflix, New York Times, etc. Products: IBM InfoSphere BigInsights, Google App Engine, Oracle Big Data Appliance, Microsoft HDInsight
  • 62. In-Memory Data Processing from UC Berkeley Extend MapReduce model to support batch executions, interactive queries, and stream processing Support various languages (Java, Python, Scala, R) with built-in analytic libraries (machine learning, graph processing) Strong and growing community High performance, based on sorting benchmarks, Spark is 10x – 100x faster than Hadoop
  • 63. NOSQL – NOT ONLY SQL Special DBMS for large data that does not require relational model e.g. unstructured data Various types: Document Store, Graph, Key-Value store, etc. Products: Parquet, Cassandra, HBASE, ElasticSearch, Accumulo, DynamoDB, Redis, Riak, CouchDB, MangoDB, Neo4j, etc.
  • 65. PREDICTIVE ANALYTICS Analyze current and historical data to automatically find patterns based on several techniques e.g. statistics, modeling, machine learning, data mining, time series analysis, deep learning, etc. Utilize other techniques e.g. text analytics, image processing, location analytics, etc. Applications: Micro Customer Segmentation, Sentiment Analysis, Customer retention, Fraud detection, etc.
  • 66. Database marketing Fraud detection Pattern detection Churn customer detection Web classification Customer Segmentation Collaborative Filtering
  • 67. OTHER ANALYTICS Spatial Analytics Mobility Analytics Social Network Analytics
  • 68. “Big data is about having the technology and people with the appropriate analysis skills to allow firms to make sense of huge volumes of data in an affordable manner.” Source: Forrester Research, 2012
  • 69. “Data Science is a Team Sport” – DJ Patil Domain Knowledge Math & Statistics Computer Science Data Scientist Statistical ResearchData Processing Machine Learning
  • 70. WHAT IF WE CAN … Process large-volume data very quickly e.g. Real-Time Data Warehouse Personalize the offering at the personal level Use unstructured data sources e.g. text, comments, images etc. Find correlation or dominant factors that contribute to changes automatically Recognize patterns automatically from historical data to predict the future
  • 71. Digital Data Driven Organization
  • 72. CS PROGRAM Architecture Track • Map/Reduced • In-Memory Processing • Cloud Computing • Mobile and Networks Analytics Track • Machine Learning • Data Mining • Big Data Analytics • Social Network Analysis