Applications of Big Data
By
Prashant Kumar Jadia
Department of Computer Science and Engineering
Hong Kong University of Science and Technology
pkjadia@connect.ust.hk
What is Big Data
• McKinsey
"Big data" refers to datasets whose size is beyond the ability of typical database
software tools to capture, store, manage, and analyze.
Various Definitions
• Gartner
Big Data in general is defined as high volume, velocity and variety information
assets that demand cost-effective, innovative forms of information processing for
enhanced insight and decision making.
• Oreilly
Big data is data that exceeds the processing capacity of conventional database
systems. The data is too big, moves too fast, or doesn't fit the strictures of your
database architectures. To gain value from this data, you must choose an
alternative way to process it.
Source: Infosys Blogs
URL: http://www.infosysblogs.com/bigdata/2013/02/what_actually_is_big_data.html
Date: 14-February-15
Four V’s of Big Data
Volume
• 300 hours of video every minute to
you tube
• 10 billion posts on Facebook Everyday
• 302 million monthly active users on
Twitter
Variety
• 500 miliion tweets everyday
• Millions of wearables and health
monitors
• Billions of photos uploaded everyday
Velocity
• Spread of sensor network
• Growth in world connectivity
Veracity
• Different sources will have different
formats of data
• Health care, same data in various
forms.
Figures are as of May, 2015
The Fifth ‘V’ Value
Value
DFAS has
saved approx
$4 billion in
improper
vendor
payments
Savings $100
million in
erroneous
claims
eCall will
save around
2500 lives
every year
Estimated
savings of
$450 billion in
USA Health
Care if Big
Data is used
Figures are as of May, 2015
History of Big Data
• First Documented use of Term Big Data
1997 by a paper from NASA: "Visualization provides an interesting challenge for
computer systems: data sets are generally quite large, taxing the capacities of
main memory, local disk, and even remote disk. We call this the problem of big
data."
• 3Vs first published in 2001
Gartner analyst Doug Laney introduced the 3Vs concept in a 2001 MetaGroup
research publication, 3D data management: Controlling data volume, variety and
velocity.
• Rapid growth since 2007
- Better Internet bandwidth
- Cheaper storage
- Increased computing powe
History of Big Data – Factors
contributing to Growth
Number of “Big Data” Papers published per year
Source: An overview of Big Data
Journal: The Next Wave | Vol. 20 | No. 4 | 2014
History of Big Data – Factors
contributing to Growth
Computing Cost Performance 1992-2012
Source: From exponential technologies to exponential innovation
URL: http://dupress.com/articles/from-exponential-technologies-to-exponential-innovation/
Date: 4-October-13
History of Big Data – Factors
contributing to Growth
Storage Cost Performance 1992-2012
Source: From exponential technologies to exponential innovation
URL: http://dupress.com/articles/from-exponential-technologies-to-exponential-innovation/
Date: 4-October-13
History of Big Data – Factors
contributing to Growth
Global Internet Traffic
Figures are as of May, 2015
History of Big Data – Factors
contributing to Growth
Gartner Emerging Technologies 2012
History of Big Data – Factors
contributing to Growth
Google search for Term “Big Data” – Signifying public interest
Figures are as of May, 2015
Big Data in Social Media
Recommendation Systems
Marketing
Electioneering
Influence Marketing
Credit Scoring
Candidature Check
Big Data in Social Media
The conversation Prism
• What is Social Media?
A group of Internet-based applications that
build on the ideological and technological
foundations of Web 2.0, and that allow the
creation and exchange of user-generated
content.
• Social media is much more
than FB and twitter.
• Social media platforms for
almost every sphere of life.
Users /day
Twitter 302 million 500 million tweets
Facebook 936 million
55 million status
update
LinkedIn 364 million
YouTube 1+ billion users
432000 hours of
videos
• How big are these platform?
Figures are as of May, 2015
Uses of Social Media Data
• What can be mined out of ocean of data?
Possibilities are endless.. .. ..
UN project showcased an
exciting application to discover
association between food
prices inflation and tweets
about price of rice.
Social Media – Recommendation
Systems
Many Types of recommendation systems
• Facebook – Recommended Friends
• LinkedIn – People You May Know
• YouTube – Videos you may Link
• Amazon – People also brought
• Pinterest – Board Recommendation
So, how does Recommendation Systems work?
Social Media – Recommendation
Systems
People / Friend Recommendation
- Using known information predict ties
- Friends of Friends are likely to be friends
Algorithm/research area
- Community detection
- Structural Holes
Social Media – Electioneering
• What is Electioneering?
- The activity of trying to persuade people to vote
for a particular political party.
• What is the Big Data’s role in it?
- Determine and target most perusable electoral
base
- Effectively choose marketing media for maximum
reach for every dollar spent
- Influencing the influencers
Social Media – Electioneering
• Maximizing return per dollar
– Match billing record (set-up box company) with present voter list
– Divide a day into 96 zones
– Study the time slots usage of target electoral across 60 channels
– Pick the slots with maximum reach per dollar
• User Modelling
– Model users as on rating of 0 – 10 for being perusable
– Volunteers then call/visit electoral with appropriate content
• Micro-targeting
– Monitor social media facebook, twitter etc.
– Micro target voters by delivering custom message to specific sub group
Social Media – Influence
Marketing
What is Social Influence
- Social influence occurs when one's opinions, emotions,
or behaviours are affected by others, intentionally or
unintentionally.
What is Influence marketing
- Discovering and predicting a users influence on
connected nodes and ability to spread information.
Social Media – Influence
Marketing
Use Case
- Klout generates a score on a scale of 1-100 for a social
user to represent her/his ability to engage other people
and inspire social actions.
- In 2012, Cathay Pacific opened access to SFO lounge to
Klout user’s
Big Data in Healthcare
Self-aware Medics
Sports and Fitness Tracking
Clinical Trials
Personalized Medicines
Genomics
Electronic Health records
Big Data in Healthcare
• Data characteristics
- 1.2 billion clinical documents are produced in the U.S.
each year. 60% are in unstructured format
- Health trackers
- GENOMICS
• Savings
- Can save up to $450 billion if healthcare industry uses big
data analytics and patients make the right choices.
- US Government recoveries from forfeiture, asset seizures
and fines amounting to $4.3 dollars
Figures are as of May, 2015
Healthcare – Genomics
Success Story
Use Big Data and genomics to pin on disease root cause
Story
- Joshua Osborn(pictured), 14 year old admitted to hospital for high fever
- MRI showed brain swelling. However, all related series
of test showed negative result.
- Doctors decided to run experimental DNA Technology
- Extracted DNA using cerebrospinal fluid
- With in 2 day, three million DNA sequences were
extracted
- From Sequence obtained, team subtracted all known human elements
- Only 0.02 percent left out, belonged to lethal bacterium called Leptospira
- Started the cure for the infection and within weeks Joshua was back home
Underling Big Data Technology
- SNAP, a spark based sequence aligner
Big Data in Smart Cities
Smart Transport
Traffic Management
Smart Governance
Smart Energy
Smart Economy
Smart Cities – Internet of Things
What is IoT
The Internet of Things, also called The
Internet of Objects, refers to a
wireless network between objects,
usually the network will be wireless
and self-configuring, such as
household appliances.
-Wikipedia
Benefits
- Dynamic control of Life
- Improve resource utilization
- Automation support systems
- Integrating physical systems
with human society
Smart Cities – Smart Transport
Latest Use Case: eCall
- Mandatory for all vehicles
to have embedded impact
sensors
- Sensors can call
emergency services in
case of impact.
- Devices activated only on
accidents.
Savings
- Expected to reduce response time by 40-50%
- Time saved = lives saved. 2500 lives annually
Challenges
- User privacy and concerns over being tracked and monitored
Smart Cities – Smart Energy
Use Case: Time based
energy pricing
- Monitor energy usage using
smart meters
- Report usage to both customer
and energy company in real
time.
- Big data is used to predict and
calculate pricing based on
history and current utilization.
Savings and benefits
- Customer can better manage
there energy usage
- Potential to maximize saving on
energy
Smart Cities – Smart Energy
Use Case: IBM HyREF
- Cloud imaging technology can
track clouds
- Sensors for wind speed,
temperature and direction.
- Can predict 1 month in advance
- Can predict weather 1 month in
advance at interval of 15 mins
Savings and benefits
- Can better manage variable
nature of winds
- Better forecast of energy
generation
- Enable integration of traditional
sources of power generation in
case of outage