1. D: DRIVE
How to become Data Driven?
This programme has been funded with
support from the European Commission
Module 1. Introduction to Big and Smart Data
2. Smart Data Smart Region | www.smartdata.how
This programme has been funded with support from the European Commission. The author is
solely responsible for this publication (communication) and the Commission accepts no
responsibility for any use that may be made of the information contained therein.
The objective of this module is to provide an overview of
the basic information on big and smart data.
Upon completion of this module you will:
- Comprehend the emerging role of big data
- Understand the key terms regarding big and smart data
- Know how big data can be turned into smart data
- Be able to apply the key terms regarding big and smart
data
Duration of the module: approximately 1 – 2 hours
Module 1.
Introduction
to Big and Smart
Data
3. The emerging role of data in VET and enterprise1
V‘s of data2
How does big data become smart data?3
Smart Data Smart Region | www.smartdata.how
This programme has been funded with support from the European Commission. The author is
solely responsible for this publication (communication) and the Commission accepts no
responsibility for any use that may be made of the information contained therein.
– A Brief History of Data
– What is Big Data?
– Classification of Data
– Sources of Data
– The Importance of Big Data
– Turning Big Data into Value
– Smart Data Applications
– How to Start Smart?
4. A BRIEF HISTORY OF DATA
Smart Data Smart Region | www.smartdata.how
2400 BCE
300 BCE
1663
1926
1989
1997
2005
2015
The abacus – the first dedicated
device constructed specifically for
performing calculations – comes
into use in Babylon. The first
libraries also appeared around this
time, representing our first
attempts at mass data storage.
The Library of Alexandria is perhaps
the largest collection of data in the
ancient world, housing up to half a
million scrolls and covering
everything we had learned so far.
In London, John Graunt carries out
the first recorded experiment in
statistical data analysis. By
recording information about
mortality, he theorized that he can
design an early warning system for
the bubonic plague ravaging
Europe.
Nikola Tesla predicted that
humans will be able to access
and analyse huge amounts of
data in the future by using a
pocket friendly device.
Possibly the first use of the term Big
Data in the way it is used today.
Author Erik Larson speculated on the
origin of the junk mail he receives. He
writes: “The keepers of big data say
they are doing it for the consumer’s
benefit. But data have a way of being
used for purposes other originally
intended.”
Michael Lesk publishes his paper How Much
Information is there in the World? Theorizing
that the existence of 12,000 petabytes is
“perhaps not an unreasonable guess”. He also
points out that even at this early point in its
development, the web is increasing in size 10-
fold each year.
Commentators announce that we
are witnessing the birth of “Web
2.0” – the user-generated web
where the majority of content
will be provided by users of
services, rather than the service
providers themselves.
Google is the largest big data
company in the world that stores
10 billion gigabytes of data and
processes approximately 3.5
billion requests every day.
5. Big Data is the
foundation of all of the
megatrends that are
happening today, from
social to mobile to the
cloud to gaming.
Chris Lynch
6. There are some things that are so big, that they have implications for
everyone, whether we want it or not. Big data is one of those things,
and it is completely transforming the way we do business and is
impacting most other parts of our lives.
The basic idea behind the phrase „Big Data“ is that everything we do
is increasingly leaving a digital trace, which we can use and analyse.
Big Data therefore refers to our ability to make use of the
everincreasing volumes of data.
WHAT IS BIG DATA?
Smart Data Smart Region | www.smartdata.how
„Data of a very large size, typically to
the extent that its manipulation and
management present significant
logistical challenges.“
Oxford English Dictionary, 2013
STRUCTURED
DATA
• High degree of
organization, such as
relational database.
• It represents only 5 to
10% of all data
• Examples: Dates, phone
numbers, customer
names, transaction
information,...
UNSTRUCTURED
DATA
• Information that is
difficult to organize using
traditional mechanisms.
• It represents around 80%
of data
• Examples: Images,
reports, social media,
spreadsheets,
communications,...
SEMI-STRUCTURED
DATA
• Information that doesn’t
reside in a relational
database but that does
have some organizational
properties that make it
easier to analyze
• Examples: Websites, XML,
e-mails,...
CLASSIFICATION OF DATA
To learn more about Big Data and its
importance complete Exercise 1 from
Learners Workbook
7. Structured Data
Employee_ID Employee_Name Gender Department Salary_In_Euros
2365 Rajesh Kulkarni Male Finance 65000
3398 Pratibha Joshi Female Admin 65000
7465 Shushil Roy Male Admin 50000
7500 Shubhojit Das Male Finance 50000
7699 Priya Sane Female Finance 55000
Unstructured Data
Semi-structured Data
8. Smart Data Smart Region | www.smartdata.how
SOURCES OF DATA
Big data is often boiled down to a few varieties including social data, machine data, and
transactional data.
Machine data
Transactional data
Social media data
9. WhatsApp users
share
347,222
photos.
EVERY
MINUTE OF
EVERY DAY
E-mail users send
204,000,000
messages
YouTube users
upload
4,320
minutes of new
videos.
Google recieves over
4,000,000
search queries.Facebook users
share
2,460,000
pieces of content.
Twitter users tweet
277,000
times.
Amazon makes
83,000$
in online sales.
Instagram users
post
216,000
new photos.
Skype users
connect for
23,300
hours.
Social Media Data
Complete Exercise 2 from Learners
Workbook to learn how much you are
worth in the social media world
12. Smart Data Smart Region | www.smartdata.how
THE IMPORTANCE OF BIG DATA
The importance of big data does not revolve around how much data a company has but how a company utilizes the
collected data. Every company uses data in its own way; the more efficiently a company uses its data, the more
potential it has to grow. The company can take data from any source and analyze it to find answers which will enable:
Cost Savings
Time Reductions
New Product Development
Understanding the Market Conditions
Control Online Reputation
13. Smart Data Smart Region | www.smartdata.how
5 V‘s OF DATA
Volume
Velocity
Variety
Veracity
Value
The magnitude of
the data being
generated.
The speed at which
data is being
generated and
aggregated.
The different types of
data.
The trustworthiness
of the data in terms of
accuracy in quality.
The economic value
of the data.
90% of the data in the world
today has been created in the
last 2 years alone.
Literally the speed of light!
Data doubles every 40 months.
Structured, semi-structured
and unstructured data.
Because of the anonimity of
the Internet or possibly false
identities, the reliability of data
is often in question.
Having access to big data is no
good unless we can turn it into
value.
Big Data does a pretty good job of telling us what happened, but not why it happened or what to do about it. The 5 V‘s represent
specific characteristics and properties that can help us understand both the challenges and advantages of big data initiatives.
14. Smart Data Smart Region | www.smartdata.how
TURNING BIG DATA INTO VALUE
Smart data describes data that
has valid, well-defined,
meaningful information that
can expedite information
processing.
The „Datafication“
of our World:
• Activities
• Conversations
• Words
• Voice
• Social Media
• Browser logs
• Photos
• Videos
• Sensors
• Etc.
Analysing Big
Data:
• Text analytics
• Sentitment
analysis
• Face recognition
• Voice analytics
• Etc.
VOLUME
VELOCITY
VARIETY
VERACITY
The „Datafication“ of our world gives us unprecedeted amounts of data in terms of volume, velocity,
variety and veracity. The latest technology such as cloud computing and distributed systems together
with the latest software and analysis approaches allow us to leverage all types of data to gain insights
and add value.
VALUE
SMART
DATA
15. SMART DATA APPLICATIONS
• Fraud
detection/Prevention
• Brand sentiment
analysis
• Real time pricing
• Product placement
• Micro-targeted
advertising
• Monitor patient visits
• Patient care and
safety
• Reduce readmittance
rates
• Smart meter-stream
analysis
• Proactive equipment
repair
• Power and
consuption matching
• Cell tower diagnostics
• Bandwidth allocation
• Proactive
maintenance
• Decreasing time to
market
• Supply planning
• Increasing product
quality
• Network intrusion
detection and
prevention
• Disease outbreak
detection
• Unsafe driving
detection and
monitoring
• Route and time
planning for public
transport
FINANCIAL SERVICES RETAIL TELECOM MANUFACTURING
HEALTHCARE UTILITIES, OIL & GAS PUBLIC SECTOR TRANSPORTATION
Every business in the world needs data to thrive. Data is what tells you who your
customers are and how they operate, and it’s what can guide you to new insights
and new innovations. Any business can benefit from using big data to learn more
about their strategic position and development potential, but in order of not
„drowning“ in big data it is neccessary to find the right area of interest first.
Smart Data Smart Region | www.smartdata.how
Find out how a big retailer used the power
of Big Data in Learners Workbook, Exercise 3
16. HOW TO START SMART?
Smart Data Smart Region | www.smartdata.how
Even though data analysis and visualization tools have come a long way in the past decade, big data analysis still relies on human
intervention and coordination to be successful. You need to know how to ask the right questions, how to eliminate your own bias,
and how to form actionable insights rather than basic conclusions.
1. Review your
data.
• What data do you
have?
• How is it used?
• Do you have the
expertise to manage
your data?
2. Ask the right
questions.
• What data do you
have and how is it
used?
• Are you being specific
enough?
3. Draw the
conclusions.
• Could an expert help
to sense-check your
results?
• Can you validate your
hypotheses?
• What further data do
you need?
17. LACK OF
TALENT
Smart Data Smart Region | www.smartdata.how
BIG DATA CHALLENGES
It‘s easy to get caught up in the hype and opportunity of big data. However, one of the reasons big data is so underutilized is
because big data and big data technologies also present many challenges. One survey found that 55% of big data projects are
never completed. So what‘s the problem with big data?
SCALABILITY
ACTIONABLE
INSIGHTS
DATA
QUALITY
SECURITY
COST
MANAGEMENT
18. Smart Data Smart Region | www.smartdata.how
BIG DATA PLATFORMS
Big data platform generally consists
of big data storage, servers, database,
big data management, business
intelligence and other big data
management utilities. It also supports
custom development, querying and
integration with other systems. There
are hundreds of big data
tools and services.