The objective of this module is to provide an overview of the basic information on big data.
Upon completion of this module you will:
-Comprehend the emerging role of big data
-Understand the key terms regarding big and smart data
- Know how big data can be turned into smart data
- Be able to apply the key terms regarding big data
Duration of the module: approximately 1 – 2 hours
1. This programme has been funded with
support from the European Commission
Module 1:
Introduction
to Big Data
2. Module 1:
Introduction to Big
Data
The objective of this module is to provide an overview
of the basic information on big data.
Upon completion of this module you will:
- Comprehend the emerging role of big data
- Understand the key terms regarding big and smart
data
- Know how big data can be turned into smart data
- Be able to apply the key terms regarding big data
Duration of the module: approximately 1 – 2 hours
This programme has been funded with support from the
European Commission. The author is solely responsible for
this publication (communication) and the Commission
accepts no responsibility for any use that may be made of
the information contained therein.
3. The emerging role of data in
VET and enterprise
1
– A Brief History of Data
– What is Big Data?
– Sources of Data
– The Importance of Big Data
The V‘s of data
– Turning Big Data into Value
– Smart Data Applications
– How to Start Smart?
– Big Data Challenges
How does big data become
smart data?
– How Target used the Power of Big Data
Study case
2
3
4
This programme has been funded with support from the European Commission. The author
is solely responsible for this publication (communication) and the Commission accepts no
responsibility for any use that may be made of the information contained therein.
4. THE EMERGING ROLE OF
DATA IN VET AND
ENTERPRISE
AGE FRIENDLY ECONOMY | FUTURE OPPORTUNITIES FOR SMES
1. What is Big Data?
2. Classification of Data
3. Sources of Data
4. The Importance of Big Data
5. AGE FRIENDLY ECONOMY | FUTURE OPPORTUNITIES FOR SMES
Big Data is the
foundation of all of the
megatrends that are
happening today, from
social to mobile to the
cloud to gaming.
Chris Lynch
6. AGE FRIENDLY ECONOMY | FUTURE OPPORTUNITIES FOR SMES
“Data of a very large
size, typically to the
extent that its
manipulation and
management present
significant logistical
challenges.“
Oxford English Dictionary, 2013
WHAT IS BIG DATA?
There are some things that are so big, that they
have implications for everyone, whether we want it
or not.
Big data is one of those things, and it is completely
transforming the way we do business and is
impacting most other parts of our lives.
The basic idea behind the phrase “Big Data” is that
everything we do is increasingly leaving a digital
trace, which we can use and analyse.
Big Data therefore refers to our ability to make
use of the everincreasing volumes of data.
7. AGE FRIENDLY ECONOMY | FUTURE OPPORTUN
CLASSIFICATION OF DATA
“Data” is defined as ‘the quantities, characters, or symbols on which operations are performed by a computer, which may be stored
and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media’, as a quick
google search would show.
“Big Data” refers to copious amounts of data which are:
-too large to be processed
-too copious to be analyzed by traditional tools
-not stored or managed efficiently.
However, there is also huge potential in the analysis of Big Data.
Proper management and study of data can help companies make better decisions based on usage statistics and user interests,
thereby helping their growth. Some companies have even come up with new products and services, based on feedback received
from Big Data analysis opportunities.
Classification is essential for the study of any subject. So Big Data is widely classified into three main types, which are:
STRUCTURED
DATA
UNSTRUCTURED
DATA
SEMI-
STRUCTURED
DATA
1 2 3
8. AGE FRIENDLY ECONOMY | FUTURE OPPORTUNITIES FOR SMES
STRUCTURED
DATA
1
Structured Data is used to refer to the data which is already stored in databases, in an ordered manner. It
accounts for about 20% of the total existing data.
There are two sources of structured data- machines and humans.
All the data received from sensors, web logs and financial systems are classified under machine-generated
data. These include medical devices, GPS data, data of usage statistics captured by servers and applications
and the huge amount of data that usually move through trading platforms, to name a few.
Human-generated structured data mainly includes all the data a human input into a computer, such as his
name and other personal details. When a person clicks a link on the internet, or even makes a move in a
game, data is created.
Example of Structured Data
An 'Employee' table in a
database is an example of
Structured Data.
Employee_ID Employee_Name Gender Department Salary_In_Euros
2365 Rajesh Kulkarni Male Finance 65000
3398 Pratibha Joshi Female Admin 65000
7465 Shushil Roy Male Admin 50000
7500 Shubhojit Das Male Finance 50000
7699 Priya Sane Female Finance 55000
9. AGE FRIENDLY ECONOMY | FUTURE OPPORTUNITIES FOR SMES
UNSTRUCTURED
DATA
2 Unstructured data is the opposite of structured data- they have no clear format in storage.
About 80% of the total data accounted for is unstructured big data.
Most of the data a person encounters belongs to this category- and until recently, there was not much to
do to it except storing it or analyzing it manually.
Unstructured data is also classified based on its source, into machine-generated or human-generated.
Machine-generated data accounts for all the satellite images, the scientific data from various
experiments and radar data captured by various facets of technology.
Example of Unstructured
Data
Output returned by 'Google Search‚.
Human-generated unstructured data is found in
abundance across the internet, since it includes social
media data, mobile data and website content. This
means that the pictures we upload to out Facebook or
Instagram handles, the videos we watch on YouTube
and even the text messages we send all contribute to
the gigantic heap that is unstructured data.
10. AGE FRIENDLY ECONOMY | FUTURE OPPORTUNITIES FOR SMES
SEMI-
STRUCTURED
DATA
3 Semi-structured data appears to be unstructured at a glance so can be difficult to analyze.
Information that is not in the traditional database format as structured data, but contain some
organizational properties which make it easier to process, are included in semi-structured data.
For example, NoSQL documents are considered to be semi-structured, since they contain keywords that
can be used to process the document easily.
An email message is one example of semi-structured data. It includes well-defined data fields in the
header such as sender etc., while the actual body of the message is unstructured.
If you wanted to find out who is emailing whom and when (information contained in the header), a
relational database might be a good choice. But if you’re more interested in the message content, big
data tools, such as natural language processing, will be a better ft.
Example of Semi-structured
DataPersonal data stored in a XML file.
11. SOURCES OF DATA
Machine data consists of information generated from industrial equipment, real-time
data from sensors that track parts and monitor machinery (often also called the Internet of
Things), and even web logs that track user behavior online. At arcplan client CERN, the largest
particle physics research center in the world, the Large Hadron Collider (LHC) generates 40
terabytes of data every second during experiments.
Regarding Transactional data, large retailers and even B2B companies can
generate multitudes of data on a regular basis considering that their transactions consist of one
or many items, product IDs, prices, payment information, manufacturer and distributor data,
and much more.
Social media data is providing remarkable insights to companies on consumer
behavior and sentiment that can be integrated with CRM data for analysis, with 230 million
tweets posted on Twitter per day, 2.7 billion Likes and comments added to Facebook every day,
and 60 hours of video uploaded to YouTube every minute (this is what we mean by velocity of
data).
AGE FRIENDLY ECONOMY | FUTURE OPPORTUNITIES FOR SMES
12. WhatsApp users
share
347,222
photos.
AGE FRIENDLY ECONOMY | FUTURE OPPORTUNITIES FOR SMES
EVERY
MINUTE OF
EVERY DAY
E-mail users send
204,000,000
messages
YouTube users
upload
4,320
minutes of new
videos.
Google recieves over
4,000,000
search queries.Facebook users
share
2,460,000
pieces of content.
Twitter users tweet
277,000
times.
Amazon makes
83,000$
in online sales.
Instagram users
post
216,000
new photos.
Social Media Data Examples
13. AGE FRIENDLY ECONOMY | FUTURE OPPORTUNITIES FOR SMES
Machine Data
Machine data is everywhere. It is created by everything from planes and elevators to traffic lights and fitness-
monitoring devices.
More recently, machine data has gained further attention as use of the Internet of Things, Hadoop and other big
data management technologies has grown.
Application, server and business process logs, call detail records and sensor data are prime examples of machine
data. Internet clickstream data and website activity logs also factor into discussions of machine data.
Combining machine data with other enterprise data types for analysis is expected to provide new views and
insight on business activities and operations. Machine-generated data is the lifeblood of the Internet of
Things (IoT).
Internet of Things (IoT)
Simply put, IoT is the concept of basically connecting any device with an on and off switch to the Internet (and/or to
each other). This includes everything from cellphones, coffee makers, washing machines, headphones, lamps,
wearable devices and almost anything else you can think of. This also applies to components of machines, for
example a jet engine of an airplane or the drill of an oil rig.
14. AGE FRIENDLY ECONOMY | FUTURE OPPORTUNITIES FOR SMES
Transactional Data
Transactional data are information directly derived as a result of transactions. Unlike other sorts of
data, transactional data contains a time dimension which means that there is timeliness to it and
over time, it becomes less relevant.
Rather than being the object of transactions like the product being purchased or the identity of the
customer, it is more of a reference data describing the time, place, prices, payment methods,
discount values, and quantities related to that particular transaction, usually at the point of sale.
Purchases Returns Invoices Payments Credits
Donations Trades Dividends Contracts Interest
Payroll Lending Reservations Signups Subscriptions
Examples of transactional data
15. AGE FRIENDLY ECONOMY | FUTURE OPPORTUNITIES FOR SMES
THE IMPORTANCE OF BIG DATA
The importance of big data does not revolve around how much data a company has but how a company utilizes the
collected data. Every company uses data in its own way; the more efficiently a company uses its data, the more potential it
has to grow. The company can take data from any source and analyze it to find answers which will enable:
Cost Savings
Some tools of Big Data can bring cost advantages to business when large amounts of data
are to be stored and these tools also help in identifying more efficient ways of doing
business.
Time Reductions
The high speed of tools and in-memory analytics can easily identify new sources of data
which helps businesses analyzing data immediately and make quick decisions based on the
learnings.
New Product
Development
By knowing the trends of customer needs and satisfaction through analytics you can
create products according to the wants of customers.
Understanding
the Market
Conditions
By analyzing big data you can get a better understanding of current market conditions. For
example, by analyzing customers’ purchasing behaviors, a company can find out the
products that are sold the most and produce products according to this trend.
Control Online
Reputation
Big data tools can do sentiment analysis. Therefore, you can get feedback about who is
saying what about your company. If you want to monitor and improve the online presence
of your business, then, big data tools can help in all this.
16. THE 5 V‘s OF DATA
AGE FRIENDLY ECONOMY | FUTURE OPPORTUNITIES FOR SMES
17. AGE FRIENDLY ECONOMY | FUTURE OPPORTUNITIES FOR SMES
THE 5 V‘s OF DATA
Volume
Velocity
Variety
Veracity
Value
90% of the data in the
world today has been
created in the last 2 years
alone.
Literally the speed of light!
Data doubles every 40
months.
Structured, semi-
structured and
unstructured data.
Because of the anonimity of
the Internet or possibly false
identities, the reliability of data
is often in question.
Having access to big data
is no good unless we can
turn it into value.
The magnitude
of the data
being generated.
The speed at
which data is being
generated and
aggregated.
The different types
of data.
The trustworthiness
of the data in terms
of accuracy in quality.
The economic
value of the data.
Big Data does a pretty good job of telling us what happened, but not why it happened or what to do
about it. The 5 V‘s represent specific characteristics and properties that can help us understand both
the challenges and advantages of big data initiatives.
18. AGE FRIENDLY ECONOMY | FUTURE OPPORTUNITIES FOR SMES
VOLUME
Volume refers to the vast amounts of data generated
every second.
Just think of all the emails, twitter messages, photos,
video clips, sensor data etc. we produce and share
every second.
On Facebook alone we send 10 billion messages per
day, click the "like' button 4.5 billion times and upload
350 million new pictures each and every day.
This increasingly makes data sets too large to store and
analyse using traditional database technology. With big
data technology we can now store and use these data
sets with the help of distributed systems, where parts
of the data is stored in different locations and brought
together by software.
19. AGE FRIENDLY ECONOMY | FUTURE OPPORTUNITIES FOR SMES
VELOCITY
Velocity refers to the speed at which new data
is generated and the speed at which data
moves around.
Just think of social media messages going viral
in seconds, the speed at which credit card
transactions are checked for fraudulent
activities, or the milliseconds it takes trading
systems to analyse social media networks to
pick up signals that trigger decisions to buy or
sell shares.
Big data technology allows us now to analyse
the data while it is being generated, without
ever putting it into databases.
20. AGE FRIENDLY ECONOMY | FUTURE OPPORTUNITIES FOR SMES
VARIETY
Variety refers to the different types of data we
can now use. In the past we focused on
structured data that neatly fits into tables or
relational databases.
Think of photos, video sequences or social
media updates.
With big data technology we can now harness
differed types of data including messages,
social media conversations, photos, sensor
data, video or voice recordings and bring them
together with more traditional, structured
data.
21. AGE FRIENDLY ECONOMY | FUTURE OPPORTUNITIES FOR SMES
VERACITY
Veracity refers to the trustworthiness of
the data.
With many forms of big data, quality and
accuracy are less controllable.
Just think of Twitter posts with hash tags,
abbreviations, typos and colloquial speech
as well as the reliability and accuracy of
content.
Big data and analytics technology now
allows us to work with these type of data.
The volumes often make up for the lack of
quality or accuracy.
22. AGE FRIENDLY ECONOMY | FUTURE OPPORTUNITIES FOR SMES
VALUE
Value: It is all well and good having access
to big data but unless we can turn it into
value it is useless. So you can safely argue
that 'value' is the most important V of Big
Data. It is important that businesses make a
business case for any attempt to collect and
leverage big data. It is so easy to fall into the
buzz trap and embark on big data initiatives
without a clear understanding of costs and
benefits.
Big data can deliver value in almost any area of business or society:
It helps companies to better understand and serve customers: Examples
include the recommendations made by Amazon or Netflix.
It allows companies to optimize their processes: Uber is able to predict
demand, dynamically price journeys and send the closest driver to the
customers.
It improves our health care: Government agencies can now predict flu
outbreaks and track them in real time and pharmaceutical companies are
able to use big data analytics to fast-track drug development.
It helps us to improve security: Government and law enforcement agencies
use big data to foil terrorist attacks and detect cyber crime.
It allows sport stars to boost their performance: Sensors in balls, cameras on
the pitch and GPS trackers on their clothes allow athletes to analyze and
improve upon what they do.
23. HOW DOES BIG DATA
BECOME SMART DATA
AGE FRIENDLY ECONOMY | FUTURE OPPORTUNITIES FOR SMES
1. Turning Big Data into Value
2. Smart Data Applications
3. How to Start Smart?
4. Big Data Challenges
24. AGE FRIENDLY ECONOMY | FUTURE OPPORTUNITIES FOR SMES
SMART DATA APPLICATIONS
• Fraud
detection/Preve
ntion
• Brand sentiment
analysis
• Real time pricing
• Product
placement
• Micro-targeted
advertising
• Monitor patient
visits
• Patient care and
safety
• Reduce
readmittance rates
• Smart meter-stream
analysis
• Proactive equipment
repair
• Power and
consuption matching
• Cell tower
diagnostics
• Bandwidth
allocation
• Proactive
maintenance
• Decreasing time
to market
• Supply planning
• Increasing product
quality
• Network intrusion
detection and
prevention
• Disease outbreak
detection
• Unsafe driving
detection and
monitoring
• Route and time
planning for
public transport
FINANCIAL SERVICES RETAIL TELECOM MANUFACTURING
HEALTHCARE UTILITIES, OIL & GAS PUBLIC SECTOR TRANSPORTATION
Every business in the world needs data to thrive.
Data is what tells you who your customers are and how they operate,
and it’s what can guide you to new insights and new innovations but first
it is neccessary to find the right area of interest first.
25. AGE FRIENDLY ECONOMY | FUTURE OPPORTUNITIES FOR SMES
HOW TO START SMART?
Even though data analysis and visualization tools have come a long way in the past decade, big data
analysis still relies on human intervention and coordination to be successful. You need to
know how to ask the right questions, how to eliminate your own bias, and how to form actionable
insights rather than basic conclusions.
1. Review your
data.
•What data do you
have?
•How is it used?
•Do you have the
expertise to
manage your data?
2. Ask the right
questions.
• What data do
you have and
how is it used?
• Are you being
specific
enough?
3. Draw the
conclusions.
•Could an expert
help to sense-
check your results?
•Can you validate
your hypotheses?
•What further data
do you need?
26. AGE FRIENDLY ECONOMY | FUTURE OPPORTUNITIES FOR SMES
BIG DATA CHALLENGES
LACK OF TALENT
To successfully implement
a big data project requires
a sophisticated team of
developers, data scientists
and analysts who also have
a sufficient amount of
domain knowledge to
identify valuable insights.
It‘s easy to get caught up in the hype and opportunity of big data. However, one of the reasons big data is
so underutilized is because big data and big data technologies also present many challenges.
One survey found that 55% of big data projects are never completed. So what‘s the problem with big
data?
SCALABILITY
Many organizations fail
to take into account
how quickly a big data
project can grow and
evolve. Big data
workloads also tend to
be bursty, making it
difficult to allocate
capacity for resources.
ACTIONABLE
INSIGHTS
A key challenge for
data science teams is
to identify a clear
business objective and
the appropriate data
sources to collect and
analyze to meet that
objective.
DATA
QUALITY
Common causes of
dirty data include:
user imput errors,
duplicate data and
incorrect data
linking.
SECURITY
Specific challenges include:
- User authentication for every
team and team member
accessing the data
- Restricting access based on a
user‘s need
- Recording data access
histories and meeting other
comliance regulations
- proper use of encryprion on
data in-transit and at rest
COST
MANAGEMENT
Businesses pursuing big
data projects must
remember the cost of
training, maintenance
and expansion
27. AGE FRIENDLY ECONOMY | FUTURE OPPORTUNITIES FOR SMES
CASE STUDY:
How AG used the Power of Big Data
THE BACKGROUND
Every time you go shopping, you
share intimate details about your
consumption patterns with
retailers. And many of those
retailers are studying those details
to figure out what you like, what
you need, and which coupons are
most likely to make you happy. AG
– Europe’s largest golf retailer, for
example, has figured out how to
data-mine its way into imminent
retirees wallets, before they
actually retire.
THE SOURCE OF AG’s BIG DATA
AG assigns every customer a Guest ID
number, tied to their credit card,
name, or email address that becomes
a bucket that stores a history of
everything they've bought and any
demographic information AG has
collected from them or bought from
other sources. Using that, their
analyst looked at historical buying
data for all the men who had signed
up their registries in the past.
THE BIG BIG DATA CONCLUSION
Analyst ran test after test, analyzing the
data, and before long some useful
patterns emerged. Gloves, for example.
Lots of men buy golf gloves, but one of
Pole’s colleagues noticed that men on the
golf registry were buying smaller golf
peripheral especially golf gloves in the six
months leading up to their retirement.
Another analyst noted that in this 6
month window the frequency of visits to
stores increased.
28. AGE FRIENDLY ECONOMY | FUTURE OPPORTUNITIES FOR SMES
STUDY CASE:
How AG used the Power of Big Data
THE REACTION
So AG started sending coupons for
golfing kits, especially higher ticket
items such as Drivers and full sets.
Duhigg shares an anecdote -- so
good that it sounds made up -- that
conveys how eerily accurate the
targeting is. An angry man went into
a store outside of Milton Keynes,
demanding to talk to a manager.
THE CHALLENGE
What AG discovered fairly quickly
is that it creeped people out that
the company knew about their
retirement in advance.
THE SOLUTION
AG got sneakier about sending the coupons.
The company can create personalized booklets;
instead of sending men close to retirement
coupons solely for male golf high ticket items,
they more subtly spread them about: They
found out that as long as a man thinks he hasn’t
been spied on, he’ll use the coupons. He just
assumes that everyone else on his street got
the same mailer.