Public and private organizations in all sectors are using their data to give them insight about their companies, as well as a competitive advantage. This session explores some of the key areas that organizations need to be considering in developing a Big Data management strategy: 1) Why are we collecting Big Data? 2) How can we mine our Big Data; 3) What measures are needed to govern Big Data? ; 4) How do we manage sensitive information and ensure compliance with relevant legislation?; and 5) How do we manage the balance the value, risks, and costs of Big Data?
3. 2016 Ontario Connections.
Big data is high-volume, high-velocity and
high-variety information assets that demand
cost-effective, innovative forms of information
processing for enhanced insight and decision
making. http://www.gartner.com/it-glossary/big-data/
Big data is a term that describes large volumes
of high velocity, complex, and variable data
that require advanced techniques and
technologies to enable the capture, storage,
distribution, management, and analysis of the
information.
http://www.techamerica.org/Docs/fileManager.cfm?f=techamerica-
bigdatareport-final.pdf
Defining
Big Data
4. 2016 Ontario Connections.
âInsights from Big Data can enable you to
make better decisions. They can help you
facilitate growth and organizational
transformation, reduce costs and manage
volatility and risk. This enables you to
capitalize on new sources of revenue and
generate more value for your organization.â
Financial Accounting Advisory Services (n.d.). Big data strategy to support the CFO and
governance agenda
The
value of
Big Data
7. 2016 Ontario Connections.
Big Data tends to be measured in terms of
terabytes and petabytes (1024 terabytes).
Definitions of âbigâ are relative, and fluctuate,
especially as storage capacities increase over
time.
Data is generated by every computerized
system in the organization, including human
resources solutions, supply-chain management
software, and social media tools for marketing.
Volume
8. 2016 Ontario Connections.
Google indexes 20 billion pages per day.
Twitter has more than 500 million users and 400
million tweets per day.
Facebook generates 2.7 million
âLikesâ, 500 TB processed, and 300 million photos
that are uploaded per day.
http://bit.ly/1SVxPwp; http://bit.ly/1SVy76j; http://bloom.bg/1SVyldK
Examples
of volume
10. 2016 Ontario Connections.
Organizations generate various types of structured,
semi-structured, and unstructured data.
Structured data is the tabular type found in
spreadsheets or relational databases (about 10% of
most data).
Text, images, audio, and video are examples of
unstructured data, which sometimes lacks the
structural organization required by machines for
analysis
Variety
12. 2016 Ontario Connections.
Velocity refers to the rate at which data is
generated and the speed at which it should be
analyzed and acted upon.
The proliferation of digital devices such as
smartphones has led to an unprecedented rate
of data creation and is driving a growing need
for real-time analytics and evidence-based
planning
Velocity
14. 2016 Ontario Connections.
Some data is inherently unreliable; for
example, customer comments in social media,
as they entail judgment.
We need to deal with imprecise and uncertain
data. Is the data that is being stored, and
mined meaningful to the problem being
analyzed?
Veracity
15. 2016 Ontario Connections.
Big Data is often characterized by relatively
âlow value densityâ. That is, the data received
in the original form usually has a low value
relative to its volume. However, a high value
can be obtained by analyzing large volumes of
such data.
Value
16. 2016 Ontario Connections.
Value is any application of big data
that:
⢠Drives revenue increases (e.g. customer
loyalty analytics)
⢠Identifies new revenue opportunities,
improves quality and customer satisfaction
(e.g., Predictive Maintenance),
⢠Saves costs (e.g., fraud analytics)
⢠Drives better outcomes (e.g., patient care).
Value
19. 2016 Ontario Connections.
Blogs, tweets, social networking sites (such as
LinkedIn and Facebook), blogs, news feeds,
discussion boards, and video sites all fall under
Big Data.
Social
media
20. 2016 Ontario Connections.
Machine-generated data constitutes a wide variety
of devices, from RFIDs to sensors, such as optical,
acoustic, seismic, thermal, chemical, scientific, and
medical devices, and even the weather.
Machine-
generated
data
21. 2016 Ontario Connections.
From the GPS systems in our cars, in planes, and ships, to
GPS apps on smartphones, we use GPS to guide our
movements.
GPS is used to track our movements, such as emergency
beacons, and retailers who use in-store WiFi networks to
access shoppersâ smartphones and track their shopping
habits.
Location Based Services (LBS) allow us to deliver services
based on the location of moving objects such as cars or
people with mobile phones.
GPS
and
spatial
data
23. 2016 Ontario Connections.
It is generally thought that the true value of Big Data is
seen only when it is used to drive decision making.
You need efficient processes to turn high volumes of
fast-moving and varied data into meaningful insights.
As information managers, you might not be doing the
analysis, but you have a crucial role to play in
managing this data to enable this analysis.
Big Data
analytics:
How do
we mine
our data?
24. 2016 Ontario Connections.
Text analytics extract information from textual
data.
⢠Social network feeds, emails, blogs, online forums, survey
responses, corporate documents, news, and call centre
logs are examples of textual data held by organizations.
Text analytics enable organizations to convert
large volumes of human generated text into
meaningful summaries, which support
evidence-based decision-making.
Text
analytics
25. 2016 Ontario Connections.
Audio analytics analyze and extract information
from unstructured audio data. Customer call
centres and healthcare are the primary
application areas of audio analytics.
⢠Call centres use audio analytics for efficient analysis
of recorded calls to improve customer experience,
evaluate agent performance, and so forth.
⢠In healthcare, audio analytics support diagnosis and
treatment of certain medical conditions that affect the
patientâs communication patterns
(e.g.,schizophrenia), or analyze an infantâs cries to
learn about the infantâs health and emotional status.
Audio
analytics
26. 2016 Ontario Connections.
Video analytics involves a variety of techniques to
monitor, analyze, and extract meaningful information
from video streams.
The increasing prevalence of closed-circuit television
(CCTV) cameras and of video-sharing websites are
the two leading contributors to the growth of
computerized video analysis. A key challenge,
however, is the sheer size of video data.
Video
analytics
27. 2016 Ontario Connections.
Social media analytics refer to the analysis of
structured and unstructured data from social
media channels.
⢠Social networks (e.g., Facebookand LinkedIn)
⢠Blogs (e.g., Blogger and WordPress)
⢠Microblogs (e.g.,Twitter and Tumblr)
⢠Social news (e.g., Digg and Reddit)
⢠Socia bookmarking (e.g., Delicious and StumbleUpon)
⢠Media sharing (e.g., Instagram and YouTube)
⢠Wikis (e.g., Wikipedia and Wikihow)
⢠Question-and-answer sites (e.g., Yahoo! Answers and
Ask.com)
⢠Review sites (e.g., Yelp, TripAdvisor)
Social
media
analytics
28. 2016 Ontario Connections.
Predictive analytics comprise a variety of
techniques that predict future outcomes based
on historical and current data, e.g., predicting
customersâ travel plans based on what they
buy, when they buy, and even what they say on
social media.
Predictive
analytics
30. 2016 Ontario Connections.
Canadian federal institutions reported 256 data breaches
in 2014-2015, up from 228 the year before. The main
culprit was identified as the use portable storage
devices:
⢠More than two-thirds of the agencies had not formally
assessed the risks surrounding the use of all types of
portable storage devices;
⢠More than 90 per cent did not track all devices
throughout their life cycle;
⢠One-quarter did not enforce the use of encrypted
storage devices.
http://bit.ly/27Say7c
Security
concerns
31. 2016 Ontario Connections.
⢠More data translates = higher risk of exposure in the event of a
breach.
⢠More experimental usage = the organization's governance and
security protocol is less likely to be in place
⢠New types of data are uncovering new privacy implications, with
few privacy laws or guidelines to protect that information (e.g.,
cell phone beacons that broadcast physical location, & health
devices such as medical, fitness and lifestyle trackers).
⢠Data linkage and combined sensitive data. The act of combining
multiple data sources can create unanticipated sensitive data
exposure.
Considerations
for Big Data
32. 2016 Ontario Connections.
âThe protection of information and
information systems from unauthorized
access, use, disclosure, disruption,
modification, or destruction in order to
provide confidentiality, integrity, and
availability.â National Institutes of Standards and Technology
Information
security:
Definition
33. 2016 Ontario Connections.
âThe claim of individuals, groups
or institutions to determine for
themselves when, how and to
what extent, information about
them is communicated to others.â
International Association of Privacy Professionals
Data
privacy:
Definition
34. 2016 Ontario Connections.
Under the federal Personal Information Protection and
Electronic Documents Act (PIPEDA), âpersonal
informationâ is âinformation about an identifiable
individual, but does not include the name, title or
business address or telephone number of an
employee of an organization.â
Regulatory
framework
for big
data
35. 2016 Ontario Connections.
The protection of personal information in
Canada rests on three fundamental goals:.
⢠Transparency â providing people with a basic understanding of
how their personal information will be used in order to gain
informed consent
⢠Limiting use plus consent â the use of that information only for
the declared purpose for which it was initially collected, or
purposes consistent with that use; and,
⢠Minimization â limiting the personal information collected to what
is directly relevant and necessary to accomplish the declared
purpose and the discarding of the data once the original purpose
has been served.
PIPEDA
and Big
Data
36. 2016 Ontario Connections.
Organizations that attempt to implement Big Data
initiatives without a strong governance regime in place,
risk placing themselves in ethical dilemmas without set
processes or guidelines to follow.
A strong ethical code, along with process, training,
people, and metrics, is imperative to govern what
organizations can do within a Big Data program.
Big Data
governance
37. 2016 Ontario Connections.
Data used for Big Data analytics can be gathered
combined from different sources, and create new data
sets.
Organizations must make sure that all security and
privacy requirements that are applied to their original
data sets are tracked and maintained across Big Data
processes throughout the information life cycle, from
data collection to disclosure or retention/destruction.
Respecting
the original
intent of the
information
gathered
38. 2016 Ontario Connections.
Data that has been processed, enhanced, or changed
by Big Data should be anonymized to protect the
privacy of the original data source, such as customers
or vendors.
Data that is not properly anonymized prior to external
release (or in some cases, internal as well) may result
in the compromise of data privacy, as the data is
combined with previously collected, complex data
sets.
Re-
Identification
39. 2016 Ontario Connections.
Matching data sets from third parties may provide
valuable insights that could not be obtained with
your data alone.
You need to consider and evaluate the adequacy of
the security and privacy data protections in place at
the third-party organizations.
Third-
party
use
40. 2016 Ontario Connections.
Big dataâs potential for predictive analysis raises
particular concerns for data security and privacy.
⢠Think of the famous case of Target, which sent
coupons to a teenage girl, based upon her
shopping preferences, which suggested she
was pregnant, as well as her due date (Target
was accurate). The girlâs family found out
about her pregnancy through these coupons.
⢠Did the girl know that her shopping information
would be used for this purpose?
⢠Was she informed of Targetâs privacy policy?
The risks of
predictive
analytics
41. 2016 Ontario Connections.
There are growing concerns that Big Data is
straining the privacy principles of identifying
purposes and limited use.
Consumers are called upon to agree to privacy
policies and consent forms that no one has the
time to read. The burden is increasingly placed
on the consumers, as these policies take the
form of disclaimers for the orgnizations.
Increasing
burden on
the
consumer
42. 2016 Ontario Connections.
âJust because commercial
organizations can collect
personal information and run it
through the revealing algorithms
of predictive analytics, doesnât
mean that they should.â
Jennifer Stoddard
Can we
vs.
should
we?
43. 2016 Ontario Connections.
A useful tool is the Privacy Maturity Model
designed by the American Institute of Certified
Public Accountants (AICPA) or the Canadian
Institute of Chartered Accountants (CICA).
These sections are particularly relevant:
⢠1.2.3: Personal information identification and classification
⢠1.2.4: Risk assessment
⢠1.2.6: Infrastructure and systems management
⢠3.2.2: Consent for new purposes and uses
⢠4.2.4: Information developed about individuals
⢠8.2.1: Information security program.
http://bit.ly/1R3VcQZ
Privacy
assessment
46. 2016 Ontario Connections.
Strong data governance policies
and procedures are important:
⢠Who owns the data?
⢠Who is responsible for protecting the
data?
⢠How is data collected?
⢠What data is collected?
⢠How is the data retained?
Handling
&
retaining
data
47. 2016 Ontario Connections.
What security & privacy regulations apply to your
data?
What are the compliance provisions of your
agreements with any third parties or service providers.
What are their privacy and security policies?
Developing a solid compliance framework with a risk-
based map for implementation and maintenance.
Compliance
48. 2016 Ontario Connections.
Develop case scenarios where you would use Big
Data.
Identify what data will be used and how.
Identify possible risks
In this way, you are prepared for when you actually
use the Big Data, rather than be in a position to react
if something goes wrong.
Data
use
cases
49. 2016 Ontario Connections.
Tell your customers what personal data you
collect and how you use it.
Provide consistent consent mechanisms
across all products
Ensure that customers have the means to
withdraw their consent at the individual device
level.
Manage
consent
50. 2016 Ontario Connections.
Have rigorous controls over who has access to
the data.
Have periodic review of who has access rights,
and ensure that rights are removed
immediately, as and when required.
Access
management
51. 2016 Ontario Connections.
Remove all Personally Identifiable
Information (PII) from a data set and turn it into non-
identifying data.
Monitor anonymization requirements and analyze
the risks of re-identification.
Anonymization
52. 2016 Ontario Connections.
Maintain your responsibility to your customers
when you share data with third parties.
Include specific Big Data provisions within
contractual agreements.
Monitor third parties for compliance with data-
sharing agreements.
Data
sharing
56. 2016 Ontario Connections.
Database of 191 million U.S. voters exposed
on Internet
⢠An independent computer security researcher uncovered a database of
information on 191 million voters that is exposed on the open Internet.
The database includes names, addresses, birth dates, party affiliations,
phone numbers and emails of voters in all 50 U.S. states.
⢠A representative with the U.S. Federal Elections Commission, which
regulates campaign financing, said the agency does not have
jurisdiction over protecting voter records.
⢠Regulations on protecting voter data vary from state to state, with many
states imposing no restrictions. California, for example, requires that
voter data be used for political purposes only and not be available to
persons outside of the United States.
Government
breach
57. 2016 Ontario Connections.
Anthem
⢠Health insurer Anthemâs database was hacked
into. The personal information of 78.8 million
people was potentially stolen.
⢠The data breach extended into multiple brands
Anthem, Inc. uses to market its healthcare plans,
including, Anthem Blue Cross, Anthem Blue Cross
and Blue Shield, Blue Cross and Blue Shield of
Georgia, Empire Blue Cross and Blue Shield,
Amerigroup, Caremore, and UniCare.
Corporate
breach