SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Downloaden Sie, um offline zu lesen
Emerging Data Quality Trends for
Governing and Analyzing Big Data
August 1, 2019
Harald Smith
Speaker
Harald Smith
• Director of Product Marketing, Syncsort
• 20+ years in Information Management with a focus on
data quality, integration, and governance
• Co-author of Patterns of Information Management
• Author of two Redbooks on Information Governance
and Data Integration
• Blog author: “Data Democratized”
Agenda
• Ongoing Data Challenges
• Four Emerging Data Quality Trends
• Approaches to addressing Data Quality needs
• Questions
Why is Data Quality
so important?
Data: the fuel of the future
Data is to this century, what oil was to the last one: a driver of
growth and change.
The Economist: Fuel of the future - Data is giving rise to a new economy: 6th May 2017
Flows of data have created new infrastructures, new businesses,
new monopolies, new politics and crucially new economics.
Digital information is unlike any previous resource: it is extracted,
refined, valued, bought and sold in different ways.
It changes the rules for markets and it demands new approaches
from regulators.
Many a battle will be fought over who should own, and benefit
from, data.
5 Emerging Data Quality Trends
Analysis
Segmentation
Data compliance Access Scheduling All reports!
Competitor
analysis
Sales reports
Single Customer /
360 View
Data regulation Security Workloads Aggregations HR / recruitment
Dashboards CRM Content Governance
Capacity
Management
Performance
planning
Forecasting &
modelling
Overall business
strategy!
Performance
metrics
Campaign
management
Risk
Optimization &
SLA’s
Route planning Cash flow
Territory
management
ROI Disaster Recovery Inventory
Contingency
planning
UX
Data impacts all areas of the business
Sales Marketing FinanceLegal IT Operations Management
6 Emerging Data Quality Trends
Data Governance & Quality are top of mind
3V’s of Big Data
Volume, variety, and velocity
of data is growing
Ever more Analysis
New tools allowing more
granular data dissection and
segmentation
Dichotomy in Outcomes
Expectations of data is
increasing yet confidence in
data is falling
Governance Requirements
Broader and deeper
compliance & regulation
expectations
trust & confidence
7 Emerging Data Quality Trends
“Get to Know Me”…
• Design and deliver rich, individualized experiences that build customer loyalty
• Increasingly broad spectrum of data sources involved in, and required for,
effectively personalizing customer experiences and targeted marketing offers
What Types of Data?
• Internal sources – often many/overlapping
• 3rd Party data – geospatial, demographics, firmographics
• Suppression data – keeping customer information updated
• New sources – mobile, social media
What Data Challenges?
• Incorporating and managing the expected exponential increase in digital
demographic data
• Tapping into customer technology histories to build and evolve an understanding
of individual customers
Use Case: 360 View of Customer
Internal Data
▪ Customer Master Data
▪ Point-of-Sale Data
▪ Contact Form Data
▪ Loyalty Program Data
▪ ecommerce Data
▪ Customer Service Data
Suppression Data
▪ Change of Address
▪ Mortality
▪ Do Not Call
Third-Party Data
▪ Age
▪ Occupation
▪ Education
▪ Gender
▪ Income
▪ Geospatial/Location
Social Data
▪ Digital demographics
▪ Sentiment
▪ Opinions
▪ Interests
▪ Social handles
8 Emerging Data Quality Trends
Protect Financial Assets and Ensure Compliance
• Flag credit card fraud in real time
• Identify and report on money laundering
What Types of Data?
• Internal sources – often many/overlapping
• Suppression data – keeping customer information updated
• Mobile data – devices, locations
• New sources – social media, 3rd party data, …
What Data Challenges?
• Fraudulent transaction detection requires:
• Huge volumes of customer profile data
• Recent transaction activity with “last known” values
• Device data with geolocation and time-based tagging
• Data used to refine Machine Learning models (e.g., anomaly detection,
implausible behavior analysis) to review new transactions in real time
Use Case: Anti-Fraud/Anti-Money Laundering
Internal Data
▪ Customer Master Data
▪ Point-of-Sale Data
▪ Contact Form Data
▪ Loyalty Program Data
▪ ecommerce Data
▪ Customer Service Data
Mobile Data
▪ Device
▪ Location
▪ Wearables
▪ Mobile wallets
Suppression Data
▪ Change of Address
▪ Mortality
▪ Do Not Call
Social Data
▪ Digital Demographics
▪ Sentiment
▪ Opinions
▪ Interests
▪ Social handles
9 Emerging Data Quality Trends
Only 35%of senior executives have a high
level of trust in the accuracy of
their Big Data Analytics
KPMG 2016 Global CEO Outlook
92% of
executives are concerned about
the negative impact of data and
analytics on corporate
reputation
KPMG 2017 Global CEO Outlook
80%of AI/ML projects are stalling due
to poor data quality
Dimensional Research, 2019
Big Data
Needs
Data Quality
10 Emerging Data Quality Trends
“Societal trust in business is
arguably at an all-time low
and, in a world increasingly
driven by data and technology,
reputations and brands are
ever harder to protect.”
EY “Trust in Data and Why it Matters”, 2017.
The importance of data quality
in the enterprise:
• Decision making – Trust the data
that drives your business
• Customer centricity – Get a single,
complete and accurate view of your
customer for better sales, marketing
and customer service
• Compliance – Know your data, and
ensure its accuracy to meet industry
and government regulations
• Machine learning & AI – High quality
models require training on high
quality, accurate data
Four Emerging
Data Quality Trends
Four Emerging Data Quality Trends
All the traditional DQ issues remain, but now consider:
1. New DQ considerations for new types of data
2. New application considerations (e.g. Machine learning)
3. Processing at scale/meeting SLAs
4. Data Democratization and resource/knowledge constraints
12 Emerging Data Quality Trends
1. New Data, New Measures
Common Data Quality Problems
All the traditional data quality issues
remain, but now at greater scale and
in more places
• Many data records with different layouts
• Inconsistent data formats (number
formatting, measurements, languages,
postal conventions and dates)
• Lack of standardization of the different
fields
• Names spelled differently, partially entered,
or multiple names provided
• Misspellings and keystroke errors
• Data sourced from third parties does not
contain all the necessary fields or is out-of-
date
• Invalid values: codes, reference data, out-of-
range, future dates
Lack of Standardization
14 Emerging Data Quality Trends
Common Data Quality Measurements
What measures can we take advantage of?
• Completeness – Are the relevant fields populated?
• Integrity – Does the data maintain an internal structural
integrity or a relational integrity across sources
• Uniqueness – Are keys or records unique?
• Validity – Does the data have the correct values?
• Code and reference values
• Valid ranges
• Valid value combinations
• Consistency – Is the data at consistent levels of aggregation
or does it have consistent valid values over time?
15 Emerging Data Quality Trends
• Timeliness – Did the data arrive in a time period
that makes it useful or usable?
Example: Call Center Record
Unique ✓
Integrity ✓
Complete ?
Consistent ✓
Timely ✓
Valid ?
Is Duration = 0 important?
Is 01/01/20xx a defaulted date?
And how will this be linked or
connected with my other data?
The file appears complete, but does
it cover all call centers?
16 Emerging Data Quality Trends
Example: Social Media Feed
Unique?
Integrity?
Complete?
Consistent?
Timely?
Valid?
17 Emerging Data Quality Trends
New Data Quality Problems
New data, new data quality challenges
• 3rd Party and external data with unknown provenance or relevance
• Bias in the data – whether in collection, extraction, or other processing
• Data without standardized structure or formatting
• Continuously streaming data
• Disjointed data (e.g. gaps in receipt)
• Consistency and verification of data sources
• Changes and transformation applied to data (i.e. does it really represent the
original input)
18 Emerging Data Quality Trends
“34 percent of bankers in our survey report that their
organization has been the target of adversarial AI at least
once, and 78 percent believe automated systems create new
risks, such as fake data, external data manipulation, and
inherent bias.”
Accenture Banking Technology Vision 2018
What else can we review or measure?
Provenance – Where did the data originate, who gathered it, and what criteria was used to create it?
• E.g. government agency, 3rd party provider, free or paid data
Coverage (Relevance) – How well does the data source meet the defined needs?
• E.g. does it cover the relevant geography? Is it biased (and if so, how)?
Continuity – Data points for all intervals or expected intervals?
• E.g. sensors, weather records, call data records
Triangulation – What Gartner describes as ‘consistency of data across proximate data points’, i.e. consistent measurements from
related points of reference.
• E.g. if temperatures in Chicago and Louisville are 30°and 32°then temperature in Indianapolis for same day is unlikely to be 70°
Transformation from origin – how many layers and/or changes has the data passed through?
• E.g. has the original data source already been merged with two other record sources? And is the result accurate?
Repetition or duplication of data patterns – Data points exactly the same across multiple recording intervals or across multiple
sensors.
• E.g. is there tampering with sensors or call data?
Additional Measures of Data Quality
19 Emerging Data Quality Trends
20 Emerging Data Quality Trends
Example: New Data Quality Measures applied
Triangulated
Continuity
Provenance
Coverage
Usage
Repeated
patterns
Transformation
Jane Doe pulled from
Twitter based on
#Blackberry
All items for #Blackberry in
relevant time interval
appear to be included
Marketing confirms this
data has high value
Good association with
current product & sales
data
All tweets appear
unique within the date
& vs. prior feeds
This needed to include
#BB and #Crackberry as
well!
No changes or merges of
the data were applied
2. Machine Learning & Data Quality
“
”
The magic of machine learning is that you build a
statistical model based on the most valid dataset for
the domain of interest.
If the data is junk, then you’ll be building a junk
model that will not be able to do its job.
James Kobeilus
SiliconANGLE Wikibon
Lead Analyst for Data Science, Deep Learning, App Development
2018
Common Machine Learning Applications
Marketing
• Targeted marketing
• Recommendation engine
• Next best action
• Customer churn prevention
Risk Management
• Anti-money laundering
• Fraud detection
• Cybersecurity
• Know your customer
23 Emerging Data Quality Trends
Data Challenges with Machine Learning
Five Big Challenges of Enabling Machine Learning
1. Scattered and Difficult to Access Datasets
Much of the necessary data is trapped in mainframes or streams in from POS, and ATM machines in incompatible formats, making it difficult to gather
and prepare the data for model training.
2. Data Cleansing at Scale
Data quality cleansing and preparation routines have to be reproduced at scale. Most data quality tools are not designed to work on that scale of data.
3. Entity Resolution and Customer Identification
Distinguishing matches across massive datasets that indicate a single specific entity requires sophisticated multi-field matching algorithms and a lot of
compute power. Essentially everything has to be compared to everything.
4. Need for Near Real-Time Current Data
Tracking and detection needs to happen very rapidly. Current transactions need to be constantly added to combined datasets, prepared and presented
to models as close to real-time as possible.
5. Tracking Lineage from the Source
Data changes made to help train models have to be exactly duplicated in production, in order for models to accurately make predictions on new data,
and for required audit trails. Capture of complete lineage, from source to end point is needed.
24 Emerging Data Quality Trends
Data Quality Challenges with Machine Learning
Incorrect, Incomplete, Mis-Formatted, and Sparse “Dirty Data” – Mistakes
and errors are almost never the patterns you’re looking for in a data set.
Sparse data generates other issues. Correcting and standardizing will tend
to boost the signal, but must account for bias.
Missing context – Many data sources lack context around location or
population segments. Unless enriched with other data sets, (e.g.
geospatial, demographics, or firmographics data), some ML algorithms
will not be usable.
Multiple copies – If your data comes from many sources, as it often does,
it may contain multiple records of information about the same person,
company, product or other entity. Removing duplicates and enhancing the
overall depth and accuracy of knowledge about a single entity can make a
huge difference.
Spurious correlations – Just as missing context may hinder some ML
algorithms, inclusion of already correlated data (e.g. city and postal code)
may result in overfitting of ML algorithms.
Correcting data problems vastly increases a data set’s usefulness for machine learning.
However, traditional data quality software is
designed to work on smaller data sets.
And data analysts may not be aware of
specific data quality issues that must be
addressed to support machine learning.
Traditional data quality processes are an
effective method to remove defects.
25 Emerging Data Quality Trends
Example: Missing segments of populations
Event: Hurricane Sandy
20 million tweets
• Majority of tweets from Manhattan not the hard
hit areas such as Seaside Heights and Midland
Beach due to power outages and diminishing
cell phone batteries
• Despite the millions of Spanish-speakers
affected, very few Spanish-language tweets
collected
• Assess % across and against all likely
locations
• Seek out disconfirming information
Data: Boston Potholes
Street Bump App
• Draws on accelerometer and GPS data to help
passively detect potholes
• Lower income groups in the US are less likely to
have smartphones, particularly older residents -
penetration as low as 16%
• Result is underreporting of road problems in
more elderly communities
• Assess % across all likely locations
• Add other sources
• Utilize demographics for evaluations
26 Emerging Data Quality Trends
Example: Noise, or Inserted content
“Bots are just a tool for making the
numbers look how you want them
to look.”
Sam Woolley
Researcher, Oxford University’s
Project on Computational
Propaganda
Wired: Nov 8, 2016
“The Political Twitter Bots Will Rage This Election Day”
Event: Election
Bot tweets
• ~400,000 bots tweeting on the election
• ~20% of all election-related tweets came from an army of influential
bots
• 55-80% of Twitter activity—the likes, follows, and retweets —are
from bots
• It had been easier to identify earlier bots, but now it’s incredibly
difficult for a human to make a determination
• Evaluate patterns
• Is there any real sentiment here?
• How much repetitive content is there?
• How much “influence” comes from a single or a
few sources (negative or positive)?
• Will it skew the analysis?
27 Emerging Data Quality Trends
Example: Simple bias
“The “black sheep problem” is that if you
were to try to guess what color most sheep
were by looking [at] language data, it would
be very difficult for you to conclude that
they weren't almost all black. In English,
“black sheep” outnumbers “white sheep”
about 25:1 (many "black sheeps” are movie
references); in French it's 3:1; in German it's
12:1. Some languages get it right; in Korean
it's 1:1.5 in favor of white sheep…”
Hal DaumĂŠ
Associate Professor, University of Maryland
Blog: June 24, 2016
“Language bias and black sheep”
http://nlpers.blogspot.com/2016/06/language-bias-
and-black-sheep.html
Data: Google Word2Vec data set
Word2vec
• Converts words into a vector space for analysis
• “Numerous researchers have begun to use the data to better understand
everything from machine translation to intelligent Web searching.”
• Embeddings based on a group of 300 million words taken from Google News
• Researchers from Boston University and Microsoft have found it is
“blatantly sexist”
• Impacts the ability to create personalized services
• Evaluate % of words & associations
• How do I interpret a sentiment?
• Does this data set contain hidden and
unexpressed bias?
• Will I miss opportunities because of hidden
assumptions?
28 Emerging Data Quality Trends
3. Data Quality at Scale
Challenges To Ensuring
Data Quality
Many sources of data (70%) and volume of data (48%)
are among the top 3 challenges companies face when
ensuring high quality data.
Applying governance processes to manage and measure
data quality is second with 50%.
* Syncsort, 2019 Enterprise Data Quality survey
70%
50%
48%
47%
46%
43%
32%
27%
27%
25%
15%
Many sources of data
Applying governance processes
to manage and measure data…
Volume of data
Inconsistent formats of data
Inconsistent definitions of data
Missing information
Connecting policies and rules to
data
Misfielded data
Lack of skills/staff
Lack of tools (or inadequate
tools)
Not seen as an organizational
priority
What are the greatest challenges you face
when ensuring high data quality?
30 Emerging Data Quality Trends
Processing at Scale
New Data Quality considerations
• Handling data volumes and distributed data
• Profiling data – assessing high volumes and streaming data
• Standardizing and enriching data content
• Matching entities – not just master data – e.g. transactions for fraud detection
• Meeting Service Level Agreements (SLA’s)
• Running consistently on new and regularly changing platforms (Hadoop,
Spark, Cloud)
31 Emerging Data Quality Trends
Big Data at scale distributes data across many nodes –
not necessarily with other relevant data!
• Data Quality functions must be performed in a consistent
manner, no matter where actual processing takes place, how
the data is segmented, and what the data volume is
• Cleansing, standardization, and data validation will generally scale
linearly
• Data Enrichment: Reference data, lookups must be readily
accessible by any process wherever executed
Handling distributed data volumes
Source: HP Analyst Briefing
32 Emerging Data Quality Trends
• But particular implications for profiling, joining, sorting, and
matching data
• Profiling: Identification of outliers necessitates full volume views
and need to aggregate statistics and frequencies of data
distributed across cluster
• Joins & sorts: Efficient shuffling of data stored across cluster is
critical
• Entity Resolution: Distinguishing matches that indicate a single
specific entity across so much data requires multiple passes with
sophisticated multi-field matching algorithms – with results that
are understandable by business users in order to be meaningful
Handling distributed data volumes
33 Emerging Data Quality Trends
Anti-Money
Laundering on
Hadoop at
Global Bank
• Must provide cluster-native
data verification,
enrichment, and demanding
multi-field fuzzy matching for
entity resolution to Golden
Record
• Massive data volumes
• Scattered data – Mainframe,
RDBMS, Cloud, …
• Must be secure – Kerberos,
LDAP
• Must have lineage – data
origin to end point
• Must archive unaltered
mainframe data
Full Anti-Money Laundering
regulatory compliance with
financial crimes data lake –
high performance
results at massive scale.
• Full end-to-end data lineage
supplied to Apache Atlas and ASG
Data Intelligence
• Cluster-native data verification,
enrichment, and demanding
multi-field entity resolution on
Spark
• Unmodified mainframe “Golden
Records” stored on Hadoop
Bank must monitor transactions to
detect Money Laundering for FCA
compliance.
Leverage Machine learning at scale
to detect patterns, but …
Requires large amounts of current,
clean data.
34 Emerging Data Quality Trends
4. Data Literacy / Democratization
Data Democratization
Data Quality is a key component to user empowerment
• Data Literacy - critical to understand:
• Business context and language
• Data (including data structures and data types)
• Data access (how and where to find)
• Data usage (how will the data be used by the business)
• Basic Statistics
• Data Quality dimensions
• Data Quality techniques and tools
• Resource constraints – in both Data Quality and technologies
• What questions to ask?
• Where to find answers?
36 Emerging Data Quality Trends
Approaches to Addressing
Emerging Data Quality Trends
Approaches
Data Literacy / Data Governance
• Communicating Best Practices in Data Quality for everyone
38 Emerging Data Quality Trends
“Universal” Data Quality Best Practices
• Establish Scope: ask core questions
• Identifying data requirements
• Address bias
• Understand context
• Address and resolve data quality issues
• Apply data governance processes
Solving “Big Data” Data Quality Challenges
• Handle scale
• Ensure consistent data quality application
across platforms
Culture of Data Literacy
• “Democratization of Data” requires cultural support
• Empowered to ask questions about the data
• Trained to understand and use data
• Trained to understand approaching and evaluating data quality
• Traditional data, new data, machine learning requirements, …
• Understand the business context of the data
Program of Data Governance
• Provide the processes and practices necessary for success
• Measure, monitor, and improve
• Continuous iteration and development
Center of Excellence/Knowledge Base
• Where do you go to find answers?
• Who can help show you how?
Communicate!
39 Emerging Data Quality Trends
Data Literacy: challenges & best practices
• Lack of Common Terminology
• Organizational Barriers & Silos
• Isolated or Unknown Work
• Lack of Engagement
Establish a Common Language
• Define terminology – a ‘stake in the ground’
• Map information
• Support with policies/standards
Gain Broader Buy In
• Bring stakeholders together
• Build the structure, culture,
ownership, steering groups,
stewardship over time
Enrich Information
• Discover what you don’t know
• Resolve differences
• Enhance/annotate to increase insight
Share Insights Regularly
• Produce and share tangible outcomes
• Highlight ‘wins’
• Demonstrate efficiencies & savings
Copyright Š Syncsort 2019
“If you don’t know what you want to
get out of the data, how can you
know what data you need – and
what insight you’re looking for?”
Wolf Ruzicka
Chairman of the Board at EastBanc
Technologies
Blog post: June 1, 2017
“Grow A Data Tree Out Of The “Big Data” Swamp”
Establish Scope
• Understand the business objective and problem
• Asking the “right questions” about your data (not just “what”
and “how”)
• “Empowering users (“Who”) to gain new clarity into the core
problem (“Why”)
• “High-quality data” definition will vary by business problem
Identify Requirements & Processes
• Do you have all the data required?
• Do you understand the characteristics and context of the data?
• How will data be matched, consolidated, or connected?
• What’s needed to facilitate the matching, consolidation, or
connection required?
• Have you evaluated the sources?
• What’s the Fitness for your Purpose?
Universal Data Quality best practices
41 Emerging Data Quality Trends
Understand Context
• What are the Critical Data Elements?
• What qualities do we need to address, or leave alone?
• When, and where, do we need to transform or enrich the data
content?
• How are we connecting, relating, or combining data?
Develop, Test, and Deploy Corrective Measures
• Consistent application of standardization, transformation,
enrichment, and entity resolution
• Common templates, rules, metrics, and processes that can be
leveraged
• Deploy into batch, real-time, or embedded services
Apply Data Governance
• Deploy and implement metrics and measures for ongoing
assessment and evaluation
Universal Data Quality best practices
“Never lead with a data set;
lead with a question.”
Anthony Scriffignano
Chief Data Scientist, Dun & Bradstreet
Forbes Insights, May 31, 2017
“The Data Differentiator”
42 Emerging Data Quality Trends
Quantify: challenges & best practices
• Hidden Activities
• Money, Time and Resource
Waste
• Lack of Transparency and Trust
• Disconnect Between Process
and Measures
Identify Baseline Measures
• Keep a focus on lean and agile
• Define value accurately for the business
Link to Business Performance
• Create and refine streams of value
• Transform culture through action
and empowerment
Monitor, Report and Remediate Issues
• Continuously review
• Ensure issues are visible and understood
• Understand root causes
• Address/resolve issues
Quantify Impact of Changes
• Demonstrate through clearly understood measures
• Establish value continuously
• Finish early, finish often
Copyright Š Syncsort 2019
Leverage tools built for Big Data
• Focus on the data quality challenges, not the Big Data ones
• Connect to and process hundreds of millions of records of data
• Standardize, enhance, and match international data sets with postal and
country-code validation
• Integrate, enrich, and match new and legacy customer data from multiple
disparate sources
• Deploy data quality workflows as native, parallel MapReduce or Spark
processes for optimal efficiency on premises or in the Cloud
• Increase processing efficiency by expanding cluster, not rebuilding
processes
• Support failover through fault-tolerant designs; during a node failure,
processing is redirected to another node
44 Emerging Data Quality Trends
Simplify: Design Once, Deploy Anywhere
Intelligent Execution - Insulate your organization from underlying complexities of Big Data
Get excellent performance every time
without tuning, load balancing, etc.
Avoid re-design, re-compile, re-work
• Future-proof job designs for emerging compute
frameworks
• Move from dev to test to production
• Move from on-premises to Cloud
• Move from one Cloud to another
Use existing Data Quality skills
• Focus on data quality problems, not technical ones
Design Once
in visual GUI
Deploy Anywhere!
On-Premises,
Cloud
MapReduce, Spark,
Future Platforms
Windows, Linux,
Unix
Batch,
Streaming
Single Node,
Cluster
Emerging Data Quality Trends45
Data Quality remains Data Quality, even at scale
“Data and analytics leaders need to understand the
business priorities and challenges of their organization.
Only then will they be in the right position to create
compelling business cases that connect data quality
improvement with key business priorities.”
Ted Friedman
VP Distinguished Analyst, Gartner
Smarter with Gartner at Gartner.com: June 12, 2018
“How to Create a Business Case for Data Quality Improvement”
“Never lead with a data set;
lead with a question.”
Anthony Scriffignano
Chief Data Scientist, Dun & Bradstreet
Forbes Insights, May 31, 2017
“The Data Differentiator”
46 Emerging Data Quality Trends
Q&A
harald.smith@syncsort.com

Weitere ähnliche Inhalte

Was ist angesagt?

RWDG Slides: The Future of Data Governance – IoT, AI, IG, and Cloud
RWDG Slides: The Future of Data Governance – IoT, AI, IG, and CloudRWDG Slides: The Future of Data Governance – IoT, AI, IG, and Cloud
RWDG Slides: The Future of Data Governance – IoT, AI, IG, and CloudDATAVERSITY
 
The Chief Data Officer Agenda: Metrics for Information and Data Management
The Chief Data Officer Agenda: Metrics for Information and Data ManagementThe Chief Data Officer Agenda: Metrics for Information and Data Management
The Chief Data Officer Agenda: Metrics for Information and Data ManagementDATAVERSITY
 
Data Quality Strategies
Data Quality StrategiesData Quality Strategies
Data Quality StrategiesDATAVERSITY
 
Data-Ed Webinar: Data Governance Strategies
Data-Ed Webinar: Data Governance StrategiesData-Ed Webinar: Data Governance Strategies
Data-Ed Webinar: Data Governance StrategiesDATAVERSITY
 
RWDG Slides: Build an Effective Data Governance Framework
RWDG Slides: Build an Effective Data Governance FrameworkRWDG Slides: Build an Effective Data Governance Framework
RWDG Slides: Build an Effective Data Governance FrameworkDATAVERSITY
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best PracticesDATAVERSITY
 
Why Your Data Management Strategy Isn't Working (and How to Fix It)
Why Your Data Management Strategy Isn't Working (and How to Fix It)Why Your Data Management Strategy Isn't Working (and How to Fix It)
Why Your Data Management Strategy Isn't Working (and How to Fix It)DATAVERSITY
 
RWDG Slides: The Stewardship Approach to Data Governance
RWDG Slides: The Stewardship Approach to Data GovernanceRWDG Slides: The Stewardship Approach to Data Governance
RWDG Slides: The Stewardship Approach to Data GovernanceDATAVERSITY
 
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?DATAVERSITY
 
The Five Pillars of Data Governance 2.0 Success
The Five Pillars of Data Governance 2.0 SuccessThe Five Pillars of Data Governance 2.0 Success
The Five Pillars of Data Governance 2.0 SuccessDATAVERSITY
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata ManagementDATAVERSITY
 
RWDG Slides: Non-Invasive Metadata Governance
RWDG Slides: Non-Invasive Metadata GovernanceRWDG Slides: Non-Invasive Metadata Governance
RWDG Slides: Non-Invasive Metadata GovernanceDATAVERSITY
 
Lead Your Data Revolution - How to Build a Foundation of Trust and Data Gover...
Lead Your Data Revolution - How to Build a Foundation of Trust and Data Gover...Lead Your Data Revolution - How to Build a Foundation of Trust and Data Gover...
Lead Your Data Revolution - How to Build a Foundation of Trust and Data Gover...DATAVERSITY
 
DAS Slides: Data Governance and Data Architecture – Alignment and Synergies
DAS Slides: Data Governance and Data Architecture – Alignment and SynergiesDAS Slides: Data Governance and Data Architecture – Alignment and Synergies
DAS Slides: Data Governance and Data Architecture – Alignment and SynergiesDATAVERSITY
 
Webinar: Maximizing Your Potential with Data Leadership
Webinar: Maximizing Your Potential with Data LeadershipWebinar: Maximizing Your Potential with Data Leadership
Webinar: Maximizing Your Potential with Data LeadershipDATAVERSITY
 
Data Governance Strategies - With Great Power Comes Great Accountability
Data Governance Strategies - With Great Power Comes Great AccountabilityData Governance Strategies - With Great Power Comes Great Accountability
Data Governance Strategies - With Great Power Comes Great AccountabilityDATAVERSITY
 
Seiner dataversity-rwdg2017-05-operating modelofdatagovernanceroles-20170518f...
Seiner dataversity-rwdg2017-05-operating modelofdatagovernanceroles-20170518f...Seiner dataversity-rwdg2017-05-operating modelofdatagovernanceroles-20170518f...
Seiner dataversity-rwdg2017-05-operating modelofdatagovernanceroles-20170518f...DATAVERSITY
 
Real-World Data Governance Webinar: Big Data Governance - What Is It and Why ...
Real-World Data Governance Webinar: Big Data Governance - What Is It and Why ...Real-World Data Governance Webinar: Big Data Governance - What Is It and Why ...
Real-World Data Governance Webinar: Big Data Governance - What Is It and Why ...DATAVERSITY
 
Successful Data Governance Models and Frameworks
Successful Data Governance Models and FrameworksSuccessful Data Governance Models and Frameworks
Successful Data Governance Models and FrameworksDATAVERSITY
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 

Was ist angesagt? (20)

RWDG Slides: The Future of Data Governance – IoT, AI, IG, and Cloud
RWDG Slides: The Future of Data Governance – IoT, AI, IG, and CloudRWDG Slides: The Future of Data Governance – IoT, AI, IG, and Cloud
RWDG Slides: The Future of Data Governance – IoT, AI, IG, and Cloud
 
The Chief Data Officer Agenda: Metrics for Information and Data Management
The Chief Data Officer Agenda: Metrics for Information and Data ManagementThe Chief Data Officer Agenda: Metrics for Information and Data Management
The Chief Data Officer Agenda: Metrics for Information and Data Management
 
Data Quality Strategies
Data Quality StrategiesData Quality Strategies
Data Quality Strategies
 
Data-Ed Webinar: Data Governance Strategies
Data-Ed Webinar: Data Governance StrategiesData-Ed Webinar: Data Governance Strategies
Data-Ed Webinar: Data Governance Strategies
 
RWDG Slides: Build an Effective Data Governance Framework
RWDG Slides: Build an Effective Data Governance FrameworkRWDG Slides: Build an Effective Data Governance Framework
RWDG Slides: Build an Effective Data Governance Framework
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
 
Why Your Data Management Strategy Isn't Working (and How to Fix It)
Why Your Data Management Strategy Isn't Working (and How to Fix It)Why Your Data Management Strategy Isn't Working (and How to Fix It)
Why Your Data Management Strategy Isn't Working (and How to Fix It)
 
RWDG Slides: The Stewardship Approach to Data Governance
RWDG Slides: The Stewardship Approach to Data GovernanceRWDG Slides: The Stewardship Approach to Data Governance
RWDG Slides: The Stewardship Approach to Data Governance
 
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?
 
The Five Pillars of Data Governance 2.0 Success
The Five Pillars of Data Governance 2.0 SuccessThe Five Pillars of Data Governance 2.0 Success
The Five Pillars of Data Governance 2.0 Success
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata Management
 
RWDG Slides: Non-Invasive Metadata Governance
RWDG Slides: Non-Invasive Metadata GovernanceRWDG Slides: Non-Invasive Metadata Governance
RWDG Slides: Non-Invasive Metadata Governance
 
Lead Your Data Revolution - How to Build a Foundation of Trust and Data Gover...
Lead Your Data Revolution - How to Build a Foundation of Trust and Data Gover...Lead Your Data Revolution - How to Build a Foundation of Trust and Data Gover...
Lead Your Data Revolution - How to Build a Foundation of Trust and Data Gover...
 
DAS Slides: Data Governance and Data Architecture – Alignment and Synergies
DAS Slides: Data Governance and Data Architecture – Alignment and SynergiesDAS Slides: Data Governance and Data Architecture – Alignment and Synergies
DAS Slides: Data Governance and Data Architecture – Alignment and Synergies
 
Webinar: Maximizing Your Potential with Data Leadership
Webinar: Maximizing Your Potential with Data LeadershipWebinar: Maximizing Your Potential with Data Leadership
Webinar: Maximizing Your Potential with Data Leadership
 
Data Governance Strategies - With Great Power Comes Great Accountability
Data Governance Strategies - With Great Power Comes Great AccountabilityData Governance Strategies - With Great Power Comes Great Accountability
Data Governance Strategies - With Great Power Comes Great Accountability
 
Seiner dataversity-rwdg2017-05-operating modelofdatagovernanceroles-20170518f...
Seiner dataversity-rwdg2017-05-operating modelofdatagovernanceroles-20170518f...Seiner dataversity-rwdg2017-05-operating modelofdatagovernanceroles-20170518f...
Seiner dataversity-rwdg2017-05-operating modelofdatagovernanceroles-20170518f...
 
Real-World Data Governance Webinar: Big Data Governance - What Is It and Why ...
Real-World Data Governance Webinar: Big Data Governance - What Is It and Why ...Real-World Data Governance Webinar: Big Data Governance - What Is It and Why ...
Real-World Data Governance Webinar: Big Data Governance - What Is It and Why ...
 
Successful Data Governance Models and Frameworks
Successful Data Governance Models and FrameworksSuccessful Data Governance Models and Frameworks
Successful Data Governance Models and Frameworks
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 

Ähnlich wie Emerging Data Quality Trends for Governing and Analyzing Big Data

20140826 I&T Webinar_The Proliferation of Data - Finding Meaning Amidst the N...
20140826 I&T Webinar_The Proliferation of Data - Finding Meaning Amidst the N...20140826 I&T Webinar_The Proliferation of Data - Finding Meaning Amidst the N...
20140826 I&T Webinar_The Proliferation of Data - Finding Meaning Amidst the N...Steven Callahan
 
Applying Data Quality Best Practices at Big Data Scale
Applying Data Quality Best Practices at Big Data ScaleApplying Data Quality Best Practices at Big Data Scale
Applying Data Quality Best Practices at Big Data ScalePrecisely
 
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackYour AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackPrecisely
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityPrecisely
 
Network Conference LMS Big Data Final 1.24.14
Network Conference LMS Big Data Final 1.24.14Network Conference LMS Big Data Final 1.24.14
Network Conference LMS Big Data Final 1.24.14LMSmith361
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data assetBala Iyer
 
Accelerating Personalization to Cut Through Digital Noise
Accelerating Personalization to Cut Through Digital NoiseAccelerating Personalization to Cut Through Digital Noise
Accelerating Personalization to Cut Through Digital NoisePrecisely
 
How to get started in extracting business value from big data 1 of 2 oct 2013
How to get started in extracting business value from big data 1 of 2 oct 2013How to get started in extracting business value from big data 1 of 2 oct 2013
How to get started in extracting business value from big data 1 of 2 oct 2013Jaime Nistal
 
Fate of the Chief Data Officer
Fate of the Chief Data OfficerFate of the Chief Data Officer
Fate of the Chief Data OfficerTamarah Usher
 
Big data initiative justification and prioritization framework
Big data initiative justification and prioritization frameworkBig data initiative justification and prioritization framework
Big data initiative justification and prioritization frameworkNeerajsabhnani
 
BBDO Connect Big Data
BBDO Connect Big DataBBDO Connect Big Data
BBDO Connect Big DataBBDO Belgium
 
Data Integrity Trends
Data Integrity TrendsData Integrity Trends
Data Integrity TrendsPrecisely
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesDATAVERSITY
 
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your DataFoundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your DataPrecisely
 
Big data
Big dataBig data
Big dataRiya
 
Information Governance: Reducing Costs and Increasing Customer Satisfaction
Information Governance: Reducing Costs and Increasing Customer SatisfactionInformation Governance: Reducing Costs and Increasing Customer Satisfaction
Information Governance: Reducing Costs and Increasing Customer SatisfactionCapgemini
 
Data Governance in a big data era
Data Governance in a big data eraData Governance in a big data era
Data Governance in a big data eraPieter De Leenheer
 
Marketsoft and marketing cube data quality to cc-v3
Marketsoft and marketing cube   data quality to cc-v3Marketsoft and marketing cube   data quality to cc-v3
Marketsoft and marketing cube data quality to cc-v3Marketsoft
 

Ähnlich wie Emerging Data Quality Trends for Governing and Analyzing Big Data (20)

20140826 I&T Webinar_The Proliferation of Data - Finding Meaning Amidst the N...
20140826 I&T Webinar_The Proliferation of Data - Finding Meaning Amidst the N...20140826 I&T Webinar_The Proliferation of Data - Finding Meaning Amidst the N...
20140826 I&T Webinar_The Proliferation of Data - Finding Meaning Amidst the N...
 
Applying Data Quality Best Practices at Big Data Scale
Applying Data Quality Best Practices at Big Data ScaleApplying Data Quality Best Practices at Big Data Scale
Applying Data Quality Best Practices at Big Data Scale
 
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackYour AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
Network Conference LMS Big Data Final 1.24.14
Network Conference LMS Big Data Final 1.24.14Network Conference LMS Big Data Final 1.24.14
Network Conference LMS Big Data Final 1.24.14
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data asset
 
Accelerating Personalization to Cut Through Digital Noise
Accelerating Personalization to Cut Through Digital NoiseAccelerating Personalization to Cut Through Digital Noise
Accelerating Personalization to Cut Through Digital Noise
 
How to get started in extracting business value from big data 1 of 2 oct 2013
How to get started in extracting business value from big data 1 of 2 oct 2013How to get started in extracting business value from big data 1 of 2 oct 2013
How to get started in extracting business value from big data 1 of 2 oct 2013
 
Fate of the Chief Data Officer
Fate of the Chief Data OfficerFate of the Chief Data Officer
Fate of the Chief Data Officer
 
Big data initiative justification and prioritization framework
Big data initiative justification and prioritization frameworkBig data initiative justification and prioritization framework
Big data initiative justification and prioritization framework
 
BBDO Connect Big Data
BBDO Connect Big DataBBDO Connect Big Data
BBDO Connect Big Data
 
Data Integrity Trends
Data Integrity TrendsData Integrity Trends
Data Integrity Trends
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your DataFoundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
 
Big data
Big dataBig data
Big data
 
Information Governance: Reducing Costs and Increasing Customer Satisfaction
Information Governance: Reducing Costs and Increasing Customer SatisfactionInformation Governance: Reducing Costs and Increasing Customer Satisfaction
Information Governance: Reducing Costs and Increasing Customer Satisfaction
 
Data Governance in a big data era
Data Governance in a big data eraData Governance in a big data era
Data Governance in a big data era
 
Marketsoft and marketing cube data quality to cc-v3
Marketsoft and marketing cube   data quality to cc-v3Marketsoft and marketing cube   data quality to cc-v3
Marketsoft and marketing cube data quality to cc-v3
 
National Conference - Big Data - 31 Jan 2015
National Conference - Big Data - 31 Jan 2015National Conference - Big Data - 31 Jan 2015
National Conference - Big Data - 31 Jan 2015
 

Mehr von DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data LiteracyDATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for YouDATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 

Mehr von DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

KĂźrzlich hochgeladen

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 

KĂźrzlich hochgeladen (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 

Emerging Data Quality Trends for Governing and Analyzing Big Data

  • 1. Emerging Data Quality Trends for Governing and Analyzing Big Data August 1, 2019 Harald Smith
  • 2. Speaker Harald Smith • Director of Product Marketing, Syncsort • 20+ years in Information Management with a focus on data quality, integration, and governance • Co-author of Patterns of Information Management • Author of two Redbooks on Information Governance and Data Integration • Blog author: “Data Democratized”
  • 3. Agenda • Ongoing Data Challenges • Four Emerging Data Quality Trends • Approaches to addressing Data Quality needs • Questions
  • 4. Why is Data Quality so important?
  • 5. Data: the fuel of the future Data is to this century, what oil was to the last one: a driver of growth and change. The Economist: Fuel of the future - Data is giving rise to a new economy: 6th May 2017 Flows of data have created new infrastructures, new businesses, new monopolies, new politics and crucially new economics. Digital information is unlike any previous resource: it is extracted, refined, valued, bought and sold in different ways. It changes the rules for markets and it demands new approaches from regulators. Many a battle will be fought over who should own, and benefit from, data. 5 Emerging Data Quality Trends
  • 6. Analysis Segmentation Data compliance Access Scheduling All reports! Competitor analysis Sales reports Single Customer / 360 View Data regulation Security Workloads Aggregations HR / recruitment Dashboards CRM Content Governance Capacity Management Performance planning Forecasting & modelling Overall business strategy! Performance metrics Campaign management Risk Optimization & SLA’s Route planning Cash flow Territory management ROI Disaster Recovery Inventory Contingency planning UX Data impacts all areas of the business Sales Marketing FinanceLegal IT Operations Management 6 Emerging Data Quality Trends
  • 7. Data Governance & Quality are top of mind 3V’s of Big Data Volume, variety, and velocity of data is growing Ever more Analysis New tools allowing more granular data dissection and segmentation Dichotomy in Outcomes Expectations of data is increasing yet confidence in data is falling Governance Requirements Broader and deeper compliance & regulation expectations trust & confidence 7 Emerging Data Quality Trends
  • 8. “Get to Know Me”… • Design and deliver rich, individualized experiences that build customer loyalty • Increasingly broad spectrum of data sources involved in, and required for, effectively personalizing customer experiences and targeted marketing offers What Types of Data? • Internal sources – often many/overlapping • 3rd Party data – geospatial, demographics, firmographics • Suppression data – keeping customer information updated • New sources – mobile, social media What Data Challenges? • Incorporating and managing the expected exponential increase in digital demographic data • Tapping into customer technology histories to build and evolve an understanding of individual customers Use Case: 360 View of Customer Internal Data ▪ Customer Master Data ▪ Point-of-Sale Data ▪ Contact Form Data ▪ Loyalty Program Data ▪ ecommerce Data ▪ Customer Service Data Suppression Data ▪ Change of Address ▪ Mortality ▪ Do Not Call Third-Party Data ▪ Age ▪ Occupation ▪ Education ▪ Gender ▪ Income ▪ Geospatial/Location Social Data ▪ Digital demographics ▪ Sentiment ▪ Opinions ▪ Interests ▪ Social handles 8 Emerging Data Quality Trends
  • 9. Protect Financial Assets and Ensure Compliance • Flag credit card fraud in real time • Identify and report on money laundering What Types of Data? • Internal sources – often many/overlapping • Suppression data – keeping customer information updated • Mobile data – devices, locations • New sources – social media, 3rd party data, … What Data Challenges? • Fraudulent transaction detection requires: • Huge volumes of customer profile data • Recent transaction activity with “last known” values • Device data with geolocation and time-based tagging • Data used to refine Machine Learning models (e.g., anomaly detection, implausible behavior analysis) to review new transactions in real time Use Case: Anti-Fraud/Anti-Money Laundering Internal Data ▪ Customer Master Data ▪ Point-of-Sale Data ▪ Contact Form Data ▪ Loyalty Program Data ▪ ecommerce Data ▪ Customer Service Data Mobile Data ▪ Device ▪ Location ▪ Wearables ▪ Mobile wallets Suppression Data ▪ Change of Address ▪ Mortality ▪ Do Not Call Social Data ▪ Digital Demographics ▪ Sentiment ▪ Opinions ▪ Interests ▪ Social handles 9 Emerging Data Quality Trends
  • 10. Only 35%of senior executives have a high level of trust in the accuracy of their Big Data Analytics KPMG 2016 Global CEO Outlook 92% of executives are concerned about the negative impact of data and analytics on corporate reputation KPMG 2017 Global CEO Outlook 80%of AI/ML projects are stalling due to poor data quality Dimensional Research, 2019 Big Data Needs Data Quality 10 Emerging Data Quality Trends “Societal trust in business is arguably at an all-time low and, in a world increasingly driven by data and technology, reputations and brands are ever harder to protect.” EY “Trust in Data and Why it Matters”, 2017. The importance of data quality in the enterprise: • Decision making – Trust the data that drives your business • Customer centricity – Get a single, complete and accurate view of your customer for better sales, marketing and customer service • Compliance – Know your data, and ensure its accuracy to meet industry and government regulations • Machine learning & AI – High quality models require training on high quality, accurate data
  • 12. Four Emerging Data Quality Trends All the traditional DQ issues remain, but now consider: 1. New DQ considerations for new types of data 2. New application considerations (e.g. Machine learning) 3. Processing at scale/meeting SLAs 4. Data Democratization and resource/knowledge constraints 12 Emerging Data Quality Trends
  • 13. 1. New Data, New Measures
  • 14. Common Data Quality Problems All the traditional data quality issues remain, but now at greater scale and in more places • Many data records with different layouts • Inconsistent data formats (number formatting, measurements, languages, postal conventions and dates) • Lack of standardization of the different fields • Names spelled differently, partially entered, or multiple names provided • Misspellings and keystroke errors • Data sourced from third parties does not contain all the necessary fields or is out-of- date • Invalid values: codes, reference data, out-of- range, future dates Lack of Standardization 14 Emerging Data Quality Trends
  • 15. Common Data Quality Measurements What measures can we take advantage of? • Completeness – Are the relevant fields populated? • Integrity – Does the data maintain an internal structural integrity or a relational integrity across sources • Uniqueness – Are keys or records unique? • Validity – Does the data have the correct values? • Code and reference values • Valid ranges • Valid value combinations • Consistency – Is the data at consistent levels of aggregation or does it have consistent valid values over time? 15 Emerging Data Quality Trends • Timeliness – Did the data arrive in a time period that makes it useful or usable?
  • 16. Example: Call Center Record Unique ✓ Integrity ✓ Complete ? Consistent ✓ Timely ✓ Valid ? Is Duration = 0 important? Is 01/01/20xx a defaulted date? And how will this be linked or connected with my other data? The file appears complete, but does it cover all call centers? 16 Emerging Data Quality Trends
  • 17. Example: Social Media Feed Unique? Integrity? Complete? Consistent? Timely? Valid? 17 Emerging Data Quality Trends
  • 18. New Data Quality Problems New data, new data quality challenges • 3rd Party and external data with unknown provenance or relevance • Bias in the data – whether in collection, extraction, or other processing • Data without standardized structure or formatting • Continuously streaming data • Disjointed data (e.g. gaps in receipt) • Consistency and verification of data sources • Changes and transformation applied to data (i.e. does it really represent the original input) 18 Emerging Data Quality Trends “34 percent of bankers in our survey report that their organization has been the target of adversarial AI at least once, and 78 percent believe automated systems create new risks, such as fake data, external data manipulation, and inherent bias.” Accenture Banking Technology Vision 2018
  • 19. What else can we review or measure? Provenance – Where did the data originate, who gathered it, and what criteria was used to create it? • E.g. government agency, 3rd party provider, free or paid data Coverage (Relevance) – How well does the data source meet the defined needs? • E.g. does it cover the relevant geography? Is it biased (and if so, how)? Continuity – Data points for all intervals or expected intervals? • E.g. sensors, weather records, call data records Triangulation – What Gartner describes as ‘consistency of data across proximate data points’, i.e. consistent measurements from related points of reference. • E.g. if temperatures in Chicago and Louisville are 30°and 32°then temperature in Indianapolis for same day is unlikely to be 70° Transformation from origin – how many layers and/or changes has the data passed through? • E.g. has the original data source already been merged with two other record sources? And is the result accurate? Repetition or duplication of data patterns – Data points exactly the same across multiple recording intervals or across multiple sensors. • E.g. is there tampering with sensors or call data? Additional Measures of Data Quality 19 Emerging Data Quality Trends
  • 20. 20 Emerging Data Quality Trends Example: New Data Quality Measures applied Triangulated Continuity Provenance Coverage Usage Repeated patterns Transformation Jane Doe pulled from Twitter based on #Blackberry All items for #Blackberry in relevant time interval appear to be included Marketing confirms this data has high value Good association with current product & sales data All tweets appear unique within the date & vs. prior feeds This needed to include #BB and #Crackberry as well! No changes or merges of the data were applied
  • 21. 2. Machine Learning & Data Quality
  • 22. “ ” The magic of machine learning is that you build a statistical model based on the most valid dataset for the domain of interest. If the data is junk, then you’ll be building a junk model that will not be able to do its job. James Kobeilus SiliconANGLE Wikibon Lead Analyst for Data Science, Deep Learning, App Development 2018
  • 23. Common Machine Learning Applications Marketing • Targeted marketing • Recommendation engine • Next best action • Customer churn prevention Risk Management • Anti-money laundering • Fraud detection • Cybersecurity • Know your customer 23 Emerging Data Quality Trends
  • 24. Data Challenges with Machine Learning Five Big Challenges of Enabling Machine Learning 1. Scattered and Difficult to Access Datasets Much of the necessary data is trapped in mainframes or streams in from POS, and ATM machines in incompatible formats, making it difficult to gather and prepare the data for model training. 2. Data Cleansing at Scale Data quality cleansing and preparation routines have to be reproduced at scale. Most data quality tools are not designed to work on that scale of data. 3. Entity Resolution and Customer Identification Distinguishing matches across massive datasets that indicate a single specific entity requires sophisticated multi-field matching algorithms and a lot of compute power. Essentially everything has to be compared to everything. 4. Need for Near Real-Time Current Data Tracking and detection needs to happen very rapidly. Current transactions need to be constantly added to combined datasets, prepared and presented to models as close to real-time as possible. 5. Tracking Lineage from the Source Data changes made to help train models have to be exactly duplicated in production, in order for models to accurately make predictions on new data, and for required audit trails. Capture of complete lineage, from source to end point is needed. 24 Emerging Data Quality Trends
  • 25. Data Quality Challenges with Machine Learning Incorrect, Incomplete, Mis-Formatted, and Sparse “Dirty Data” – Mistakes and errors are almost never the patterns you’re looking for in a data set. Sparse data generates other issues. Correcting and standardizing will tend to boost the signal, but must account for bias. Missing context – Many data sources lack context around location or population segments. Unless enriched with other data sets, (e.g. geospatial, demographics, or firmographics data), some ML algorithms will not be usable. Multiple copies – If your data comes from many sources, as it often does, it may contain multiple records of information about the same person, company, product or other entity. Removing duplicates and enhancing the overall depth and accuracy of knowledge about a single entity can make a huge difference. Spurious correlations – Just as missing context may hinder some ML algorithms, inclusion of already correlated data (e.g. city and postal code) may result in overfitting of ML algorithms. Correcting data problems vastly increases a data set’s usefulness for machine learning. However, traditional data quality software is designed to work on smaller data sets. And data analysts may not be aware of specific data quality issues that must be addressed to support machine learning. Traditional data quality processes are an effective method to remove defects. 25 Emerging Data Quality Trends
  • 26. Example: Missing segments of populations Event: Hurricane Sandy 20 million tweets • Majority of tweets from Manhattan not the hard hit areas such as Seaside Heights and Midland Beach due to power outages and diminishing cell phone batteries • Despite the millions of Spanish-speakers affected, very few Spanish-language tweets collected • Assess % across and against all likely locations • Seek out disconfirming information Data: Boston Potholes Street Bump App • Draws on accelerometer and GPS data to help passively detect potholes • Lower income groups in the US are less likely to have smartphones, particularly older residents - penetration as low as 16% • Result is underreporting of road problems in more elderly communities • Assess % across all likely locations • Add other sources • Utilize demographics for evaluations 26 Emerging Data Quality Trends
  • 27. Example: Noise, or Inserted content “Bots are just a tool for making the numbers look how you want them to look.” Sam Woolley Researcher, Oxford University’s Project on Computational Propaganda Wired: Nov 8, 2016 “The Political Twitter Bots Will Rage This Election Day” Event: Election Bot tweets • ~400,000 bots tweeting on the election • ~20% of all election-related tweets came from an army of influential bots • 55-80% of Twitter activity—the likes, follows, and retweets —are from bots • It had been easier to identify earlier bots, but now it’s incredibly difficult for a human to make a determination • Evaluate patterns • Is there any real sentiment here? • How much repetitive content is there? • How much “influence” comes from a single or a few sources (negative or positive)? • Will it skew the analysis? 27 Emerging Data Quality Trends
  • 28. Example: Simple bias “The “black sheep problem” is that if you were to try to guess what color most sheep were by looking [at] language data, it would be very difficult for you to conclude that they weren't almost all black. In English, “black sheep” outnumbers “white sheep” about 25:1 (many "black sheeps” are movie references); in French it's 3:1; in German it's 12:1. Some languages get it right; in Korean it's 1:1.5 in favor of white sheep…” Hal DaumĂŠ Associate Professor, University of Maryland Blog: June 24, 2016 “Language bias and black sheep” http://nlpers.blogspot.com/2016/06/language-bias- and-black-sheep.html Data: Google Word2Vec data set Word2vec • Converts words into a vector space for analysis • “Numerous researchers have begun to use the data to better understand everything from machine translation to intelligent Web searching.” • Embeddings based on a group of 300 million words taken from Google News • Researchers from Boston University and Microsoft have found it is “blatantly sexist” • Impacts the ability to create personalized services • Evaluate % of words & associations • How do I interpret a sentiment? • Does this data set contain hidden and unexpressed bias? • Will I miss opportunities because of hidden assumptions? 28 Emerging Data Quality Trends
  • 29. 3. Data Quality at Scale
  • 30. Challenges To Ensuring Data Quality Many sources of data (70%) and volume of data (48%) are among the top 3 challenges companies face when ensuring high quality data. Applying governance processes to manage and measure data quality is second with 50%. * Syncsort, 2019 Enterprise Data Quality survey 70% 50% 48% 47% 46% 43% 32% 27% 27% 25% 15% Many sources of data Applying governance processes to manage and measure data… Volume of data Inconsistent formats of data Inconsistent definitions of data Missing information Connecting policies and rules to data Misfielded data Lack of skills/staff Lack of tools (or inadequate tools) Not seen as an organizational priority What are the greatest challenges you face when ensuring high data quality? 30 Emerging Data Quality Trends
  • 31. Processing at Scale New Data Quality considerations • Handling data volumes and distributed data • Profiling data – assessing high volumes and streaming data • Standardizing and enriching data content • Matching entities – not just master data – e.g. transactions for fraud detection • Meeting Service Level Agreements (SLA’s) • Running consistently on new and regularly changing platforms (Hadoop, Spark, Cloud) 31 Emerging Data Quality Trends
  • 32. Big Data at scale distributes data across many nodes – not necessarily with other relevant data! • Data Quality functions must be performed in a consistent manner, no matter where actual processing takes place, how the data is segmented, and what the data volume is • Cleansing, standardization, and data validation will generally scale linearly • Data Enrichment: Reference data, lookups must be readily accessible by any process wherever executed Handling distributed data volumes Source: HP Analyst Briefing 32 Emerging Data Quality Trends
  • 33. • But particular implications for profiling, joining, sorting, and matching data • Profiling: Identification of outliers necessitates full volume views and need to aggregate statistics and frequencies of data distributed across cluster • Joins & sorts: Efficient shuffling of data stored across cluster is critical • Entity Resolution: Distinguishing matches that indicate a single specific entity across so much data requires multiple passes with sophisticated multi-field matching algorithms – with results that are understandable by business users in order to be meaningful Handling distributed data volumes 33 Emerging Data Quality Trends
  • 34. Anti-Money Laundering on Hadoop at Global Bank • Must provide cluster-native data verification, enrichment, and demanding multi-field fuzzy matching for entity resolution to Golden Record • Massive data volumes • Scattered data – Mainframe, RDBMS, Cloud, … • Must be secure – Kerberos, LDAP • Must have lineage – data origin to end point • Must archive unaltered mainframe data Full Anti-Money Laundering regulatory compliance with financial crimes data lake – high performance results at massive scale. • Full end-to-end data lineage supplied to Apache Atlas and ASG Data Intelligence • Cluster-native data verification, enrichment, and demanding multi-field entity resolution on Spark • Unmodified mainframe “Golden Records” stored on Hadoop Bank must monitor transactions to detect Money Laundering for FCA compliance. Leverage Machine learning at scale to detect patterns, but … Requires large amounts of current, clean data. 34 Emerging Data Quality Trends
  • 35. 4. Data Literacy / Democratization
  • 36. Data Democratization Data Quality is a key component to user empowerment • Data Literacy - critical to understand: • Business context and language • Data (including data structures and data types) • Data access (how and where to find) • Data usage (how will the data be used by the business) • Basic Statistics • Data Quality dimensions • Data Quality techniques and tools • Resource constraints – in both Data Quality and technologies • What questions to ask? • Where to find answers? 36 Emerging Data Quality Trends
  • 37. Approaches to Addressing Emerging Data Quality Trends
  • 38. Approaches Data Literacy / Data Governance • Communicating Best Practices in Data Quality for everyone 38 Emerging Data Quality Trends “Universal” Data Quality Best Practices • Establish Scope: ask core questions • Identifying data requirements • Address bias • Understand context • Address and resolve data quality issues • Apply data governance processes Solving “Big Data” Data Quality Challenges • Handle scale • Ensure consistent data quality application across platforms
  • 39. Culture of Data Literacy • “Democratization of Data” requires cultural support • Empowered to ask questions about the data • Trained to understand and use data • Trained to understand approaching and evaluating data quality • Traditional data, new data, machine learning requirements, … • Understand the business context of the data Program of Data Governance • Provide the processes and practices necessary for success • Measure, monitor, and improve • Continuous iteration and development Center of Excellence/Knowledge Base • Where do you go to find answers? • Who can help show you how? Communicate! 39 Emerging Data Quality Trends
  • 40. Data Literacy: challenges & best practices • Lack of Common Terminology • Organizational Barriers & Silos • Isolated or Unknown Work • Lack of Engagement Establish a Common Language • Define terminology – a ‘stake in the ground’ • Map information • Support with policies/standards Gain Broader Buy In • Bring stakeholders together • Build the structure, culture, ownership, steering groups, stewardship over time Enrich Information • Discover what you don’t know • Resolve differences • Enhance/annotate to increase insight Share Insights Regularly • Produce and share tangible outcomes • Highlight ‘wins’ • Demonstrate efficiencies & savings Copyright Š Syncsort 2019
  • 41. “If you don’t know what you want to get out of the data, how can you know what data you need – and what insight you’re looking for?” Wolf Ruzicka Chairman of the Board at EastBanc Technologies Blog post: June 1, 2017 “Grow A Data Tree Out Of The “Big Data” Swamp” Establish Scope • Understand the business objective and problem • Asking the “right questions” about your data (not just “what” and “how”) • “Empowering users (“Who”) to gain new clarity into the core problem (“Why”) • “High-quality data” definition will vary by business problem Identify Requirements & Processes • Do you have all the data required? • Do you understand the characteristics and context of the data? • How will data be matched, consolidated, or connected? • What’s needed to facilitate the matching, consolidation, or connection required? • Have you evaluated the sources? • What’s the Fitness for your Purpose? Universal Data Quality best practices 41 Emerging Data Quality Trends
  • 42. Understand Context • What are the Critical Data Elements? • What qualities do we need to address, or leave alone? • When, and where, do we need to transform or enrich the data content? • How are we connecting, relating, or combining data? Develop, Test, and Deploy Corrective Measures • Consistent application of standardization, transformation, enrichment, and entity resolution • Common templates, rules, metrics, and processes that can be leveraged • Deploy into batch, real-time, or embedded services Apply Data Governance • Deploy and implement metrics and measures for ongoing assessment and evaluation Universal Data Quality best practices “Never lead with a data set; lead with a question.” Anthony Scriffignano Chief Data Scientist, Dun & Bradstreet Forbes Insights, May 31, 2017 “The Data Differentiator” 42 Emerging Data Quality Trends
  • 43. Quantify: challenges & best practices • Hidden Activities • Money, Time and Resource Waste • Lack of Transparency and Trust • Disconnect Between Process and Measures Identify Baseline Measures • Keep a focus on lean and agile • Define value accurately for the business Link to Business Performance • Create and refine streams of value • Transform culture through action and empowerment Monitor, Report and Remediate Issues • Continuously review • Ensure issues are visible and understood • Understand root causes • Address/resolve issues Quantify Impact of Changes • Demonstrate through clearly understood measures • Establish value continuously • Finish early, finish often Copyright Š Syncsort 2019
  • 44. Leverage tools built for Big Data • Focus on the data quality challenges, not the Big Data ones • Connect to and process hundreds of millions of records of data • Standardize, enhance, and match international data sets with postal and country-code validation • Integrate, enrich, and match new and legacy customer data from multiple disparate sources • Deploy data quality workflows as native, parallel MapReduce or Spark processes for optimal efficiency on premises or in the Cloud • Increase processing efficiency by expanding cluster, not rebuilding processes • Support failover through fault-tolerant designs; during a node failure, processing is redirected to another node 44 Emerging Data Quality Trends
  • 45. Simplify: Design Once, Deploy Anywhere Intelligent Execution - Insulate your organization from underlying complexities of Big Data Get excellent performance every time without tuning, load balancing, etc. Avoid re-design, re-compile, re-work • Future-proof job designs for emerging compute frameworks • Move from dev to test to production • Move from on-premises to Cloud • Move from one Cloud to another Use existing Data Quality skills • Focus on data quality problems, not technical ones Design Once in visual GUI Deploy Anywhere! On-Premises, Cloud MapReduce, Spark, Future Platforms Windows, Linux, Unix Batch, Streaming Single Node, Cluster Emerging Data Quality Trends45
  • 46. Data Quality remains Data Quality, even at scale “Data and analytics leaders need to understand the business priorities and challenges of their organization. Only then will they be in the right position to create compelling business cases that connect data quality improvement with key business priorities.” Ted Friedman VP Distinguished Analyst, Gartner Smarter with Gartner at Gartner.com: June 12, 2018 “How to Create a Business Case for Data Quality Improvement” “Never lead with a data set; lead with a question.” Anthony Scriffignano Chief Data Scientist, Dun & Bradstreet Forbes Insights, May 31, 2017 “The Data Differentiator” 46 Emerging Data Quality Trends
  • 47. Q&A
  • 48.