SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Your AI and ML Projects Are Failing
Key Steps to Get Them Back on Track
Harald Smith, Director Product Marketing
Housekeeping
Webcast Audio
• Today’s webcast audio is streamed through your computer speakers.
• If you need technical assistance with the web interface or audio,
please reach out to us using the chat window.
Questions Welcome
• Submit your questions at any time during the presentation
using the chat window.
• We will answer them during our Q&A session following the
presentation.
Recording and slides
• This webcast is being recorded. You will receive an
email following the webcast with a link to download
both the recording and the slides.
2
Speaker
Harald Smith
• Director of Product Marketing, Syncsort
• 20+ years in Information Management with a
focus on data quality, integration, and governance
• Co-author of Patterns of Information Management
• Author of two Redbooks on Information Governance
and Data Integration
• Blog author: “Data Democratized”
3
AI/ML needs
Data Quality
The importance of data quality
in the enterprise:
35%of senior executives
have a high level of trust
in the accuracy of their
Big Data Analytics
KPMG 2016 Global CEO Outlook
92%of executives are concerned
about the negative impact of
data and analytics on
corporate reputation
KPMG 2017 Global CEO Outlook
80%of AI/ML projects are stalling
due to poor data quality
Dimensional Research, 2019
“Societal trust in business
is arguably at an all-time
low and, in a world
increasingly driven by
data and technology,
reputations and brands are
ever harder to protect.”
• Decision making
• Customer centricity
• Compliance
• Machine learning & AI
4 EY “Trust in Data and Why it Matters”, 2017
Only
“
”
The magic of machine learning is that you
build a statistical model based on the most
valid dataset for the domain of interest.
If the data is junk, then you’ll be building a junk
model that will not be able to do its job.
James Kobeilus
SiliconANGLE Wikibon
Lead Analyst for Data Science, Deep Learning, App Development
2018
1
Key steps to improve Data Quality for AI/ML
Identify the
“right” data to
collect and work
with
Establish baselines
of data quality
through data
profiling and
business rules
Assess and
communicate the
fitness for purpose
of the data for
training and
evaluating the
subsequent models
and algorithms
6
Four foundational data steps to get or keep your AI and ML projects grounded and underway:
Frame the
business problem
2 3 4
1. Frame the business problem
Common Machine Learning applications
Customer/Marketing
• Targeted marketing
• Recommendation engine
• Next best action
• Customer churn prevention
• Sentiment analysis
Risk Management
• Anti-money laundering
• Fraud detection (electricity pilferage, fraudulent transactions)
• Cybersecurity
• Know your customer
Supply Chain Management
• Reduction of freight costs/Optimal routing
• Damage identification/Mechanical repair
8
Universal DQ
best practices
Understand the End Goal
• How does the business intend to use
the data (i.e. what’s the use case)?
• Empower users (“Who”) to gain new
clarity into the core problem (“Why”)
• What will the data be used for?
• What defines the Fitness for
your Purpose?
Establish Scope
• Ask the “right questions” about
the use case and the data (not just
“what” and “how”)
• What data is relevant to the effort?
• Big Data or other, you need to set
boundaries for the work
Understand Context
• How does the business define
the data?
• What are the important
characteristics and context
of the data?
• What are the Critical Data
Elements?
• What qualities will you need
to address, or leave alone?
• “High-quality data” definition
will vary by business problem
“If you don’t know what you want
to get out of the data, how can you
know what data you need – and what
insight you’re looking for?”
Wolf Ruzicka, Chairman of the Board at EastBanc
Technologies, Blog post: June 1, 2017,
“Grow A Data Tree Out Of The “Big Data” Swamp”
“Never lead with a data set;
lead with a question.”
Anthony Scriffignano, Chief Data Scientist,
Dun & Bradstreet, Forbes Insights, May 31, 2017,
“The Data Differentiator”
9
2. Identify the “right” data
What’s the
“Right” Data?
Is relevant and specific for
the business problem
Is free from bias and
assumptions
Supports hypothesis testing
Ask questions about the data you expect
you need
Understand the Provenance of the data
• Who produced it, when did they
produce it, and why?
• Has it been transformed or
changed from original (lineage)?
Understand whether the data is
Comprehensive
• What is the scope of the data?
• What data is missing?
• Are approaches available to
identify/capture what is missing?
Understand the “universe” of Relevant
data
• Consider sources within and
outside the organization
Understand whether the data is Timely
• How can you be certain the
data is truly current?
Understand additional challenges
obtaining data, both for evaluation and
operational use
11
Comprehensiveness depends on the
business context/question
• Customer Engagement/Loyalty 
• Known customers, both active & inactive
• New Customer Campaigns 
• “Active” consumers, both known and
unknown
• Fraud Detection 
• Any known or unknown person
impersonating a customer or prospect
Ask/understand what the “Unknown and/or
Unavailable”  represents
• Why does this segment exist?
• If relevant, can the characteristics be inferred
through other data?
• Is there inherent bias in leaving this group out?
Comprehensive: a “Customer” example
Unknown & Active
• Prospect
• Data in CRM? Website
visits? Store visits? Prospect
lists?
Known & Active
• Customer
• Data in MDM/DW
• What about Call Center?
CRM? Website visits? Store
visits? Loyalty Program?
Unknown and/or unavailable
• Not a customer
• No data? Or is data
available through other
means?
Known & Inactive
• Former Customer
• Data in MDM/DW?
• What about Call Center?
CRM? Website visits? Store
visits? Loyalty Program?




12
Relevance for additional data depends on the
business context/questions
• Customer Engagement/Loyalty
• Website, Call Center, Social Media, Location,
Store Data, Demographics
• New Customer Campaigns
• Location, Demographics, Website, Social Media,
Prospect Lists
• Optimal Shipping/Delivery
• Location, Weather, Store Data
• Fraud Detection
• IP Address, Device ID, Purchase Location, etc.
Additional content from both internal and external
sources may be relevant if within a useful time period
• Change of Address, Suppression lists, etc.
Relevant:
a “Customer” example
“Customer”
Location Demographics
Social
Media
Website,
Call Center,
Store, etc.
Other:
Weather,
Prospect
Lists, etc.
Order
Transactions
Call
Transcriptions
Product/
Service
Reviews
Abandoned
Carts
Census
Data
Credit
History
13
1. Lack of data, or scattered and difficult to access datasets
• Little or no accessible data; or necessary data trapped in mainframes, operational systems, or streams.
• Data typically stored in incompatible formats.
• Other data must be acquired, appended, or transformed for use.
2. Data standardization, cleansing, and enrichment at scale
• Data needs to be tagged, classified, standardized, and normalized.
• Data quality standardization, cleansing, enrichment, and preparation needs to be applied consistently and reproduced at scale.
3. Entity resolution and customer identification
• Distinguishing single entity matches across massive datasets requires sophisticated multi-pass, multi-field matching algorithms.
• Continuous cross-comparison and resolution needs to occur as new data arrives.
4. Need for near real-time current data
• Tracking and detection needs to happen very rapidly.
• Current transactions constantly added to combined datasets and presented to models as close to real-time as possible.
5. Tracking lineage from the source
• Data changes made to help train models have to be exactly duplicated in production.
• Capture of complete lineage, from source to end point is needed.
Five further challenges to enable Machine Learning
14
3. Establish baselines
of Data Quality
Data Quality challenges with Machine Learning
Incorrect, incomplete, mis-formatted, and sparse “dirty data”
• Mistakes and errors are rarely the patterns you are looking for.
• Sparse data generates other issues or may be ignored as “noise”.
• Correcting and standardizing data boosts the signal, but can increase bias.
Missing context
• Insufficient information about customer and location data can make many
ML algorithms unusable.
• Enriching data increases context, but choice of source can skew/bias result.
Duplicates and multiple copies
• Many sources can yield multiple records about the same person, company,
product or other entity, skewing the signal and outcomes.
• Removing duplicates enhances the overall depth and accuracy about a
single entity, but must watch for over- or undermatching of data.
Spurious correlations
• Inclusion of already correlated data (e.g. city and postal code) may result in
overfitting of ML algorithms or ‘false’ discoveries.
Correcting data problems vastly increases a data set’s usefulness for machine learning.
But data analysts may not be aware of
specific data quality issues that must be
addressed to support machine learning.
Traditional data quality processes are
an effective method to identify defects.
!CAUTION
16
Understand Context
• What Critical Data Elements and other attributes are relevant?
• What qualities need to be addressed, or left alone?
• When, and where, do we need to transform or enrich the data content?
• How are we connecting, relating, or combining data?
Develop, Test, and Deploy Corrective Measures
• Consistent application of standardization, transformation, enrichment,
and entity resolution
• Common templates, rules, metrics, and processes that can be leveraged
• Validation and measurement after corrective measures applied
• Deploy into batch, real-time, or embedded services
Apply Data Governance
• Implement metrics and measures for ongoing assessment and evaluation
• Establish baselines for ongoing comparison/evaluation
• Continue to iterate throughout data preparation and model testing
Data Quality best practices
17
Tools for
DQ analysis
Data Profiling
The set of analytical techniques
that evaluate actual data content
(vs. metadata) to provide a
complete view of each data
element in a data source.
Provides summarized inferences,
and details of value and pattern
frequencies to quickly gain data
insights.
Business Rules
The data quality or validation rules
that help ensure that data is “fit for
use” in its intended operational
and decision-making contexts.
Assess the dimensions of data
quality: accuracy, completeness,
consistency, relevance, timeliness,
& validity of data.
18
Common Data Quality measurements
What measures can we take advantage of?
1. Completeness – Are the relevant fields populated?
2. Integrity – Does the data maintain an internal structural integrity
or a relational integrity across sources
3. Uniqueness – Are keys or records unique?
4. Validity – Does the data have the correct values?
• Code and reference values
• Valid ranges
• Valid value combinations
5. Consistency – Is the data at consistent levels of
aggregation or does it have consistent valid
values over time?
6. Timeliness – Did the data arrive in
a time period that makes it
useful or usable?
19
New data, new data quality challenges
• 3rd Party and external data with unknown provenance, timeliness, or
relevance
• Bias in the data – whether in collection, extraction, or other processing
• Data without standardized structure or formatting
• Continuously streaming data
• Disjointed data (e.g. gaps in comprehensiveness or receipt)
• Consistency and verification of data sources (e.g. was the origination
verified?)
• Changes and transformation applied to data (i.e. does it really represent the
original input)
New Data Quality problems
“34 percent of bankers in our survey report that their
organization has been the target of adversarial AI at least
once, and 78 percent believe automated systems create
new risks, such as fake data, external data manipulation,
and inherent bias.””
Accenture Banking Technology Vision 2018
20
4. Assess & communicate
fitness for purpose
Work within the defined Business Frame!
• Reconfirm the business purpose and context
• Review the data attributes deemed critical and the criteria that required
validation
Test and validate data for identified DQ measurements
• Apply data profiling and established business rules
• Establish baselines!
• Evaluate and determine necessary actions/remediate issues
• Take action on incorrect data and defaults
• Create flags for subsequent use in marking or remediating data
Annotate what you’ve found
• Identify each attribute/criteria and annotate all issues
• Utilize flags, tags, and other indicators to help others distinguish the
type and severity of issues
Establish, document, and present Fitness for Purpose
Iterate for all data in use, as well as model validation
Assess Fitness for Purpose
22
Culture of Data Literacy
“Democratization of Data” requires cultural support
• Empowered to ask questions about the data
• Trained to understand the business context and use of data
• Trained to understand approaching and evaluating data quality
• Traditional data, new data, machine learning requirements, …
• Empowered to prove/reject hypotheses
Program of Data Governance
• Provide the processes and practices necessary for success
• Measure, monitor, and improve
• Continous iteration and development
• Communicate what you’ve discovered! (and where others can find!)
Center of Excellence/Knowledge Base
• Where do you go to find answers?
• Who can help show you how?
Communicate!
23
Summary
Keep AI/ML projects focused
It is challenging to keep the
business frame/value in mind!
• Data comes from multiple
disparate systems & sources
• The business context may not
be obvious based on data alone
• There is a higher demand and
expectation for seeing data
quality in context.
• You need to assess and measure
the data content to establish
both baselines and common
understanding
4 Key Steps
1. Remember the end goal – ask
questions, use best practices, and
establish scope & context
2. Consider what data is needed
• Focus your attention based
on the type of data and the
use case
• Consider how you can ensure
data is comprehensive,
relevant, and useful
3. Test rules to validate data quality,
establish baselines, communicate
findings, and build trust!
4. Assess and communicate fitness
for purpose
Gaining insight and
measurement of
data quality is more
critical than ever!
24
Further Resources
• Data Profiling: The First Step to Big Data Quality
• Emerging Data Quality Trends for Governing and Analyzing Big Data
• Introducing Trillium DQ for Big Data: Powerful Profiling and Data
Quality for the Data Lake
harald.smith@syncsort.com
Questions
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track

Weitere ähnliche Inhalte

Was ist angesagt?

Data Management Meets Human Management - Why Words Matter
Data Management Meets Human Management - Why Words MatterData Management Meets Human Management - Why Words Matter
Data Management Meets Human Management - Why Words MatterDATAVERSITY
 
Data-Ed: Monetizing Data Management
Data-Ed: Monetizing Data Management Data-Ed: Monetizing Data Management
Data-Ed: Monetizing Data Management Data Blueprint
 
Competing on analytics
Competing on analyticsCompeting on analytics
Competing on analyticsGreg Seltzer
 
How to Consume Your Data for AI
How to Consume Your Data for AIHow to Consume Your Data for AI
How to Consume Your Data for AIDATAVERSITY
 
Helping HR to Cross the Big Data Chasm
Helping HR to Cross the Big Data ChasmHelping HR to Cross the Big Data Chasm
Helping HR to Cross the Big Data ChasmDATAVERSITY
 
Applications of AI in Supply Chain Management: Hype versus Reality
Applications of AI in Supply Chain Management: Hype versus RealityApplications of AI in Supply Chain Management: Hype versus Reality
Applications of AI in Supply Chain Management: Hype versus RealityGanes Kesari
 
Industry Focus Camp SCB17 "How to build a data driven organization"
Industry Focus Camp SCB17 "How to build a data driven organization"Industry Focus Camp SCB17 "How to build a data driven organization"
Industry Focus Camp SCB17 "How to build a data driven organization"Bundesverband Deutsche Startups e.V.
 
2011 digital trends webinar presentation
2011 digital trends webinar presentation2011 digital trends webinar presentation
2011 digital trends webinar presentationEconsultancy
 
Data Modeling Techniques
Data Modeling TechniquesData Modeling Techniques
Data Modeling TechniquesDATAVERSITY
 
Death of the Dashboard
Death of the DashboardDeath of the Dashboard
Death of the DashboardDATAVERSITY
 
MLOps - Getting Machine Learning Into Production
MLOps - Getting Machine Learning Into ProductionMLOps - Getting Machine Learning Into Production
MLOps - Getting Machine Learning Into ProductionMichael Pearce
 
Metadata Matters: Business Critical Metadata
Metadata Matters: Business Critical MetadataMetadata Matters: Business Critical Metadata
Metadata Matters: Business Critical MetadataConcept Searching, Inc
 
How Enterprises are Using NoSQL for Mission-Critical Applications
How Enterprises are Using NoSQL for Mission-Critical ApplicationsHow Enterprises are Using NoSQL for Mission-Critical Applications
How Enterprises are Using NoSQL for Mission-Critical ApplicationsDATAVERSITY
 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesDATAVERSITY
 
Big Challenges in Data Modeling: Modeling Metadata
Big Challenges in Data Modeling: Modeling MetadataBig Challenges in Data Modeling: Modeling Metadata
Big Challenges in Data Modeling: Modeling MetadataDATAVERSITY
 
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...DATAVERSITY
 
How to Create a Data Analytics Roadmap
How to Create a Data Analytics RoadmapHow to Create a Data Analytics Roadmap
How to Create a Data Analytics RoadmapCCG
 
DataEd Online: Building the Case for the Top Data Job
DataEd Online: Building the Case for the Top Data JobDataEd Online: Building the Case for the Top Data Job
DataEd Online: Building the Case for the Top Data JobDATAVERSITY
 
The ABC of Data Governance: driving Information Excellence
The ABC of Data Governance: driving Information ExcellenceThe ABC of Data Governance: driving Information Excellence
The ABC of Data Governance: driving Information ExcellenceAlan D. Duncan
 
Information Asset Management in Financial Institutions: How Much Is It Really...
Information Asset Management in Financial Institutions: How Much Is It Really...Information Asset Management in Financial Institutions: How Much Is It Really...
Information Asset Management in Financial Institutions: How Much Is It Really...Precisely
 

Was ist angesagt? (20)

Data Management Meets Human Management - Why Words Matter
Data Management Meets Human Management - Why Words MatterData Management Meets Human Management - Why Words Matter
Data Management Meets Human Management - Why Words Matter
 
Data-Ed: Monetizing Data Management
Data-Ed: Monetizing Data Management Data-Ed: Monetizing Data Management
Data-Ed: Monetizing Data Management
 
Competing on analytics
Competing on analyticsCompeting on analytics
Competing on analytics
 
How to Consume Your Data for AI
How to Consume Your Data for AIHow to Consume Your Data for AI
How to Consume Your Data for AI
 
Helping HR to Cross the Big Data Chasm
Helping HR to Cross the Big Data ChasmHelping HR to Cross the Big Data Chasm
Helping HR to Cross the Big Data Chasm
 
Applications of AI in Supply Chain Management: Hype versus Reality
Applications of AI in Supply Chain Management: Hype versus RealityApplications of AI in Supply Chain Management: Hype versus Reality
Applications of AI in Supply Chain Management: Hype versus Reality
 
Industry Focus Camp SCB17 "How to build a data driven organization"
Industry Focus Camp SCB17 "How to build a data driven organization"Industry Focus Camp SCB17 "How to build a data driven organization"
Industry Focus Camp SCB17 "How to build a data driven organization"
 
2011 digital trends webinar presentation
2011 digital trends webinar presentation2011 digital trends webinar presentation
2011 digital trends webinar presentation
 
Data Modeling Techniques
Data Modeling TechniquesData Modeling Techniques
Data Modeling Techniques
 
Death of the Dashboard
Death of the DashboardDeath of the Dashboard
Death of the Dashboard
 
MLOps - Getting Machine Learning Into Production
MLOps - Getting Machine Learning Into ProductionMLOps - Getting Machine Learning Into Production
MLOps - Getting Machine Learning Into Production
 
Metadata Matters: Business Critical Metadata
Metadata Matters: Business Critical MetadataMetadata Matters: Business Critical Metadata
Metadata Matters: Business Critical Metadata
 
How Enterprises are Using NoSQL for Mission-Critical Applications
How Enterprises are Using NoSQL for Mission-Critical ApplicationsHow Enterprises are Using NoSQL for Mission-Critical Applications
How Enterprises are Using NoSQL for Mission-Critical Applications
 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success Stories
 
Big Challenges in Data Modeling: Modeling Metadata
Big Challenges in Data Modeling: Modeling MetadataBig Challenges in Data Modeling: Modeling Metadata
Big Challenges in Data Modeling: Modeling Metadata
 
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
 
How to Create a Data Analytics Roadmap
How to Create a Data Analytics RoadmapHow to Create a Data Analytics Roadmap
How to Create a Data Analytics Roadmap
 
DataEd Online: Building the Case for the Top Data Job
DataEd Online: Building the Case for the Top Data JobDataEd Online: Building the Case for the Top Data Job
DataEd Online: Building the Case for the Top Data Job
 
The ABC of Data Governance: driving Information Excellence
The ABC of Data Governance: driving Information ExcellenceThe ABC of Data Governance: driving Information Excellence
The ABC of Data Governance: driving Information Excellence
 
Information Asset Management in Financial Institutions: How Much Is It Really...
Information Asset Management in Financial Institutions: How Much Is It Really...Information Asset Management in Financial Institutions: How Much Is It Really...
Information Asset Management in Financial Institutions: How Much Is It Really...
 

Ähnlich wie Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track

Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityPrecisely
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AIGary Allemann
 
Transform Your Downstream Cloud Analytics with Data Quality 
Transform Your Downstream Cloud Analytics with Data Quality Transform Your Downstream Cloud Analytics with Data Quality 
Transform Your Downstream Cloud Analytics with Data Quality Precisely
 
Finding Data at Risk for CCPA Compliance
Finding Data at Risk for CCPA ComplianceFinding Data at Risk for CCPA Compliance
Finding Data at Risk for CCPA CompliancePrecisely
 
Big data
Big dataBig data
Big dataRiya
 
Data-Ed: Trends in Data Modeling
Data-Ed: Trends in Data ModelingData-Ed: Trends in Data Modeling
Data-Ed: Trends in Data ModelingData Blueprint
 
Data-Ed Online: Trends in Data Modeling
Data-Ed Online: Trends in Data ModelingData-Ed Online: Trends in Data Modeling
Data-Ed Online: Trends in Data ModelingDATAVERSITY
 
Emerging Data Quality Trends for Governing and Analyzing Big Data
Emerging Data Quality Trends for Governing and Analyzing Big DataEmerging Data Quality Trends for Governing and Analyzing Big Data
Emerging Data Quality Trends for Governing and Analyzing Big DataDATAVERSITY
 
Usama Fayyad talk in South Africa: From BigData to Data Science
Usama Fayyad talk in South Africa:  From BigData to Data ScienceUsama Fayyad talk in South Africa:  From BigData to Data Science
Usama Fayyad talk in South Africa: From BigData to Data ScienceUsama Fayyad
 
Emerging Data Quality Trends for Governing and Analyzing Big Data
Emerging Data Quality Trends for Governing and Analyzing Big DataEmerging Data Quality Trends for Governing and Analyzing Big Data
Emerging Data Quality Trends for Governing and Analyzing Big DataPrecisely
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesDATAVERSITY
 
Marketers Flunk The Big Data Text
Marketers Flunk The Big Data TextMarketers Flunk The Big Data Text
Marketers Flunk The Big Data TextShaun Kollannur
 
From Compliance to Customer 360: Winning with Data Quality & Data Governance
From Compliance to Customer 360: Winning with Data Quality & Data GovernanceFrom Compliance to Customer 360: Winning with Data Quality & Data Governance
From Compliance to Customer 360: Winning with Data Quality & Data GovernancePrecisely
 
AI-Led-Cognitive-Data-Quality.pdf
AI-Led-Cognitive-Data-Quality.pdfAI-Led-Cognitive-Data-Quality.pdf
AI-Led-Cognitive-Data-Quality.pdfarifulislam946965
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data assetBala Iyer
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsVivastream
 
Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM Data Blueprint
 

Ähnlich wie Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track (20)

Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AI
 
Transform Your Downstream Cloud Analytics with Data Quality 
Transform Your Downstream Cloud Analytics with Data Quality Transform Your Downstream Cloud Analytics with Data Quality 
Transform Your Downstream Cloud Analytics with Data Quality 
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
 
Finding Data at Risk for CCPA Compliance
Finding Data at Risk for CCPA ComplianceFinding Data at Risk for CCPA Compliance
Finding Data at Risk for CCPA Compliance
 
Data mining wrhousing-lec
Data mining wrhousing-lecData mining wrhousing-lec
Data mining wrhousing-lec
 
Customer 360
Customer 360Customer 360
Customer 360
 
Big data
Big dataBig data
Big data
 
Data-Ed: Trends in Data Modeling
Data-Ed: Trends in Data ModelingData-Ed: Trends in Data Modeling
Data-Ed: Trends in Data Modeling
 
Data-Ed Online: Trends in Data Modeling
Data-Ed Online: Trends in Data ModelingData-Ed Online: Trends in Data Modeling
Data-Ed Online: Trends in Data Modeling
 
Emerging Data Quality Trends for Governing and Analyzing Big Data
Emerging Data Quality Trends for Governing and Analyzing Big DataEmerging Data Quality Trends for Governing and Analyzing Big Data
Emerging Data Quality Trends for Governing and Analyzing Big Data
 
Usama Fayyad talk in South Africa: From BigData to Data Science
Usama Fayyad talk in South Africa:  From BigData to Data ScienceUsama Fayyad talk in South Africa:  From BigData to Data Science
Usama Fayyad talk in South Africa: From BigData to Data Science
 
Emerging Data Quality Trends for Governing and Analyzing Big Data
Emerging Data Quality Trends for Governing and Analyzing Big DataEmerging Data Quality Trends for Governing and Analyzing Big Data
Emerging Data Quality Trends for Governing and Analyzing Big Data
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 
Marketers Flunk The Big Data Text
Marketers Flunk The Big Data TextMarketers Flunk The Big Data Text
Marketers Flunk The Big Data Text
 
From Compliance to Customer 360: Winning with Data Quality & Data Governance
From Compliance to Customer 360: Winning with Data Quality & Data GovernanceFrom Compliance to Customer 360: Winning with Data Quality & Data Governance
From Compliance to Customer 360: Winning with Data Quality & Data Governance
 
AI-Led-Cognitive-Data-Quality.pdf
AI-Led-Cognitive-Data-Quality.pdfAI-Led-Cognitive-Data-Quality.pdf
AI-Led-Cognitive-Data-Quality.pdf
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data asset
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisions
 
Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM
 

Mehr von Precisely

How to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdfHow to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdfPrecisely
 
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter MassendatenZukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter MassendatenPrecisely
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Crucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdfCrucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdfPrecisely
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10Precisely
 
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...Precisely
 
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...Precisely
 
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3fTestjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3fPrecisely
 
Data Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity TrendsData Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity TrendsPrecisely
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Optimisez la fonction financière en automatisant vos processus SAP
Optimisez la fonction financière en automatisant vos processus SAPOptimisez la fonction financière en automatisant vos processus SAP
Optimisez la fonction financière en automatisant vos processus SAPPrecisely
 
SAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
SAPS/4HANA Migration - Transformation-Management + nachhaltige InvestitionenSAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
SAPS/4HANA Migration - Transformation-Management + nachhaltige InvestitionenPrecisely
 
Automatisierte SAP Prozesse mit Hilfe von APIs
Automatisierte SAP Prozesse mit Hilfe von APIsAutomatisierte SAP Prozesse mit Hilfe von APIs
Automatisierte SAP Prozesse mit Hilfe von APIsPrecisely
 
Moving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and PreciselyMoving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and PreciselyPrecisely
 
Effective Security Monitoring for IBM i: What You Need to Know
Effective Security Monitoring for IBM i: What You Need to KnowEffective Security Monitoring for IBM i: What You Need to Know
Effective Security Monitoring for IBM i: What You Need to KnowPrecisely
 
Automate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellenceAutomate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellencePrecisely
 
5 Keys to Improved IT Operation Management
5 Keys to Improved IT Operation Management5 Keys to Improved IT Operation Management
5 Keys to Improved IT Operation ManagementPrecisely
 
Unlock Efficiency With Your Address Data Today For a Smarter Tomorrow
Unlock Efficiency With Your Address Data Today For a Smarter TomorrowUnlock Efficiency With Your Address Data Today For a Smarter Tomorrow
Unlock Efficiency With Your Address Data Today For a Smarter TomorrowPrecisely
 
Navigating Cloud Trends in 2024 Webinar Deck
Navigating Cloud Trends in 2024 Webinar DeckNavigating Cloud Trends in 2024 Webinar Deck
Navigating Cloud Trends in 2024 Webinar DeckPrecisely
 

Mehr von Precisely (20)

How to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdfHow to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdf
 
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter MassendatenZukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Crucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdfCrucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10
 
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
 
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
 
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3fTestjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
 
Data Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity TrendsData Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity Trends
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Optimisez la fonction financière en automatisant vos processus SAP
Optimisez la fonction financière en automatisant vos processus SAPOptimisez la fonction financière en automatisant vos processus SAP
Optimisez la fonction financière en automatisant vos processus SAP
 
SAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
SAPS/4HANA Migration - Transformation-Management + nachhaltige InvestitionenSAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
SAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
 
Automatisierte SAP Prozesse mit Hilfe von APIs
Automatisierte SAP Prozesse mit Hilfe von APIsAutomatisierte SAP Prozesse mit Hilfe von APIs
Automatisierte SAP Prozesse mit Hilfe von APIs
 
Moving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and PreciselyMoving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and Precisely
 
Effective Security Monitoring for IBM i: What You Need to Know
Effective Security Monitoring for IBM i: What You Need to KnowEffective Security Monitoring for IBM i: What You Need to Know
Effective Security Monitoring for IBM i: What You Need to Know
 
Automate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellenceAutomate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center Excellence
 
5 Keys to Improved IT Operation Management
5 Keys to Improved IT Operation Management5 Keys to Improved IT Operation Management
5 Keys to Improved IT Operation Management
 
Unlock Efficiency With Your Address Data Today For a Smarter Tomorrow
Unlock Efficiency With Your Address Data Today For a Smarter TomorrowUnlock Efficiency With Your Address Data Today For a Smarter Tomorrow
Unlock Efficiency With Your Address Data Today For a Smarter Tomorrow
 
Navigating Cloud Trends in 2024 Webinar Deck
Navigating Cloud Trends in 2024 Webinar DeckNavigating Cloud Trends in 2024 Webinar Deck
Navigating Cloud Trends in 2024 Webinar Deck
 

Kürzlich hochgeladen

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 

Kürzlich hochgeladen (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track

  • 1. Your AI and ML Projects Are Failing Key Steps to Get Them Back on Track Harald Smith, Director Product Marketing
  • 2. Housekeeping Webcast Audio • Today’s webcast audio is streamed through your computer speakers. • If you need technical assistance with the web interface or audio, please reach out to us using the chat window. Questions Welcome • Submit your questions at any time during the presentation using the chat window. • We will answer them during our Q&A session following the presentation. Recording and slides • This webcast is being recorded. You will receive an email following the webcast with a link to download both the recording and the slides. 2
  • 3. Speaker Harald Smith • Director of Product Marketing, Syncsort • 20+ years in Information Management with a focus on data quality, integration, and governance • Co-author of Patterns of Information Management • Author of two Redbooks on Information Governance and Data Integration • Blog author: “Data Democratized” 3
  • 4. AI/ML needs Data Quality The importance of data quality in the enterprise: 35%of senior executives have a high level of trust in the accuracy of their Big Data Analytics KPMG 2016 Global CEO Outlook 92%of executives are concerned about the negative impact of data and analytics on corporate reputation KPMG 2017 Global CEO Outlook 80%of AI/ML projects are stalling due to poor data quality Dimensional Research, 2019 “Societal trust in business is arguably at an all-time low and, in a world increasingly driven by data and technology, reputations and brands are ever harder to protect.” • Decision making • Customer centricity • Compliance • Machine learning & AI 4 EY “Trust in Data and Why it Matters”, 2017 Only
  • 5. “ ” The magic of machine learning is that you build a statistical model based on the most valid dataset for the domain of interest. If the data is junk, then you’ll be building a junk model that will not be able to do its job. James Kobeilus SiliconANGLE Wikibon Lead Analyst for Data Science, Deep Learning, App Development 2018
  • 6. 1 Key steps to improve Data Quality for AI/ML Identify the “right” data to collect and work with Establish baselines of data quality through data profiling and business rules Assess and communicate the fitness for purpose of the data for training and evaluating the subsequent models and algorithms 6 Four foundational data steps to get or keep your AI and ML projects grounded and underway: Frame the business problem 2 3 4
  • 7. 1. Frame the business problem
  • 8. Common Machine Learning applications Customer/Marketing • Targeted marketing • Recommendation engine • Next best action • Customer churn prevention • Sentiment analysis Risk Management • Anti-money laundering • Fraud detection (electricity pilferage, fraudulent transactions) • Cybersecurity • Know your customer Supply Chain Management • Reduction of freight costs/Optimal routing • Damage identification/Mechanical repair 8
  • 9. Universal DQ best practices Understand the End Goal • How does the business intend to use the data (i.e. what’s the use case)? • Empower users (“Who”) to gain new clarity into the core problem (“Why”) • What will the data be used for? • What defines the Fitness for your Purpose? Establish Scope • Ask the “right questions” about the use case and the data (not just “what” and “how”) • What data is relevant to the effort? • Big Data or other, you need to set boundaries for the work Understand Context • How does the business define the data? • What are the important characteristics and context of the data? • What are the Critical Data Elements? • What qualities will you need to address, or leave alone? • “High-quality data” definition will vary by business problem “If you don’t know what you want to get out of the data, how can you know what data you need – and what insight you’re looking for?” Wolf Ruzicka, Chairman of the Board at EastBanc Technologies, Blog post: June 1, 2017, “Grow A Data Tree Out Of The “Big Data” Swamp” “Never lead with a data set; lead with a question.” Anthony Scriffignano, Chief Data Scientist, Dun & Bradstreet, Forbes Insights, May 31, 2017, “The Data Differentiator” 9
  • 10. 2. Identify the “right” data
  • 11. What’s the “Right” Data? Is relevant and specific for the business problem Is free from bias and assumptions Supports hypothesis testing Ask questions about the data you expect you need Understand the Provenance of the data • Who produced it, when did they produce it, and why? • Has it been transformed or changed from original (lineage)? Understand whether the data is Comprehensive • What is the scope of the data? • What data is missing? • Are approaches available to identify/capture what is missing? Understand the “universe” of Relevant data • Consider sources within and outside the organization Understand whether the data is Timely • How can you be certain the data is truly current? Understand additional challenges obtaining data, both for evaluation and operational use 11
  • 12. Comprehensiveness depends on the business context/question • Customer Engagement/Loyalty  • Known customers, both active & inactive • New Customer Campaigns  • “Active” consumers, both known and unknown • Fraud Detection  • Any known or unknown person impersonating a customer or prospect Ask/understand what the “Unknown and/or Unavailable”  represents • Why does this segment exist? • If relevant, can the characteristics be inferred through other data? • Is there inherent bias in leaving this group out? Comprehensive: a “Customer” example Unknown & Active • Prospect • Data in CRM? Website visits? Store visits? Prospect lists? Known & Active • Customer • Data in MDM/DW • What about Call Center? CRM? Website visits? Store visits? Loyalty Program? Unknown and/or unavailable • Not a customer • No data? Or is data available through other means? Known & Inactive • Former Customer • Data in MDM/DW? • What about Call Center? CRM? Website visits? Store visits? Loyalty Program?     12
  • 13. Relevance for additional data depends on the business context/questions • Customer Engagement/Loyalty • Website, Call Center, Social Media, Location, Store Data, Demographics • New Customer Campaigns • Location, Demographics, Website, Social Media, Prospect Lists • Optimal Shipping/Delivery • Location, Weather, Store Data • Fraud Detection • IP Address, Device ID, Purchase Location, etc. Additional content from both internal and external sources may be relevant if within a useful time period • Change of Address, Suppression lists, etc. Relevant: a “Customer” example “Customer” Location Demographics Social Media Website, Call Center, Store, etc. Other: Weather, Prospect Lists, etc. Order Transactions Call Transcriptions Product/ Service Reviews Abandoned Carts Census Data Credit History 13
  • 14. 1. Lack of data, or scattered and difficult to access datasets • Little or no accessible data; or necessary data trapped in mainframes, operational systems, or streams. • Data typically stored in incompatible formats. • Other data must be acquired, appended, or transformed for use. 2. Data standardization, cleansing, and enrichment at scale • Data needs to be tagged, classified, standardized, and normalized. • Data quality standardization, cleansing, enrichment, and preparation needs to be applied consistently and reproduced at scale. 3. Entity resolution and customer identification • Distinguishing single entity matches across massive datasets requires sophisticated multi-pass, multi-field matching algorithms. • Continuous cross-comparison and resolution needs to occur as new data arrives. 4. Need for near real-time current data • Tracking and detection needs to happen very rapidly. • Current transactions constantly added to combined datasets and presented to models as close to real-time as possible. 5. Tracking lineage from the source • Data changes made to help train models have to be exactly duplicated in production. • Capture of complete lineage, from source to end point is needed. Five further challenges to enable Machine Learning 14
  • 16. Data Quality challenges with Machine Learning Incorrect, incomplete, mis-formatted, and sparse “dirty data” • Mistakes and errors are rarely the patterns you are looking for. • Sparse data generates other issues or may be ignored as “noise”. • Correcting and standardizing data boosts the signal, but can increase bias. Missing context • Insufficient information about customer and location data can make many ML algorithms unusable. • Enriching data increases context, but choice of source can skew/bias result. Duplicates and multiple copies • Many sources can yield multiple records about the same person, company, product or other entity, skewing the signal and outcomes. • Removing duplicates enhances the overall depth and accuracy about a single entity, but must watch for over- or undermatching of data. Spurious correlations • Inclusion of already correlated data (e.g. city and postal code) may result in overfitting of ML algorithms or ‘false’ discoveries. Correcting data problems vastly increases a data set’s usefulness for machine learning. But data analysts may not be aware of specific data quality issues that must be addressed to support machine learning. Traditional data quality processes are an effective method to identify defects. !CAUTION 16
  • 17. Understand Context • What Critical Data Elements and other attributes are relevant? • What qualities need to be addressed, or left alone? • When, and where, do we need to transform or enrich the data content? • How are we connecting, relating, or combining data? Develop, Test, and Deploy Corrective Measures • Consistent application of standardization, transformation, enrichment, and entity resolution • Common templates, rules, metrics, and processes that can be leveraged • Validation and measurement after corrective measures applied • Deploy into batch, real-time, or embedded services Apply Data Governance • Implement metrics and measures for ongoing assessment and evaluation • Establish baselines for ongoing comparison/evaluation • Continue to iterate throughout data preparation and model testing Data Quality best practices 17
  • 18. Tools for DQ analysis Data Profiling The set of analytical techniques that evaluate actual data content (vs. metadata) to provide a complete view of each data element in a data source. Provides summarized inferences, and details of value and pattern frequencies to quickly gain data insights. Business Rules The data quality or validation rules that help ensure that data is “fit for use” in its intended operational and decision-making contexts. Assess the dimensions of data quality: accuracy, completeness, consistency, relevance, timeliness, & validity of data. 18
  • 19. Common Data Quality measurements What measures can we take advantage of? 1. Completeness – Are the relevant fields populated? 2. Integrity – Does the data maintain an internal structural integrity or a relational integrity across sources 3. Uniqueness – Are keys or records unique? 4. Validity – Does the data have the correct values? • Code and reference values • Valid ranges • Valid value combinations 5. Consistency – Is the data at consistent levels of aggregation or does it have consistent valid values over time? 6. Timeliness – Did the data arrive in a time period that makes it useful or usable? 19
  • 20. New data, new data quality challenges • 3rd Party and external data with unknown provenance, timeliness, or relevance • Bias in the data – whether in collection, extraction, or other processing • Data without standardized structure or formatting • Continuously streaming data • Disjointed data (e.g. gaps in comprehensiveness or receipt) • Consistency and verification of data sources (e.g. was the origination verified?) • Changes and transformation applied to data (i.e. does it really represent the original input) New Data Quality problems “34 percent of bankers in our survey report that their organization has been the target of adversarial AI at least once, and 78 percent believe automated systems create new risks, such as fake data, external data manipulation, and inherent bias.”” Accenture Banking Technology Vision 2018 20
  • 21. 4. Assess & communicate fitness for purpose
  • 22. Work within the defined Business Frame! • Reconfirm the business purpose and context • Review the data attributes deemed critical and the criteria that required validation Test and validate data for identified DQ measurements • Apply data profiling and established business rules • Establish baselines! • Evaluate and determine necessary actions/remediate issues • Take action on incorrect data and defaults • Create flags for subsequent use in marking or remediating data Annotate what you’ve found • Identify each attribute/criteria and annotate all issues • Utilize flags, tags, and other indicators to help others distinguish the type and severity of issues Establish, document, and present Fitness for Purpose Iterate for all data in use, as well as model validation Assess Fitness for Purpose 22
  • 23. Culture of Data Literacy “Democratization of Data” requires cultural support • Empowered to ask questions about the data • Trained to understand the business context and use of data • Trained to understand approaching and evaluating data quality • Traditional data, new data, machine learning requirements, … • Empowered to prove/reject hypotheses Program of Data Governance • Provide the processes and practices necessary for success • Measure, monitor, and improve • Continous iteration and development • Communicate what you’ve discovered! (and where others can find!) Center of Excellence/Knowledge Base • Where do you go to find answers? • Who can help show you how? Communicate! 23
  • 24. Summary Keep AI/ML projects focused It is challenging to keep the business frame/value in mind! • Data comes from multiple disparate systems & sources • The business context may not be obvious based on data alone • There is a higher demand and expectation for seeing data quality in context. • You need to assess and measure the data content to establish both baselines and common understanding 4 Key Steps 1. Remember the end goal – ask questions, use best practices, and establish scope & context 2. Consider what data is needed • Focus your attention based on the type of data and the use case • Consider how you can ensure data is comprehensive, relevant, and useful 3. Test rules to validate data quality, establish baselines, communicate findings, and build trust! 4. Assess and communicate fitness for purpose Gaining insight and measurement of data quality is more critical than ever! 24
  • 25. Further Resources • Data Profiling: The First Step to Big Data Quality • Emerging Data Quality Trends for Governing and Analyzing Big Data • Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for the Data Lake harald.smith@syncsort.com