6. Information is a significant raw material for
businesses around the world.
7. Making data-based decisions
Wrong information leads to wrong decisions
Information as products
Bad and unimpressive products
Information for logistics
Company may shut down
8. Gathering Data from Humans
Paper forms
• Spelling mistakes
• Unclear questions
• Bare minimum information
• OCR
Web forms
• Bypassing filters
9. Tainting Existing Data
Changes in procedures
• Didn’t update older data
• Different data structures
• Different ways of handling data
Importing sources of (bad) data
10. Some Industry Jargon
Single View of Customer
• Marketing Campaigns
Single Version of the Truth
• Strategy
Getting Correct Reports
14. How much is it worth?
• 30% ROI (big consultancy)
• 10-25% Loss of revenue for bad data quality
• Competitive advantage
• Avoid going out of business
15. MFI Group
Founded 1964
Upgraded ERP systems early 2000’s
Due to issues with data quality in 2004
• £46m in lost sales, £16m extra deliveries +
technical costs and £20m for the actual
system.
Administration 2008
(Comeback 2010)
16. Recap
Data Quality is a big subject
Avoid embarrassing mistakes
Keep company running efficiently
Good for reports
17. What Deduplication is used for
Increasing data quality
Compressing data
Pre-stage data cleansing needed
19. Address Matching
Databases
• Royal Mail (PAF)
• Council Address Data
• Do Your Own
Fill in missing parts
House Number, Building Number, House Name,
Flat Number, Company Name, Street, Locality,
Town, City, County, Country and Postcode
20. Name Matching
Name, Full name
Forename, Firstname,
Lastname, Surname
Initial
Middle name(s)
Title, Suffix
Qualification
Lord James Jonah William Smith 3rd
21. SQL example
SELECT c1.*, c2.*
FROM customers c1 INNER JOIN customers c2
ON c1.address_id = c2.address_id
WHERE c1.surname = c2.surname
AND c1.forename = c2.forename
AND (c1.middlename = c2.middlename
XOR (c1.middlename = ‘’ XOR c2.middle=name‘’));
22. Title Forename Middle Surname DOB
MR MARK MADANES 05/10/1963
MR MARK MADANES 04/10/1963
23. Title Forename Middle Surname DOB
MR CIARAN GERARD O’NEILL 26/07/1971
MR CIARAN M O’NEILL 26/07/1971
24. Title Forename Middle Surname DOB
MS JAN PHILMORE 15/10/1954
MR JAN PHILMORE 00/00/0000
25. Title Forename Middle Surname DOB
MR ALBERTO CARLOS 00/00/0000
MR ALBERT O CARLOS 00/00/0000
30. Business Rules
Example
• Middle name: Adam Smith vs. Adam E Smith
• Title: Miss vs. Ms vs. Lady
• Initial: A Smith vs. Adam Smith (same address)
• Surnames: O`Brien vs. O’Brien vs. O’Brien
• More Surname: McDonald vs. Mc Donald vs. Mac
Donald
31. Things to Watch Out for
Same father/son or mother/daughter names
Twins with same DOB
Initial for a forename
Mixing of forename with middle name
Changing surname after marriage