With staff working from home, many institutions are prioritizing data quality projects. Join Chad Petrovay, TMS Administrator at The Morgan Library & Museum, as he shares his deep knowledge of data scrubbing. Power users, system administrators, and SQL experts will learn how to correct and monitor data quality, and are introduced to new low-cost/free tools.
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
How Clean is your Database? Data Scrubbing for all Skill Sets
1. Guest Presentation
HOW CLEAN IS YOUR DATABASE?
DATA SCRUBBING FOR ALL SKILL SETS (2020 EDITION)
CHAD PETROVAY, TMS ADMINISTRATOR
THE MORGAN LIBRARY & MUSEUM
2. Data Quality
Data quality is a measure of the condition of data
based on factors such as accuracy, completeness,
consistency, reliability and whether it’s up to date.
5. What is your personal skill level?
Power User
Uses the TMS UI;
has expanded rights,
but not full rights.
SQL Expert
Wait? TMS has a UI?
Nah, I’ll just script it in the
database.
Administrator
Full rights in TMS and
access to DB Config.
10. Institution
Standards
• Establish the rules for data entry
• Conceptualize terms and
authority values
Prevents
• Data entry errors
• Formatting errors
• Inconsistency
• Creativity
11. Institution
Training
• Makes the system approachable
• Improves user efficiency
Prevents
• Data entry errors
• Unmanaged data silos (Excel)
12. Power User
Spell Check
Uses the Spelling and Grammar
engine in Microsoft Office.
Prevents:
• Typographical errors
• Misspellings
• Punctuation errors
• Grammatical errors
13. Power User
Function Keys
Reduces keystrokes when entering
repeated text.
Prevents:
• Typographical errors
• Misspellings
• Punctuation errors
• Grammatical errors
• Formatting errors
14. System Admin
Customize Field Labels
• Clarify field usage
• Makes system more intuitive
• Align field labels with your
institutional lingo
Prevents
• Confusion
15. System Admin
Customize Field Labels
In Database Configuration
1. Manage » Tables/Columns
2. Find the table
3. Find the column (i.e. field)
4. Right-click » Edit
5. Change Local Column Name
16. System Admin
Security Groups
• If your institution does not use a
field, then restrict access
• Restrict control of authority values
to select power users
• Text Types & Term Types
Prevents
• Populating obsolete fields
• Creativity
18. System Admin
Usage Report
In TMS Module
1. Maintenance » Authorities » Others
2. Usage Report
3. After report generates:
• Browse
• Print
• Edit as RTF
• Save As RTF
19. System Admin
Frequency Report
In Database Configuration
1. Manage » Tables/Columns
2. Find the table
3. Find the column (i.e. field)
4. Right-click » Frequency
5. Save TXT file
20. Power User
Crystal Reports
In TMS Module
1. Report » Reports
2. Find report by name
3. Click Run
When creating the report:
1. Add formula “reporttype”
2. "NOTLINKED""NOTLINKED"
21. SQL
Distinct Values: SQL
A SQL query will return all
records, including:
• Departments you cannot see
• Template records
SELECT
DISTINCT ObjectName
FROM Objects
SELECT
ObjectName, COUNT(*)
FROM Objects
GROUP BY ObjectName
[HAVING COUNT(*) = 1]
[HAVING COUNT(*) > 1]
22. Power User
Distinct Values: Excel Pivot Table
Is the field in a List View?
Can you export your result set?
1. Export into Excel
2. Copy column into a new sheet
3. Create column “Count”
4. Fill “Count” with 1
5. Create a Pivot Table
Tutorial
• bit.ly/3d4M8Ou
23. Power User
OpenRefine
Install OpenRefine
• Download at www.openrefine.org
• Extract archive
• Execute openrefine.exe
• Opens in your web browser
Requires Java
• www.java.com/en/download/
24. Power User
OpenRefine: Facets
• A Facet shows a value
distribution
• Filter records
• Batch change
• Facets
• Word Facet
• Text-Length Facet
• Null / Empty String / Blank
Facets
26. Power User
DataCleaner
Install Community Edition
• Download at www.datacleaner.org
• Extract archive
• Execute DataCleaner.exe
Requires Java
• www.java.com/en/download/
27. Power User
DataCleaner: DataStore
If your server uses NT Authentication:
• Add SQL user to the database
Create a datastore in DataCleaner:
1. Select Microsoft SQL Server
2. Supply details
• Hostname = Server name
• Database = TMS
• Username & Password
29. Power User
DataCleaner: Building Job
• Drag database elements and
components onto the canvas
• Best to drag columns instead
of full tables/views
• Use filter to exclude NULLs
and empty strings
30. Power User
DataCleaner: Results
• String Analysis
• Row Count
• Null/Blank Count
• All upper/lower count
• Char/Word count
• Max/Min/Avg char count
• Max/Min/Avg space count
• Max/Min word count
• Click arrow for details
31. Power User
DataCleaner: Results
• Value Distribution
• Total count
• Distinct count
• List of distinct values
(except uniques)
• Graphical rank-size of distinct
values
• Click arrow for details
32. Power User
DataCleaner: Results
• Pattern Finder
• A = Uppercase letter
• a = Lowercase letter
• # = Number
• ? = AlphaNumeric
• Graphical rank-size of distinct
patterns
• Click arrow for details
33. Power User
Data Quality Services (DQS)
• Knowledge Base
• Projects
• Cleansing
• Matching
• Bundled with SQL Server
• Enterprise Edition
• Developer Edition
• Only works with local
databases
34. PLANNING
“To achieve great things, two things are needed:
a plan, and not quite enough time.” –Leonard Bernstein
39. The Three Modes
Human Middleware
Direct human contact with UI
Usually Record-by-Record
Labor intensive
Automation
Requires additional
tools/services/platforms
Steeper learning curve
Artificial Intelligence
SQL Script
Change one or more records
through the back-end
Requires intimate knowledge of
database structure
40. Human Middleware
Finding Records by Pattern
• Query using wildcards:
• single character (?)
• multi-character (*)
• Wrap sequences with
double quotes
Format TMS Search
(646) 733-2239 “(???) ???-????”
646.733.2239 *???.???.????*
+44 (0)207379 8188 +*
(510) 652-8950 ext 223 “* ext*”
Chad M. “* ?.”
Cheryl & Edward *&*
Cheryl and Edward “* and *”
41. Human Middleware
Search and Replace (String)
In Objects Module
1. Maintenance » Database »
Search and Replace
2. Select Module/Table/Column
3. Provide search and replace terms
4. Review results
• Replace All
• Replace
• Skip
42. System Admin
Search and Replace (Thesaurus)
In Database Configuration
1. Edit » Search and Replace »
Linked Thesaurus Terms
2. Click Zoom button (…) to find source term
3. Click Zoom button (…) to find target term
4. Click OK and confirm
43. Human Middleware
Merge Constituents Utility
In Plugins folder
1. Search for duplicate constituents
2. Select candidates from the
suggestions
3. Click Next
Feature Idea: Constituent Packages!!
44. Human Middleware
Merge Constituents Utility
4. Set Target record
• Right-click » Merge to this
5. Edit data in the columns of
the grid
6. Go section by section and
select the data to keep
7. Ready to merge?
• File » Merge
8. Save an XML file
45. SQL
Updating with SQL
• Know the system
• Test the SQL script in a sandbox
environment first
• Backup your database
before running SQL script
• Consider converting frequently used
scripts into Stored Procedures
• Gallery Systems may not be able to
provide support
46. SQL
Finding Records with Patterns
• Use a LIKE Statement
• Query using wildcards:
• single character (_)
• multi-character (%)
Format TMS Search
(646) 733-2239 ( _ _ _ ) _ _ _ - _ _ _ _
646.733.2239 % _ _ _ . _ _ _ . _ _ _ _ %
+44 (0)207379 8188 +%
(510) 652-8950 ext. 223 % ext%
Chad M. % _.
Cheryl & Edward %&%
Cheryl and Edward % and %
47. SQL
Excel Trick
If you have data in Excel
1. Create a SQL script using a
CONCATENATE formula
2. Copy the formula down the
column
3. Select and copy the column
4. Paste the content in SSMS
5. Execute
48. SQL
My Stored Procedure
Stored Procedure
• @ColumnID = Identifies the field
• Get the ColumnID from the
Data Dictionary
• @PK = Primary key for the record
• @NewValue = the value you want
• @LoginID = your username
EXECUTE [dbo].[MLM_UpdateFieldValue]
@ColumnID = 1243, @PK = 273469,
@NewValue = ‘Gift of John Doe’,
@LoginID = ‘cpetrovay’;
EXECUTE [dbo].[MLM_UpdateFieldValue]
@ColumnID = 1228, @PK = 273469,
@NewValue = ‘Loaned Object’,
@LoginID = ‘cpetrovay’;
49. SQL
My Stored Procedure
Process:
• Truncates new value if too long
• Looks up authority key values
• Updates only when value changes
• Tracks change in Audit Trail
Available:
• github.com/cpetrovay/TMS_UpdateField_SP/
EXECUTE [dbo].[MLM_UpdateFieldValue]
@ColumnID = 1243, @PK = 273469,
@NewValue = ‘Gift of John Doe’,
@LoginID = ‘cpetrovay’;
EXECUTE [dbo].[MLM_UpdateFieldValue]
@ColumnID = 1228, @PK = 273469,
@NewValue = ‘Loaned Object’,
@LoginID = ‘cpetrovay’;
54. Automation
Database Mail
• Sends email to user when
predefined criteria is met
• Requires configuration in
SQL Server
• Setup by SQL Expert
55. Automation
SSRS Subscription
• Sends email to user when
predefined criteria is met
• Requires configuration in
SSRS Server
• Setup by Report Writer
57. “While few things in life are guaranteed,
it is safe to say that not addressing data quality
issues this year means you’ll be facing the
same issues next year,
likely on a larger scale.”
-BO CRADER (sgENGAGE)