1. Taxonomy Management, Automatic
Metadata Tagging & Auto Classification
in SharePoint
Juan J. Celaya
COMPU-DATA International, LLC
jcelaya@cdlac.com
Twitter = JuanJCelaya
www.cdlac.com
2. Welcome to Houston SharePoint Saturday
Thank you for being a part of the first
ever SharePoint Saturday for the
greater Houston area!
• Please turn off all electronic devices or set them to vibrate.
• If you must take a phone call, please do so in the hall so as not
to disturb others.
• Thanks to our Platinum Sponsors:
3. Automatic - Metadata Tagging & Classification
COMPU-DATA International, LLC (“CDI”) specializes in Enterprise Content Management
(ECM), Information Organization & Access (IOA) and Business Process Automation (BPA) delivering
specific products, services and solutions.
What we solve
1. The inability to find relevant and precise information
2. Inefficiencies that exist in manually tagging records and its effects in retention
3. The burden of organizing records from diverse repositories into a single interface
4. Problems with implementations that address Privacy or Confidential business information
exposure
5. Failures in the successful implementation of distributed capture systems
6. Increasing costs due to failures in process automation
4. Automatic - Metadata Tagging & Classification
Taxonomy Management, Automatic Metadata Tagging
& Auto Classification in SharePoint®
In support of
Data Transparency, Security, Search Precision and Privacy
For solutions in
Information Management, Records Management and eDiscovery
7. Automatic - Metadata Tagging & Classification Background
DoD Directive 8320.02 Challenge
Data Sharing in a Net-Centric Department of Defense
Paragraph 4.2: “Data assets shall be made visible by
creating and associating metadata (“tagging”), including
discovery metadata, for each asset.”
8. Automatic - Metadata Tagging & Classification Background
Drive resolution of capability gaps across $6.9B HMO with
75 locations
Findability
Records Management Declaration
Data Privacy and Security
Research support for Veteran Administration in relation to
Veteran Claims
Post Traumatic Stress Disorder
Agent Orange
Classified & Unclassified Information
10. Automatic - Metadata Tagging & Classification Problem
Capability Gap
Communication of Relevant Information
Access to Distributed Electronic Information
Security and Privacy
Access
Rights
Records
Retention
Code Server Content with
Metadata
Appropriate Document Library 1 Document Library 2
Tagging Metadata, Retention
Codes, and Rights
Management Templates
Document Library 3 Document Library 4
11. Automatic - Metadata Tagging & Classification Problem
Capability Gap
Communication of Relevant Information
Access to Distributed Electronic Information
Security and Privacy
Failure to Tag Content to Organizational Standards
Lack of Information Transparency Becomes a Limiting Factor
Increasing Volume of Sensitive Information Exposure Events
Increase Failure to Find Information
Non-Compliance with Records Management Policies
Affect the Organization in that
Organizational Confidential & Sensitive Information Revealed
Personally Identifiable Information (PII) and Protected Health
Information (PHI) data exposure events
Time Gap between Information Requests & Discovery is Directly
Proportional to Volume of Data Assets
Information not Preserved per Regulatory Guidelines
12. Automatic - Metadata Tagging & Classification Problem
Is it possible to tag every asset created?
Is it possible to properly tag every asset?
(Semantic, Records Retention and Security Metadata)
Not by a human being!
13. Automatic - Metadata Tagging & Classification Problem
Other Manual Metadata Tagging Problems
Created from a subjective frame of reference
May not be in line with corporate governance
Limited use of templates to populate metadata
Limits document transparency in an ECM
environment
Cost in-effective
Ineffective
Capture X
Manage
X
Store
X
Preserve
X
Deliver
15. Automatic - Metadata Tagging & Classification Solution
Using Metadata to reduce
Data Exposure Events
Enforce Information & Record
Management Policies
and simultaneously
Improving Search Precision
Automatically!
16. Automatic - Metadata Tagging & Classification Solution
Metadata Drives Content Content Types
Types which Drives the Auto- (based on)
Application of RMS Templates
Records Retention Schedule Taxonomies
COBRA
Davis-Bacon Act
Employee Polygraph Protection Act of 1988
Automatically tag content Equal Employment Opportunity Laws
upon upload to document Employee Retirement Income Security Act
Fair Credit Reporting Act
libraries using organizational Fair Labor Reporting Act
Family and Medical Leave Act
metadata – PII, PHI, and Health Insurance and Portability Accountability Act
sensitive information Immigration and Reform Control Act
Lilly Ledbetter Fair Pay Act
Once tagged Microsoft Office National Labor Relations Act
Omnibus Transportation Employee Testing Act
documents containing PII or Occupational Safety and Health Administration
PHI can be “locked-down” Paperwork Reduction Act of 1980
Sarbanes-Oxley Act
using Window Rights Uniformed Services Employment and Reemployment Rights Act
Walsh-Healey Act
Management Services (RMS)
and PDFs can be locked down Data Privacy and Security Taxonomies
using customized iFilters on Business Unit and Functional Taxonomies
the MOSS server
17. Automatic - Metadata Tagging & Classification Solution
Metadata Drives Content Types Content Types
which Drives Records
(based on)
Management
Records Retention Schedule Taxonomies
COBRA
Davis-Bacon Act
Automatically tag documents Employee Polygraph Protection Act of 1988
with Record Policy Codes Equal Employment Opportunity Laws
Employee Retirement Income Security Act
based upon semantic content Fair Credit Reporting Act
contained in the documents Fair Labor Reporting Act
Family and Medical Leave Act
Automate short-term storage Health Insurance and Portability Accountability Act
Immigration and Reform Control Act
and long-term data Lilly Ledbetter Fair Pay Act
National Labor Relations Act
preservation Omnibus Transportation Employee Testing Act
Enforce rules without
Occupational Safety and Health Administration
Paperwork Reduction Act of 1980
depending on the end users Sarbanes-Oxley Act
Uniformed Services Employment and Reemployment Rights Act
Rules are applied in the same Walsh-Healey Act
manner every time all the time Data Privacy and Security Taxonomies
regardless of the data source Business Unit and Functional Taxonomies
(Consistency)
18. Automatic - Metadata Tagging & Classification Solution
Metadata Tagging Drives Search Content Types
Precision
(based on)
Records Retention Schedule Taxonomies
Increase the value of COBRA
Davis-Bacon Act
unstructured and structured Employee Polygraph Protection Act of 1988
Equal Employment Opportunity Laws
information Employee Retirement Income Security Act
Eliminate non-existent or Fair Credit Reporting Act
Fair Labor Reporting Act
inconsistent metadata tagging Family and Medical Leave Act
Health Insurance and Portability Accountability Act
by end users Immigration and Reform Control Act
Lilly Ledbetter Fair Pay Act
Significantly decrease costs National Labor Relations Act
associated with manual Omnibus Transportation Employee Testing Act
Occupational Safety and Health Administration
metadata tagging Paperwork Reduction Act of 1980
Sarbanes-Oxley Act
Enable workflow processes Uniformed Services Employment and Reemployment Rights Act
through automatic semantic Walsh-Healey Act
metadata tagging into Data Privacy and Security Taxonomies
SharePoint content types Business Unit and Functional Taxonomies
Dramatically boost Enterprise
Search Precision
19. Automatic - Metadata Tagging & Classification Solution
Taxonomy Management
Enabling the Automatic Meta-tagging and Auto-Classification of Documents and Records
Manually Created Clues
associated with “Weather”
Metadata
20. Automatic - Metadata Tagging & Classification Solution
Automatic Metadata Generation
Expand the clues with the use of highly relevant content
Automatically Generated
Clues associated with
“Weather”
21. Automatic - Metadata Tagging & Classification Solution
Automatic Metadata Generation
The relevant semantic data linked results in more actionable information
Highly relevant metadata generated by
Taxonomy Manager added to original
clue set for the concept of “Weather”
27. Automatic - Metadata Tagging & Classification Solution
Taxonomy Browsing
Search within folder using enterprise search
while filtering across metadata that has
been automatically tagged to documents
and records
Taxonomies from Taxonomy
Manager displayed within
SharePoint
Newsletter 0102 is 1 of 38 documents
that have been automatically classified
to the Weather folder and tagged with
the concept of “Turbulence Encounter”
28. Automatic - Metadata Tagging & Classification Solution
Facetted Searching
Weather and Windshear or Thunderstorms are 2 of 5 pieces of
metadata that were automatically tagged to the document titled
“Newsletter 01-02”
Selecting “Windshear or Thunderstorms” generates a result set on
the next slide
29. Automatic - Metadata Tagging & Classification Solution
Facetted Searching
At the intersection of two pieces of metadata, “Weather” and “Windshear or
Thunderstorms”, resides the document entitled “Newsletter 0102”
In addition there are 36 other documents that also have those two
compound term meta-tags that were generated with Taxonomy Manager
New facets are dynamically generated based upon the new result set of 36 documents
that reside at the intersection of “Weather” and “Windshear or Thunderstorms”
33. Automatic - Metadata Tagging & Classification Approach
When building a Taxonomy
Address business issues that cause problems (costs)
Search/Findability
Data Exposure Events
SOX compliance – Retention!
Do NOT make it an academic exercise
Priority
Start with the one that causes the most pain at the
executive level
Legal issues and guidelines that affect the business
You don’t need a lot of people….
34. Automatic - Metadata Tagging & Classification Approach
Outcome Due to the Use of Metadata
All information is automatically tagged resulting in
the automatic filing of information based on
organizational records management policies
All information is retrievable using concepts (high-
precision) instead of key-words (low-precision)
Reading, tagging and information filing costs are
significantly reduced if not eliminated
Information retrieval costs are dramatically reduced
Consistent results vs. manual tagging
35. Automatic - Metadata Tagging & Classification Approach
Compliance Outcomes using Metadata in the
Federal Sector
Compliance with DoD Directive 8320.02
Data Transparency
Compliance with DoD Directive 8015.02
Record Management – Gov. equivalent SOX
Compliance with DoD Privacy Act Program
Compliance with HIPAA & JCAHO Data Privacy &
Security Standards
HIPAA – Health Insurance Portability & Accountability Act
JCAHO – Joint Commission on the Accreditation of Healthcare Organizations
37. Session Evaluation
• Please complete and turn in
your Session Evaluation
Form so we can improve
future events.
• Presenter:
– Juan J. Celaya
• Session Name:
– Automatic:Metadata Tagging
& Classification