TermSet metadata tagging presentation - taxonomy bootcamp london 2016

•Download as PPTX, PDF•

1 like•294 views

Slides from the London Taxonomy Bootcamp 2016. Discussing town down and bottom up approaches for defining taxonomies. Demonstration of using natuaral language processing to automate the discovery of metadata in SharePoint documents.

Technology

TAGGING
DOCUMENTS MADE
EASY, USING
MACHINE
LEARNING
Brendan Clarke
brendan@termset.com
www.termSet.com

BRENDAN
CLARKE
• A Microsoft ECM expert
• Co-Founded TermSet three
years ago
• Got the scars from real world
IA projects

Creating
Taxononomies,
7
NLP, 3
Demo, 10Tagging, 10
Demo, 10
Agenda

PART ONE – APPROACHES FOR BUILDING TAXONOMIES

TOP DOWN - APPROCH
• Defines top level
containers and work
downwards.
• Usually broad (3-10
wide) and shallow (3-4
deep)
• Simple, high level
classification (functional)

TOP DOWN – TERMS
• Manually defined or
replicated from existing
structures
• Imported from other
systems
• Industry standards /
purchased taxonomies

TOP DOWN – SUMMARY
• People / Committee
Driven approach
• Some guesswork of
what terms should be
• Simple, high level
classification (functional)
– Way better than
folders!

BOTTOM UP - APPROCH
• Terms driven by the
words and phrases
within your content
• More complex
taxonomies
• Detailed, accurate terms
that are subject or facet
level

BOTTOM UP - TERMS
• Manual analysis of the
documents
• Statistical analysis of
terms and phrases
• Natural Language
processing

BOTTOM UP - SUMMARY
• Technology driven
approach (or a very tough
people process)
• Produces detailed
taxonomies that reflect the
actual content
• Extra granulation of
tagging

AND THE WINNER IS…
• Combining top down and
bottom up is the best
approach
• Top down classifies the
type of documents
• Bottom up classifies the
subject of the document
• New technology allows
bottom up to be realistic

TermSet adds accurate consistent metadata without placing any burden on
end users or your IT team.
Builds taxonomies (bottom up) using NLP
Applies tags
Metadata as a service TM

DEMO – CREATING TERMS FROM YOUR DOCUMENTS USING NLP

MANUAL TAGGING
• Adoption problem
• Asbestos problem / GIGO
• Challenging to do retrospectively
(migration tools can help)

MANUAL TAGGING
• Infer as many terms as possible from:
Document types, Location, Function
• Mandate as few tags as possible
• Stay shallow or flat with hierarchies

MACHINE TAGGING
• Simple machine tagging can use search
to match taxonomy terms to the
content of documents
• More advanced taggers allow rules or
weights to be assigned to each tag
(tags not context aware)
• New technologies (NLP) provide a new
approach to creating taxonomies

TERMSET TAGGING
• TermSet recommends the right
taxonomies for each library (context
aware tagging)
• TermSet automates building the
underlying IA in SharePoint
• Extra cool NLP tags can be added
(Summaries, Sentiment and Language)
• Monitors for new documents and
terms arriving into your world

WRAP UP
• TermSet automates a bottom up
approach to create and use
taxonomies for SharePoint
• Visit www.termset.com or e-mail
brendan@termset.com for a free
licence
• If you need assistance with top down
taxonomies or you use a different DMS
e-mail me to join the beta program for
www.taxononica.com

Recently uploaded (20)

Anypoint Exchange: It’s Not Just a Repo!

Human Factors of XR: Using Human Factors to Design XR Systems

Artificial intelligence in cctv survelliance.pptx

Gen AI in Business - Global Trends Report 2024.pdf

My INSURER PTE LTD - Insurtech Innovation Award 2024

WordPress Websites for Engineers: Elevate Your Brand

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

My Hashitalk Indonesia April 2024 Presentation

Are Multi-Cloud and Serverless Good or Bad?

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Search Engine Optimization SEO PDF for 2024.pdf

Developer Data Modeling Mistakes: From Postgres to NoSQL

"Debugging python applications inside k8s environment", Andrii Soldatenko

The Future of Software Development - Devin AI Innovative Approach.pdf

SIP trunking in Janus @ Kamailio World 2024

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

Connect Wave/ connectwave Pitch Deck Presentation

Story boards and shot lists for my a level piece

Vertex AI Gemini Prompt Engineering Tips

TermSet metadata tagging presentation - taxonomy bootcamp london 2016

1. TAGGING DOCUMENTS MADE EASY, USING MACHINE LEARNING Brendan Clarke brendan@termset.com www.termSet.com

2. BRENDAN CLARKE • A Microsoft ECM expert • Co-Founded TermSet three years ago • Got the scars from real world IA projects

3. Creating Taxononomies, 7 NLP, 3 Demo, 10Tagging, 10 Demo, 10 Agenda

4. PART ONE – APPROACHES FOR BUILDING TAXONOMIES

5. TOP DOWN - APPROCH • Defines top level containers and work downwards. • Usually broad (3-10 wide) and shallow (3-4 deep) • Simple, high level classification (functional)

6. TOP DOWN – TERMS • Manually defined or replicated from existing structures • Imported from other systems • Industry standards / purchased taxonomies

7. TOP DOWN – SUMMARY • People / Committee Driven approach • Some guesswork of what terms should be • Simple, high level classification (functional) – Way better than folders!

8. BOTTOM UP - APPROCH • Terms driven by the words and phrases within your content • More complex taxonomies • Detailed, accurate terms that are subject or facet level

9. BOTTOM UP - TERMS • Manual analysis of the documents • Statistical analysis of terms and phrases • Natural Language processing

10. BOTTOM UP - SUMMARY • Technology driven approach (or a very tough people process) • Produces detailed taxonomies that reflect the actual content • Extra granulation of tagging

11. AND THE WINNER IS… • Combining top down and bottom up is the best approach • Top down classifies the type of documents • Bottom up classifies the subject of the document • New technology allows bottom up to be realistic

12. TermSet adds accurate consistent metadata without placing any burden on end users or your IT team. Builds taxonomies (bottom up) using NLP Applies tags Metadata as a service TM

13. WHAT EXACTLY IS NLP ?

14. DEMO – CREATING TERMS FROM YOUR DOCUMENTS USING NLP

15. PART TWO – APPLYING YOUR TAGS

16. MANUAL TAGGING • Adoption problem • Asbestos problem / GIGO • Challenging to do retrospectively (migration tools can help)

17. MANUAL TAGGING • Infer as many terms as possible from: Document types, Location, Function • Mandate as few tags as possible • Stay shallow or flat with hierarchies

18. MACHINE TAGGING • Simple machine tagging can use search to match taxonomy terms to the content of documents • More advanced taggers allow rules or weights to be assigned to each tag (tags not context aware) • New technologies (NLP) provide a new approach to creating taxonomies

19. TERMSET TAGGING • TermSet recommends the right taxonomies for each library (context aware tagging) • TermSet automates building the underlying IA in SharePoint • Extra cool NLP tags can be added (Summaries, Sentiment and Language) • Monitors for new documents and terms arriving into your world

20. DEMO – TAGGING DOCUMENTS

21. WRAP UP • TermSet automates a bottom up approach to create and use taxonomies for SharePoint • Visit www.termset.com or e-mail brendan@termset.com for a free licence • If you need assistance with top down taxonomies or you use a different DMS e-mail me to join the beta program for www.taxononica.com

Editor's Notes

A top down approach defines containers for terms, usually starting with some global taxonomies such as locations, departments or products (used throughout the business). Lots of level 1 and 2 term sets that define the function of the document. For example, Departments -> HR Level 3 may begins to define the content itself, for example Departments -> HR -> Policy Documents Works well to classify content into the right areas. This is functional classification.
Often terms are defined by committees who involve specialist groups to define terms Line of business systems or databases may contain data that can be imported (http://www.termset.com/blog/2016/8/25/loading-metadata-terms-into-sharepoint-using-powershell) SKOS is an interesting for advanced taxonomies (https://www.w3.org/2001/sw/wiki/SKOS/Datasets), WAND is off the shelf (http://www.wandinc.com/wand-taxonomy-library-portal.aspx)
The challenge with deciding terms without looking at your documents is that it will be guesswork to know what would be effective. That said, a simple top down taxonomy is 10x better than a folder structure. No duplication as documents can be tagged within multiple areas.
Bottom up means looking at the information you have in your content (usually documents and e-mails) and building taxonomies that are based on how you actually describe information. Bottom up results in a taxonomy that can describe the subject or facet of the document.
How long does it take for people to read and process documents: http://www.termset.com/calc/ Getting a working team of people to actually read documents is time consuming and expensive, but sometimes if the information is valuable it may be worth it. There are tools that can analyse the frequency of works or phrases in your documents. They can be highly effective but need a lot of consultancy to make sense of the results. NLP is the future of text analysis (more later).
A bottom up approach can be used to describe the contents of the documents (not just the area)
TermSet has a different approach. It manages every step of adding metadata to your SharePoint content. Projects can be completed in days or weeks instead of months or years. The application uses machine learning that can build over 400 taxonomies that relate to your data. You can also easily train it to apply tags that are important to you. A full list of features is available at http://www.termset.com/platform/
Natural language processing is at the core of TermSet. We have an engine trained to recognise entities within documents. (First Click) This a BBC news article, when our engine reads the text it identifies entitles such as people, locations and organisations. (Second Click) In fact, we identify a vast array of information inside the documents including concepts, sentiment and relationships.
A document library with medical / pharma documents. There is no structure to the documents in this library.
We create a discovery job to process (read) the documents.
We select the location of the documents and can feed in existing taxonomies and define patterns to look for.
TermSet can also suggest new taxonomies that are created from the terms inside your documents. TermSet can also assess the sentiment, the language and write a summary of any document.
Click to create a brand new taxonomy build from your documents
Select the taxonomy
Verify the terms created from the content
TermSet then creates columns in your libraries
Every time you add a field that needs to be completed in order to save a document you are impeding adoption of a new DMS If you do mandate fields, many users will pick the first on the list or just randomly pick anything in order to save the document What do you do with the 1 million documents that came from a file share (or any other source without metadata)?
Manually tagging new content can work well. Always use default values to answer as many questions before the user is involved (infer the metadata wherever possible). Keeping it simple is a good plan. Single lookup columns may be better than deep hierarchies.
There are a number of taggers for SharePoint that will look at your documents and apply tags from a taxonomy that you have defined Some tagggers ask for rules to be defined for each term (can work well, takes forever to get right).
Creates site collection columns.
Creates site collection columns.
Tags the documents asynchronously.
Before TermSet.
Two new columns added (Drug and Health condition) and the documents are tagged. New documents will tagged as they arrive (new terms will need to be approved).
A one sentence summary of each document is created.
Search is super-charged with meta-data available as refinement.
Meta-data allows us to understand the information inside a document library.
Visit www.termset.com or e-mail brendan@termset.com for a free licence If you need assistance with top down taxonomies or you use a different DMS please e-mail me to join the beta program for www.taxononica.com

TermSet metadata tagging presentation - taxonomy bootcamp london 2016

Recommended

Recommended

More Related Content

Similar to TermSet metadata tagging presentation - taxonomy bootcamp london 2016

Similar to TermSet metadata tagging presentation - taxonomy bootcamp london 2016 (20)

Recently uploaded

Recently uploaded (20)

TermSet metadata tagging presentation - taxonomy bootcamp london 2016

Editor's Notes