SlideShare a Scribd company logo
1 of 21
TAGGING
DOCUMENTS MADE
EASY, USING
MACHINE
LEARNING
Brendan Clarke
brendan@termset.com
www.termSet.com
BRENDAN
CLARKE
• A Microsoft ECM expert
• Co-Founded TermSet three
years ago
• Got the scars from real world
IA projects
Creating
Taxononomies,
7
NLP, 3
Demo, 10Tagging, 10
Demo, 10
Agenda
PART ONE – APPROACHES FOR BUILDING TAXONOMIES
TOP DOWN - APPROCH
• Defines top level
containers and work
downwards.
• Usually broad (3-10
wide) and shallow (3-4
deep)
• Simple, high level
classification (functional)
TOP DOWN – TERMS
• Manually defined or
replicated from existing
structures
• Imported from other
systems
• Industry standards /
purchased taxonomies
TOP DOWN – SUMMARY
• People / Committee
Driven approach
• Some guesswork of
what terms should be
• Simple, high level
classification (functional)
– Way better than
folders!
BOTTOM UP - APPROCH
• Terms driven by the
words and phrases
within your content
• More complex
taxonomies
• Detailed, accurate terms
that are subject or facet
level
BOTTOM UP - TERMS
• Manual analysis of the
documents
• Statistical analysis of
terms and phrases
• Natural Language
processing
BOTTOM UP - SUMMARY
• Technology driven
approach (or a very tough
people process)
• Produces detailed
taxonomies that reflect the
actual content
• Extra granulation of
tagging
AND THE WINNER IS…
• Combining top down and
bottom up is the best
approach
• Top down classifies the
type of documents
• Bottom up classifies the
subject of the document
• New technology allows
bottom up to be realistic
TermSet adds accurate consistent metadata without placing any burden on
end users or your IT team.
Builds taxonomies (bottom up) using NLP
Applies tags
Metadata as a service TM
WHAT EXACTLY IS NLP ?
DEMO – CREATING TERMS FROM YOUR DOCUMENTS USING NLP
PART TWO – APPLYING YOUR TAGS
MANUAL TAGGING
• Adoption problem
• Asbestos problem / GIGO
• Challenging to do retrospectively
(migration tools can help)
MANUAL TAGGING
• Infer as many terms as possible from:
Document types, Location, Function
• Mandate as few tags as possible
• Stay shallow or flat with hierarchies
MACHINE TAGGING
• Simple machine tagging can use search
to match taxonomy terms to the
content of documents
• More advanced taggers allow rules or
weights to be assigned to each tag
(tags not context aware)
• New technologies (NLP) provide a new
approach to creating taxonomies
TERMSET TAGGING
• TermSet recommends the right
taxonomies for each library (context
aware tagging)
• TermSet automates building the
underlying IA in SharePoint
• Extra cool NLP tags can be added
(Summaries, Sentiment and Language)
• Monitors for new documents and
terms arriving into your world
DEMO – TAGGING DOCUMENTS
WRAP UP
• TermSet automates a bottom up
approach to create and use
taxonomies for SharePoint
• Visit www.termset.com or e-mail
brendan@termset.com for a free
licence
• If you need assistance with top down
taxonomies or you use a different DMS
e-mail me to join the beta program for
www.taxononica.com

More Related Content

Similar to TermSet metadata tagging presentation - taxonomy bootcamp london 2016

TM-Town TAUS Translation Technology Webinar (April 2015)
TM-Town TAUS Translation Technology Webinar (April 2015)TM-Town TAUS Translation Technology Webinar (April 2015)
TM-Town TAUS Translation Technology Webinar (April 2015)Kevin Dias
 
Taxonomy Now! Building a stress-resistant knowledge architecture in your curr...
Taxonomy Now! Building a stress-resistant knowledge architecture in your curr...Taxonomy Now! Building a stress-resistant knowledge architecture in your curr...
Taxonomy Now! Building a stress-resistant knowledge architecture in your curr...Joe Pairman
 
Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and searchNathan McMinn
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesMax Irwin
 
Successful Content Management Through Taxonomy And Metadata Design
Successful Content Management Through Taxonomy And Metadata DesignSuccessful Content Management Through Taxonomy And Metadata Design
Successful Content Management Through Taxonomy And Metadata Designsarakirsten
 
Taxonomy: a powerful magnifier with a harsh lens
Taxonomy: a powerful magnifier with a harsh lensTaxonomy: a powerful magnifier with a harsh lens
Taxonomy: a powerful magnifier with a harsh lensJoe Pairman
 
Navigating the Mess of a Shared drive Migration to SharePoint
Navigating the Mess of a Shared drive Migration to SharePointNavigating the Mess of a Shared drive Migration to SharePoint
Navigating the Mess of a Shared drive Migration to SharePointJoanne Klein
 
[AIIM17] Data Categorization You Can Live With - Monica Crocker
[AIIM17]  Data Categorization You Can Live With - Monica Crocker [AIIM17]  Data Categorization You Can Live With - Monica Crocker
[AIIM17] Data Categorization You Can Live With - Monica Crocker AIIM International
 
Lean and Collaborative Content - Workshop
Lean and Collaborative Content - WorkshopLean and Collaborative Content - Workshop
Lean and Collaborative Content - WorkshopIXIASOFT
 
How To Integrate Taxonomy and Term Store Management Webinar
How To Integrate Taxonomy and Term Store Management WebinarHow To Integrate Taxonomy and Term Store Management Webinar
How To Integrate Taxonomy and Term Store Management WebinarConcept Searching, Inc
 
DITA-Workshop on Saturday 5 May 2018 at Pune
DITA-Workshop on Saturday 5 May 2018 at PuneDITA-Workshop on Saturday 5 May 2018 at Pune
DITA-Workshop on Saturday 5 May 2018 at PuneAmit Siddhartha
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkSimon Hughes
 
DITA Surprise, Unwrapping DITA Best Practices - tekom tcworld 2016
DITA Surprise, Unwrapping DITA Best Practices - tekom tcworld 2016DITA Surprise, Unwrapping DITA Best Practices - tekom tcworld 2016
DITA Surprise, Unwrapping DITA Best Practices - tekom tcworld 2016IXIASOFT
 
Systematic Searching Strategies.pptx
Systematic Searching Strategies.pptxSystematic Searching Strategies.pptx
Systematic Searching Strategies.pptxAnPhong9
 

Similar to TermSet metadata tagging presentation - taxonomy bootcamp london 2016 (20)

TM-Town TAUS Translation Technology Webinar (April 2015)
TM-Town TAUS Translation Technology Webinar (April 2015)TM-Town TAUS Translation Technology Webinar (April 2015)
TM-Town TAUS Translation Technology Webinar (April 2015)
 
Taxonomy Now! Building a stress-resistant knowledge architecture in your curr...
Taxonomy Now! Building a stress-resistant knowledge architecture in your curr...Taxonomy Now! Building a stress-resistant knowledge architecture in your curr...
Taxonomy Now! Building a stress-resistant knowledge architecture in your curr...
 
Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and search
 
TM Town - TAUS Tokyo Forum 2015
TM Town - TAUS Tokyo Forum 2015TM Town - TAUS Tokyo Forum 2015
TM Town - TAUS Tokyo Forum 2015
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
 
Successful Content Management Through Taxonomy And Metadata Design
Successful Content Management Through Taxonomy And Metadata DesignSuccessful Content Management Through Taxonomy And Metadata Design
Successful Content Management Through Taxonomy And Metadata Design
 
Taxonomy And Metadata
Taxonomy And MetadataTaxonomy And Metadata
Taxonomy And Metadata
 
Taxonomy: a powerful magnifier with a harsh lens
Taxonomy: a powerful magnifier with a harsh lensTaxonomy: a powerful magnifier with a harsh lens
Taxonomy: a powerful magnifier with a harsh lens
 
DHUG 2017 - Thesaurus Construction Training
DHUG 2017 - Thesaurus Construction TrainingDHUG 2017 - Thesaurus Construction Training
DHUG 2017 - Thesaurus Construction Training
 
Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013
 
Antconc
AntconcAntconc
Antconc
 
Navigating the Mess of a Shared drive Migration to SharePoint
Navigating the Mess of a Shared drive Migration to SharePointNavigating the Mess of a Shared drive Migration to SharePoint
Navigating the Mess of a Shared drive Migration to SharePoint
 
[AIIM17] Data Categorization You Can Live With - Monica Crocker
[AIIM17]  Data Categorization You Can Live With - Monica Crocker [AIIM17]  Data Categorization You Can Live With - Monica Crocker
[AIIM17] Data Categorization You Can Live With - Monica Crocker
 
Lean and Collaborative Content - Workshop
Lean and Collaborative Content - WorkshopLean and Collaborative Content - Workshop
Lean and Collaborative Content - Workshop
 
How To Integrate Taxonomy and Term Store Management Webinar
How To Integrate Taxonomy and Term Store Management WebinarHow To Integrate Taxonomy and Term Store Management Webinar
How To Integrate Taxonomy and Term Store Management Webinar
 
DITA-Workshop on Saturday 5 May 2018 at Pune
DITA-Workshop on Saturday 5 May 2018 at PuneDITA-Workshop on Saturday 5 May 2018 at Pune
DITA-Workshop on Saturday 5 May 2018 at Pune
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
 
DITA Surprise, Unwrapping DITA Best Practices - tekom tcworld 2016
DITA Surprise, Unwrapping DITA Best Practices - tekom tcworld 2016DITA Surprise, Unwrapping DITA Best Practices - tekom tcworld 2016
DITA Surprise, Unwrapping DITA Best Practices - tekom tcworld 2016
 
Systematic Searching Strategies.pptx
Systematic Searching Strategies.pptxSystematic Searching Strategies.pptx
Systematic Searching Strategies.pptx
 
Text analytics
Text analyticsText analytics
Text analytics
 

Recently uploaded

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Recently uploaded (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

TermSet metadata tagging presentation - taxonomy bootcamp london 2016

  • 1. TAGGING DOCUMENTS MADE EASY, USING MACHINE LEARNING Brendan Clarke brendan@termset.com www.termSet.com
  • 2. BRENDAN CLARKE • A Microsoft ECM expert • Co-Founded TermSet three years ago • Got the scars from real world IA projects
  • 4. PART ONE – APPROACHES FOR BUILDING TAXONOMIES
  • 5. TOP DOWN - APPROCH • Defines top level containers and work downwards. • Usually broad (3-10 wide) and shallow (3-4 deep) • Simple, high level classification (functional)
  • 6. TOP DOWN – TERMS • Manually defined or replicated from existing structures • Imported from other systems • Industry standards / purchased taxonomies
  • 7. TOP DOWN – SUMMARY • People / Committee Driven approach • Some guesswork of what terms should be • Simple, high level classification (functional) – Way better than folders!
  • 8. BOTTOM UP - APPROCH • Terms driven by the words and phrases within your content • More complex taxonomies • Detailed, accurate terms that are subject or facet level
  • 9. BOTTOM UP - TERMS • Manual analysis of the documents • Statistical analysis of terms and phrases • Natural Language processing
  • 10. BOTTOM UP - SUMMARY • Technology driven approach (or a very tough people process) • Produces detailed taxonomies that reflect the actual content • Extra granulation of tagging
  • 11. AND THE WINNER IS… • Combining top down and bottom up is the best approach • Top down classifies the type of documents • Bottom up classifies the subject of the document • New technology allows bottom up to be realistic
  • 12. TermSet adds accurate consistent metadata without placing any burden on end users or your IT team. Builds taxonomies (bottom up) using NLP Applies tags Metadata as a service TM
  • 14. DEMO – CREATING TERMS FROM YOUR DOCUMENTS USING NLP
  • 15. PART TWO – APPLYING YOUR TAGS
  • 16. MANUAL TAGGING • Adoption problem • Asbestos problem / GIGO • Challenging to do retrospectively (migration tools can help)
  • 17. MANUAL TAGGING • Infer as many terms as possible from: Document types, Location, Function • Mandate as few tags as possible • Stay shallow or flat with hierarchies
  • 18. MACHINE TAGGING • Simple machine tagging can use search to match taxonomy terms to the content of documents • More advanced taggers allow rules or weights to be assigned to each tag (tags not context aware) • New technologies (NLP) provide a new approach to creating taxonomies
  • 19. TERMSET TAGGING • TermSet recommends the right taxonomies for each library (context aware tagging) • TermSet automates building the underlying IA in SharePoint • Extra cool NLP tags can be added (Summaries, Sentiment and Language) • Monitors for new documents and terms arriving into your world
  • 20. DEMO – TAGGING DOCUMENTS
  • 21. WRAP UP • TermSet automates a bottom up approach to create and use taxonomies for SharePoint • Visit www.termset.com or e-mail brendan@termset.com for a free licence • If you need assistance with top down taxonomies or you use a different DMS e-mail me to join the beta program for www.taxononica.com

Editor's Notes

  1. A top down approach defines containers for terms, usually starting with some global taxonomies such as locations, departments or products (used throughout the business). Lots of level 1 and 2 term sets that define the function of the document. For example, Departments -> HR Level 3 may begins to define the content itself, for example Departments -> HR -> Policy Documents Works well to classify content into the right areas. This is functional classification.
  2. Often terms are defined by committees who involve specialist groups to define terms Line of business systems or databases may contain data that can be imported (http://www.termset.com/blog/2016/8/25/loading-metadata-terms-into-sharepoint-using-powershell) SKOS is an interesting for advanced taxonomies (https://www.w3.org/2001/sw/wiki/SKOS/Datasets), WAND is off the shelf (http://www.wandinc.com/wand-taxonomy-library-portal.aspx)
  3. The challenge with deciding terms without looking at your documents is that it will be guesswork to know what would be effective. That said, a simple top down taxonomy is 10x better than a folder structure. No duplication as documents can be tagged within multiple areas.
  4. Bottom up means looking at the information you have in your content (usually documents and e-mails) and building taxonomies that are based on how you actually describe information. Bottom up results in a taxonomy that can describe the subject or facet of the document.
  5. How long does it take for people to read and process documents: http://www.termset.com/calc/ Getting a working team of people to actually read documents is time consuming and expensive, but sometimes if the information is valuable it may be worth it. There are tools that can analyse the frequency of works or phrases in your documents. They can be highly effective but need a lot of consultancy to make sense of the results. NLP is the future of text analysis (more later).
  6. A bottom up approach can be used to describe the contents of the documents (not just the area)
  7. TermSet has a different approach.  It manages every step of adding metadata to your SharePoint content.  Projects can be completed in days or weeks instead of months or years. The application uses machine learning that can build over 400 taxonomies that relate to your data. You can also easily train it to apply tags that are important to you. A full list of features is available at http://www.termset.com/platform/
  8. Natural language processing is at the core of TermSet. We have an engine trained to recognise entities within documents. (First Click) This a BBC news article, when our engine reads the text it identifies entitles such as people, locations and organisations. (Second Click) In fact, we identify a vast array of information inside the documents including concepts, sentiment and relationships.
  9. A document library with medical / pharma documents. There is no structure to the documents in this library.
  10. We create a discovery job to process (read) the documents.
  11. We select the location of the documents and can feed in existing taxonomies and define patterns to look for.
  12. TermSet can also suggest new taxonomies that are created from the terms inside your documents. TermSet can also assess the sentiment, the language and write a summary of any document.
  13. Click to create a brand new taxonomy build from your documents
  14. Select the taxonomy
  15. Verify the terms created from the content
  16. TermSet then creates columns in your libraries
  17. Every time you add a field that needs to be completed in order to save a document you are impeding adoption of a new DMS If you do mandate fields, many users will pick the first on the list or just randomly pick anything in order to save the document What do you do with the 1 million documents that came from a file share (or any other source without metadata)?
  18. Manually tagging new content can work well. Always use default values to answer as many questions before the user is involved (infer the metadata wherever possible). Keeping it simple is a good plan. Single lookup columns may be better than deep hierarchies.
  19. There are a number of taggers for SharePoint that will look at your documents and apply tags from a taxonomy that you have defined Some tagggers ask for rules to be defined for each term (can work well, takes forever to get right).
  20. Creates site collection columns.
  21. Creates site collection columns.
  22. Tags the documents asynchronously.
  23. Before TermSet.
  24. Two new columns added (Drug and Health condition) and the documents are tagged. New documents will tagged as they arrive (new terms will need to be approved).
  25. A one sentence summary of each document is created.
  26. Search is super-charged with meta-data available as refinement.
  27. Meta-data allows us to understand the information inside a document library.
  28. Visit www.termset.com or e-mail brendan@termset.com for a free licence If you need assistance with top down taxonomies or you use a different DMS please e-mail me to join the beta program for www.taxononica.com