2. Update
⢠Who we are
⢠What we do
⢠Introducing Data Harmony 3.14
⢠New Features
⢠Introducing Data Harmony 4.0
⢠New Features
⢠New Products
4. A Brief History of Access Innovations, Inc.
⢠Founded in October, 1978 in Margieâs kitchen with 6 original
partners
⢠Jay Ven Eman hired as employee #1!
⢠Building bibliographic databases by aggregating information
from secondary publishers
⢠First commercial installation of Apple computers in 1980
5. Mission and Vision
⢠MISSION:
⢠To maximize customer information assets, their creation, capture,
distribution, and reuse
⢠VISION:
⢠Achieve and maintain technical and professional leadership in
software and services for content creators
6. ⢠Closely held
⢠Financed by
⢠Sweat and Persistence
⢠Good Cash Flow and Management
⢠Since 1978
Marjorie M.K. Hlava
Jay Ven Eman
Joanna Ginter
Woman Owned Small Business
Corporate Information
9. Our Services
⢠Metadata Creation and Enhancement
⢠Semantic Enrichment
⢠Controlled Vocabulary Development
⢠Database Design and Construction
⢠Text, Image, and Database Markup
⢠Data Capture and Conversion
⢠Abstracting and Indexing
⢠Training sets
⢠Medical Plants Names Service (MPNS)
12. Database Services - 3
⢠Applications development
⢠Data Harmony Hosting Environment
⢠Search â Lucene and Solr
⢠Search Harmony interface
⢠Web services layer
⢠Link to user experience or user interface
⢠Web calls
⢠API setup and linking
13. Database Services - 4
⢠Analytics from semantics
⢠Business Intelligence (BI)
⢠Visualizations for decision makers
⢠Coverage analytics
⢠Term mining
⢠Image indexing
⢠Fate prediction
⢠SciGen â No Bad Submissions (No B.S.)
⢠www.accessinn.com
14. Our Software
⢠Data Harmony
⢠XIS (XML Intranet System)Ž
⢠M.A.I.Ž (Machine Aided Indexer)
⢠Thesaurus Master Ž
MAIstroâ˘
Data Harmony
Suite
17. Data Harmony
⢠Built for our use starting in 1987
⢠Visual Basic C++ Java Web hosted
⢠Aid to the editorial and indexing processes
⢠Alleviate the clerical aspects
⢠Speed the tagging process
⢠Guarantee accuracy, consistency, and depth of indexing
⢠Two patents â 21 granted claims
18. Data Harmony
⢠Java
⢠Platform independent
⢠Runs in proprietary "browser"; uses Java in Operating System, not
browser applets
⢠APIs, Web services to interact with other apps
⢠XML
⢠TCP/IP over intra and internets
⢠SSL option included
⢠JSON option for API returns
⢠WebStart or installation app to simplify client installation
⢠GlassFish and TomCat for web app extensions
www.dataharmony.com
19. Data Harmony Suite - Main Modules
â˘M.A.I.
â˘Thesaurus Master
â˘XIS
â˘XML Intranet System
â˘Administrative configuration module
â˘âThe Data Harmony Suiteâ
21. Data Harmony
⢠Machine Aided Indexing (M.A.I.)
⢠Semantic, syntactic, morphological, etc. layer
⢠Rule Builder for users
⢠Concept Extractor for text
⢠Statistics for Machine Learning
⢠Use in automatic, batch, or assisted mode
⢠Thesaurus Master
⢠For creating taxonomies, thesauri, ontologies, and authority files
⢠MAIstro
⢠Thesaurus Master and M.A.I. combined
⢠AND
⢠A bunch more modules!
22. TaxoDiary
â˘Daily Blog â Melody Smith and the rest of Heather
Kotulaâs team
â˘Weekly Feature
â˘3 + items per day
â˘5 days a week
â˘Big archive
â˘Launched in June 2010
23. TaxoGene
⢠The Human Genome Project lists 22,300 genes
⢠There are an average of 19 synonyms per gene name
⢠Bringing these together to auto index to the preferred
name
⢠Auto API call to the TaxoGene
⢠Licensed at $3895 per Year
24. TaxoBank
⢠2000 taxonomies listed
⢠Open access and deposit
⢠Terms of use included
⢠Reuse or update instead of build from scratch
25. Access Integrity (Ai2)
⢠Medical Claims Compliance
⢠Automatic ICD-10 suggestions
⢠Rules bases for
⢠CPT
⢠HCPCS
⢠ICD-10
⢠Accurate, deep, consistent coding
⢠Making medical billing efficient
⢠Based on the patient encounter / physicians notes
28. 3.14 v 1058+
⢠This means 1058 revisions and improvements since v 3.13
⢠Lots of little improvements
⢠A few big new features
⢠Most increases are in managed services
29. Deprecated Terms
⢠New status for thesaurus terms
⢠Additional view added for terms with deprecated statuses
⢠Behavior
⢠Used for legacy indexing
⢠Rule saves disabled (cannot create new rules for Deprecated Terms)
⢠Import Options (no default identity rule built on import)
⢠Projects prior to 3.14 will not display deprecated terms unless changing one
line in the project configuration file.
⢠Added ability to import and export terms with deprecated status.
⢠Setting in Admin module for choosing to skip deprecated terms during M.A.I.
(âyesâ to skip is the default setting)
30. 1. Deprecated Term Status
in the Term Record Pane
3. Saving or changing a rule with a Deprecated term within a USE
statement will produce an error, signifying the editor to resolve the
term in the rule base or refrain from editing the current rule
2. Deprecated Terms
view â Produces an
alphabetical listing of
all terms with
deprecated status.
Functions similarly to
Candidate Terms view.
31. Deprecated Terms
⢠Can choose to index with deprecated terms as though their statuses were
Candidate
⢠A new "Deprecated View" is now listed in the View options (under Candidate
Terms option).
⢠A term is switched to "deprecated" with simple click. If it has rules the editor will
popup and ask the editor to handle them (either delete or edit to remove the
term).
⢠If a rule contains a deprecated terms it will not validate.
⢠When importing a new term as deprecated it won't automatically add a new
"identity rule" as we do with other "regular" terms.
⢠Added support for import and export.
32. New XIS Applet - MAI-rerun on re-index
⢠New XIS app declared within the schema to update MAI on all records
when re-indexed
33. Suggested Terms API changes
Format (JSON or XML)
â˘XML
Changes level
â˘Weights of terms can be âboostedâ depending on
the field
â˘Number of terms returned
â˘Allows Full path indexing
34. New DH APIs and Enhancements
Added multiple options to the suggestTerms API
1. Format (JSON or XML)
2. Boost Weighting of Terms
3. BatchLimit,
4. Use fields (to return with MAI terms)
5. Fullpath
6. Highlight (inlineTagging)
7. Capture (save received data or no)
8. SaveToXis (xisProject, xisDocset, xisUser)
9. Specify maximum number of returns
Added Logging API for every MAI call
example of suggestTerms
{
"format" : "XML",
"weight" : 3,
"batchLimit" : 1000,
"fields" : [
"BT",
"NT",
"RT"
],
"saveToXis" : true,
"fullpath" : true,
"hilite" : false,
"xisProject" : "PLOSfilter",
"xisDocset" : "records",
"xisUser" : "editor"}
35. suggestTerms Weighting (Boost)
By changing the boost value for multiple
fields, we see the MAI suggested returns
in the output are skewed higher towards
terms that appear in highly boosted
sections such as article titles.
{
"boosts": [
{
"type": "xpath",
"value":
"/doc/section-title/title",
"boost": 5
},
{
"type": "regex",
"value":
"<abstract>.*?</abstract>"
,
"boost": 2
},
{
"type": "xmlTag",
"value": "footer",
"boost": -10
}
]
}
36. Special Character Extensions
⢠Single quotes, ampersands, greater than and less than symbols, etc.
⢠Formerly not been allowed in the MAIstro syntax
⢠AI now allows import of most special character
⢠Apostrophes, representing possession are now recognized by the MAI parser.
⢠MAI will now correctly parse terms, mainly entity names, containing multiple
special characters including parentheses, commas, and periods.
â â & < >
37. Washington, D.C.
⢠Wrote a best practices section in the DH User Guide
⢠Periods or commas are followed by a whitespace
⢠MAI will correctly parse the text-to-match.
⢠Where they are followed by a space please see the section
recommending changing the padding characters setting in the Data
Harmony Administration Module.
38. Logging API
â˘Track how often the MAI server is called with an API
â˘Dates
⢠Timestamps
⢠IP addresses
39. DH 4.0 â the Dashboard
⢠Thesaurus Master
⢠MAI
⢠XIS
⢠Project Information
⢠Admin
⢠Support
⢠DiscoverEnt
⢠SentiScore
⢠TOPiCluster
⢠TermSpy
⢠Swift Summ
98. Image: Courtesy AACR and EJPress
Add a box:
âSuggest
New termsâ
Smart
Submit
99. ⢠Five Rule bases
⢠Identifies taxonomic concepts
⢠Controversial topics
⢠Suspect science
⢠Endangered species
⢠Bad call lines
⢠Clinical trials
⢠XIS powers a pre submission filtering application
⢠Used to help editors quickly review records
⢠Retains SciGen Analysis and other metadata information
Smart Submit
100.
101. Medical Plant Names Service
â˘From The Royal Botanical Gardens at Kew
â˘Nearly 28,000 Medicinal Plants
⢠Full records
⢠14.7 synonyms - average
⢠Know the right name and the actual use
â˘Offered on subscription as a API call for your data
102. Knowledge Organization Systems for Commerce
⢠NKOS, Linked data, academic apps, etc.
⢠But what about the things businesses use?
⢠Commerce apps
⢠Thin data
⢠Coded lists
⢠Need words and inferences
⢠Many applications in commerce
⢠Enabling search
⢠Enabling transactions
⢠Enabling purchase
103. E-Commerce transactions
⢠Use case
⢠How to index / tag everything
⢠On an online âstoreâ site, like Amazon, eBay, Walmart, Home Depot, B&H
Photo
⢠Or instore to enable search on a kiosk
⢠Or for purchase of services and supplies on a corporate website
⢠Map to UNSPSC or Ecl@ss for corporate transactions
⢠UNSPSC (United Nations Standard Products and
Services Code)
104. Others
KOS Platform
Code 101011
Inkjet Printers
UNSPSC
âComputer printersâ
43212104
Eclass
âInk jet printerâ
19140103
Other code sets
Product Code
Sets
Local
Stores
Local
Stores
Local
Stores
Local
Stores
Large Retailers
(Walmart, Target, etc.)
Brick and Mortar
Retailers
eBay
âPrinters,Computerâ
171961
eCommerce
Retailers
eBay
âPrinters, Inkjetâ
745677
eBay
âPrinters,Computerâ
171961
USAID
Federal Agencies
NASA
107. What Next?
Effective implementation of the master taxonomy
⢠A well maintained master taxonomy has multiple uses which can increase value
includingâŚ
108. Others
KOS Platform
Code 101011
Inkjet Printers
UNSPSC
âComputer printersâ
43212104
Eclass
âInk jet printerâ
19140103
Other code sets
Product Code
Sets
Local
Stores
Local
Stores
Local
Stores
Local
Stores
Large Retailers
(Walmart, Target, etc.)
Brick and Mortar
Retailers
eBay
âPrinters,Computerâ
171961
eCommerce
Retailers
eBay
âPrinters, Inkjetâ
745677
eBay
âPrinters,Computerâ
171961
USAID
Federal Agencies
NASA
A Knowledge Graph?
Or does it have to be an RDF Triples?
Certainly could be converted
110. Thesaurus Master with Knowledge
Graphs
URL Linking enabling a deeper ontological
understanding of your metadata
a.k.a.
Knowledge Graph Linking
111. Knowledge Graph
⢠Thesaurus Master will now link to outside knowledge stores
⢠Wikipedia
⢠DBPedia
⢠WebMD
⢠Mayo Clinic
⢠Also allow arbitrary knowledge stores
⢠In-house wikiâs
⢠Databases
⢠EtcâŚ
112. The Power of Knowledge Graphs
⢠The taxonomic motivation for knowledge graphs
⢠Mainly describes real world entities and their interrelations, organized in a
graph
⢠Defines possible classes and relations of entities in a schema
⢠Allows for potentially interrelating arbitrary entities with each other
⢠Covers nearly all topical domains
⢠Use-case motivations
⢠Named-entity disambiguation
⢠SPARQL Query integration
⢠Automated NLP algorithms that read text changes in the graph and produce
structured knowledge extracted from that text.
⢠truth maintenance to all inferred knowledge, regardless of source, so that
revisions to the graph maintain consistency with itself.
113. API Support
⢠Knowledge graph integration will include API Integration
⢠Allow access to graph relationships
⢠SPARQL Queries
⢠Truth relationships
⢠NLP (MAI) access to the graphs
⢠Subgraph associations as well
⢠When this is useful for an organization
⢠Curation of the knowledge store
⢠Semantic Extract, Transform, and Load
⢠On Demand Load
⢠Custom Views
⢠Enhanced search in the taxonomy
⢠Custom term inferences
⢠Rule refinement
As a vendor to content producers of all kinds, shapes, and sizes here is what I want you to believeâŚ
Data Harmony introduces the Deprecated status with the 3.14 release. Deprecated terms are used within thesauri to denote outdated or regressed terminology. The terms may include historical geographic states (e.g. âSoviet Union), may include outdated medical terminology (e.g. âBrightâs Diseaseâ formally used to classify âNephritisâ of the kidneys), or outdated expressions of terms in which the taxonomy team wishes to retain but does not want to remove the term and add as a synonym. Indexers and archivists may want to preserve the outdated terminology for historical or legacy purposes, so we added an option to include them, yet limit the way they impact indexing future content.
Functionally, the deprecated status acts similarly to the Candidate statuses. Both have dedicated views in Thesaurus Master in order for taxonomists to âresolveâ the term within the vocabulary. Both statuses have MAI options to ignore or include within the MAI suggested terms postings, and both statuses can be changed with the click of a button in the term record pane.
Finally, we added support for importing and exporting terms with Deprecated statuses . Terms with a Deprecated status, however, will not create identity rules for the term on import. Additionally, if a term is changed to Deprecated, users will not be allowed to edit or add rules which include a USE statement for the deprecated terms. These restrictions are to instruct taxonomy developers and managers to resolve future indexing of concepts which may have previously included the outdated vocabulary concepts yet must direct to a new term for future automated-indexing processes.
#1 on the upper left shows the radio button to change a termâs status in the thesaurus. To change the status, simply click Deprecated and the term record is saved as Deprecated.
#2 displays the Deprecated Terms view, listing each term with the deprecated status. This will allow users to sort through a list of deprecated terms and resolve each of their rules to redirect to another term or to remove a rule outright.
#3 shows the validation step in which a user attempts to save or add a rule which includes a deprecated concept. MAI will instruct the user that saving a rule including a deprecated term is invalid.
We have added multiple enhancements and rolled several smaller API calls into the suggest Terms API. This gives us and users more streamlined options to call on the Data Harmony suite for indexing content.
Data Harmony now fully supports JSON calls alongside XML output.
Weighting, or boosting, can now be performed at the section level of a document and adds a multiplier to the suggested returns value. For instance, terms discovered in the title would be skewed more heavily than terms in the body if the weightings were changed.
Suggest Terms can specify the maximum limit of suggested terms returned without reconfiguring the project from the Admin module.
Fullpath can specify the ideal-full-path of the descriptor as it appears in the taxonomy
Highlighting options provide inline-tagging of the descriptors within the text
Hereâs an example of how the boosting would appear in an API.
Special Character Extensions -Special characters such as single quotes, ampersands, greater than and less than symbols, etc. have not been allowed in the MAIstro syntax. They are however very important in chemical nomenclature, as they are in place names such as Washington, D.C.
AI now allows import of most special characters
Apostrophes, representing possession are now recognized by the MAI parser.
MAI will now correctly parse terms, mainly entity names, containing multiple special characters including parentheses, commas, and periods.
Wrote a best practices section in the DH User Guide to deal with the variation in the use of Periods and commas. As long as neither the periods nor commas are followed by a whitespace, MAI will correctly parse the text-to-match. Where they are followed by a space please see the section recommending changing the padding characters setting in the Data Harmony Administration Module.