People will go to extraordinary lengths to invent tools to save time. All of us would like to be in a place where machines can take over information indexing for us. But when is this possible, and when should it be avoided.
People will go to extraordinary lengths to invent tools to save time. All of us would like to be in a place where machines can take over information indexing for us. But when is this possible, and when should it be avoided.
People will go to extraordinary lengths to invent tools to save time. All of us would like to be in a place where machines can take over information indexing for us. But when is this possible, and when should it be avoided.
This chart attempts to examine the question of when is it appropriate to let machines or people perform indexing. It is not an exact science, so individual circumstances require an evaluation of all these factors, plus business factors such as ease of access to human indexers and to IT resources and funds. Broadly speaking, however, certain factors steer one in the direction of certain solutions. Factors that lean toward machine indexing: If the size of a data set is so large that it would be impossible to process it by humans then machine indexing may be the only solution, regardless of any qualitative factors. If the data set is fast moving and access to it is time-sensitive, then machine indexing can also be the preferred solution. Although small sets of fast moving data may be processed by humans. If the users are generalists or in pursuit of information for recreational purposes then they are likely to be more tolerant of noisy or incomplete results. Factors that lean toward human indexing: If the data is not at all machine-readable then human indexing may be the only solution. For example, photographs and video without any metadata or embedded speech may require human review. If the data contains subtle or abstract concepts then these may elude even the most finely tuned machines. For example, the ideas behind in the To be or not to be soliloquy in Hamlet are too subtle to be identified from textual analysis alone. If the users are experts for whom data is a mission-critical resource then they may require exceedingly high precision and recall which would demand either human indexing or an extremely high degree of human training and QC of the machine process. Factors that benefit either indexing method: If data is well structured within identifiable fields or metadata attributes then this structure provides context that will greatly assist machine indexing, but also help with human indexing. If data is on a homogeneous topic, such as a database of articles all about nuclear physics, will be easier to index than a database covering all disciplines and topics.
The human indexing process essentially involves three simple steps: Review the content one article / record at a time Search the controlled vocabularies to find the terms that best describe the content Either tag the content directly by adding the terms as metadata values within the CMS, or assign the indexing terms to the content item by using a separate index table / interface
Most of our user-base create their taxonomies in Synaptica and then integrate them with third-party automatic indexing tools. Others have determined that they need to perform human indexing and over the years they have developed a wish list of time-saving tools. (see bullets for wish list)
Most of our user-base create their taxonomies in Synaptica and then integrate them with third-party automatic indexing tools. Others have determined that they need to perform human indexing and over the years they have developed a wish list of time-saving tools. (see bullets for wish list)
Ten years ago the Synaptica software team productized this wish list and bundled all these features into a Synaptica package called IMS. IMS – the Indexing Management System – acts as an integration toolset between the taxonomy management system and content management system. It provides ready-made GUI screens, and also a suite of web services components that allow indexing functionality to be custom crafted into the CMS screenflow.
This slide illustrates the workflow for IMS as a component that sits between a CMS system and a taxonomy management system to assist the human indexing process.
This screen shot illustrates how indexing profiles can be created to streamline the indexing operation for particular sets of content. Many parameters can be configured such as user-access permissions, term expansion, access to particular vocabularies and facets, even the selection of individual sub-branches within a hierarchy.
We are actively working with a number of clients who are performing human indexing for selected data sets. Following are three “hypothetical” but realistic examples.