This document discusses subject indexing languages. It defines subject indexing language as a set of controlled vocabulary terms and their relationships that are used to describe the concepts in documents. There are three main types of indexing languages: natural language, which uses terms directly from the document; controlled vocabulary, which uses standardized terms from an authority list; and free indexing language, which uses any terms. The key aspects of subject indexing languages are that they allow concepts from documents to be represented in a structured way to facilitate information retrieval.
2. Introduction:
A subject is then any concept or combination of concepts
which is expressed in the document. The readers’ task is
to interpret the words and sentences in the document in
order to understand the concepts. Whether a reader
understands a document depends on how precisely the
author expresses the concepts he refers to and whether
the reader is aware of the concepts the author expresses.
The basic idea is that the concepts exist before the
author writes the document and the reader reads the
document.
3. • Similarly, the indexer’s task is to identify concepts in the
document and re-express these in indexing terms. This is
done first by establishing the subject content, or in other
words the content of concepts in the document.
Thereafter the principal concept presented in the
subject content is identified, and finally, the concepts
are expressed in the indexing language. The indexing is
successful when the document and the indexing term
express the same concepts.
4. What is indexing languages?
The term ‘indexing languages’ may be understood as same as the term
‘indexing’ in the broader sense, that is, in a general sense.
Indexing language is a set of items (vocabulary) and devices for
handling the relationships between them in a system for providing
index descriptions. Indexing language is also referred to as retrieval
language.
Indexing language is the process of creating set of vocabularies that
helps to provide access to objects of information, books,
documents, articles, etc. Like any other language, it will consist of
two parts: vocabulary and syntax.
this process of creating and providing access to objects of
information could either be manual or through computer
technology.
5. The above definitions will help us define indexing language
in the following different ways:
• As terms or vocabularies used to represent
document or content of document which are extracted from
document text or assigned from authority list adhering some
process or techniques
• Serving as access points for searching
• Possibly being extracted or derived from document text:
natural language
• Possibly being assigned from authority control list:
controlled vocabulary
6. So in a nutshell
A system for naming subjects using subject-terms or
vocabularies and also devices for handling the
relationships between them to provide a systematic
index descriptions is called an indexing language.
Like any other language, it will consist of two
parts: vocabulary and syntax.
7. Again, we need to understand that:
If we use terms or vocabulary as they appear in documents
without modification, we are using natural language.
However, using natural language always may lead to
problems. Because, as per as vocabulary is concerned,
different authors may use different terms to express the same
idea or they may use synonyms to express same idea. If that is
so, it will lead to a decrease in recall while searching with any
one term (idea) appear in documents which is against the
whole purpose of indexing and retrieval.
For example: the same idea may be expressed in more than
one way as per syntax is concerned, like : paediatric or child
disease; geriatric or health care of old people; child
psychology or psychology of children; adult education or
education of adults.
8. For these reasons, assigned indexing systems introduce a
measure of control over the terms used: we use a controlled
vocabulary.
We also formalize a flexible syntax of natural language by
permitting only certain constructions, as for example, instead of
heat treatment of aluminium, we use aluminium-heat
treatment; instead of using libraries for children, or children’s
libraries, we use libraries, children’s. This is what called using
a structured language or controlled vocabulary
A controlled vocabulary and formalized structure are features of
an artificial indexing language.
The extreme example of an artificial indexing is the notation of
a classification scheme; instead of natural language terms, heat
treatment of aluminium, or the more formalized aluminium-
heat treatment, we use 669.71.04.
9. Once the subject analysis of the document is
completed, the final step is to represent the selected
concepts in the language of indexing system (as index
entries). The indexer should be familiar with the
indexing tools, and their working rules and procedures
in order to ensure that concepts are organized in a
usable and accessible form. The process of subject
indexing involves basically three steps:
Familiarization => Analysis => Representation
10. Let us now look at how indexing languages are actually
conceptualized and created.
All indexing languages originate as natural language, or the
language found in documents. Natural language does
not refer to writing style, but to the fact that the
language is not under authority control.
Language under authority control is called controlled
vocabulary. There is nothing special about the words in
controlled vocabulary except the fact that they are
standardized for use in certain systems.
11. The following diagram illustrates the processes involved in
translating natural language (NL) terms into controlled
vocabulary (CV) terms for entry in database records.
The diagram helps explain why . . .
1. Natural indexing languages are also called derived-term
approaches
2. Controlled indexing languages are also called assigned-
term approaches
12. Abstracting and Indexing Process
Processes Involved in Translating Natural Language Terms
into Controlled Vocabulary
Full-Text
Document
Abstract NL Record Field
NL Record Field
CV Record FieldAuthority File
Natural Language
Controlled
Vocabulary
Enter in
Enter in
Enter inChose from
Write into
13. To review, subject analysis requires you to
1. become familiar with document content;
2. extract significant concepts and terms;
3. translate extracted terms into the language—
often controlled—of the system; and
4. formalize the terms (format them, etc.) according
to input rules.
14. Types of Indexing languages
As the above discussion suggest, there are three
types of Indexing language
Natural Language or Natural indexing language
Controlled Vocabulary or Controlled indexing language
Free indexing language
15. Natural indexing language:
• This is a slightly broader language in which the description of the
document can be done using any of the terms present in the
document. Any term that is used to define or describe the content
within a document is known as a ‘subject term’. That is why
indexing language some time is called ‘subject indexing’ of
‘subject indexing language’.
• In Natural indexing language, a subject term can be used to
describe/search for a specific document based primarily on its
content.
• A subject term may also be described as a compact synonym or
surrogate for a specific subject representation.
16. Controlled indexing language:
• Controlled indexing language refers to the indexing language in
which only approved terms are allowed to be used to describe the
document. These subject terms are controlled vocabulary under
subject authority file.
• For subject terms under authority control (or vocabulary control), a
subject authority file or list . . .
may be described as a list of terms that are permitted to be
used in describing or representing specific subjects
May be said to standardize one of two synonyms that are
used to assign or represent specific topics
May be used to determine the preferred term when multiple
terms are used to define or describe a single topic
May be used to provide cross references for terms that are
on par with, hierarchical or alternate in position or
relationships
17. • Cataloguing and indexing professionals have created
different subject authority control structures:
Subject headings lists are used by cataloguers in cases
where subject terms have been used as subject headings.
A thesaurus is used by indexers where subject terms are
known as descriptors.
18. Free indexing language:
As the name suggests, this type of indexing language brings
into use any term within or outside the document for its
description.
In today’s times, the searching mechanism and trends have
changed and there is a higher use of free text search. This
demands that the natural language with the highest
possible indexing ideally indexing every text be done. Of
course, whether free text search or expert-driven well-
chosen vocabularies is being done to check which is more
efficient is a matter of research.
19. Here's how the processes differs for natural language and
controlled vocabulary:
Natural language Controlled vocabulary
Terms are based on existing
vocabulary of documents (which
may be inconsistent)
Terms are based on standardized
vocabulary intended to describe
concepts consistently
Indexers / cataloguers extract
terms from documents and
enter them (or their own terms)
in various subject fields extract
terms from documents,
Indexers / cataloguers choose
appropriate authorized terms from
controlled vocabulary list, and
enter terms in designated
controlled vocabulary field
Searchers may enter any search
terms that are likely to occur in
natural language
Searchers must enter search terms
that are in controlled vocabulary
20. Basics of Subject Indexing
MEANING:
In the literature of LIS, the phrases subject cataloguing and
subject indexing are used more or less interchangeably. But
it should be understood that subject cataloguing is
intended to embrace only that cataloguing activity which
provides a verbal subject approach to library collections,
especially macro documents (i.e. books). It refers
determining and assigning of suitable entries for the
subject component of a document for use in a library’s
catalogue, i.e. subject catalogue is a representation of
documents. The primary purpose of the subject catalogue
is to show which books on a specific subject are possessed
by the library.
21. Subject indexing refers to that indexing activity
which provides a verbal subject approach to
micro documents (e.g., journal articles, research
reports, patent literature, etc.). Subject indexing
provides a subject entry for every topic
associated with the content of a micro
document, i.e. subject index is a representation
the knowledge expressed by documents
22. The representation of documents and the knowledge
expressed by them is one of the central and unique areas
of study within Library and Information Science (LIS) and
is commonly referred to as subject indexing. Subject
approach to information has been a long and extensive
concern of librarianship and is assumed to be the major
approach (access method) of users for a very long
period. Indexes facilitate retrieval of information in both
traditional manual systems and newer computerised
systems. Without proper indexing and indexes, search
and retrieval are virtually impossible.
23. A subject is then any concept or combination of concepts
which is expressed in the document. The readers’ task is
to interpret the words and sentences in the document in
order to understand the concepts. Whether a reader
understands a document depends on how precisely the
author expresses the concepts he refers to and whether
the reader is aware of the concepts the author expresses.
The basic idea is that the concepts exist before the
author writes the document and the reader reads the
document.