Relevance of clasification and indexing

Relevance of classification and indexing
in the organization of internet resources

 The general opinion is that the digital age wipes the
centuries old library system.
 There is a feeling that libraries and librarians are obsolete in
present digital era.
 Two questions generally faced by the LIS professionals are:
 ‘What will be the future of libraries?’
 ‘Why organization of information if you can find it on the
internet?’

Will Sherman: 33 Reasons why libraries and librarians are still
important (http://www.degreetutor.com/library)
 Not everything available on the internet
 Digital libraries are not the internet
 Internet compliments libraries but does not replace
 The internet is not free
 Digitization does not mean destruction, infact means survival
 Libraries are not just books
 Like business, digital libraries still need human beings
 Eliminating libraries would cut short cultural evolution
 Internet is a mess while libraries organize knowledge

Librarians employed three important tools for K.O. They are:
 Data element directory (Cataloging Manual)
 Classification Scheme for categorization of the documents; and
 Thesaurus (vocabulary control tool) for consistent indexing (assigning
index terms)
The web has grown without any of these tools, so unorganized
(Devadason, F.J. Facet analysis and semantic Web: Musings of a student of Ranganathan
http://www.reocities.com/Athens/5041/FASEMWEB.html)

However the issue is:
Enormous quantity of information outside libraries
How to collect and organize world’s knowledge?

TRADITONAL WEB BASED
 Classification – shelf arrangement  Search engines
 Catalogue – identification and  Subject gateways
location of information  Directories
 Analysis & consolidation-Indexing /
abtractingfor micro documents Result: The web is a sea of all kinds of data
Result: -improved precision or recall - difficult to find, access &
-provide context for search terms retrieve pertinent information
- enable browsing -extremely unorganized data
- access to related information with - Too many false and missing links
meaningful relationships Eg Building and architecture
-serve as a mechanism for switching Travel and hotel
between languages.

Difference: Use of subject descriptors

 Directories - Could not cope with the scale of Web growth
- Were often built by amateurs in classification and vocabulary management
- Were biased by the commercial use of the Web
 Vocabularies
- Open Directory categories
- Wikipedia categories
- Metadata in html <head>
- Spammed, not in sync with the content
- Ignored by most search engines now
- Bottom line : The Web is not and will never be an organized library
(Bernard, V. Porting library vocabularies to the Semantic Web, and back A win-win round trip. IFLA 2010,
Gothenburg)

Eg. Works on M. K. Gandhi

Library - The art of librarianship has been used for thousands of years to
organise knowledge – catalogue/ librarian – class no. – shelf
Search engines - collections are built by robots; number count
- aim for exhaustive indexing;
- offer automatically generated metadata
Subject gateways - collections are built by humans
- aim to develop catalogues of high quality resources
- offer human generated metadata

Can we apply classification principles?
Can we apply Metadata?
Can we apply indexing techniques?

 Two distinct ways of finding resources on the Internet emerged
(Dodd 1996).
- the use of robot or spider based search engines and
- producing ‘hotlists’, which would encourage users to
browse the Web.
 This production of hierarchically arranged lists brought in the
use of Library classification schemes
 Subject directories like Yahoo! and other quality controlled
subject gateways started use of classification schemes to
enhance searching the Net.
 They maximize the retrievability / visibility of information:
clustering, browsing. e.g. LIS education through distance mode

 Electronic versions of classification schemes (Web Dewey, UDC Online)
made it to adopt them on the web.
 The Web, as an information environment, differs from the controlled
setting of a traditional information retrieval system
 How and to what extent a classification is actually used to support
subject access on web.
 Many Web sites, like Google and Yahoo, use hierarchical classification
trees to organize text resources in Web.
 Subject gateways offer hierarchical browse structures based on subject
classification schemes.

The DDC was adapted earlier and more quickly to usage in digital systems via
the Internet.
It is completely and easily available as "WebDewey" for all Web browsers and
platforms.
Examples:
 Library and Archives Canada (LAC) has capitalized on the Dewey Decimal
Classification (DDC) potential for organizing Web resources in two
projects.
 ADAM, the Art, Design, Architecture & Media Information Gateway
 Biz/ed is a subject gateway for business education
 BUBL uses the Dewey Decimal Classification system as the primary
organisation structure for its catalogue of Internet resources.
 National Library of Canada's Canadian Information by Subject service

 Since 1993UDC has been in subject gateways and become more prevalent
in East European SGs, portals and hubs since 2000

 UDC in SGs appeared to be linked to the following types of applications:
 manual classification of manually collected links on small to medium-size
directories (from a few hundred to a few thousand resources)
 manual classification of a large number of automatically harvested resources
using harvesting and metadata creation tools and more advanced technology
(quality controlled SGs)
 automatic harvesting and classification (quality controlled SGs)

(Aida Slavic. UDC in subject gateways: experiment or opportunity? Knowledge Organization, 33, 2006)

Examples:
 WAIS (Wide Area Information Server)
 NISS (National Information Services and System )
 INTUTE
 FVL (Finnish Virtual Library )
 GERHARD (German Harvest Automated Retrieval and Directory)
 PORT (Maritime Information Gateway)
 OKO (Slovenian catalogue of Web resources ) etc
But they are not displaying the UDC structure on the interface or UDC
numbers in the metadata.
The UDC is probably more "modern" and has made faster progress towards
a faceted structure.

 Descriptive metadata is to facilitate discovery of relevant information.
 In addition to resource discovery, metadata can help organize electronic
resources, facilitate interoperability and legacy resource integration,
provide digital identification, and support archiving and preservation.
 The process is automatic and cost effective
 In descriptive metadata, the medium of that resource becomes a non-
issue.
 This enables DC metadata to be used by any organizations for
cataloguing specialized types of mixed-media collections

 Pre and post coordinated; Derived and assigned; context based;
Thesaurus and classaurus (Classaurus is a faceted scheme of terms
indicating hierarchy enriched with synonyms)
 Two concepts - Semantics and syntax
 Purpose – achieve precision out of recalled information
 Humans can do it since it is natural language
 Machines – ignorant and can’t make any sense
How to achieve precision out of recalled information of the Web?

 Relationships – categorized as
 Hierarchical (internal) – whole – part composition
 Non hierarchical (external) – associative and equivalent
 Application in different areas
 Design of classification (thesaurus)
 Knowledge organization and Information retrieval (search strategies)
 Lexical cohesion
 Epistemology etc
 Design and development of databases
 Web design and development
 Artificial intelligence
 Text analysis and summarization
 Hypermedia

 Creating representation of Web pages
 Providing standard identifiers (URI) associated to access protocol (http).
 The WWW is based on HTML / XML hierarchies for coding a body of text
and images (multi media) and linking things together Via http protocol,
hypertext etc
 Use of vocabularies as subject descriptors to organize Web content as in
libraries

 Taxonomies, subject headings, classifications
- That’s where library heritage is strong and the Web is weak
- Such vocabularies can be structuring for the web of data as they are
for libraries
- But it is more than in a library – the process should be automated
 Semantic enhancement of scholarly journal articles, by aiding publication
of data and metadata and providing ‘lively’ interactive access is
necessary
 Such semantic enhancements are already being undertaken by leading
STM publishers
 Application of structured vocabularies, of course using artificial
intelligence, is the ‘semantic Web’

Tim Berners-Lee: Computer Scientist at MIT, USA.;WWW Creator; Director of
W3Consortium; Developer of Semantic Web
 Intention: to enhance the usability and usefulness of the web and its connected
resources.

“I have a dream for the Web [in which computers] become capable of analysing all the data on the
Web – the content, links, and transactions between people and computers. A ‘Semantic Web’,
which should make this possible, has yet to emerge, but when it does, the day-to-day
mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to
machines. The ‘intelligent agents’ people have touted for ages will finally materialize.”
—Tim Berners Lee, 1999

 Technologies enabling machines to make more sense of the Web making the Web
more useful for humans.
 This means radically improving ability to find, sort, and classify information: an activity
that takes up a large part

 The Semantic Web is a project that intends to create a universal medium for
information exchange by putting documents with computer-processable
meaning (semantics) on the World Wide Web.
 “The Semantic Web is an extension of the current Web that will allow you to
find, share, and combine information more easily. It relies on machine-readable
information and metadata expressed in RDF.”
www.noisebetweenstations.com/personal/essays/metadata_glossary/metadata_glossar

 Humans can easily connect the data when browsing the Web…for e.g. we
disregard advertisements, we know the links that are interesting for our purpose
(job –resume; air ticket – flights)… but machines can’t!
Eg. automatic airline reservation can done (Ivan Herman, W3C) combining the local
knowledge with remote services: airline preferences; dietary requirements;
calendaring
For e.g. a computer can find the nearest plastic surgeon and book an appointment
that fits a personal schedule.

 XML provides a surface syntax for structured documents, but imposes no
semantic constraints on the meaning of these documents.
 XML SCHEMA is a language for restricting the structure of XML
documents.
 RDF is a simple data model for referring to objects (“resources") and how
they are related. An RDF-based model can be represented in XML syntax.
 RDF Schema is a vocabulary for describing properties and classes of RDF
resources, with semantics for generalization-hierarchies of such properties
and classes.

 OWL adds more vocabulary for describing properties and classes: among
others, relations between classes (e.g. disjointness), cardinality (e.g.
"exactly one"), equality, richer typing of properties, characteristics of
properties (e.g. symmetry), and enumerated classes.
 URI – Universal Resource Identifier - used as universal naming tools,
including for properties
 NAME SPACE is a context in which a group of one or more identifiers
might exist. An identifier defined in a namespace is associated with that
namespace. E.g. Employee ID 123. Many modern computer languages
provide support for namespaces.

 All these are based on knowledge representation algorithms, say week AI.
 The primary facilitators of this technology are URIs which identify resources
along with XML and namespaces.
 These with a bit of logic form RDF, which can be used to say anything about
anything.
 FOAF: A popular application of the semantic web is Friend of a Friend or
(FoaF), which describes relationships among people and other agents in
terms of RDF.

 The web is changing and offering new possibilities for communication
and interaction by combining the concepts on the web. This is made
possible by XML
 XML provides an interoperable syntactical foundation that facilitates to
represent relationships and built meanings

 RDF is an XML based standard for describing resources that exist on
the web.
 RDF is a model for such relationships and Interchange
 RDF is the standard interchange format on the semantic web. Once
information is in RDF form, it becomes easy to process it, since RDF is
a generic format.
 It is a model of (s p o) triplets with p naming the relationship between
s and o
 RDF is a graph: i.e., a set of RDF statements is a directed, labeled
graph
- the nodes represent the resources that are bound
- the labeled edges are the relationships with their names

 With an RDF application, it is easy to know which bits of data are the
semantics of the application, and which bits are just syntactic fluff.
 RDF statements describe a resource, the resources properties and the
values of the properties.
 RDF statements are often refer to as “triples” that consist of a subject,
predicate and object which correspond to a resource (subject), a
property (predicate) and a property value (object)

 This piece of RDF basically says that this article has the title "The Semantic Web:
An Introduction", and was written by someone whose name is "Sean B. Palmer".
Here are the triples that this RDF produces:-
<> <http://purl.org/dc/elements/1.1/creator> _:x0 . this
<http://purl.org/dc/elements/1.1/title> "The Semantic Web: An Introduction" .
_:x0 <http://xmlns.com/0.1/foaf/name> "Sean B. Palmer" .

<rdf:Description rdf:about="http://www.ivan-herman.net">
<foaf:name>Ivan</foaf:name>
<abc:myCalendar rdf:resource="http://…/myCalendar"/>
<foaf:surname>Herman</foaf:surname>
</rdf:Description>

 URI is simply a web identifier like the strings starting with “http:”
“ftp:” Anyone can create a URI and the ownership of them is clearly
delegated so they form ideal base technology to build a global web.
 Resources on the web are identified by URIs, which uses a global
naming convention.
 The W3C maintains list of URI schemes.
 The URI-s made the merge possible
 URI-s ground RDF into the Web
 URI-s make this the Semantic Web

 Ontological analysis clarifies the structure of knowledge
 Defined as the terms used to describe and represent an area of

knowledge.
 These are explicit specifications of a conceptualization
 The ontology is the study of the ‘categories, of things that exist or
may exist in some domain’.
 A common ontology defines the vocabulary with which queries and
assertions are exchanged among agents.
 These are the rules that help integration and operate on globally
shared theory
 Often equated with taxonomic hierarchies of classes but need not be
limited to this form as it adds knowledge about the word

 The semantic Web is generally built on syntaxes which use URIs to
represent data, usually in triples based structures i.e. many triples of URI
data that can be held in databases, or interchanged on the WWW using a
particular syntax developed especially for the task. These syntaxes are
called “Resource Description Framework” Syntaxes.
 The application of Semantic Web is to create relations among resources
on the Web and to interchange those data, like (hyper) links on the
traditional web, except that:
- there is no notion of “current” document; ie, relationship is between any
two resources
- a relationship must have a name: a link to my CV should be
differentiated from a link to my calendar
- there is no attached user-interface action like for a hyperlink

 Map the various data onto an abstract data representation make the
data independent of its internal representation…
 Merge the resulting representations
 Start making queries on the whole!
queries that could not have been done on the individual data sets

 Web lacks the coordination and organization of a traditional library.
 It has been practiced and proved that the use of traditional library tools
and techniques could be a great help in taming the Net.
 The IFLA Information Technology section, with support of Cataloguing
section, Classification and Indexing section, and Knowledge
Management section, proposes the creation of a Semantic Web Special
Interest Group (SWSIG) within IFLA.
 The SWSIG intends to be a platform where interested professionals
could gather, and undertake whatever tasks are needed to develop,
enhance and facilitate the adoption of semantic Web technologies in the
library community.
 Librarians should start research projects to develop better techniques of
organizing the web. Modern classification research must find order
especially in the context of complexities of the Internet

Relevance of clasification and indexing

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (8)

Similar to Relevance of clasification and indexing

Similar to Relevance of clasification and indexing (20)

Recently uploaded

Recently uploaded (20)

Relevance of clasification and indexing