SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Pondicherry University
Dhatchayani M
Department: LIS
Course: MLIS, 2ND Year
Automatic indexing is indexing made by algorithmic procedures. The
algorithm works on a database containing document representations (which
may be full text representations or bibliographical records or partial text
representations and in principle also value added databases). Automatic
indexing may also be performed on non-text databases,
e.g. images or music.
This statistical technique: Involves
(1) the determination of certain probability relationships between individual
content-bearing words and subject categories, and
(2) the use of these relationships to predict the category to which a
document containing the words belongs.
The basic and simplest concept of automatic indexing developed in
the 1950s was the KWIC or Keyword in Context index based on
permutations of significant words in titles, abstracts or full text --
manipulated by machine. The first major report on the application of this
indexing concept occurred at the International Conference on Scientific
Information (ICSI) held in Washington, D. C. in November of 1958. The
paper was not the sensational product; the actual demonstration of the
method was the sensation of the conference.
 At the risk of getting ahead of ourselves and in view of the obvious
information explosion that our scientific and intelligence communities surely
face, let us point out what successful automatic indexing could mean.
 First, we seem to be rapidly approaching the time when along with the
printed page there will be an associated tape of corresponding information
ready for direct input to a computing machine.
 This means that as each organization receives its daily incoming documents
a machine could read them and route them directly to the proper users. The
users could describe their
 Information needs in terms of "standing" requests and on the basis of these
a machine could determine how the incoming "take" should be
disseminated. Since automatic dissemination is only a special aspect of a
mechanized library
 System, it follows that automatic indexing also would allow incoming
documents to be indexed and thus identified for subsequent retrieval.
 Basic Notions: This approach to the problem of automatic indexing is a
statistical one. It is based on the rather straightforward notion that the
individual words in a document function. The fundamental thesis says, in
effect, that statistics on kind, frequency, location, order, etc.,
 Words and Predictions: Concerning the selection of clue words, how
shall we decide which words convey the most information, how many
different words should be used, etc.? Clearly, certain content-bearing words
such as "electron" and "transistor" are better clues than logical type words
such as "if", and "then", etc.
 The Empirical Test: First a corpus of documents was selected and
indexed using a set of subject categories created for the purposes of the
experiment. The design, execution, results and evaluation of this test are
examined in the following sections.
Automatic indexing is the process of analyzing an item to extract the
Information to be permanently kept in an index. This text categorizes the
indexing techniques into statistical, natural language, concept, and hypertext
linkages.
 Statistical strategies: Statistical strategies cover the broadest range of
indexing techniques and are the most prevalent in commercial systems. The
words/phrases are the domain of searchable values.
 Natural Language: Natural Language approaches perform the similar
processing token identification as in statistical techniques, but then
additionally perform varying levels of natural language parsing of the item
(e.g., present, past, future actions).
 Concept index: Concept indexing uses the words within an item to
correlate to concepts discussed in the item. This is a generalization of the
specific words to values used to index the item.
 Hypertext linkages: Finally, a special class of indexing can be defined
by creation of hypertext linkages. These linkages provide virtual threads of
concepts between items versus directly defining the concept within an item.
Conclusion:
 Automatic indexing is the preprocessing stage allowing search of items
in an Information Retrieval System. Its role is critical to the success of
searches in finding relevant items. If the concepts within an item are not
located and represented in the index during this stage, the item is not
found during search. Some techniques allow for the combinations of
data at search time to equate to particular concepts (i.e.post co-
ordination).
Thank you

Weitere ähnliche Inhalte

Was ist angesagt?

Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
Nanthini Dominique
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
ankur bhalla
 

Was ist angesagt? (20)

Term weighting
Term weightingTerm weighting
Term weighting
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEM
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
web mining
web miningweb mining
web mining
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Web content mining
Web content miningWeb content mining
Web content mining
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
Signature files
Signature filesSignature files
Signature files
 
Web mining
Web miningWeb mining
Web mining
 
Multimedia Information Retrieval
Multimedia Information RetrievalMultimedia Information Retrieval
Multimedia Information Retrieval
 
Digital library
Digital libraryDigital library
Digital library
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.ppt
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
 
Introduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsIntroduction to Information Retrieval & Models
Introduction to Information Retrieval & Models
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture Notes
 

Ähnlich wie Automatic indexing

Post 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docxPost 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docx
stilliegeorgiana
 
Post 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text miniPost 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text mini
anhcrowley
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrieval
unyil96
 
Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...
ijcsity
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language Processing
Waqas Tariq
 
Hendrik flash talk metadata creation 2010 05-19
Hendrik flash talk metadata creation 2010 05-19Hendrik flash talk metadata creation 2010 05-19
Hendrik flash talk metadata creation 2010 05-19
Trinity College Dublin
 

Ähnlich wie Automatic indexing (20)

Hci
HciHci
Hci
 
Content analysis
Content analysisContent analysis
Content analysis
 
Content analysis
Content analysisContent analysis
Content analysis
 
Technical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search EngineTechnical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search Engine
 
Post 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docxPost 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docx
 
Post 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text miniPost 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text mini
 
A Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And ApplicationsA Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And Applications
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrieval
 
Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...
 
Text mining
Text miningText mining
Text mining
 
G04124041046
G04124041046G04124041046
G04124041046
 
Empowering Search Through 3RDi Semantic Enrichment
Empowering Search Through 3RDi Semantic EnrichmentEmpowering Search Through 3RDi Semantic Enrichment
Empowering Search Through 3RDi Semantic Enrichment
 
Information extraction using discourse
Information extraction using discourseInformation extraction using discourse
Information extraction using discourse
 
Classification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern MiningClassification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern Mining
 
Hypertext
HypertextHypertext
Hypertext
 
Social Media and Text Analytics
Social Media and Text AnalyticsSocial Media and Text Analytics
Social Media and Text Analytics
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language Processing
 
IJRET-V1I1P5 - A User Friendly Mobile Search Engine for fast Accessing the Da...
IJRET-V1I1P5 - A User Friendly Mobile Search Engine for fast Accessing the Da...IJRET-V1I1P5 - A User Friendly Mobile Search Engine for fast Accessing the Da...
IJRET-V1I1P5 - A User Friendly Mobile Search Engine for fast Accessing the Da...
 
Hendrik flash talk metadata creation 2010 05-19
Hendrik flash talk metadata creation 2010 05-19Hendrik flash talk metadata creation 2010 05-19
Hendrik flash talk metadata creation 2010 05-19
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_Habib
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Automatic indexing

  • 2. Automatic indexing is indexing made by algorithmic procedures. The algorithm works on a database containing document representations (which may be full text representations or bibliographical records or partial text representations and in principle also value added databases). Automatic indexing may also be performed on non-text databases, e.g. images or music. This statistical technique: Involves (1) the determination of certain probability relationships between individual content-bearing words and subject categories, and (2) the use of these relationships to predict the category to which a document containing the words belongs.
  • 3. The basic and simplest concept of automatic indexing developed in the 1950s was the KWIC or Keyword in Context index based on permutations of significant words in titles, abstracts or full text -- manipulated by machine. The first major report on the application of this indexing concept occurred at the International Conference on Scientific Information (ICSI) held in Washington, D. C. in November of 1958. The paper was not the sensational product; the actual demonstration of the method was the sensation of the conference.
  • 4.  At the risk of getting ahead of ourselves and in view of the obvious information explosion that our scientific and intelligence communities surely face, let us point out what successful automatic indexing could mean.  First, we seem to be rapidly approaching the time when along with the printed page there will be an associated tape of corresponding information ready for direct input to a computing machine.  This means that as each organization receives its daily incoming documents a machine could read them and route them directly to the proper users. The users could describe their  Information needs in terms of "standing" requests and on the basis of these a machine could determine how the incoming "take" should be disseminated. Since automatic dissemination is only a special aspect of a mechanized library  System, it follows that automatic indexing also would allow incoming documents to be indexed and thus identified for subsequent retrieval.
  • 5.  Basic Notions: This approach to the problem of automatic indexing is a statistical one. It is based on the rather straightforward notion that the individual words in a document function. The fundamental thesis says, in effect, that statistics on kind, frequency, location, order, etc.,  Words and Predictions: Concerning the selection of clue words, how shall we decide which words convey the most information, how many different words should be used, etc.? Clearly, certain content-bearing words such as "electron" and "transistor" are better clues than logical type words such as "if", and "then", etc.  The Empirical Test: First a corpus of documents was selected and indexed using a set of subject categories created for the purposes of the experiment. The design, execution, results and evaluation of this test are examined in the following sections.
  • 6. Automatic indexing is the process of analyzing an item to extract the Information to be permanently kept in an index. This text categorizes the indexing techniques into statistical, natural language, concept, and hypertext linkages.  Statistical strategies: Statistical strategies cover the broadest range of indexing techniques and are the most prevalent in commercial systems. The words/phrases are the domain of searchable values.  Natural Language: Natural Language approaches perform the similar processing token identification as in statistical techniques, but then additionally perform varying levels of natural language parsing of the item (e.g., present, past, future actions).  Concept index: Concept indexing uses the words within an item to correlate to concepts discussed in the item. This is a generalization of the specific words to values used to index the item.
  • 7.  Hypertext linkages: Finally, a special class of indexing can be defined by creation of hypertext linkages. These linkages provide virtual threads of concepts between items versus directly defining the concept within an item. Conclusion:  Automatic indexing is the preprocessing stage allowing search of items in an Information Retrieval System. Its role is critical to the success of searches in finding relevant items. If the concepts within an item are not located and represented in the index during this stage, the item is not found during search. Some techniques allow for the combinations of data at search time to equate to particular concepts (i.e.post co- ordination).