University of Gondar IE Course Extracts Key Information

UNIVERSITY
OF GONDAR
Faculty of natural and computational
science
DEPARTEMENT OF INFORMATION SCIENCE
COURSE TITLE:INFORMATION STORAGE AND
RETRIEVAL SYSTEMSS
COURSE CODE:INFO(461)
ASSIGNMENT TITTLE: INFPRMATION EXTRACTON

Acronyms
Introduction

Definition

of information extraction
Types of information extraction
Application of information extraction
Function of Information Extraction
The difference between IR and IE
Conclusion



IE-Information Extraction



IR-Information Retrieval



NE-Named Entity recognition.



CO-Co reference resolution



ST-Scenario Template production



TR-Template Relation construction



PR- Public Relation









Information Extraction (IE) is a technology based on
analysing natural language in order to extract snippets of
information.
It is the process takes texts as input and produces fixedformat, unambiguous data as output. This data may be used
directly for display to users.
The user would then read the documents and extract the
requisite information themselves. They might then enter the
information in a spreadsheet and produce a chart for a report
or presentation.
IE systems are more difficult and knowledge-intensive to
build, and are to varying degrees tied to particular domains
and scenarios.

Information Extraction (IE):is to automatically extract
structured information from unstructured and/or semi
structured documents.


It is system to analyses unrestricted text in order to extract

information about pre-specified types of events, entities or
relationships.


It is the automatic extraction of structured information from
unstructured documents.



It is systems to extract clear, factual information from
unstructured document. Roughly: Who did what to

whom when?


It is the task of automatically extracting structured
information from unstructured data and semi-

structured documents.

 Unstructured

data is a data which includes web

pages, text documents, office documents,
presentations, emails,…It doesn’t have a data model.
 It’s

also referred to as “dark matter“.

Information Extraction split into five types: these are
1.Named Entity recognition (NE) - The simplest and most

reliable IE technology.


This about identifying textual information relating to
people, organizations, places, brands, products and so on.



.

These are typically nouns and proper nouns.

2. Co reference resolution (CO)-it involves

identifying identity relations between entities in texts.
 These

entities are both those identified by NE

recognition and anaphoric references to that entities.

Conti......

3. Template Element construction (TE) - The TE task builds on
NE recognition and co reference resolution, associating
descriptive information with the entities.
4. Template Relation construction (TR)- Finds relations
between TE entities.


This helps IR systems to answer particular information-seeking
queries.

5.Scenario Template production (ST)-It Fits TE and

TR results into specified event scenarios. Scenario
templates (STs) are the prototypical outputs .




NE- is about finding entities;
CO- about which entities and references (such as
pronouns) refer to the same thing;




TE- about what attributes entities have;
TR- about what relationships between entities there
are;



ST- about events that the entities participate.

APPLICATION OF
INFORMATION EXTRACTION

1. Financial Analysts:- IE can enable analysts
to answer questions such as, How many
instances predicting strong performance for a

particular company are out there ?

2. Marketing Strategists:- IE can be used to create a range

of media metrics, for example the media distance, or extent
of collocation between concepts and products/companies.
3. Public Relation Workers (PR):-Public relations staff are
concerned to identify negative reporting events as quickly
as possible in order to respond .





Some of the function of IEs are:
To retrieving and storing structured data,
To transform unstructured data into something that can be
reasoned with.



To extract automatically structured information from
unstructured and/or semi-structured machine-readable
documents.

Information Extraction is not Information Retrieval.
Information Retrieval- refers to the human-computer interaction
(HCI) that happens when we use a machine to search a body of
information for information objects (content) that match our
search query.


It is used to reduce what has been called "information
overload”



Information Extraction-is to automatically extract
structured information from unstructured documents.



It refers to the machine's ability to automatically extract
structured information.

Generally,


IR is there to find relevant documents but,



IE is there to extract relevant information from the

documents










Information extraction systems search large bodies of
unrestricted text for specific types of entities and relations, and
use them to populate well-organized databases.
These databases can then be used to find answers for specific
questions.
The typical architecture for an information extraction system
begins by segmenting, tokenizing, and part-of-speech tagging
the text.
The resulting data is then searched for specific types of entity.
Finally, the information extraction system looks at entities that
are mentioned near one another in the text, and tries to determine
whether specific relationships hold between those entities.

University of Gondar IE Course Extracts Key Information

University of Gondar IE Course Extracts Key Information

Recommended

Recommended

More Related Content

What's hot

What's hot (16)

Viewers also liked

Viewers also liked (15)

Similar to University of Gondar IE Course Extracts Key Information

Similar to University of Gondar IE Course Extracts Key Information (20)

Recently uploaded

Recently uploaded (20)

University of Gondar IE Course Extracts Key Information