2. Name Entity Recognition (NER) is an information
extraction task that is concerned with the
recognition and classification of name entity from
free text. Name entities classes are, for instance,
location, person named, organization named,
dates and money amounts.
3. This Application is better in various aspects :-
=> Provides interactive U.I
user friendliness
As it is an easy to use program thus is quite time saving
also
It has all Deployable functionalities
4. The following diagram explains the interconnectivity of the
modules and their working.
Selection of
Data Set
Applying
Algorithm
Identify and
Classify NE’s
Display Result
5. The main functions the product must perform or must let the user
perform
1: User Self Service
User self-service is a subset within the knowledge management
software category and which contains a range of software that
specializes in the way information, process rules and logic are collected
and accessed through support interviews. This software allows people
to secure answers to their inquiries and /or needs through an
automated interview fashion instead of traditional search approaches.
2: Work Flow
A workflow consists of an orchestrated and repeatable pattern of
business activity enabled by the systematic organization of resources
into processes that transform materials, provide services or process
information. It can be depicted as a sequence of operations, declared as
work of a person or group and organization of staff, or one or more
simple or complex mechanisms.
6. 3 : Reporting and Diagrammatic Representation
With this approach to the articles in Communications, we better understand the
culture, identity and evolution of computing. With a view toward portraying its
value for institutional – identity data mining, we present several findings that
emerged from our N-Gram analysis.
4 : Extensibility
It is a software design principle defined as a system’s ability to have new
functionality extended, in which the system’s internal structure and data flow are
minimally or nor not affected, particularly that recompiling or changing the
original source code is unnecessary when changing a system’s behavior, either by
the creator or other programmers.
5: Application Interface- An application interface specifies a component in terms of
its operations, their inputs and outputs and underlying types. Its main purpose is
to define a set of functionalities that are independent of their respective
implementation, allowing both definition and implementation to vary without
compromising each other.
8. A new name entity class extraction method based
on association rules have been presented.
Comparing the method with maximum entropy
method. In the English corpus, under the
appropriate combination of types of rules it is
possible to improve the recall so that the
association rule method is strictly more effective
that the maximum entropy i.e. this result makes
our method particularly suitable for tasks whose
requirements emphasize the quality rather than
the quantity of results.
9. String Match Algorithm means scanning of one or
more generally, all the occurrences of a search string
in a given text. This paper helped to introduce a fast
string match algorithm in order to detect the exact
and like occurrences of the given pattern within
input string. In this paper , the sum of character’s
value of the string that needs to scanned has been
compared with the sum of the corresponding values
in the sliding window , from the experimental results
it will be concluded that novel algorithm is more
efficient than BM in many times, also the longer the
pattern , the bigger the performance improved.
10. Exact String Match Algorithm
Exact String Match Algorithm also called as called as string
search algorithm is an algorithm where we can find a place
where one or several patterns or strings are found within a larger
string or text i.e. String matching consists of at least one or may
more than one occurrence of a string or pattern in a text. The
strings considered are sequence of symbols, and the symbols are
defined by an alphabet. The size and the other features of
alphabet are important factors in designing of an algorithm.
11. Working of Algorithm
The text is scanned with the help of a window whose is
equal to m.
Firstly, the left end of the window and the text is aligned, and
then the characters of the window were compared with the
character of the pattern, generally called as attempt.
Then after the whole match or mismatch of the pattern,
window is shifted to the right.
The whole procedure is repeated until the right end of the
window goes beyond the right end of the text.
This mechanism is nothing but the sliding window
mechanism, where each attempt with position j in the text
when the window is positioned on y[j…j+m-1].
12. Pseudo Code
for i := 0 to n-1 {
for j := 0 to m-1 {
if P[j] <> T[i+j] then break
}
if j = m then return i
}
This pseudo code shifts along by one by one and tries to compare
corresponding character
14. Using Visual studio, sql server and .Net organizations can bring the functionality for
users to find the useful and interesting results from the last days article .
Dot Net will be used to create the front-end and application
interface that will be used by the user to access multiple
functionalities. This ensures that best graphical layout and
much more user friendly web page. We will create pages in dot net
which will have different pages for modular functions. Sql Server
will be used as the core backend and the database is stored in the
form of file in the system. Visual Studio will be used as the tool to
compile java programs. The algorithms and modification in the
pre- written VS toolkit code will be done in dot net.
The applications will ask users to proceed and select a feature to
perform action and the methods and algorithms will generate
results for the user.
15.
16.
17.
18.
19.
20. After successful execution of project, I found that
this project can be used for classification of
entities from free text to make the work of user
easily. Also it has been observed that the tool will
not work properly in case of redundant data i.e.
when we were trying to classify for money entity
and we wished to match for the string ‘money’ the
tool was unable to display the correct output.
21. This report has looked in detail at the major
techniques used for String match in any given text
Section I gave an overview of name entity
recognition and in particular the basic introduction
about the Document. Section II describes in detail,
various String Matching algorithms which are
mandatory to make this project a success. Then
Section III there is an overview about the functional
requirements and Diagrams making it easy for the
reader to understand the working of this project.
Section IV focuses on the test planning and
implementation tools and Thus a NER using N-
gram tool is ready.