Comparison of the performance of Dependency Parsing Algorithms. Department of Computer Science, Indian Institute of Information Technology, Design & Manufacturing
Dependency Parsing Algorithms Analysis - Major Project
1. Under the guidance ofUnder the guidance of
Dr. Manish Shrivastava
By:
Bhuvnesh Pratap Singh(2006017)
Surya Prakash Rai(2006064)
2. Contents in the Store
What is Dependency Parsing ?
Problem Definition
Motivation
Objective
Conceptual Tour
Exploring Algorithms
Methodology
Results
Scope for further work
Q & A
3. What is Parsing ?
Parsing is the process of deducing the syntactic structure of aParsing is the process of deducing the syntactic structure of a
string. It a prerequisite for many natural language processing
tasks. It is used in applications such as Information Extraction
& Machine translation.
4. Dependency Parsing ?
Dependency parsing is a way of parsing where the parsing of a
sentence is performed by relating each word to other words in
the sentence which depend on it .
5. In terminology , a dependency relation holds between a Head and a
Dependent .
Alternative terms in the literature are Governor and regent for headAlternative terms in the literature are Governor and regent for head
and modifier for dependent .
6. Problem Definition
Till date four major dependency parsing algorithms have been
proposed viz. Covington projective , Covington non-projective , Nivre
arc-eager & Nivre arc-standard. Now the problem to be worked up onarc-eager & Nivre arc-standard. Now the problem to be worked up on
is the comparison of these different dependency parsing algorithms
on terms of accuracy and time complexity
7. Motivation
In some machine translation and natural language processing systems,
human languages are parsed by computer programs. Human
sentences are not easily parsed by programs, as there is substantial
ambiguity in the structure of human language.ambiguity in the structure of human language.
It is difficult to prepare formal rules to describe informal behaviour even
though it is clear that some rules are being followed
8. Moreover the task of data dependency parsing becomes highly
imperative in the light of the presence of free word order
languages around us just for example our own native languagelanguages around us just for example our own native language
Hindi
9. Objective
First understand the present major dependency parsing algorithms and
then implement the same on some platform so that a exhaustive
comparison can be drawn between these algorithms on the basis of
the time taken during learning and testing phase and then thethe time taken during learning and testing phase and then the
accuracy shown over the testing /validation data where the data used
for the purpose is English data in the conll format.
11. Dependency Grammar
The tradition of dependency grammar is based on the assumption
that syntactic structure consists of lexical elements linked by binary
asymmetrical relations called Dependencies.
13. The Notion of Dependency
The fundamental notion of dependency is based on the idea that theThe fundamental notion of dependency is based on the idea that the
syntactic structures of a sentence consists of binary asymmetrical
relations between the words of the sentence.
14. Tesniere said(1959)
The sentence is an organized whole, the constituent elements of
which are words. Every word that belongs to a sentence ceases bywhich are words. Every word that belongs to a sentence ceases by
itself to be isolated as in dictionary . Between the words & its
neighbors , the mind perceives connections , the totality of which
forms the structure of the sentence . The structural connections
establish dependency relations between the words . Each
connection in principle unites a superior term and a inferior term.
15. Criteria for identifying a syntactic relation
between a head H and a dependent D in a
construction C
H determines the syntactic category of C and can often replace C
H determines the semantic category of C; D gives semantic
representationrepresentation
H is obligatory ; D may be optional
H selects D and determines whether D is obligatory or optional
The form of D depends on H
17. Endocentric constructions
In an endocentric construction the Head can replace the whole
without disrupting the syntactic structure.without disrupting the syntactic structure.
“Economic news had little effect on [financial] markets”
18. Exocentric Constructions
In exocentric dependencies it’s not possible for the Head to replace
the whole with out disrupting the syntactic structure
“Economic news had little [effect] on financial markets”
19. Types of Dependency Parsing
Grammar – driven dependency parsing
Data – driven dependency parsing
20. Data driven dependency parsing
The methodology is based on three essential components:
1. Deterministic parsing algorithms for building dependency graphs
2. History-based feature models for predicting the next parser action
3. Discriminative machine learning to map histories to parser actions
21. Architecture for Data Driven Dependency
Parsing
The architecture consists of three main components:
ParserParser
Guide
Learner
25. Assumptions for Parsing
Unity
-Single tree with unique root
Uniqueness
-each word has only one head
One word at a time
Single left to right pass
Projectivity
-No crossing branches
27. Description of Algorithms Used
Parsing Algorithm
-Covington (projective ,non projective)
-Nivre (arc-eager ,arc standard)
Learning Algorithm
-SVM(LIBSVM)
28. Covington Algorithm
There are two parsing strategy basically
1.Brute-force search:
Examine each pair of words in the entire sentence, linking them as
head-to-dependent or dependent-to-head if the grammar permits.
If n words then n(n-1) pair
If backtracking allowed then complexity increases
2.Exhaustive left-to-right search:
Accept words one by one starting at the beginning of the sentence, and
try linking each word as head or dependent of every previous word.
29. Non Projective Covington Algorithm
ESH Algorithm:
Given an n-word sentence:
[1] for i := 1 to n do
[2] begin
[3] for j := i − 1 down to 1 do
[4] begin
[5] If the grammar permits,
link word j as head of word i;
[6] If the grammar permits,
link word j as dependent of word i
[7] end
[8] end
30. ESD Algorithm:
Given an n-word sentence:
[1] for i := 1 to n do
[2] begin
[3] for j := i − 1 down to 1 do
[4] begin
[5] If the grammar permits,
link word j as dependent of word I
[6] If the grammar permits,
link word j as head of word i;
[7] end
[8] end
31. inefficient Algorithm
Violation of unity ,uniqueness and projectivity
Use specific principle for uniqueness
So there are three variations
32. Algorithm ESHU
[1] for i := 1 to n do /*given n word sentence*/
[2] begin
[3] for j := i − 1 down to 1 do
[4] begin
[5] If no word has been[5] If no word has been
linked as head of word i, then
[6] if the grammar permits,
link word j as head of word i;
[7] If word j is not a dependent
of some other word, then
[8] if the grammar permits,
link word j as dependent of word i
[9] end
[10] end
33. Algorithm ESDU
[1] for i := 1 to n do/* Given an n-word sentence*/
[2] begin
[3] for j := i − 1 down to 1 do
[4] begin
[5] If word j is not a dependent[5] If word j is not a dependent
of some other word, then
[6] if the grammar permits,
link word j as dependent of word i
[7] If no word has been
linked as head of word i, then
[8] if the grammar permits,
link word j as head of word i;
[9] end
[10] end
34. Algorithm LSU
Headlist := [] /*Contains list of words that has no head*/
Wordlist := []/*All words encountered so for */
while (!end-of-sentence)
W := next input word;
for each D in Headlist
if HEAD?(W,D)
LINK(W,D);
delete D from Headlist;
end
for each H in Wordlist
if HEAD?(H,W)
LINK(H,W);
35. terminate this for each loop;
end
if no head for W was found then
Headlist := W + Headlist;
end
Wordlist := W + Wordlist;
end
36. Projective Covington Algorithm
/*we have two list head and word*/
Headlist := []; (Words that do not yet have heads)
Wordlist := []; (All words encountered so far)
repeat
(Accept a word and add it to Wordlist)
W := the next word to be parsed;
Wordlist := W + Wordlist;
(Look for dependents of W; they can only be
consecutive elements of Headlist
starting with the most recently added)
37. Contd…
for D := each element of Headlist,
starting with the first
begin
if D can depend on W then
begin
link D as dependent of W;
delete D from Headlist
end
else
terminate this for loop
end;
(Look for the head of W; it must
comprise the word preceding W)
38. Contd…
H := the word immediately preceding W
in the input string;
loop
if W can depend on H then
begin
link W as dependent of H;link W as dependent of H;
terminate the loop
end;
if H is independent then terminate the loop;
H := the head of H
end loop;
if no head for W was found then
Headlist := W + Headlist;
until all words have been parsed.
39. Nivre’s Algorithm
Configuration: C = 〈S, I, A〉
S = Stack
I = Input (remaining)
A = Arc relation (current)
Initialization:
〈nil, W, ∅〉
Termination:
〈S, nil, A〉 for any S, A
Acceptance:
〈S, nil, A〉 if (W, A) is connected
40. Transitions
• Left-Arc (LA):
〈wi|S, wj|I, A〉 → 〈S, wj|I, A ∪ {(wj, wi)}〉
if ¬∃a : a ∈ A ∧ dep(a) = wi
• Right-Arc (RA):
〈w |S, w |I, A〉 → 〈w |w |S, I, A ∪ {(w , w )}〉〈wi|S, wj|I, A〉 → 〈wj|wi|S, I, A ∪ {(wi, wj)}〉
if ¬∃a : a ∈ A ∧ dep(a) = wj
• Reduce (RE):
〈wi|S, I, A〉 → 〈S, I, A〉
if ∃a : a ∈ A ∧ dep(a) = wi
• Shift (SH):
〈S, wi|I, A〉 → 〈wi|S, I, A〉
59. Methodology
We as a group first studied the art of dependency parsing and then the
next task was to explore the various algorithms for which we have
made our time and accuracy analysis. After going through thesemade our time and accuracy analysis. After going through these
steps we created the LIBSVM Learning models (where LIBSVM is a
machine learning package for support vector machines with different
kernels) for each of these algorithms when a learning data from a
Treebank was given as input for the learning phase on the Malt
Parser platform
61. The Input data Treebank
Learning data :- WSJ_Train.conll
Testing data:- WSJ_Test.conll
No of tokens in WSJ_Train.conll:- 81651
No of tokens in WSJ_Test.conll:-16320
66. Accuracy trade off
We have described accuracy trade off in terms of Precision
where,where,
Precision = (No. of correct parses) / (Gold Standard parses)
71. Scope for further work
Comparison between these algorithms for Hindi language data is quite
captivating but that depends on the ease of the availability of qualitycaptivating but that depends on the ease of the availability of quality
manually annotated Treebank data. In case a comprehensive Hindi
Treebank data is available the same comparison can be carried out
on Hindi which is something to look in to.