8. DEFINITION OF PARSING
A parser is a compiler or interpreter component that breaks data into
smaller elements for easy translation into another language.
A parser takes input in the form of a sequence of tokens or program
instructions and usually builds a data structure in the form of a parse
tree or an abstract syntax tree.
9. Role of Parsers
⢠performs context-free syntax analysis
⢠guides context-sensitive analysis
⢠constructs an intermediate representation
⢠produces meaningful error messages
⢠attempts error correction
10. Parsing
⢠POS tags give information about the individual words, and their
internal form (eg sing vs plur, tense of verb)
⢠Additional level of information concerns the way the words relate to
each other
⢠the overall structure of each sentence
⢠the relationships between the words
⢠This can be achieved by parsing the corpus
11. Parsing Techniques
⢠Parsing adds information about sentence structure and constituents
⢠Allows us to see what constructions words enter into
⢠eg, transitivity, passivization, argument structure for verbs
⢠Allows us to see how words function relative to each other
⢠eg, what words can modify / be modified by other words
12. Parsing Issues
⢠Besides lexical ambiguities (usually resolved by tagger), language can
be structurally ambiguous
⢠global ambiguities due to ambiguous words and/or alternative possible
combinations
⢠local ambiguities, especially due to attachment ambiguities, and other
combinatorial possibilities
⢠sheer weight of alternatives available in the absence of (much) knowledge
13. Parsing strategies
⢠Start with a basic grammar, possibly written by hand, with all rules equally
probable
⢠Parse a small amount of text, then correct it manually
⢠this may involve correcting the trees and/or changing the grammar
⢠Learn new probabilities from this small treebank
⢠Parse another (similar) amount of text, then correct it manually
⢠Adjust the probabilities based on the old and new trees combined
⢠Repeat until the grammar stabilizes
14.
15.
16. Types of Parsing
Top-down parsers (LL(1), recursive descent)
⢠Start at the root of the parse tree and grow toward leaves
⢠Pick a production & try to match the input
⢠Bad âpickâ ď may need to backtrack
⢠Some grammars are backtrack-free
Bottom-up parsers (LR(1), operator precedence)
⢠Start at the leaves and grow toward root
⢠As input is consumed, encode possibilities in an internal state
⢠Start in a state valid for legal first tokens
⢠Bottom-up parsers handle a large class of grammars