SlideShare ist ein Scribd-Unternehmen logo
1 von 54
Downloaden Sie, um offline zu lesen
SANSKRIT LANGUAGE PARSER
Akash Bhargava - 10UCS002
Ashok Kumar - 10UCS010
Laxmi Kant Yadav - 10UCS027
Vijay Kumar Gupta - 10UCS057
COMPUTER SCIENCE & ENGINEERING DEPARTMENT
NATIONAL INSTITUTE OF TECHNOLOGY, AGARTALA
INDIA-799055
MAY, 2014
SANSKRIT LANGUAGE PARSER
Dissertation submitted to
National Institute of Technology, Agartala
for the award of the degree
of
Bachelor of Technology
by
Akash Bhargava - 10UCS002
Ashok Kumar - 10UCS010
Laxmi Kant Yadav - 10UCS027
Vijay Kumar Gupta - 10UCS057
Under the Guidance of
Mr. Nikhil Debbarma
Assistant Professor, CSE Department, NIT Agartala, India
COMPUTER SCIENCE & ENGINEERING DEPARTMENT
NATIONAL INSTITUTE OF TECHNOLOGY AGARTALA
MAY, 2014
DISSERTATION APPROVAL SHEET
This dissertation entitled “Language Parser”, by Akash Bhargava, Enrolment Number 10UCS002;
Ashok Kumar, Enrollment Number 10UCS010; Laxmi Kant Yadav, Enrollment Number 10UCS027;
Vijay Kumar Gupta, Enrollment Number 10UCS057 is approved for the award of Bachelor of
Technology in Computer Science & Engineering.
Nikhil Debbarma
Dissertation Supervisor
Assistant Professor
Computer Science & Engineering Department
NIT, Agartala
Paritosh Bhattacharya
Head Of Department
Professor
Computer Science & Engineering Department
NIT, Agartala
Date:19.05.2014
Place:NIT, Agartala
iii
DECLARATION
We declare that the work presented in this dissertation titled “Language Parser”,
submitted to the Computer Science & Engineering Department, National Institute
of Technology, Agartala, for the award of the Bachelor of Technology degree
in Computer Science & Engineering, represents my ideas in my own words and
where others’ ideas or words have been included, We have adequately cited and
referenced the original sources. We also declare that we have adhered to all prin-
ciples of academic honesty and integrity and have not misrepresented or fabricated
or falsified any idea/data/fact/source in my submission. We understand that any vi-
olation of the above will be cause for disciplinary action by the Institute and can
also evoke penal action from the sources which have thus not been properly cited
or from whom proper permission has not been taken when needed.
MAY, 2014
Agartala
Akash Bhargava
10UCS002
Ashok Kumar
10UCS010
Laxmi Kant Yadav
10UCS027
Vijay Kumar Gupta
10UCS057
iv
CERTIFICATE
This dissertation entitled “Language Parser”, by Akash Bhargava, Enrolment Number 10UCS002;
Ashok Kumar, Enrollment Number 10UCS010; Laxmi Kant Yadav, Enrollment Number 10UCS027;
Vijay Kumar Gupta, Enrollment Number 10UCS057 is approved for the award of Bachelor of
Technology in Computer Science & Engineering.
Nikhil Debbarma
Dissertation Supervisor
Assistant Professor
Computer Science & Engineering Department
NIT, Agartala
Suman Deb
Coordinator
Assistant Professor
Computer Science & Engineering Department
NIT, Agartala
v
Acknowledgement
We would like to take this opportunity to express our deep sense of gratitude to all who helped
us directly or indirectly during this project work. Firstly, we would like to thank out super-
visor Asst. Prof. Nikhil Debbarma and Co-ordinator Asst. Prof. Suman Deb for being a
great mentor and the best advisor we could ever have.His advice, encouragement and critics
are source of innovative ideas, inspiration and causes behind the successful completion of this
project. The confidence shown on us by him was the biggest source of inspiration for us. It has
been privilege working with them for last two semesters on two different projects.
We are highly obliged to all the faculty member of Computer Science and Engineering Depart-
ment for their support and encouragement. We also thank out Director Dr. Gopal Mugeraya
and HOD CSE Dept. Asst. Prof. Paritosh Bhattacharya for providing excellent computing
and other facilities without which this work could not achieve its quality goal.
We would like to express our sincere appreciation and gratitude towards Asst. Prof. Anupam
Jamatia for his support to prepare this project report in LATEX. Finally we are grateful to out
parents for their support. It was impossible for us to complete this project without their love,
blessing and encouragement.
-Akash Bhargava, Ashok Kumar, Laxmi Kant Yadav, Vijay Kumar Gupta
vi
Dedicated to
To our loving families for their kind love and support.
To our Project Supervisor Asst. Prof. Nikhil Debbarma and our Project Coordinator
Asst. Prof. Suman Deb for sharing valuable knowledge, encouragement showing
confidence on us all the time.
vii
Abstract
Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural
language or in computer languages, according to the rules of a formal grammar. The term
parsing comes from Latin pars (orationis), meaning part of speech.Traditional sentence parsing
is often performed as a method of understanding the exact meaning of a sentence, sometimes
with the aid of devices such as sentence diagrams. It usually emphasizes the importance of
grammatical divisions such as subject and predicate.
According to many researchers, Sanskrit is a very scientific language. Sanskrit behaves
very closely as programming language. So if we are able to make a translator that translates
Sanskrit into other language, then it would prove to be a significant development in the field of
NLP(Natural Language Processing).
In this project we will basically try to parse a Sanskrit sentence so that later on it could be
easy to translate it in some other language. We take input as a Sanskrit sentence or paragraph.
We tokenize the whole sentence(Lexical analysis). We recognize the parts of the speech from
individual tokens(Parsing) and then we parse the sentence or try to make sense out of it(Parsing)
viii
Contents
Acknowledgement vi
Dedicated to vii
Abstract viii
1 Introduction 3
1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 About The Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
ix
1.7 Drawbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.8 Study of the Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 System Requirement Specification 7
2.1 Compiler Phases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Lexical Analysis Phase : . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.2 Semantic Analysis Phase : . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.3 Intermediate Code Generation: . . . . . . . . . . . . . . . . . . . . . 9
2.1.4 Code Optimization : . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.5 Code Generation : . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Parsing Methods : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Grammar : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Makefile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 System Design 19
3.1 Spiral Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Input Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Input Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 Input Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.5 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.6 Output Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4 Implementation & Screen shots 24
x
4.1 Parser :- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1.1 Parsing Methods : . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1.2 Ambiguity : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2 Implementation Steps :- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.1 The Lexer : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.2 The Parser : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2.3 Grammer Used : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.4 Uses Of A Grammar : . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Input & Output : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5 Testing 35
5.1 Syntax Error Handling: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 Error-Recovery Strategies : . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2.1 Panic mode: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2.2 Phrase-level recovery: . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2.3 Error productions : . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2.4 Global correction : . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6 Conclusion 38
7 Appendix 40
8 Reference 42
xi
List of Figures
2.1 Phase of Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Lexical Analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Parsing Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Vibhakti . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 Conjugational . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.6 Noun and Adjective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.7 Noun Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.8 Noun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.9 Noun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.10 Noun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1 Spiral Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
x
CSED, NIT Agartala
3.3 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.1 lexical Analysis Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3 Output Snapshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4 Output SnapShot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1
CSED, NIT Agartala
2
Chapter 1
Introduction
1.1 Purpose
In this project we will basically try to parse a Sanskrit sentence so that later on it could be easy
to translate it in some other language.
1.2 Scope
Ability to parse From Sanskrit sentence to English Sentence.
1.3 Basis
We will first put up some concepts then employ them-
3
CSED, NIT Agartala
• Lexical Analysis
• Parsing
• Advantages of using Sanskrit
• Approach
1.4 Overview
This Design Document is divided into five major Section.
Section 1 is an Introduction that provides information about the document itself.
Section 2 is an overview of the application and its primary functionality.
Section 3 identifies the assumptions and constraints followed during the design the software.
Section 4 documents the over system architecture.
Section 5 provides the detailed design information for every subsystem and component in the
current delivery
1.5 Objective
In this project we will basically try to parse a Sanskrit sentence so that later on it could be
easy to translate it in some other language. Here we are describing about Machine Translation
Technique for translating Sanskrit sentence to English sentence.
1.6 About The Project
• Machine Translation has been defined as the process that utilizes computer software to
translate text from one natural language to another, It is one of the most important appli-
cations of Natural Language Processing.
• It helps people from different places to understand an unknown language without the aid
of a human translator.
4
CSED, NIT Agartala
• The language to be translated is the Source Language (SL). The language, to which source
language translated is Target Language (TL).
• The major machine translation technique are Rule Based Machine Translation Technique,
Statistical Machine Translation Technique (SMT) and Example-based machine transla-
tion (EBMT).
• One of the effective techniques for machine translation is Rule Based Machine Transla-
tion.
• In India, different machine translation systems are implemented. AnglaUrdu (AnglaHindi
based) Machine Translation System for English to Urdu , HindiAngla Machine Trans-
lation Systems form Hindi to English, English-Assarnese Machine Translation System
(Machine Translation System from English to Assamese, MaTra: Human Aided Machine
Translation System, AnglaHindi: An English to Hindi Machine-Aided Translation Sys-
tem and AnglaBharti Technology for machine aided translation from English to Indian
Languages, these are some of the machine translation works implemented in India.
• Machine translation from Sanskrit is never an easy task because of structural vastness of
its Grammar, but the grammar is well organized and least ambiguous compared to other
natural language.
• The Sanskrit sentence which is the input for our first module i.e. lexical Parser it generates
a Parse tree that is generated by using semantic relationships.
• This parse tree acts as an input to the Second module i.e. Semantic mapper where the
Sanskrit semantic word is mapped to the English semantic word.
1.7 Drawbacks
Some of the most fluent drawbacks of the project:
• This project is all about Parsing a language into another , it is not a pure translator.
• This project is platform dependent (here platform is Linux).
• It is Database oriented project not just using online approach.
5
CSED, NIT Agartala
1.8 Study of the Project
To Provide the facility for users to give input in sanskrit language and converting (parsing ) it
into English language. Here we have some predefined methods for Parsing As:
• We first tokenize the input using strtok(str,´’ ´’);
• Each token can be of 3 types- Noun,verb, preposition.The task is to identify these token
which is done by matching in indexed database.
• Each token is stored in a structure along with the meaning and its morphologic.
• Then parser comes into play and form a tree type of structure. Using these tokens.
Major approaches of Machine Translation are rule-based machine translation (RBMT, also
known as the Rational approach). Rule based translation consists of:
1. Process of analyzing input sentence of a source language syntactically and or semantically
2. Process of generating output sentence of a target language based on internal structure each
process is controlled by the dictionary and the rules.
• The strength of the rule based method is that the information can be obtained through
introspection and analysis.
• The weakness of the rule based method is the accuracy of entire process is the product of
the accuracies of each sub stage.
6
Chapter 2
System Requirement Specification
2.1 Compiler Phases
Compiler operates in phases ans each phase transforms the source program from one represen-
tation to another. Compiler has six phases :-
• Lexical Analyzer
• Syntax Analyzer
• Semantic Analyzer
• Intermediate code generation
• Code optimization
• Code Generation
7
CSED, NIT Agartala
Symbol table and error handling interact with the six phases. Some of the phases may be
grouped together.
.
Figure 2.1: Phase of Compiler
2.1.1 Lexical Analysis Phase :
The lexical phase reads the characters in the source program and groups them into a stream
of tokens in which each token represents a logically cohesive sequence of characters, such as,
An identifier, A keyword, A punctuation character. The character sequence forming a token is
called the lexeme for the token. The semantic standard representation was designed to provide a
simple description of the grammatical relationships in a sentence that can easily be understood
and effectively used by people without linguistic expertise who want to extract textual relations.
The sentence relationships are represented uniformly as semantic standard relations between
pairs of words.
8
CSED, NIT Agartala
.
Figure 2.2: Lexical Analyzer
2.1.2 Semantic Analysis Phase :
This phase checks the source program for semantic errors and gathers type information for the
subsequent code-generation phase. It uses the hierarchical structure determined by the syntax-
analysis phase to identify the operators and operands of expressions and statements. An impor-
tant component of semantic analysis is type checking.
2.1.3 Intermediate Code Generation:
The syntax and semantic analysis generate a explicit intermediate representation of the source
program. The intermediate representation should have two important properties:
• It should be easy to produce.
• Easy to translate into target program.
Intermediate representation can have a variety of forms. One of the forms is: three address
code; which is like the assembly language for a machine in which every location can act like a
9
CSED, NIT Agartala
register. Three address code consists of a sequence of instructions, each of which has at most
three operands
2.1.4 Code Optimization :
Code optimization phase attempts to improve the intermediate code, so that faster-running ma-
chine code will result.
2.1.5 Code Generation :
The final phase of the compiler is the generation of target code, consisting normally of relocat-
able machine code or assembly code. Memory locations are selected for each of the variables
used by the program. Then, the each intermediate instruction is translated into a sequence of
machine instructions that perform the same task.
2.2 Parsing Methods :
In the compiler model, the parser obtains a string of tokens from the lexical analyser, and verifies
that the string can be generated by the grammar for the source language. The parser returns any
syntax error for the source language. There are two types of parsing methods: top-down and
bottom-up. ”Top-down” is pretty much self-explanatory. From left to right, we drill down
through each non-terminal until we get to a terminal. We also build our tree from the root node
down to the leaves in a top-down fashion. It’s important to note that we drill down from left
to right replacing the leftmost non-terminal first. The definitive meaning of top-down parsing
is an attempt to find a leftmost derivation. ” In bottom-up parsing we are doing a rightmost
derivation, where we replace the rightmost non-terminal first.
There are three general types parsers for grammars.Universal parsing methods such as
theCocke-Younger-Kasami algorithmand Earleys algorithmcan parse any grammar. These meth-
ods are too inefficient to use in production compilers. The methods commonly used in compilers
are classified as either top-down parsingorbottom-up parsing. Top-down parsers build parse
trees from thetop (root)to the bottom (leaves) Bottom-up parsers build parse trees from the
10
CSED, NIT Agartala
.
Figure 2.3: Parsing Step
leaves and work up to the root. In both case input to the parser is scanned from left to right,
one symbol at a time. The output of the parser is some representation of the parse tree for the
stream of tokens. There are number of tasks that might be conducted during parsing. Such as
• Collecting information about various tokens into the symbol table.
• Performing type checking and other kinds of semantic analysis.
• Generating intermediate code.
11
CSED, NIT Agartala
Algorithm for Parsing an English sentence
1. Tokenize the sentence into various tokens i.e. token list.
2. To find the relationship between tokens we are using dependency grammar and binary
relation for our Sanskrit language. Token list acts as an input to semantic class to represent
the semantic standard.
3. Semantic class generates a tree we have a class Tree Transform which will create a tree.
4. Semantic class generates a tree we have a class Tree Transform which will create a tree.
2.3 Grammar :
Grammar provides a precise way to specify the syntax (structure or arrangement of composing
units) of a language. In grade school we take grammar lessons that teach us to speak and write
proper English. They teach us the correct way to form sentences with subjects, predicates,
noun phrases, verb phrases, etc. Subjects, predicates, and phrases are some of the composing
units of a sentence in English; similarly, if/else statements, assignment statements, and function
definitions are some of the composing units of source code, which itself is a single sentence of
a particular programming language. There are a very large number of valid English sentences
one could compose; likewise, there are a large (probably infinite) number of valid source code
programs one could create. If someone says ”on the computer she is,” we immediately recognize
that the sentence is ill- formed. It’s structure is invalid, because the noun phrase should proceed
the verb phrase. It should be: ”She is on the computer .If we take a look at that diagramming
article, well see that the model is exactly like an AST. So it goes without saying that parsing, or
more formally, syntactical analysis,” has its roots in Linguistics. Moreover, just as in English,
programming languages need to be specified in a way that allows us to verify whether a sentence
of the language is valid. That’s where context-free grammars (CFG) come to into play; they
allow us to specify the syntax of a programming language’s source code.
12
CSED, NIT Agartala
Vibhakti as Pointer
.
Figure 2.4: Vibhakti
13
CSED, NIT Agartala
Basic conjugational endings :
Figure 2.5: Conjugational
.
14
CSED, NIT Agartala
Basic noun and adjective declension
.
Figure 2.6: Noun and Adjective
15
CSED, NIT Agartala
A-stems (noun words ending with a)
Figure 2.7: Noun Word
.
16
CSED, NIT Agartala
i- and u-stems
.
Figure 2.8: Noun
.
Figure 2.9: Noun
17
CSED, NIT Agartala
Sanskrit verbs There are 10 types of verb declension forms. One example of bhava root
word is given here. (Only present, past, future).
.
Figure 2.10: Noun
2.4 Makefile
GNU make utility to maintain groups of programs.The purpose of the make utility is to de-
termine automatically which pieces of a large program need to be recompiled, and issue the
commands to recompile them.To prepare to use make, you must write a file called the make-
file that describes the relationships among files in your program, and the states the commands
for updating each file. In a program, typically the executable file is updated from object files,
which are in turn made by compiling source files . Once a suitable makefile exits, each time you
change some source files. make command will process the file called makefile. In that case, we
should use -f option if you want make command processes Makefile.
make clean:- ”make clean” deletes any files generated by previous attempts, leaving you with
clean source code
18
Chapter 3
System Design
3.1 Spiral Model
The spiral model of software development is show diagrammatic representation of this model
appears like a spiral with many loops. The exact number of loop in the spiral is not fixed each
loop of the spiral represents a phase of the software process. This model is much more flex-
ible than other model,since the exact no of phase of the phases through which the product is
developed is not fixed. Each phase in this model is split into four sectors as shown in figure.
The first quadrant identifies the objectives of the phase and the alternative solution is possible
for the phase under consideration. During second phase, the alternative solutions are evaluate
the best solutions possible. The spiral model provides direct support for coping with project
risks.Activities during the fourth quadrant concern reviewing the result of the stages traversed
so far with the customer and planning the next iteration around the spiral. This is viewed as
meta model,since it subsumes all the discussed model. The spiral mode; uses a prototyping ap-
proach by first building a prototype before embarking in the actual product development effort.
Also, the spiral model can be considered as supporting the evolutionary model-the iterations
19
CSED, NIT Agartala
Figure 3.1: Spiral Model
along the spiral can be considered as evolutionary model levels through which the complete
system is built. This enables the developer to understand and resolve the risks at each evolu-
tionary level.the spiral model uses prototyping as a risk reduction mechanism and also return
the systematic step-wise approach of the waterfall model.
3.2 Input Stages
The main input stages can be listed as below:
• Data supply
• Data transaction
• Data synchronization
• Data verification
• Data validation
• Data correction
20
CSED, NIT Agartala
3.3 Input Types
It is necessary to determine the various types of inputs.Inputs can be categorized as follows:
• External inputs,which are prime inputs for the system.
• Internal inputs,which are user communications with the system.
• which are inputs entered during a dialogue.
3.4 Input Media
At this stage choice has to be made about the input media. To conclude about the input media
consideration has to be given to:
• Type of input
• Flexibility of format
• Speed
• Accuracy
• Easy of correction
• Easy to use
• Portability
3.5 Data Flow Diagram
21
CSED, NIT Agartala
Figure 3.2: Data Flow Diagram
Figure 3.3: Data Flow Diagram
22
CSED, NIT Agartala
3.6 Output Design
Outputs from computer systems are required primarily to communicate the results of processing
to users.They are also used to provide a permanent copy of the results for later consultation.The
various types of outputs are:
• External Outputs,whose destination is in the file named Temp.
• Internal outputs whose destination is with in organization and they are the Users main
interface with the Linux system.
• Operational outputs whose use is purely with in the android mobile department.
• Interface outputs,which involve the user in communicating directly with the system.
23
Chapter 4
Implementation & Screen shots
We will be finding trend in programming languages which are moving faster from machine level
to high level to human level languages. See how it is moving from assembly¿c¿c++¿Java¿ruby
And this will not stop until they create something entirely humanly. The scope of Sanskrit to
become a computer language lies in library system. When you compile a code in C, it patches
your code with some predefined libraries. E.g. if you do strcmp(string1,string2) is the best way
to do it because it will link library code in your executable. Libraries are written in assembly
language and highly optimized. So if you have all libraries with you, why you need C? Why
cant just say GO AND OPEN THE DOOR and expect computer to understand it and do it
in highly optimized way. Onus lies with intelligent interpreter. Sanskrit is language where
letters have meanings. It does not need to be words for them to transmit emotions/information.
Composition of letters to words, again changes their meaning. Yes, something like OOPS. E.g.
ANU is particle and PARMANU is nanoparticle. To be a programming language Consistency
is needed which is there in Sanskrit. Ill explorer more in future how Sanskrit can be adjusted
to be a human computer language.Sanskrit is not descriptive language. You dont need to write
paragraphs to explain. When you translate something to Sanskrit, its size will reduce. It is
precise, crisp and clear.
24
CSED, NIT Agartala
4.1 Parser :-
Parsing is the de-linearization of linguistic input; that is, the use of grammatical rules and
other knowledge sources to determine the functions of words in the input sentence. Getting an
efficient and unambiguous parse of natural languages has been a subject of wide interest in the
field of artificial intelligence over past 50 years. A parser breaks data into smaller elements,
according to a set of rules that describe its structure. Parsing is the process of analysing a text,
made of a sequence of tokens (for example, words), to determine its grammatical structure with
respect to a given grammar.
Following are the Steps to generate a Parse Tree:-
1. : Input is a English sentence.
2. : Lexical Analyzer Creates Tokens.
3. : Tokens generated acts as an input to Semantic analyzer.
4. : Tokens generated acts as an input to Semantic analyzer.
5. : Output is a parse tree.
4.1.1 Parsing Methods :
There are two types of parsing methods: top-down and bottom-up. ”Top-down” is pretty much
self-explanatory. From left to right, we drill down through each non-terminal until we get to a
terminal. We also build our tree from the root node down to the leaves in a top-down fashion.
It’s important to note that we drill down from left to right replacing the leftmost non-terminal
first. The definitive meaning of top-down parsing is an attempt to find a leftmost derivation.”
In bottom-up parsing we are doing a rightmost derivation, where we replace the rightmost non-
terminal first.
• Bottom-Up Parsing
In bottom-up parsing the derivation starts from the string of terminals (our sentence) .
We try to derive the start symbol of our CFG. It’s essentially a top-down derivation back-
wards. Initially, instead of replacing a non-terminal with another non-terminal or terminal
25
CSED, NIT Agartala
(drilling down), we replace a terminal with non-terminal (drilling up). At certain points
we may even replace several non-terminals with one non-terminal. Since the derivation
is the exact reverse of a leftmost derivation, we are then replacing non-terminals from
right to left (a rightmost derivation). When we make a replacement we create a node that
becomes the parent of some other node instead of its child.
• Top-Down Parsing
There are several problems with top-down parsing.
(1) Left-recursion can lead to infinite parsing loops, so it must be eliminated. Left re-
cursion in a CFG production occurs when the non-terminal on the left side appears first
on the right side of the arrow. There are simple algorithms to remove it, but the CFG
becomes twice as long in many cases.
(2) Top-down parsing may involve backtracking. Backtracking is the act of climbing back
up the derivation (the parse), reversing everything to try another derivation path. We end
up re-scanning the input as well. If inserting information into a symbol table as the parse
proceeds, everything has to be removed. The need for backtracking can be eliminated
by parsing with lookahead. Backtracking isn’t restricted to top-down parsers. There are
backtracking LR parsers as well.
Finally, (3) the order in which we choose non-terminal expansions can cause valid inputs
to be rejected without information as to why.
4.1.2 Ambiguity :
Ambiguous grammars are those in which a string of the language has more than one parse tree.
This is problematic because it may be hard to interpret the intended meaning of the string. x*y;
That C statement can be interpreted as the multiplication of two variables, x and y, or as the
declaration of a variable y whose type is a pointer to x. To resolve the conflict the compiler
must locate y’s type information in the symbol table. If it’s a numerical type the statement
is interpreted as an expression. Generally speaking, ambiguity is an unwanted feature of any
grammar and may pose a threat to the correctness of both top-down and bottom-up parsers.
Different parsers handle it with varying efficacy. In spite of all this, ambiguity isn’t always
a problem. It’s possible to generate a non-ambiguous language from an ambiguous grammar.
Even if there are two parse trees that generate a string, as long as it has one intended meaning
there’s no problem. Some parser generators allow specifying precedence and associativity rules
to remove any ambiguity.
26
CSED, NIT Agartala
4.2 Implementation Steps :-
The following steps used for developing this application:
4.2.1 The Lexer :
The first step towards creating a succesful Sanskrit English Parser(SEP) is to create a
lexer that analyses every word of the input sanskrit sentence.
Tokenizer:
The tokenizer divides the complete sentence in a stream of individual words seperated by blank
spaces.
Avyaya Analyser :
Every single output of the tokenizer goes through the smallest database of avyaya words(indeclinables)
and only if it produces a complete match, the word is accepted as an avyaya.
Verb Analyser :
The second relatively bigger database of verb roots(dhaturoops) is placed after the avyaya
database. Tokens not recognized as avyaya are then processed by the verb analyser. The pro-
gram verb.cpp analyses the suffix of every input token and generates information regarding
tense, person and number of corresponding token. The suffix is then removed and the verb is
mapped to its respective root using the verb databse. If a match is found the token is accepted
as a verb, else passed on for noun analysis.
27
CSED, NIT Agartala
Noun Analyser :
Tokens not yet recognized are fed to the noun analyser (noun.cpp). Noun declensions belonging
to different genders have different pattern that can not be matched by the program. Hence of the
21 possible noun declensions for 1 single noun, 10 declensions are stored as exceptions while
remaining 11 are processed by the program and the root word is obtained. Lastly if the word
is still not recognized than it is not present in the database and must be entered manually for
analysis.
Figure 4.1: lexical Analysis Steps
4.2.2 The Parser :
Equipped with the knowledge of what individual words represent we can now move towards
re-arranging them in such a way that their mere translation results in a meaningful English
sentence. When parsing from Sanskrit to English we move from a word order free language to
a language in which only a particular order of words would convey the same meaning.
28
CSED, NIT Agartala
How to represent CONTEXT ?
By CONTEXT we mean the parts of a statement that precede or follow a specific word or pas-
sage, usually influencing its meaning or effect. Sanskrit uses the concept of vibhakti to generate
context. Due to lack of vibhakti in English the user will have to understand the context of every
word with help from the LEXER. Using the lexer the user can add words like for, from, to, etc.
which are not used in Sanskrit. Thus the PARSER gives us the spatial arrangement of input
words in converted form (in English) and the LEXER is referred for context. This results in
English translation of a Sanskrit sentence.
Structure of an English sentence :
Every English sentence is a combination of nouns and verbs related to each other through con-
text. In a SIMPLE sentence (sentence without connectors having only 1 verb), the verb is the
central entity. Nouns then relate to this central entity via context, as defined-
Nominative(S) the SUBJECT/doer of verb
Accusative (O) the OBJECT of verb
Instrumental (I) the cause/means of verb
Dative (D) the indirect object of verb
Ablative (A) represents comparison/separation
Locative (L) represents position in space/time
The LEXER already generates this contextual information for every noun, and the PARSER can
now arrange a simple input sentence spatially, following the rules of English as shown below.
Thus, we have the following order
S V O L/A/D/I
The PARSER interprets LEXER’s outputs and rearranges various nouns at their respective po-
sitions as shown. The user can now apply context of every noun used, to obtain a corresponding
English translation.
Parsing rules for a simple sentence :
The PARSER can handle all forms of noun declensions,verb declensions and avyayas(including
connectors). Following points summarise the working of the parser -
29
CSED, NIT Agartala
.
Figure 4.2: Parsing
• The parser stores nouns, verbs and avyaya in 3 separate structures along with their re-
spective information required by the parser like case context,number,person.
• The parser can handle words representing adjectives.
• The parser can handle words representing adverbs.
• The parser can resolve ambiguity generated by Sanskrit noun declensions. Ex. If an input
Sanskrit sentence contains no nominative noun but there is a noun which can be both
nominative and accusative then it is treated as nominative.
• The parser requires that the subject and verb agree on number.thus, is correct but, is
incorrect
• The parser also handles the GENETIVE case which represents a noun-noun relationship
rather than a noun-verb relationship as other declensions do.
• The parser handles avyayas which correspond to a given noun declension type.
• The parser handles avyayas representing questions.
• The parser handles avyayas that act as conjunctions of different types
• The parser can thus handle multiple sentences joined together using avyayas.
30
CSED, NIT Agartala
• The parser displays the interpreted spatial arrangement of the input sentence, in a text file
named temp.
• The parser can process an input even if some part of it is not defined in the laxer database.
Such unrecognized input tokens are outputed as it is, at the start of resultant sentence, in
the temp file.
4.2.3 Grammer Used :
Sanskrit uses a context free grammar. Also the BNF grammar for Sanskrit also exists. The
various forms of BNF grammar is given as:
<BNF rule> ::= <nonterminal > ”::=” <definitions >
<nonterminal > ::=” <” <words > ”>”
<terminal > ::= <word > | <punctuation mark > |’ ” ’ <any chars >’ ” ’
<words > ::= <word >|<words ><word >
<word > ::= <letter >|<word ><letter >|<word ><digit >
<definitions > ::= <definition >|<definitions >”|” <definition >
<definition > ::= <empty >|<term >|<definition ><term
<empty > ::=
<term > ::= <terminal >|<nonterminal >
4.2.4 Uses Of A Grammar :
A BNF grammar can be used in two ways :-
• To generate strings belonging to the grammar
• To do this, start with a string containing a non-terminal; while there are still non-terminals
in the string replace a non-terminal with one of its definitions.
• To recognize strings belonging to the grammar
• This is the way programs are compiled - a program is a string belonging to the grammar
that defines the language
31
CSED, NIT Agartala
• Recognition is much harder than generation
32
CSED, NIT Agartala
4.3 Input & Output :
Figure 4.3: Output Snapshot
33
CSED, NIT Agartala
Figure 4.4: Output SnapShot
34
Chapter 5
Testing
While developing this project we faced some discrepancy between the grammar definition and
the query classes implementation. In order to have a coherent implementation, we had to correct
them.
For the testing there are different strategies :-
5.1 Syntax Error Handling:
Planning the error handling right from the start can both simplify the structure of a compiler and
improve its response to errors. The program can contain errors at many different levels. e.g.
• Lexical such as misspelling an identifier, keyword, or operator.
• Syntax such as an arithmetic expression with unbalanced parenthesis.
• Semantic such as an operator applied to an incompatible operand.
35
CSED, NIT Agartala
• Logical such as an infinitely recursive call.
Much of the error detection and recovery in a compiler is centred on the syntax analysis
phase. One reason for this is that many errors are syntactic in nature or are exposed when the
stream of tokens coming from the lexical analyser disobeys the grammatical rules defining the
programming language. Another is the precision of modern parsing methods; they can detect
the presence of syntactic errors in programs very efficiently.
The error handler in a parser has simple goals:-
• It should the presence of errors clearly and accurately.
• It should recover from each error quickly enough to be able to detect subsequent errors.
• It should not significantly slow down the processing of correct programs.
5.2 Error-Recovery Strategies :
There are many different general strategies that a parser can employ to recover from a syntactic
error.
• Panic mode
• Phrase level
• Error production
• Global correction
5.2.1 Panic mode:
• This is used by most parsing methods.
• On discovering an error, the parser discards input symbols one at a time until one of a
designated set of synchronizing tokens ( delimiters; such as; semicolon or end ) is found.
36
CSED, NIT Agartala
• Panic mode correction often skips a considerable amount of input without checking it for
additional errors.
• It is simple.
5.2.2 Phrase-level recovery:
• On discovering an error; the parser may perform local correction on the remaining input;
i.e., it may replace a prefix of the remaining input by some string that allows the parser to
continue.
• Exmple, local correction would be to replace a comma by a semicolon, deleting an extra-
neous semicolon, or insert a missing semicolon.
• Its major drawback is the difficulty it has in coping with situations in which the actual
error has occurred before the point of detection.
5.2.3 Error productions :
• If an error production is used by the parser, can generate appropriate error diagnostics to
indicate the erroneous construct that has been recognized in the input.
5.2.4 Global correction :
• Given an incorrect input string x and grammar G, the algorithm will find a parse tree for
a related string y, such that the number of insertions, deletions and changes of tokens
required to transform x into y is as small as possible.
37
Chapter 6
Conclusion
The project is mainly based on Two languages C and C++. In this project we have Used Sanskrit
as an input language and English as an output language. Firstly Taking input Sanskrit from
Keyboard , Tokenize the sentence using Tokenizer , Identifying the tokens using Token Analyser
, Then matching the Tokens from database and fetching the output words and finally Add all
the resulting words to produce the output . The main goal of the current study was to parse a
Sanskrit sentence so that later on it could be easy to translate it in some other language.
The findings from this study make several contributions to the current literature. First that
we should use Sanskrit as the primary language for programming purpose .
Finally, a number of important limitations need to be considered. First This project is all
about Parsing a language into another , it is not a pure translator. Second This project is platform
dependent (here platform is Linux) and third It is Database oriented project not just using on-
line approach. It is recommended that further research be undertaken in the following areas:
• We can make this project more user friendly by using graphical user interface.
• We can apply this scheme on many different languages.
38
CSED, NIT Agartala
The findings of this study have a number of important implications for future practice.This
translator is mainly based on fetching of data from database
39
Chapter 7
Appendix
A
Avyaya Analyser 37
Ambiguous 15
C
Compiler 6
Code Optimization 9
Code Generation 9
D
Drawbacks 4
Data Flow Diagram 20
E
Error-Recovery Strategies 35
Error productions 36
G
Grammar 11
Grammer Used 30
40
CSED, NIT Agartala
Global correction 36
I
Intermediate Code Generation 8
Input Stages 19
Input Types 20
L
Lexical Analysis Phase 7
M
Makefile 17
O
Objective 3
Output Design 22
P
Parsing Methods 9
SS
Scope 2
T
Testing 34
U
Uses Of A Grammar 30
41
Chapter 8
Reference
To our Project Supervisor Assistant Professor Nikhil Debbarma and our Project Coordina-
tor Assistant Professor Suman Deb for sharing valuable knowledge, encouragement showing
confidence on us all the time and some link on internet.
• Sanskrit & Artificial Intelligence —NASA
Knowledge Representation in Sanskrit and Artificial Intelligence by Rick Briggs Roacs,
NASA Armes Research Centre, Moffet Field, California
• http://www.vedicsciences.net/articles/sanskrit-nasa.html
• AI Magazine publishes the importance of Sanskrit
• http://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/466
• http://sanskrit.jnu.ac.in/morph/analyze.jsp
• http://uttishthabharata.wordpress.com/2011/05/30/sanskrit-programming/
42

Weitere ähnliche Inhalte

Was ist angesagt?

Cs tocpp a-somewhatshortguide
Cs tocpp a-somewhatshortguideCs tocpp a-somewhatshortguide
Cs tocpp a-somewhatshortguideAlex Popov
 
Modelsim Tuttranslate
Modelsim TuttranslateModelsim Tuttranslate
Modelsim Tuttranslateguest2d20022
 
Android Face Recognition App Locker
Android Face Recognition App LockerAndroid Face Recognition App Locker
Android Face Recognition App LockerAnkur Mogra
 
SSTRM - StrategicReviewGroup.ca - Workshop 2: Power/Energy and Sustainability...
SSTRM - StrategicReviewGroup.ca - Workshop 2: Power/Energy and Sustainability...SSTRM - StrategicReviewGroup.ca - Workshop 2: Power/Energy and Sustainability...
SSTRM - StrategicReviewGroup.ca - Workshop 2: Power/Energy and Sustainability...Phil Carr
 
Auditoría de TrueCrypt: Informe final fase II
Auditoría de TrueCrypt: Informe final fase IIAuditoría de TrueCrypt: Informe final fase II
Auditoría de TrueCrypt: Informe final fase IIChema Alonso
 
Composition of Semantic Geo Services
Composition of Semantic Geo ServicesComposition of Semantic Geo Services
Composition of Semantic Geo ServicesFelipe Diniz
 
Manual of JAVA (more than Half)
Manual of JAVA (more than Half)Manual of JAVA (more than Half)
Manual of JAVA (more than Half)Farwa Ansari
 
Monitoring and evaluation_plan____a_practical_guide_to_prepare_good_quality_m...
Monitoring and evaluation_plan____a_practical_guide_to_prepare_good_quality_m...Monitoring and evaluation_plan____a_practical_guide_to_prepare_good_quality_m...
Monitoring and evaluation_plan____a_practical_guide_to_prepare_good_quality_m...Malik Khalid Mehmood
 
Guia definitiva de shodan
Guia definitiva de shodanGuia definitiva de shodan
Guia definitiva de shodannoc_313
 
JAVA Manual remaining
JAVA Manual remainingJAVA Manual remaining
JAVA Manual remainingFarwa Ansari
 
Managing groups and_teams
Managing groups and_teamsManaging groups and_teams
Managing groups and_teamsprofessorsrb
 
labview-graphical-programming-course-4.6.pdf
labview-graphical-programming-course-4.6.pdflabview-graphical-programming-course-4.6.pdf
labview-graphical-programming-course-4.6.pdfNadia Fezai
 
Cloud enabled business process management systems
Cloud enabled business process management systemsCloud enabled business process management systems
Cloud enabled business process management systemsJa'far Railton
 
Pressure Vessel Selection Sizing and Troubleshooting
Pressure Vessel Selection Sizing and Troubleshooting Pressure Vessel Selection Sizing and Troubleshooting
Pressure Vessel Selection Sizing and Troubleshooting Karl Kolmetz
 
Eta design-guide-2019oct
Eta design-guide-2019octEta design-guide-2019oct
Eta design-guide-2019octssuserae99fb
 
Pinterest (MyTacks) - Software Engineering Management
Pinterest (MyTacks) - Software Engineering ManagementPinterest (MyTacks) - Software Engineering Management
Pinterest (MyTacks) - Software Engineering ManagementAkshay Wattal
 

Was ist angesagt? (19)

Cs tocpp a-somewhatshortguide
Cs tocpp a-somewhatshortguideCs tocpp a-somewhatshortguide
Cs tocpp a-somewhatshortguide
 
Modelsim Tuttranslate
Modelsim TuttranslateModelsim Tuttranslate
Modelsim Tuttranslate
 
Cimplementation
CimplementationCimplementation
Cimplementation
 
Android Face Recognition App Locker
Android Face Recognition App LockerAndroid Face Recognition App Locker
Android Face Recognition App Locker
 
z_remy_spaan
z_remy_spaanz_remy_spaan
z_remy_spaan
 
SSTRM - StrategicReviewGroup.ca - Workshop 2: Power/Energy and Sustainability...
SSTRM - StrategicReviewGroup.ca - Workshop 2: Power/Energy and Sustainability...SSTRM - StrategicReviewGroup.ca - Workshop 2: Power/Energy and Sustainability...
SSTRM - StrategicReviewGroup.ca - Workshop 2: Power/Energy and Sustainability...
 
Auditoría de TrueCrypt: Informe final fase II
Auditoría de TrueCrypt: Informe final fase IIAuditoría de TrueCrypt: Informe final fase II
Auditoría de TrueCrypt: Informe final fase II
 
Composition of Semantic Geo Services
Composition of Semantic Geo ServicesComposition of Semantic Geo Services
Composition of Semantic Geo Services
 
Manual of JAVA (more than Half)
Manual of JAVA (more than Half)Manual of JAVA (more than Half)
Manual of JAVA (more than Half)
 
Monitoring and evaluation_plan____a_practical_guide_to_prepare_good_quality_m...
Monitoring and evaluation_plan____a_practical_guide_to_prepare_good_quality_m...Monitoring and evaluation_plan____a_practical_guide_to_prepare_good_quality_m...
Monitoring and evaluation_plan____a_practical_guide_to_prepare_good_quality_m...
 
Guia definitiva de shodan
Guia definitiva de shodanGuia definitiva de shodan
Guia definitiva de shodan
 
JAVA Manual remaining
JAVA Manual remainingJAVA Manual remaining
JAVA Manual remaining
 
Managing groups and_teams
Managing groups and_teamsManaging groups and_teams
Managing groups and_teams
 
labview-graphical-programming-course-4.6.pdf
labview-graphical-programming-course-4.6.pdflabview-graphical-programming-course-4.6.pdf
labview-graphical-programming-course-4.6.pdf
 
Cloud enabled business process management systems
Cloud enabled business process management systemsCloud enabled business process management systems
Cloud enabled business process management systems
 
Pressure Vessel Selection Sizing and Troubleshooting
Pressure Vessel Selection Sizing and Troubleshooting Pressure Vessel Selection Sizing and Troubleshooting
Pressure Vessel Selection Sizing and Troubleshooting
 
Eta design-guide-2019oct
Eta design-guide-2019octEta design-guide-2019oct
Eta design-guide-2019oct
 
Pinterest (MyTacks) - Software Engineering Management
Pinterest (MyTacks) - Software Engineering ManagementPinterest (MyTacks) - Software Engineering Management
Pinterest (MyTacks) - Software Engineering Management
 
Akka java
Akka javaAkka java
Akka java
 

Andere mochten auch

Sanskrit parser Project Report
Sanskrit parser Project ReportSanskrit parser Project Report
Sanskrit parser Project ReportLaxmi Kant Yadav
 
Presentation on Android application
Presentation on Android applicationPresentation on Android application
Presentation on Android applicationAtibur Rahman
 
Android Project Presentation
Android Project PresentationAndroid Project Presentation
Android Project PresentationLaxmi Kant Yadav
 
Proposal Defense Power Point
Proposal Defense Power PointProposal Defense Power Point
Proposal Defense Power Pointjamathompson
 
Good and Bad Power Point Examples Ed Tech
Good and Bad Power Point Examples Ed TechGood and Bad Power Point Examples Ed Tech
Good and Bad Power Point Examples Ed TechLynnylu
 
How to Defend your Thesis Proposal like a Professional
How to Defend your Thesis Proposal like a ProfessionalHow to Defend your Thesis Proposal like a Professional
How to Defend your Thesis Proposal like a ProfessionalMiriam College
 

Andere mochten auch (6)

Sanskrit parser Project Report
Sanskrit parser Project ReportSanskrit parser Project Report
Sanskrit parser Project Report
 
Presentation on Android application
Presentation on Android applicationPresentation on Android application
Presentation on Android application
 
Android Project Presentation
Android Project PresentationAndroid Project Presentation
Android Project Presentation
 
Proposal Defense Power Point
Proposal Defense Power PointProposal Defense Power Point
Proposal Defense Power Point
 
Good and Bad Power Point Examples Ed Tech
Good and Bad Power Point Examples Ed TechGood and Bad Power Point Examples Ed Tech
Good and Bad Power Point Examples Ed Tech
 
How to Defend your Thesis Proposal like a Professional
How to Defend your Thesis Proposal like a ProfessionalHow to Defend your Thesis Proposal like a Professional
How to Defend your Thesis Proposal like a Professional
 

Ähnlich wie Sanskrit Parser Report

Work Measurement Application - Ghent Internship Report - Adel Belasker
Work Measurement Application - Ghent Internship Report - Adel BelaskerWork Measurement Application - Ghent Internship Report - Adel Belasker
Work Measurement Application - Ghent Internship Report - Adel BelaskerAdel Belasker
 
Report on e-Notice App (An Android Application)
Report on e-Notice App (An Android Application)Report on e-Notice App (An Android Application)
Report on e-Notice App (An Android Application)Priyanka Kapoor
 
project Report on LAN Security Manager
project Report on LAN Security Managerproject Report on LAN Security Manager
project Report on LAN Security ManagerShahrikh Khan
 
ImplementationOFDMFPGA
ImplementationOFDMFPGAImplementationOFDMFPGA
ImplementationOFDMFPGANikita Pinto
 
bonino_thesis_final
bonino_thesis_finalbonino_thesis_final
bonino_thesis_finalDario Bonino
 
Applying The Rapid Serial Visual Presentation Technique To Small Screens
Applying The Rapid Serial Visual Presentation Technique To Small ScreensApplying The Rapid Serial Visual Presentation Technique To Small Screens
Applying The Rapid Serial Visual Presentation Technique To Small ScreensMonica Waters
 
An Optical Character Recognition Engine For Graphical Processing Units
An Optical Character Recognition Engine For Graphical Processing UnitsAn Optical Character Recognition Engine For Graphical Processing Units
An Optical Character Recognition Engine For Graphical Processing UnitsKelly Lipiec
 
Machine_translation_for_low_resource_Indian_Languages_thesis_report
Machine_translation_for_low_resource_Indian_Languages_thesis_reportMachine_translation_for_low_resource_Indian_Languages_thesis_report
Machine_translation_for_low_resource_Indian_Languages_thesis_reportTrushita Redij
 
eclipse.pdf
eclipse.pdfeclipse.pdf
eclipse.pdfPerPerso
 
QBD_1464843125535 - Copy
QBD_1464843125535 - CopyQBD_1464843125535 - Copy
QBD_1464843125535 - CopyBhavesh Jangale
 
Trinity Impulse - Event Aggregation to Increase Stundents Awareness of Events...
Trinity Impulse - Event Aggregation to Increase Stundents Awareness of Events...Trinity Impulse - Event Aggregation to Increase Stundents Awareness of Events...
Trinity Impulse - Event Aggregation to Increase Stundents Awareness of Events...Jason Cheung
 

Ähnlich wie Sanskrit Parser Report (20)

Thesis
ThesisThesis
Thesis
 
Work Measurement Application - Ghent Internship Report - Adel Belasker
Work Measurement Application - Ghent Internship Report - Adel BelaskerWork Measurement Application - Ghent Internship Report - Adel Belasker
Work Measurement Application - Ghent Internship Report - Adel Belasker
 
Fraser_William
Fraser_WilliamFraser_William
Fraser_William
 
Liebman_Thesis.pdf
Liebman_Thesis.pdfLiebman_Thesis.pdf
Liebman_Thesis.pdf
 
Report on e-Notice App (An Android Application)
Report on e-Notice App (An Android Application)Report on e-Notice App (An Android Application)
Report on e-Notice App (An Android Application)
 
project Report on LAN Security Manager
project Report on LAN Security Managerproject Report on LAN Security Manager
project Report on LAN Security Manager
 
ImplementationOFDMFPGA
ImplementationOFDMFPGAImplementationOFDMFPGA
ImplementationOFDMFPGA
 
bonino_thesis_final
bonino_thesis_finalbonino_thesis_final
bonino_thesis_final
 
FULLTEXT01.pdf
FULLTEXT01.pdfFULLTEXT01.pdf
FULLTEXT01.pdf
 
Applying The Rapid Serial Visual Presentation Technique To Small Screens
Applying The Rapid Serial Visual Presentation Technique To Small ScreensApplying The Rapid Serial Visual Presentation Technique To Small Screens
Applying The Rapid Serial Visual Presentation Technique To Small Screens
 
diss
dissdiss
diss
 
An Optical Character Recognition Engine For Graphical Processing Units
An Optical Character Recognition Engine For Graphical Processing UnitsAn Optical Character Recognition Engine For Graphical Processing Units
An Optical Character Recognition Engine For Graphical Processing Units
 
Machine_translation_for_low_resource_Indian_Languages_thesis_report
Machine_translation_for_low_resource_Indian_Languages_thesis_reportMachine_translation_for_low_resource_Indian_Languages_thesis_report
Machine_translation_for_low_resource_Indian_Languages_thesis_report
 
eclipse.pdf
eclipse.pdfeclipse.pdf
eclipse.pdf
 
Thesis
ThesisThesis
Thesis
 
test6
test6test6
test6
 
QBD_1464843125535 - Copy
QBD_1464843125535 - CopyQBD_1464843125535 - Copy
QBD_1464843125535 - Copy
 
Trinity Impulse - Event Aggregation to Increase Stundents Awareness of Events...
Trinity Impulse - Event Aggregation to Increase Stundents Awareness of Events...Trinity Impulse - Event Aggregation to Increase Stundents Awareness of Events...
Trinity Impulse - Event Aggregation to Increase Stundents Awareness of Events...
 
document
documentdocument
document
 
web_based_ide
web_based_ideweb_based_ide
web_based_ide
 

Kürzlich hochgeladen

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 

Sanskrit Parser Report

  • 1. SANSKRIT LANGUAGE PARSER Akash Bhargava - 10UCS002 Ashok Kumar - 10UCS010 Laxmi Kant Yadav - 10UCS027 Vijay Kumar Gupta - 10UCS057 COMPUTER SCIENCE & ENGINEERING DEPARTMENT NATIONAL INSTITUTE OF TECHNOLOGY, AGARTALA INDIA-799055 MAY, 2014
  • 2. SANSKRIT LANGUAGE PARSER Dissertation submitted to National Institute of Technology, Agartala for the award of the degree of Bachelor of Technology by Akash Bhargava - 10UCS002 Ashok Kumar - 10UCS010 Laxmi Kant Yadav - 10UCS027 Vijay Kumar Gupta - 10UCS057 Under the Guidance of Mr. Nikhil Debbarma Assistant Professor, CSE Department, NIT Agartala, India COMPUTER SCIENCE & ENGINEERING DEPARTMENT NATIONAL INSTITUTE OF TECHNOLOGY AGARTALA MAY, 2014
  • 3. DISSERTATION APPROVAL SHEET This dissertation entitled “Language Parser”, by Akash Bhargava, Enrolment Number 10UCS002; Ashok Kumar, Enrollment Number 10UCS010; Laxmi Kant Yadav, Enrollment Number 10UCS027; Vijay Kumar Gupta, Enrollment Number 10UCS057 is approved for the award of Bachelor of Technology in Computer Science & Engineering. Nikhil Debbarma Dissertation Supervisor Assistant Professor Computer Science & Engineering Department NIT, Agartala Paritosh Bhattacharya Head Of Department Professor Computer Science & Engineering Department NIT, Agartala Date:19.05.2014 Place:NIT, Agartala iii
  • 4. DECLARATION We declare that the work presented in this dissertation titled “Language Parser”, submitted to the Computer Science & Engineering Department, National Institute of Technology, Agartala, for the award of the Bachelor of Technology degree in Computer Science & Engineering, represents my ideas in my own words and where others’ ideas or words have been included, We have adequately cited and referenced the original sources. We also declare that we have adhered to all prin- ciples of academic honesty and integrity and have not misrepresented or fabricated or falsified any idea/data/fact/source in my submission. We understand that any vi- olation of the above will be cause for disciplinary action by the Institute and can also evoke penal action from the sources which have thus not been properly cited or from whom proper permission has not been taken when needed. MAY, 2014 Agartala Akash Bhargava 10UCS002 Ashok Kumar 10UCS010 Laxmi Kant Yadav 10UCS027 Vijay Kumar Gupta 10UCS057 iv
  • 5. CERTIFICATE This dissertation entitled “Language Parser”, by Akash Bhargava, Enrolment Number 10UCS002; Ashok Kumar, Enrollment Number 10UCS010; Laxmi Kant Yadav, Enrollment Number 10UCS027; Vijay Kumar Gupta, Enrollment Number 10UCS057 is approved for the award of Bachelor of Technology in Computer Science & Engineering. Nikhil Debbarma Dissertation Supervisor Assistant Professor Computer Science & Engineering Department NIT, Agartala Suman Deb Coordinator Assistant Professor Computer Science & Engineering Department NIT, Agartala v
  • 6. Acknowledgement We would like to take this opportunity to express our deep sense of gratitude to all who helped us directly or indirectly during this project work. Firstly, we would like to thank out super- visor Asst. Prof. Nikhil Debbarma and Co-ordinator Asst. Prof. Suman Deb for being a great mentor and the best advisor we could ever have.His advice, encouragement and critics are source of innovative ideas, inspiration and causes behind the successful completion of this project. The confidence shown on us by him was the biggest source of inspiration for us. It has been privilege working with them for last two semesters on two different projects. We are highly obliged to all the faculty member of Computer Science and Engineering Depart- ment for their support and encouragement. We also thank out Director Dr. Gopal Mugeraya and HOD CSE Dept. Asst. Prof. Paritosh Bhattacharya for providing excellent computing and other facilities without which this work could not achieve its quality goal. We would like to express our sincere appreciation and gratitude towards Asst. Prof. Anupam Jamatia for his support to prepare this project report in LATEX. Finally we are grateful to out parents for their support. It was impossible for us to complete this project without their love, blessing and encouragement. -Akash Bhargava, Ashok Kumar, Laxmi Kant Yadav, Vijay Kumar Gupta vi
  • 7. Dedicated to To our loving families for their kind love and support. To our Project Supervisor Asst. Prof. Nikhil Debbarma and our Project Coordinator Asst. Prof. Suman Deb for sharing valuable knowledge, encouragement showing confidence on us all the time. vii
  • 8. Abstract Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages, according to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part of speech.Traditional sentence parsing is often performed as a method of understanding the exact meaning of a sentence, sometimes with the aid of devices such as sentence diagrams. It usually emphasizes the importance of grammatical divisions such as subject and predicate. According to many researchers, Sanskrit is a very scientific language. Sanskrit behaves very closely as programming language. So if we are able to make a translator that translates Sanskrit into other language, then it would prove to be a significant development in the field of NLP(Natural Language Processing). In this project we will basically try to parse a Sanskrit sentence so that later on it could be easy to translate it in some other language. We take input as a Sanskrit sentence or paragraph. We tokenize the whole sentence(Lexical analysis). We recognize the parts of the speech from individual tokens(Parsing) and then we parse the sentence or try to make sense out of it(Parsing) viii
  • 9. Contents Acknowledgement vi Dedicated to vii Abstract viii 1 Introduction 3 1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.5 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.6 About The Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 ix
  • 10. 1.7 Drawbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.8 Study of the Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 System Requirement Specification 7 2.1 Compiler Phases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Lexical Analysis Phase : . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.2 Semantic Analysis Phase : . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.3 Intermediate Code Generation: . . . . . . . . . . . . . . . . . . . . . 9 2.1.4 Code Optimization : . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.5 Code Generation : . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Parsing Methods : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Grammar : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4 Makefile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 System Design 19 3.1 Spiral Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Input Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 Input Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.4 Input Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.5 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.6 Output Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4 Implementation & Screen shots 24 x
  • 11. 4.1 Parser :- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.1.1 Parsing Methods : . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.1.2 Ambiguity : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2 Implementation Steps :- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.2.1 The Lexer : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.2.2 The Parser : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2.3 Grammer Used : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2.4 Uses Of A Grammar : . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3 Input & Output : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5 Testing 35 5.1 Syntax Error Handling: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2 Error-Recovery Strategies : . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.2.1 Panic mode: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.2.2 Phrase-level recovery: . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.2.3 Error productions : . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.2.4 Global correction : . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 6 Conclusion 38 7 Appendix 40 8 Reference 42 xi
  • 12. List of Figures 2.1 Phase of Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Lexical Analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Parsing Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 Vibhakti . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5 Conjugational . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.6 Noun and Adjective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.7 Noun Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.8 Noun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.9 Noun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.10 Noun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1 Spiral Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 x
  • 13. CSED, NIT Agartala 3.3 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.1 lexical Analysis Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2 Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.3 Output Snapshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.4 Output SnapShot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 1
  • 15. Chapter 1 Introduction 1.1 Purpose In this project we will basically try to parse a Sanskrit sentence so that later on it could be easy to translate it in some other language. 1.2 Scope Ability to parse From Sanskrit sentence to English Sentence. 1.3 Basis We will first put up some concepts then employ them- 3
  • 16. CSED, NIT Agartala • Lexical Analysis • Parsing • Advantages of using Sanskrit • Approach 1.4 Overview This Design Document is divided into five major Section. Section 1 is an Introduction that provides information about the document itself. Section 2 is an overview of the application and its primary functionality. Section 3 identifies the assumptions and constraints followed during the design the software. Section 4 documents the over system architecture. Section 5 provides the detailed design information for every subsystem and component in the current delivery 1.5 Objective In this project we will basically try to parse a Sanskrit sentence so that later on it could be easy to translate it in some other language. Here we are describing about Machine Translation Technique for translating Sanskrit sentence to English sentence. 1.6 About The Project • Machine Translation has been defined as the process that utilizes computer software to translate text from one natural language to another, It is one of the most important appli- cations of Natural Language Processing. • It helps people from different places to understand an unknown language without the aid of a human translator. 4
  • 17. CSED, NIT Agartala • The language to be translated is the Source Language (SL). The language, to which source language translated is Target Language (TL). • The major machine translation technique are Rule Based Machine Translation Technique, Statistical Machine Translation Technique (SMT) and Example-based machine transla- tion (EBMT). • One of the effective techniques for machine translation is Rule Based Machine Transla- tion. • In India, different machine translation systems are implemented. AnglaUrdu (AnglaHindi based) Machine Translation System for English to Urdu , HindiAngla Machine Trans- lation Systems form Hindi to English, English-Assarnese Machine Translation System (Machine Translation System from English to Assamese, MaTra: Human Aided Machine Translation System, AnglaHindi: An English to Hindi Machine-Aided Translation Sys- tem and AnglaBharti Technology for machine aided translation from English to Indian Languages, these are some of the machine translation works implemented in India. • Machine translation from Sanskrit is never an easy task because of structural vastness of its Grammar, but the grammar is well organized and least ambiguous compared to other natural language. • The Sanskrit sentence which is the input for our first module i.e. lexical Parser it generates a Parse tree that is generated by using semantic relationships. • This parse tree acts as an input to the Second module i.e. Semantic mapper where the Sanskrit semantic word is mapped to the English semantic word. 1.7 Drawbacks Some of the most fluent drawbacks of the project: • This project is all about Parsing a language into another , it is not a pure translator. • This project is platform dependent (here platform is Linux). • It is Database oriented project not just using online approach. 5
  • 18. CSED, NIT Agartala 1.8 Study of the Project To Provide the facility for users to give input in sanskrit language and converting (parsing ) it into English language. Here we have some predefined methods for Parsing As: • We first tokenize the input using strtok(str,´’ ´’); • Each token can be of 3 types- Noun,verb, preposition.The task is to identify these token which is done by matching in indexed database. • Each token is stored in a structure along with the meaning and its morphologic. • Then parser comes into play and form a tree type of structure. Using these tokens. Major approaches of Machine Translation are rule-based machine translation (RBMT, also known as the Rational approach). Rule based translation consists of: 1. Process of analyzing input sentence of a source language syntactically and or semantically 2. Process of generating output sentence of a target language based on internal structure each process is controlled by the dictionary and the rules. • The strength of the rule based method is that the information can be obtained through introspection and analysis. • The weakness of the rule based method is the accuracy of entire process is the product of the accuracies of each sub stage. 6
  • 19. Chapter 2 System Requirement Specification 2.1 Compiler Phases Compiler operates in phases ans each phase transforms the source program from one represen- tation to another. Compiler has six phases :- • Lexical Analyzer • Syntax Analyzer • Semantic Analyzer • Intermediate code generation • Code optimization • Code Generation 7
  • 20. CSED, NIT Agartala Symbol table and error handling interact with the six phases. Some of the phases may be grouped together. . Figure 2.1: Phase of Compiler 2.1.1 Lexical Analysis Phase : The lexical phase reads the characters in the source program and groups them into a stream of tokens in which each token represents a logically cohesive sequence of characters, such as, An identifier, A keyword, A punctuation character. The character sequence forming a token is called the lexeme for the token. The semantic standard representation was designed to provide a simple description of the grammatical relationships in a sentence that can easily be understood and effectively used by people without linguistic expertise who want to extract textual relations. The sentence relationships are represented uniformly as semantic standard relations between pairs of words. 8
  • 21. CSED, NIT Agartala . Figure 2.2: Lexical Analyzer 2.1.2 Semantic Analysis Phase : This phase checks the source program for semantic errors and gathers type information for the subsequent code-generation phase. It uses the hierarchical structure determined by the syntax- analysis phase to identify the operators and operands of expressions and statements. An impor- tant component of semantic analysis is type checking. 2.1.3 Intermediate Code Generation: The syntax and semantic analysis generate a explicit intermediate representation of the source program. The intermediate representation should have two important properties: • It should be easy to produce. • Easy to translate into target program. Intermediate representation can have a variety of forms. One of the forms is: three address code; which is like the assembly language for a machine in which every location can act like a 9
  • 22. CSED, NIT Agartala register. Three address code consists of a sequence of instructions, each of which has at most three operands 2.1.4 Code Optimization : Code optimization phase attempts to improve the intermediate code, so that faster-running ma- chine code will result. 2.1.5 Code Generation : The final phase of the compiler is the generation of target code, consisting normally of relocat- able machine code or assembly code. Memory locations are selected for each of the variables used by the program. Then, the each intermediate instruction is translated into a sequence of machine instructions that perform the same task. 2.2 Parsing Methods : In the compiler model, the parser obtains a string of tokens from the lexical analyser, and verifies that the string can be generated by the grammar for the source language. The parser returns any syntax error for the source language. There are two types of parsing methods: top-down and bottom-up. ”Top-down” is pretty much self-explanatory. From left to right, we drill down through each non-terminal until we get to a terminal. We also build our tree from the root node down to the leaves in a top-down fashion. It’s important to note that we drill down from left to right replacing the leftmost non-terminal first. The definitive meaning of top-down parsing is an attempt to find a leftmost derivation. ” In bottom-up parsing we are doing a rightmost derivation, where we replace the rightmost non-terminal first. There are three general types parsers for grammars.Universal parsing methods such as theCocke-Younger-Kasami algorithmand Earleys algorithmcan parse any grammar. These meth- ods are too inefficient to use in production compilers. The methods commonly used in compilers are classified as either top-down parsingorbottom-up parsing. Top-down parsers build parse trees from thetop (root)to the bottom (leaves) Bottom-up parsers build parse trees from the 10
  • 23. CSED, NIT Agartala . Figure 2.3: Parsing Step leaves and work up to the root. In both case input to the parser is scanned from left to right, one symbol at a time. The output of the parser is some representation of the parse tree for the stream of tokens. There are number of tasks that might be conducted during parsing. Such as • Collecting information about various tokens into the symbol table. • Performing type checking and other kinds of semantic analysis. • Generating intermediate code. 11
  • 24. CSED, NIT Agartala Algorithm for Parsing an English sentence 1. Tokenize the sentence into various tokens i.e. token list. 2. To find the relationship between tokens we are using dependency grammar and binary relation for our Sanskrit language. Token list acts as an input to semantic class to represent the semantic standard. 3. Semantic class generates a tree we have a class Tree Transform which will create a tree. 4. Semantic class generates a tree we have a class Tree Transform which will create a tree. 2.3 Grammar : Grammar provides a precise way to specify the syntax (structure or arrangement of composing units) of a language. In grade school we take grammar lessons that teach us to speak and write proper English. They teach us the correct way to form sentences with subjects, predicates, noun phrases, verb phrases, etc. Subjects, predicates, and phrases are some of the composing units of a sentence in English; similarly, if/else statements, assignment statements, and function definitions are some of the composing units of source code, which itself is a single sentence of a particular programming language. There are a very large number of valid English sentences one could compose; likewise, there are a large (probably infinite) number of valid source code programs one could create. If someone says ”on the computer she is,” we immediately recognize that the sentence is ill- formed. It’s structure is invalid, because the noun phrase should proceed the verb phrase. It should be: ”She is on the computer .If we take a look at that diagramming article, well see that the model is exactly like an AST. So it goes without saying that parsing, or more formally, syntactical analysis,” has its roots in Linguistics. Moreover, just as in English, programming languages need to be specified in a way that allows us to verify whether a sentence of the language is valid. That’s where context-free grammars (CFG) come to into play; they allow us to specify the syntax of a programming language’s source code. 12
  • 25. CSED, NIT Agartala Vibhakti as Pointer . Figure 2.4: Vibhakti 13
  • 26. CSED, NIT Agartala Basic conjugational endings : Figure 2.5: Conjugational . 14
  • 27. CSED, NIT Agartala Basic noun and adjective declension . Figure 2.6: Noun and Adjective 15
  • 28. CSED, NIT Agartala A-stems (noun words ending with a) Figure 2.7: Noun Word . 16
  • 29. CSED, NIT Agartala i- and u-stems . Figure 2.8: Noun . Figure 2.9: Noun 17
  • 30. CSED, NIT Agartala Sanskrit verbs There are 10 types of verb declension forms. One example of bhava root word is given here. (Only present, past, future). . Figure 2.10: Noun 2.4 Makefile GNU make utility to maintain groups of programs.The purpose of the make utility is to de- termine automatically which pieces of a large program need to be recompiled, and issue the commands to recompile them.To prepare to use make, you must write a file called the make- file that describes the relationships among files in your program, and the states the commands for updating each file. In a program, typically the executable file is updated from object files, which are in turn made by compiling source files . Once a suitable makefile exits, each time you change some source files. make command will process the file called makefile. In that case, we should use -f option if you want make command processes Makefile. make clean:- ”make clean” deletes any files generated by previous attempts, leaving you with clean source code 18
  • 31. Chapter 3 System Design 3.1 Spiral Model The spiral model of software development is show diagrammatic representation of this model appears like a spiral with many loops. The exact number of loop in the spiral is not fixed each loop of the spiral represents a phase of the software process. This model is much more flex- ible than other model,since the exact no of phase of the phases through which the product is developed is not fixed. Each phase in this model is split into four sectors as shown in figure. The first quadrant identifies the objectives of the phase and the alternative solution is possible for the phase under consideration. During second phase, the alternative solutions are evaluate the best solutions possible. The spiral model provides direct support for coping with project risks.Activities during the fourth quadrant concern reviewing the result of the stages traversed so far with the customer and planning the next iteration around the spiral. This is viewed as meta model,since it subsumes all the discussed model. The spiral mode; uses a prototyping ap- proach by first building a prototype before embarking in the actual product development effort. Also, the spiral model can be considered as supporting the evolutionary model-the iterations 19
  • 32. CSED, NIT Agartala Figure 3.1: Spiral Model along the spiral can be considered as evolutionary model levels through which the complete system is built. This enables the developer to understand and resolve the risks at each evolu- tionary level.the spiral model uses prototyping as a risk reduction mechanism and also return the systematic step-wise approach of the waterfall model. 3.2 Input Stages The main input stages can be listed as below: • Data supply • Data transaction • Data synchronization • Data verification • Data validation • Data correction 20
  • 33. CSED, NIT Agartala 3.3 Input Types It is necessary to determine the various types of inputs.Inputs can be categorized as follows: • External inputs,which are prime inputs for the system. • Internal inputs,which are user communications with the system. • which are inputs entered during a dialogue. 3.4 Input Media At this stage choice has to be made about the input media. To conclude about the input media consideration has to be given to: • Type of input • Flexibility of format • Speed • Accuracy • Easy of correction • Easy to use • Portability 3.5 Data Flow Diagram 21
  • 34. CSED, NIT Agartala Figure 3.2: Data Flow Diagram Figure 3.3: Data Flow Diagram 22
  • 35. CSED, NIT Agartala 3.6 Output Design Outputs from computer systems are required primarily to communicate the results of processing to users.They are also used to provide a permanent copy of the results for later consultation.The various types of outputs are: • External Outputs,whose destination is in the file named Temp. • Internal outputs whose destination is with in organization and they are the Users main interface with the Linux system. • Operational outputs whose use is purely with in the android mobile department. • Interface outputs,which involve the user in communicating directly with the system. 23
  • 36. Chapter 4 Implementation & Screen shots We will be finding trend in programming languages which are moving faster from machine level to high level to human level languages. See how it is moving from assembly¿c¿c++¿Java¿ruby And this will not stop until they create something entirely humanly. The scope of Sanskrit to become a computer language lies in library system. When you compile a code in C, it patches your code with some predefined libraries. E.g. if you do strcmp(string1,string2) is the best way to do it because it will link library code in your executable. Libraries are written in assembly language and highly optimized. So if you have all libraries with you, why you need C? Why cant just say GO AND OPEN THE DOOR and expect computer to understand it and do it in highly optimized way. Onus lies with intelligent interpreter. Sanskrit is language where letters have meanings. It does not need to be words for them to transmit emotions/information. Composition of letters to words, again changes their meaning. Yes, something like OOPS. E.g. ANU is particle and PARMANU is nanoparticle. To be a programming language Consistency is needed which is there in Sanskrit. Ill explorer more in future how Sanskrit can be adjusted to be a human computer language.Sanskrit is not descriptive language. You dont need to write paragraphs to explain. When you translate something to Sanskrit, its size will reduce. It is precise, crisp and clear. 24
  • 37. CSED, NIT Agartala 4.1 Parser :- Parsing is the de-linearization of linguistic input; that is, the use of grammatical rules and other knowledge sources to determine the functions of words in the input sentence. Getting an efficient and unambiguous parse of natural languages has been a subject of wide interest in the field of artificial intelligence over past 50 years. A parser breaks data into smaller elements, according to a set of rules that describe its structure. Parsing is the process of analysing a text, made of a sequence of tokens (for example, words), to determine its grammatical structure with respect to a given grammar. Following are the Steps to generate a Parse Tree:- 1. : Input is a English sentence. 2. : Lexical Analyzer Creates Tokens. 3. : Tokens generated acts as an input to Semantic analyzer. 4. : Tokens generated acts as an input to Semantic analyzer. 5. : Output is a parse tree. 4.1.1 Parsing Methods : There are two types of parsing methods: top-down and bottom-up. ”Top-down” is pretty much self-explanatory. From left to right, we drill down through each non-terminal until we get to a terminal. We also build our tree from the root node down to the leaves in a top-down fashion. It’s important to note that we drill down from left to right replacing the leftmost non-terminal first. The definitive meaning of top-down parsing is an attempt to find a leftmost derivation.” In bottom-up parsing we are doing a rightmost derivation, where we replace the rightmost non- terminal first. • Bottom-Up Parsing In bottom-up parsing the derivation starts from the string of terminals (our sentence) . We try to derive the start symbol of our CFG. It’s essentially a top-down derivation back- wards. Initially, instead of replacing a non-terminal with another non-terminal or terminal 25
  • 38. CSED, NIT Agartala (drilling down), we replace a terminal with non-terminal (drilling up). At certain points we may even replace several non-terminals with one non-terminal. Since the derivation is the exact reverse of a leftmost derivation, we are then replacing non-terminals from right to left (a rightmost derivation). When we make a replacement we create a node that becomes the parent of some other node instead of its child. • Top-Down Parsing There are several problems with top-down parsing. (1) Left-recursion can lead to infinite parsing loops, so it must be eliminated. Left re- cursion in a CFG production occurs when the non-terminal on the left side appears first on the right side of the arrow. There are simple algorithms to remove it, but the CFG becomes twice as long in many cases. (2) Top-down parsing may involve backtracking. Backtracking is the act of climbing back up the derivation (the parse), reversing everything to try another derivation path. We end up re-scanning the input as well. If inserting information into a symbol table as the parse proceeds, everything has to be removed. The need for backtracking can be eliminated by parsing with lookahead. Backtracking isn’t restricted to top-down parsers. There are backtracking LR parsers as well. Finally, (3) the order in which we choose non-terminal expansions can cause valid inputs to be rejected without information as to why. 4.1.2 Ambiguity : Ambiguous grammars are those in which a string of the language has more than one parse tree. This is problematic because it may be hard to interpret the intended meaning of the string. x*y; That C statement can be interpreted as the multiplication of two variables, x and y, or as the declaration of a variable y whose type is a pointer to x. To resolve the conflict the compiler must locate y’s type information in the symbol table. If it’s a numerical type the statement is interpreted as an expression. Generally speaking, ambiguity is an unwanted feature of any grammar and may pose a threat to the correctness of both top-down and bottom-up parsers. Different parsers handle it with varying efficacy. In spite of all this, ambiguity isn’t always a problem. It’s possible to generate a non-ambiguous language from an ambiguous grammar. Even if there are two parse trees that generate a string, as long as it has one intended meaning there’s no problem. Some parser generators allow specifying precedence and associativity rules to remove any ambiguity. 26
  • 39. CSED, NIT Agartala 4.2 Implementation Steps :- The following steps used for developing this application: 4.2.1 The Lexer : The first step towards creating a succesful Sanskrit English Parser(SEP) is to create a lexer that analyses every word of the input sanskrit sentence. Tokenizer: The tokenizer divides the complete sentence in a stream of individual words seperated by blank spaces. Avyaya Analyser : Every single output of the tokenizer goes through the smallest database of avyaya words(indeclinables) and only if it produces a complete match, the word is accepted as an avyaya. Verb Analyser : The second relatively bigger database of verb roots(dhaturoops) is placed after the avyaya database. Tokens not recognized as avyaya are then processed by the verb analyser. The pro- gram verb.cpp analyses the suffix of every input token and generates information regarding tense, person and number of corresponding token. The suffix is then removed and the verb is mapped to its respective root using the verb databse. If a match is found the token is accepted as a verb, else passed on for noun analysis. 27
  • 40. CSED, NIT Agartala Noun Analyser : Tokens not yet recognized are fed to the noun analyser (noun.cpp). Noun declensions belonging to different genders have different pattern that can not be matched by the program. Hence of the 21 possible noun declensions for 1 single noun, 10 declensions are stored as exceptions while remaining 11 are processed by the program and the root word is obtained. Lastly if the word is still not recognized than it is not present in the database and must be entered manually for analysis. Figure 4.1: lexical Analysis Steps 4.2.2 The Parser : Equipped with the knowledge of what individual words represent we can now move towards re-arranging them in such a way that their mere translation results in a meaningful English sentence. When parsing from Sanskrit to English we move from a word order free language to a language in which only a particular order of words would convey the same meaning. 28
  • 41. CSED, NIT Agartala How to represent CONTEXT ? By CONTEXT we mean the parts of a statement that precede or follow a specific word or pas- sage, usually influencing its meaning or effect. Sanskrit uses the concept of vibhakti to generate context. Due to lack of vibhakti in English the user will have to understand the context of every word with help from the LEXER. Using the lexer the user can add words like for, from, to, etc. which are not used in Sanskrit. Thus the PARSER gives us the spatial arrangement of input words in converted form (in English) and the LEXER is referred for context. This results in English translation of a Sanskrit sentence. Structure of an English sentence : Every English sentence is a combination of nouns and verbs related to each other through con- text. In a SIMPLE sentence (sentence without connectors having only 1 verb), the verb is the central entity. Nouns then relate to this central entity via context, as defined- Nominative(S) the SUBJECT/doer of verb Accusative (O) the OBJECT of verb Instrumental (I) the cause/means of verb Dative (D) the indirect object of verb Ablative (A) represents comparison/separation Locative (L) represents position in space/time The LEXER already generates this contextual information for every noun, and the PARSER can now arrange a simple input sentence spatially, following the rules of English as shown below. Thus, we have the following order S V O L/A/D/I The PARSER interprets LEXER’s outputs and rearranges various nouns at their respective po- sitions as shown. The user can now apply context of every noun used, to obtain a corresponding English translation. Parsing rules for a simple sentence : The PARSER can handle all forms of noun declensions,verb declensions and avyayas(including connectors). Following points summarise the working of the parser - 29
  • 42. CSED, NIT Agartala . Figure 4.2: Parsing • The parser stores nouns, verbs and avyaya in 3 separate structures along with their re- spective information required by the parser like case context,number,person. • The parser can handle words representing adjectives. • The parser can handle words representing adverbs. • The parser can resolve ambiguity generated by Sanskrit noun declensions. Ex. If an input Sanskrit sentence contains no nominative noun but there is a noun which can be both nominative and accusative then it is treated as nominative. • The parser requires that the subject and verb agree on number.thus, is correct but, is incorrect • The parser also handles the GENETIVE case which represents a noun-noun relationship rather than a noun-verb relationship as other declensions do. • The parser handles avyayas which correspond to a given noun declension type. • The parser handles avyayas representing questions. • The parser handles avyayas that act as conjunctions of different types • The parser can thus handle multiple sentences joined together using avyayas. 30
  • 43. CSED, NIT Agartala • The parser displays the interpreted spatial arrangement of the input sentence, in a text file named temp. • The parser can process an input even if some part of it is not defined in the laxer database. Such unrecognized input tokens are outputed as it is, at the start of resultant sentence, in the temp file. 4.2.3 Grammer Used : Sanskrit uses a context free grammar. Also the BNF grammar for Sanskrit also exists. The various forms of BNF grammar is given as: <BNF rule> ::= <nonterminal > ”::=” <definitions > <nonterminal > ::=” <” <words > ”>” <terminal > ::= <word > | <punctuation mark > |’ ” ’ <any chars >’ ” ’ <words > ::= <word >|<words ><word > <word > ::= <letter >|<word ><letter >|<word ><digit > <definitions > ::= <definition >|<definitions >”|” <definition > <definition > ::= <empty >|<term >|<definition ><term <empty > ::= <term > ::= <terminal >|<nonterminal > 4.2.4 Uses Of A Grammar : A BNF grammar can be used in two ways :- • To generate strings belonging to the grammar • To do this, start with a string containing a non-terminal; while there are still non-terminals in the string replace a non-terminal with one of its definitions. • To recognize strings belonging to the grammar • This is the way programs are compiled - a program is a string belonging to the grammar that defines the language 31
  • 44. CSED, NIT Agartala • Recognition is much harder than generation 32
  • 45. CSED, NIT Agartala 4.3 Input & Output : Figure 4.3: Output Snapshot 33
  • 46. CSED, NIT Agartala Figure 4.4: Output SnapShot 34
  • 47. Chapter 5 Testing While developing this project we faced some discrepancy between the grammar definition and the query classes implementation. In order to have a coherent implementation, we had to correct them. For the testing there are different strategies :- 5.1 Syntax Error Handling: Planning the error handling right from the start can both simplify the structure of a compiler and improve its response to errors. The program can contain errors at many different levels. e.g. • Lexical such as misspelling an identifier, keyword, or operator. • Syntax such as an arithmetic expression with unbalanced parenthesis. • Semantic such as an operator applied to an incompatible operand. 35
  • 48. CSED, NIT Agartala • Logical such as an infinitely recursive call. Much of the error detection and recovery in a compiler is centred on the syntax analysis phase. One reason for this is that many errors are syntactic in nature or are exposed when the stream of tokens coming from the lexical analyser disobeys the grammatical rules defining the programming language. Another is the precision of modern parsing methods; they can detect the presence of syntactic errors in programs very efficiently. The error handler in a parser has simple goals:- • It should the presence of errors clearly and accurately. • It should recover from each error quickly enough to be able to detect subsequent errors. • It should not significantly slow down the processing of correct programs. 5.2 Error-Recovery Strategies : There are many different general strategies that a parser can employ to recover from a syntactic error. • Panic mode • Phrase level • Error production • Global correction 5.2.1 Panic mode: • This is used by most parsing methods. • On discovering an error, the parser discards input symbols one at a time until one of a designated set of synchronizing tokens ( delimiters; such as; semicolon or end ) is found. 36
  • 49. CSED, NIT Agartala • Panic mode correction often skips a considerable amount of input without checking it for additional errors. • It is simple. 5.2.2 Phrase-level recovery: • On discovering an error; the parser may perform local correction on the remaining input; i.e., it may replace a prefix of the remaining input by some string that allows the parser to continue. • Exmple, local correction would be to replace a comma by a semicolon, deleting an extra- neous semicolon, or insert a missing semicolon. • Its major drawback is the difficulty it has in coping with situations in which the actual error has occurred before the point of detection. 5.2.3 Error productions : • If an error production is used by the parser, can generate appropriate error diagnostics to indicate the erroneous construct that has been recognized in the input. 5.2.4 Global correction : • Given an incorrect input string x and grammar G, the algorithm will find a parse tree for a related string y, such that the number of insertions, deletions and changes of tokens required to transform x into y is as small as possible. 37
  • 50. Chapter 6 Conclusion The project is mainly based on Two languages C and C++. In this project we have Used Sanskrit as an input language and English as an output language. Firstly Taking input Sanskrit from Keyboard , Tokenize the sentence using Tokenizer , Identifying the tokens using Token Analyser , Then matching the Tokens from database and fetching the output words and finally Add all the resulting words to produce the output . The main goal of the current study was to parse a Sanskrit sentence so that later on it could be easy to translate it in some other language. The findings from this study make several contributions to the current literature. First that we should use Sanskrit as the primary language for programming purpose . Finally, a number of important limitations need to be considered. First This project is all about Parsing a language into another , it is not a pure translator. Second This project is platform dependent (here platform is Linux) and third It is Database oriented project not just using on- line approach. It is recommended that further research be undertaken in the following areas: • We can make this project more user friendly by using graphical user interface. • We can apply this scheme on many different languages. 38
  • 51. CSED, NIT Agartala The findings of this study have a number of important implications for future practice.This translator is mainly based on fetching of data from database 39
  • 52. Chapter 7 Appendix A Avyaya Analyser 37 Ambiguous 15 C Compiler 6 Code Optimization 9 Code Generation 9 D Drawbacks 4 Data Flow Diagram 20 E Error-Recovery Strategies 35 Error productions 36 G Grammar 11 Grammer Used 30 40
  • 53. CSED, NIT Agartala Global correction 36 I Intermediate Code Generation 8 Input Stages 19 Input Types 20 L Lexical Analysis Phase 7 M Makefile 17 O Objective 3 Output Design 22 P Parsing Methods 9 SS Scope 2 T Testing 34 U Uses Of A Grammar 30 41
  • 54. Chapter 8 Reference To our Project Supervisor Assistant Professor Nikhil Debbarma and our Project Coordina- tor Assistant Professor Suman Deb for sharing valuable knowledge, encouragement showing confidence on us all the time and some link on internet. • Sanskrit & Artificial Intelligence —NASA Knowledge Representation in Sanskrit and Artificial Intelligence by Rick Briggs Roacs, NASA Armes Research Centre, Moffet Field, California • http://www.vedicsciences.net/articles/sanskrit-nasa.html • AI Magazine publishes the importance of Sanskrit • http://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/466 • http://sanskrit.jnu.ac.in/morph/analyze.jsp • http://uttishthabharata.wordpress.com/2011/05/30/sanskrit-programming/ 42