SlideShare ist ein Scribd-Unternehmen logo
1 von 6
Downloaden Sie, um offline zu lesen
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.3, June 2013
DOI : 10.5121/ijnlc.2013.2306 55
NAMED ENTITY RECOGNITION IN NATURAL
LANGUAGES USING TRANSLITERATION
Sudha Morwal, Deepti Chopra and Dr. G.N. Purohit
Department Of Computer Engineering, Banasthali Vidyapith, Jaipur (Raj.), India
sudha_morwal@yahoo.co.in
deeptichopra11@yahoo.co.in
ABSTRACT
Transliteration may be defined as the process of mapping sounds in a text written in one language to
another language. Current paper discusses about transliteration and its use in Named Entity Recognition.
We have designed a code that executes Transliteration and assist in the process of Named Entity
Recognition. We have presented some of the results of Named Entity Recognition (NER) using
Transliteration.
KEYWORDS
NER, Transliteration, Unknown words, Performance Metrics
1. INTRODUCTION
Transliteration is the task of converting word, letter or group of letters written in one language to
another language. In this, we map phonetic sounds of one language into another language. Some
of the important applications of Transliteration include the following: Machine Translation,
Information Retrieval, Named Entity Recognition and Information Extraction. Named Entity
Recognition (NER) is the subtask of Information Extraction in which named entities or proper
nouns are recognized from a given text and thereafter classified into various categories.
Transliteration plays a key role in solving the problem of unknown words in Named Entity
Recognition. If a training of English sentence in NER is performed using example based learning
and in testing if a Hindi sentence is given, then only the Named Entities (words which are not
tagged with OTHER tag) present in the Training sentences are transliterated to Hindi language.
Also, the words which are neither present in training file nor in the transliteration file, ‘OTHER’
tag is assigned to them. In this way unknown words in Named Entity Recognition can be handled.
e. g. Ram/PER plays/OTHER cricket/SPORT
In the above annotated training text in English, ‘PER’ denotes that ‘Ram’ is a Person, ‘plays’
tagged as ‘OTHER’ means that ‘plays’ is not a Named Entity, ‘cricket’ is a name of a Sport, and
so it tagged with ‘SPORT’ tag. According to our approach,
If testing sentence in NER is given as:
राम खेलता है
If we do not use the transliteration approach, then by testing the above mentioned Hindi sentence,
राम and will not be identified as Named Entities. So, we must use the transliteration
approach in order to identify them properly. In training sentence, since Ram and cricket are
Named Entities, so they are transliterated to Hindi language and stored along with their tags in
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.3, June 2013
56
transliteration file. So, now ‘राम’ and ‘ ’ can easily be identified as Named Entities and the
problem of unknown word is solved.
2. RELATED WORK
Named Entity Recognition (NER) is one the most important task in NLP. It is the process in
which proper nouns or Named Entities are detected and then classified into different classes of
Named Entity classes e.g. Name of Person, Sport, River, Country, State, city, Organization etc.
The unknown words in Named Entity Recognition can be handled using many ways.
The words that are detected as unknown or for which no training has been done but they exist in
testing sentences, a specific tag e.g. unk tag can be allotted to them in order to identify them and
solve the problem of unknown words in NER [1][4].
Capitalization information does not alone serve to identify the POS tag, As Capitalization can
exists in unknown words also. Also, Capitalization information can identify Named Entities in
English only. [2][3].
3. OUR APPROACH
In our approach, we have used transliteration approach to identify the unknown words in Named
Entity Recognition. We have taken ‘OTHER’ tag to be a ‘Not a Named Entity’ tag. Figure 1
depicts our approach involved during training in NER. We process each and every token of
training data and detect its tag. If the token is not tagged with ‘OTHER’ tag, then we simply
transliterate the token into different other languages in which testing sentence appear and save the
transliterated Named Entities along with their tags information in a transliterated file. If the token
is attached with ‘OTHER’ tag, then we simply process the next token in a sentence and continue
this procedure until end of the training sentences is not reached.
Figure1 Our approach involved during training in NER
ALGORITHM 1: TRANSLITERATION ON TRAINING DATA
Input: Annotated Text A
Output: Transliteration file F having list of Named Entities in other language
Variables: n = No. of Tokens in A, T = t1 t2 t3…..tn tokens in A
1: if (A ≠ ϕ) then
2: for i=1 to n do
3: if (ti = Named Entity) then
4: Transliterate ti into language in which testing sentences
appear and store transliterated token along with its tag in F
5: end if
6: end for
7: end if
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.3, June 2013
57
If we have mapping of phonetic sounds from one language into another, then we can easily
perform transliteration task in any language. e. g. If ‘Geeta/PER plays/OTHER
Badminton/SPORT’ is a training sentence, then the transliteration task will transliterate Geeta and
Badminton into other languages, since Geeta is a Named Entity attached to ‘PER’ tag and
Badminton is a Named Entity attached to ‘SPORT’ tag. ‘plays’ is not a Named Entity, since it is
tagged by ‘OTHER’ tag, so transliteration is not performed on it. So, our approach is a language
independent based methodology, that is capable of detecting Named Entities from a text written
in any of the Natural languages, provided that it has been trained on a Named Entity Recognition
based system in any of the languages and there lie some sound mappings so that transliterated
Named Entities in other languages can be achieved.
Figure 2 displays the approach taken while performing testing in a NER based system. In this
approach, all the tokens in a testing sentence are considered one at a time. If the present token is
not an unknown entity, then a tag is alloted to it using Viterbi algorithm. If present token is an
unknown entity, then we search the Transliteration file. If the unknown entity exists in
transliteration file, then the Named Entity tag is attached to it, otherwise ‘OTHER’ tag is assigned
to it. This procedure takes place until all the testing sentences are not processed.
Figure 2 Steps taken while performing testing in Named Entity Recognition
ALGORITHM 1: TESTING IN NER
Input: Testing Data D
Output: Transliteration file F having list of Named Entities in other language, Output of
Testing file O
Variables: n = No. of Tokens in D, T = t1 t2 t3…..tn tokens in D
1: if (D ≠ ϕ) then
2: for i=1 to n do
3: if (ti ≠ Unknown Entity) then
4: Attach Tag to ti using Viterbi Algo & store ti and tag in O
5: else
6: for all tokens (Tok) in F do
7: if (ti Tok) then
8: Attach tag to ti & store ti and tag in O
9: else
10 Attach ‘OTHER’ tag & store ti and tag in O
11: end if
12: end for
13: end if
14: end for
15: end if
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.3, June 2013
58
3. RESULT
In NER, handling of unknown words is a very important issue. We have developed a code for
Transliteration. These Named Entities are then transliterated into different languages and stored in
the respective transliterated files of different languages. We have performed Transliteration of
Named Entities in English to Hindi and it generated transliterated Named Entities with accuracy
of 70.6%.
TABLE 1 Named Entities transliterated into different languages
SNO TAGS
1 PERSON(Name of Person)
2 ORG (Name of Organization)
3 CON (Name of Country)
4 MAG (Name of Magazine)
5 WEEK (Name of Week)
6 LOC (Name of Location)
7 PC (Name of Personal Computer)
8 MONTH (Name of Month)
For a given testing sentence, we can initially search if all the tokens exist in training sentences. If
yes, then using Viterbi Algorithm, we can get optimal state sequence. If any word is unknown
that does not exists in training file, then we can search the transliterated file, if the word exists in
transliterated file, then the correct tag can be allotted to it and hence the problem of unknown
word can be solved. If we have performed training in NER in English initially and during testing
if we give a word in Hindi, then this would be treated as an unknown word for which we have not
performed training yet. So, we can generate the transliterated list of Named Entities from English
to other languages and handle the problem of unknown words in NER. We have performed NER
in English. For Training, we took English and Hindi text from NLTK and accuracy obtained with
our approach is 70.6%. TABLE 1 displays the Named Entities on which transliteration is
performed in different languages and then these transliterated Named Entities are stored in a file
along with their tags that can be used further to resolve the problem of unknown entities in a
Named Entity Recognition.
4. PERFORMANCE METRICS
Performance Metrics is measure to estimate the performance of a NER based system.
Performance Metrics can be calculated in terms of 3 parameters: Precision, Accuracy and F-
Measure. The output of a NER based system is termed as “response” and the interpretation of
human as the “answer key” [9]. Consider the following terms:
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.3, June 2013
59
1. Correct-If the response is same as the answer key.
2. Incorrect-If the response is not same as the answer key.
3. Missing-If answer key is found to be tagged but response is not tagged.
4. Spurious-If response is found to be tagged but answer key is not tagged. [6]
Hence, we define Precision, Recall and F-Measure as follows: [5]7][8]
Precision (P): Correct / (Correct + Incorrect + Missing)
Recall (R): Correct / (Correct + Incorrect + Spurious)
F-Measure: (2 * P * R) / (P + R)
5. CONCLUSION
In this paper, we have discussed about Transliteration. We have obtained 70.58% accuracy by
performing Named Entity Recognition in English and handling unknown words using
transliteration approach. We have also discussed about Performance Metrics which is a very
important measure to judge the performance of a Named Entity Recognition based system.
ACKNOWLEDGEMENT
I would like to thank all those who helped me in accomplishing this task.
REFERENCES
[1] http://www.cse.iitb.ac.in/~cs626-460-2012/seminar_ppts/ner.pdf
[2] http://www.sigkdd.org/explorations/issues/7-1-2005-06/4-Fu.pdf
[3] http://acl.ldc.upenn.edu/acl2002/MAIN/pdfs/Main036.pdf
[4] http://staff.um.edu.mt/cabe2/publications/nfHmms.pdf
[5] G.V.S.RAJU, B.SRINIVASU, Dr.S.VISWANADHA RAJU, 4K.S.M.V.KUMAR “Named Entity
Recognition for Telugu Using Maximum Entropy Model”
[6] B. Sasidhar, P. M. Yohan, Dr. A. Vinaya Babu3, Dr. A. Govardhan,.“A Survey on Named Entity
Recognition in Indian Languages with particular reference to Telugu” IJCSI International Journal of
Computer Science Issues, Vol. 8, Issue 2, March 2011.
[7] Asif Ekbal, Rejwanul Haque, Amitava Das, Venkateswarlu Poka and Sivaji Bandyopadhyay
“Language Independent Named Entity Recognition in Indian Languages” .In Proceedings of the
IJCNLP-08 Workshop on NER for South and South East Asian Languages, pages 33–40,Hyderabad,
India, January 2008.Available at: http://www.mt-archive.info/IJCNLP-2008-Ekbal.pdf
[8] Darvinder kaur, Vishal Gupta. “A survey of Named Entity Recognition in English and other Indian
Languages” .IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 6, November
2010.
[9] Shilpi Srivastava, Mukund Sanglikar & D.C Kothari. ”Named Entity Recognition System for Hindi
Language: A Hybrid Approach” International Journal of Computational Linguistics (IJCL), Volume
(2) : Issue (1) : 2011.Available at
http://cscjournals.org/csc/manuscript/Journals/IJCL/volume2/Issue1/IJCL-19.pdf
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.3, June 2013
60
About Authors
Sudha Morwal is an active researcher in the field of Natural Language Processing.
Currently working as Associate Professor in the Department of Computer Science at
Banasthali University (Rajast han), India. She has done M.Tech (Computer Science) ,
NET, M.Sc (Computer Science) and her PhD is in progress from Banasthali University
(Rajasthan), India. She has published many papers in International Conferences and
Journals.
Deepti Chopra received B.Tech degree in Computer Science and Engineering from
Rajasthan College of Engineering for Women, Jaipur, Rajasthan in 2011.Currently she has
done M.Tech in Computer Science and Engineering from Banasthali University, Rajasthan.
Her research interests include Artificial Intelligence, Natural Language Processing, and
Information Retrieval. She has published many papers in International journals and
conferences.
Dr. G. N. Purohit is a Professor in Department of Mathematics & Statistics at Banasthali
University (Rajasthan). Before joining Banasthali University, he was Professor and Head of
the Department of Mathematics, University of Rajasthan, Jaipur. He had been Chief-editor
of a research journal and regular reviewer of many journals. His present interest is in O.R.,
Discrete Mathematics and Communication networks. He has published around 40 research
papers in various journals

Weitere ähnliche Inhalte

Was ist angesagt?

NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERTshaurya uppal
 
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHHANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHijnlc
 
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABIRULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABIijnlc
 
Survey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi LanguageSurvey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi LanguageEditor IJCATR
 
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVMHINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVMijnlc
 
IRJET - Storytelling App for Children with Hearing Impairment using Natur...
IRJET -  	  Storytelling App for Children with Hearing Impairment using Natur...IRJET -  	  Storytelling App for Children with Hearing Impairment using Natur...
IRJET - Storytelling App for Children with Hearing Impairment using Natur...IRJET Journal
 
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...kevig
 
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONkevig
 
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...kevig
 
Tamil Morphological Analysis
Tamil Morphological AnalysisTamil Morphological Analysis
Tamil Morphological AnalysisKarthik Sankar
 
Code Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging haiCode Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging haiIIIT Hyderabad
 
Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...baskaran_md
 
Brill's Rule-based Part of Speech Tagger for Kadazan
Brill's Rule-based Part of Speech Tagger for KadazanBrill's Rule-based Part of Speech Tagger for Kadazan
Brill's Rule-based Part of Speech Tagger for Kadazanidescitation
 
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...ijnlc
 
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...kevig
 

Was ist angesagt? (20)

NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHHANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
 
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABIRULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
 
Ay34306312
Ay34306312Ay34306312
Ay34306312
 
Survey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi LanguageSurvey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi Language
 
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVMHINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
 
IRJET - Storytelling App for Children with Hearing Impairment using Natur...
IRJET -  	  Storytelling App for Children with Hearing Impairment using Natur...IRJET -  	  Storytelling App for Children with Hearing Impairment using Natur...
IRJET - Storytelling App for Children with Hearing Impairment using Natur...
 
D3 dhanalakshmi
D3 dhanalakshmiD3 dhanalakshmi
D3 dhanalakshmi
 
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
 
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
 
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
 
Tamil Morphological Analysis
Tamil Morphological AnalysisTamil Morphological Analysis
Tamil Morphological Analysis
 
Code Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging haiCode Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging hai
 
Pxc3898474
Pxc3898474Pxc3898474
Pxc3898474
 
Ceis 8
Ceis 8Ceis 8
Ceis 8
 
Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...
 
Brill's Rule-based Part of Speech Tagger for Kadazan
Brill's Rule-based Part of Speech Tagger for KadazanBrill's Rule-based Part of Speech Tagger for Kadazan
Brill's Rule-based Part of Speech Tagger for Kadazan
 
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
 
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
 
Ijetcas14 575
Ijetcas14 575Ijetcas14 575
Ijetcas14 575
 

Andere mochten auch

AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYijnlc
 
Robust extended tokenization framework for romanian by semantic parallel text...
Robust extended tokenization framework for romanian by semantic parallel text...Robust extended tokenization framework for romanian by semantic parallel text...
Robust extended tokenization framework for romanian by semantic parallel text...ijnlc
 
Transliteration by orthography or phonology for hindi and marathi to english ...
Transliteration by orthography or phonology for hindi and marathi to english ...Transliteration by orthography or phonology for hindi and marathi to english ...
Transliteration by orthography or phonology for hindi and marathi to english ...ijnlc
 
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank AlgorithmEnhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithmijnlc
 
Survey of machine translation systems in india
Survey of machine translation systems in indiaSurvey of machine translation systems in india
Survey of machine translation systems in indiaijnlc
 
Improving performance of english hindi cross language information retrieval u...
Improving performance of english hindi cross language information retrieval u...Improving performance of english hindi cross language information retrieval u...
Improving performance of english hindi cross language information retrieval u...ijnlc
 
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSEFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSijnlc
 
1 -corinne_berinstein
1  -corinne_berinstein1  -corinne_berinstein
1 -corinne_berinsteinAahil Malik
 
Assessing of children article
Assessing of children articleAssessing of children article
Assessing of children articleAahil Malik
 
قانون النشر الالكتروني والجرائم المعلوماتية
قانون النشر الالكتروني والجرائم المعلوماتيةقانون النشر الالكتروني والجرائم المعلوماتية
قانون النشر الالكتروني والجرائم المعلوماتيةAsiri682
 
Ephesians 1 The Blessings of the Redeemed
Ephesians 1   The Blessings of the RedeemedEphesians 1   The Blessings of the Redeemed
Ephesians 1 The Blessings of the RedeemedDonnie Manuel
 
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES ijnlc
 
50 soal latihan grammar untuk mahasiswa semester awal.
50 soal latihan grammar untuk mahasiswa semester awal.50 soal latihan grammar untuk mahasiswa semester awal.
50 soal latihan grammar untuk mahasiswa semester awal.Yanthika January
 
Verb based manipuri sentiment analysis
Verb based manipuri sentiment analysisVerb based manipuri sentiment analysis
Verb based manipuri sentiment analysisijnlc
 
Building a vietnamese dialog mechanism for v dlg~tabl system
Building a vietnamese dialog mechanism for v dlg~tabl systemBuilding a vietnamese dialog mechanism for v dlg~tabl system
Building a vietnamese dialog mechanism for v dlg~tabl systemijnlc
 
doc
docdoc
docP S
 

Andere mochten auch (19)

AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITY
 
Robust extended tokenization framework for romanian by semantic parallel text...
Robust extended tokenization framework for romanian by semantic parallel text...Robust extended tokenization framework for romanian by semantic parallel text...
Robust extended tokenization framework for romanian by semantic parallel text...
 
Transliteration by orthography or phonology for hindi and marathi to english ...
Transliteration by orthography or phonology for hindi and marathi to english ...Transliteration by orthography or phonology for hindi and marathi to english ...
Transliteration by orthography or phonology for hindi and marathi to english ...
 
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank AlgorithmEnhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
 
Survey of machine translation systems in india
Survey of machine translation systems in indiaSurvey of machine translation systems in india
Survey of machine translation systems in india
 
Improving performance of english hindi cross language information retrieval u...
Improving performance of english hindi cross language information retrieval u...Improving performance of english hindi cross language information retrieval u...
Improving performance of english hindi cross language information retrieval u...
 
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSEFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
 
1 -corinne_berinstein
1  -corinne_berinstein1  -corinne_berinstein
1 -corinne_berinstein
 
Assessing of children article
Assessing of children articleAssessing of children article
Assessing of children article
 
قانون النشر الالكتروني والجرائم المعلوماتية
قانون النشر الالكتروني والجرائم المعلوماتيةقانون النشر الالكتروني والجرائم المعلوماتية
قانون النشر الالكتروني والجرائم المعلوماتية
 
Ephesians 1 The Blessings of the Redeemed
Ephesians 1   The Blessings of the RedeemedEphesians 1   The Blessings of the Redeemed
Ephesians 1 The Blessings of the Redeemed
 
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
 
50 soal latihan grammar untuk mahasiswa semester awal.
50 soal latihan grammar untuk mahasiswa semester awal.50 soal latihan grammar untuk mahasiswa semester awal.
50 soal latihan grammar untuk mahasiswa semester awal.
 
Emmelfinale72
Emmelfinale72Emmelfinale72
Emmelfinale72
 
Emmel Cremonesi
Emmel CremonesiEmmel Cremonesi
Emmel Cremonesi
 
Verb based manipuri sentiment analysis
Verb based manipuri sentiment analysisVerb based manipuri sentiment analysis
Verb based manipuri sentiment analysis
 
Building a vietnamese dialog mechanism for v dlg~tabl system
Building a vietnamese dialog mechanism for v dlg~tabl systemBuilding a vietnamese dialog mechanism for v dlg~tabl system
Building a vietnamese dialog mechanism for v dlg~tabl system
 
doc
docdoc
doc
 
ppt ic
ppt icppt ic
ppt ic
 

Ähnlich wie Ijnlc020306NAMED ENTITY RECOGNITION IN NATURAL LANGUAGES USING TRANSLITERATION

Identification and Classification of Named Entities in Indian Languages
Identification and Classification of Named Entities in Indian LanguagesIdentification and Classification of Named Entities in Indian Languages
Identification and Classification of Named Entities in Indian Languageskevig
 
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Model
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov ModelNERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Model
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Modelkevig
 
Natural Language Processing using Text Mining
Natural Language Processing using Text MiningNatural Language Processing using Text Mining
Natural Language Processing using Text MiningSushanti Acharya
 
Named Entity Recognition for Telugu Using Conditional Random Field
Named Entity Recognition for Telugu Using Conditional Random FieldNamed Entity Recognition for Telugu Using Conditional Random Field
Named Entity Recognition for Telugu Using Conditional Random FieldWaqas Tariq
 
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...kevig
 
Pycon India 2018 Natural Language Processing Workshop
Pycon India 2018   Natural Language Processing WorkshopPycon India 2018   Natural Language Processing Workshop
Pycon India 2018 Natural Language Processing WorkshopLakshya Sivaramakrishnan
 
Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...IJECEIAES
 
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEM
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEMIMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEM
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEMcscpconf
 
Rule Based Transliteration Scheme for English to Punjabi
Rule Based Transliteration Scheme for English to PunjabiRule Based Transliteration Scheme for English to Punjabi
Rule Based Transliteration Scheme for English to Punjabikevig
 
HIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOL
HIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOLHIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOL
HIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOLijfcstjournal
 
SETSWANA PART OF SPEECH TAGGING
SETSWANA PART OF SPEECH TAGGINGSETSWANA PART OF SPEECH TAGGING
SETSWANA PART OF SPEECH TAGGINGkevig
 
Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...
Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...
Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...Waqas Tariq
 
A Review on a web based Punjabi to English Machine Transliteration System
A Review on a web based Punjabi to English Machine Transliteration SystemA Review on a web based Punjabi to English Machine Transliteration System
A Review on a web based Punjabi to English Machine Transliteration SystemEditor IJCATR
 
Spell checker for Kannada OCR
Spell checker for Kannada OCRSpell checker for Kannada OCR
Spell checker for Kannada OCRdbpublications
 
Untitled presentation.pdf
Untitled presentation.pdfUntitled presentation.pdf
Untitled presentation.pdfUpinder Kaur
 
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this pptAI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this pptpavankalyanadroittec
 
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET Journal
 
Natural Language Processing: A comprehensive overview
Natural Language Processing: A comprehensive overviewNatural Language Processing: A comprehensive overview
Natural Language Processing: A comprehensive overviewBenjaminlapid1
 

Ähnlich wie Ijnlc020306NAMED ENTITY RECOGNITION IN NATURAL LANGUAGES USING TRANSLITERATION (20)

Identification and Classification of Named Entities in Indian Languages
Identification and Classification of Named Entities in Indian LanguagesIdentification and Classification of Named Entities in Indian Languages
Identification and Classification of Named Entities in Indian Languages
 
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Model
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov ModelNERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Model
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Model
 
Natural Language Processing using Text Mining
Natural Language Processing using Text MiningNatural Language Processing using Text Mining
Natural Language Processing using Text Mining
 
Named Entity Recognition for Telugu Using Conditional Random Field
Named Entity Recognition for Telugu Using Conditional Random FieldNamed Entity Recognition for Telugu Using Conditional Random Field
Named Entity Recognition for Telugu Using Conditional Random Field
 
AI_08_NLP.pptx
AI_08_NLP.pptxAI_08_NLP.pptx
AI_08_NLP.pptx
 
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
 
Pycon India 2018 Natural Language Processing Workshop
Pycon India 2018   Natural Language Processing WorkshopPycon India 2018   Natural Language Processing Workshop
Pycon India 2018 Natural Language Processing Workshop
 
Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...
 
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEM
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEMIMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEM
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEM
 
Rule Based Transliteration Scheme for English to Punjabi
Rule Based Transliteration Scheme for English to PunjabiRule Based Transliteration Scheme for English to Punjabi
Rule Based Transliteration Scheme for English to Punjabi
 
HIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOL
HIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOLHIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOL
HIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOL
 
SETSWANA PART OF SPEECH TAGGING
SETSWANA PART OF SPEECH TAGGINGSETSWANA PART OF SPEECH TAGGING
SETSWANA PART OF SPEECH TAGGING
 
Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...
Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...
Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...
 
A Review on a web based Punjabi to English Machine Transliteration System
A Review on a web based Punjabi to English Machine Transliteration SystemA Review on a web based Punjabi to English Machine Transliteration System
A Review on a web based Punjabi to English Machine Transliteration System
 
Spell checker for Kannada OCR
Spell checker for Kannada OCRSpell checker for Kannada OCR
Spell checker for Kannada OCR
 
Untitled presentation.pdf
Untitled presentation.pdfUntitled presentation.pdf
Untitled presentation.pdf
 
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this pptAI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
 
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
 
Top 10 Must-Know NLP Techniques for Data Scientists
Top 10 Must-Know NLP Techniques for Data ScientistsTop 10 Must-Know NLP Techniques for Data Scientists
Top 10 Must-Know NLP Techniques for Data Scientists
 
Natural Language Processing: A comprehensive overview
Natural Language Processing: A comprehensive overviewNatural Language Processing: A comprehensive overview
Natural Language Processing: A comprehensive overview
 

Kürzlich hochgeladen

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Kürzlich hochgeladen (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Ijnlc020306NAMED ENTITY RECOGNITION IN NATURAL LANGUAGES USING TRANSLITERATION

  • 1. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.3, June 2013 DOI : 10.5121/ijnlc.2013.2306 55 NAMED ENTITY RECOGNITION IN NATURAL LANGUAGES USING TRANSLITERATION Sudha Morwal, Deepti Chopra and Dr. G.N. Purohit Department Of Computer Engineering, Banasthali Vidyapith, Jaipur (Raj.), India sudha_morwal@yahoo.co.in deeptichopra11@yahoo.co.in ABSTRACT Transliteration may be defined as the process of mapping sounds in a text written in one language to another language. Current paper discusses about transliteration and its use in Named Entity Recognition. We have designed a code that executes Transliteration and assist in the process of Named Entity Recognition. We have presented some of the results of Named Entity Recognition (NER) using Transliteration. KEYWORDS NER, Transliteration, Unknown words, Performance Metrics 1. INTRODUCTION Transliteration is the task of converting word, letter or group of letters written in one language to another language. In this, we map phonetic sounds of one language into another language. Some of the important applications of Transliteration include the following: Machine Translation, Information Retrieval, Named Entity Recognition and Information Extraction. Named Entity Recognition (NER) is the subtask of Information Extraction in which named entities or proper nouns are recognized from a given text and thereafter classified into various categories. Transliteration plays a key role in solving the problem of unknown words in Named Entity Recognition. If a training of English sentence in NER is performed using example based learning and in testing if a Hindi sentence is given, then only the Named Entities (words which are not tagged with OTHER tag) present in the Training sentences are transliterated to Hindi language. Also, the words which are neither present in training file nor in the transliteration file, ‘OTHER’ tag is assigned to them. In this way unknown words in Named Entity Recognition can be handled. e. g. Ram/PER plays/OTHER cricket/SPORT In the above annotated training text in English, ‘PER’ denotes that ‘Ram’ is a Person, ‘plays’ tagged as ‘OTHER’ means that ‘plays’ is not a Named Entity, ‘cricket’ is a name of a Sport, and so it tagged with ‘SPORT’ tag. According to our approach, If testing sentence in NER is given as: राम खेलता है If we do not use the transliteration approach, then by testing the above mentioned Hindi sentence, राम and will not be identified as Named Entities. So, we must use the transliteration approach in order to identify them properly. In training sentence, since Ram and cricket are Named Entities, so they are transliterated to Hindi language and stored along with their tags in
  • 2. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.3, June 2013 56 transliteration file. So, now ‘राम’ and ‘ ’ can easily be identified as Named Entities and the problem of unknown word is solved. 2. RELATED WORK Named Entity Recognition (NER) is one the most important task in NLP. It is the process in which proper nouns or Named Entities are detected and then classified into different classes of Named Entity classes e.g. Name of Person, Sport, River, Country, State, city, Organization etc. The unknown words in Named Entity Recognition can be handled using many ways. The words that are detected as unknown or for which no training has been done but they exist in testing sentences, a specific tag e.g. unk tag can be allotted to them in order to identify them and solve the problem of unknown words in NER [1][4]. Capitalization information does not alone serve to identify the POS tag, As Capitalization can exists in unknown words also. Also, Capitalization information can identify Named Entities in English only. [2][3]. 3. OUR APPROACH In our approach, we have used transliteration approach to identify the unknown words in Named Entity Recognition. We have taken ‘OTHER’ tag to be a ‘Not a Named Entity’ tag. Figure 1 depicts our approach involved during training in NER. We process each and every token of training data and detect its tag. If the token is not tagged with ‘OTHER’ tag, then we simply transliterate the token into different other languages in which testing sentence appear and save the transliterated Named Entities along with their tags information in a transliterated file. If the token is attached with ‘OTHER’ tag, then we simply process the next token in a sentence and continue this procedure until end of the training sentences is not reached. Figure1 Our approach involved during training in NER ALGORITHM 1: TRANSLITERATION ON TRAINING DATA Input: Annotated Text A Output: Transliteration file F having list of Named Entities in other language Variables: n = No. of Tokens in A, T = t1 t2 t3…..tn tokens in A 1: if (A ≠ ϕ) then 2: for i=1 to n do 3: if (ti = Named Entity) then 4: Transliterate ti into language in which testing sentences appear and store transliterated token along with its tag in F 5: end if 6: end for 7: end if
  • 3. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.3, June 2013 57 If we have mapping of phonetic sounds from one language into another, then we can easily perform transliteration task in any language. e. g. If ‘Geeta/PER plays/OTHER Badminton/SPORT’ is a training sentence, then the transliteration task will transliterate Geeta and Badminton into other languages, since Geeta is a Named Entity attached to ‘PER’ tag and Badminton is a Named Entity attached to ‘SPORT’ tag. ‘plays’ is not a Named Entity, since it is tagged by ‘OTHER’ tag, so transliteration is not performed on it. So, our approach is a language independent based methodology, that is capable of detecting Named Entities from a text written in any of the Natural languages, provided that it has been trained on a Named Entity Recognition based system in any of the languages and there lie some sound mappings so that transliterated Named Entities in other languages can be achieved. Figure 2 displays the approach taken while performing testing in a NER based system. In this approach, all the tokens in a testing sentence are considered one at a time. If the present token is not an unknown entity, then a tag is alloted to it using Viterbi algorithm. If present token is an unknown entity, then we search the Transliteration file. If the unknown entity exists in transliteration file, then the Named Entity tag is attached to it, otherwise ‘OTHER’ tag is assigned to it. This procedure takes place until all the testing sentences are not processed. Figure 2 Steps taken while performing testing in Named Entity Recognition ALGORITHM 1: TESTING IN NER Input: Testing Data D Output: Transliteration file F having list of Named Entities in other language, Output of Testing file O Variables: n = No. of Tokens in D, T = t1 t2 t3…..tn tokens in D 1: if (D ≠ ϕ) then 2: for i=1 to n do 3: if (ti ≠ Unknown Entity) then 4: Attach Tag to ti using Viterbi Algo & store ti and tag in O 5: else 6: for all tokens (Tok) in F do 7: if (ti Tok) then 8: Attach tag to ti & store ti and tag in O 9: else 10 Attach ‘OTHER’ tag & store ti and tag in O 11: end if 12: end for 13: end if 14: end for 15: end if
  • 4. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.3, June 2013 58 3. RESULT In NER, handling of unknown words is a very important issue. We have developed a code for Transliteration. These Named Entities are then transliterated into different languages and stored in the respective transliterated files of different languages. We have performed Transliteration of Named Entities in English to Hindi and it generated transliterated Named Entities with accuracy of 70.6%. TABLE 1 Named Entities transliterated into different languages SNO TAGS 1 PERSON(Name of Person) 2 ORG (Name of Organization) 3 CON (Name of Country) 4 MAG (Name of Magazine) 5 WEEK (Name of Week) 6 LOC (Name of Location) 7 PC (Name of Personal Computer) 8 MONTH (Name of Month) For a given testing sentence, we can initially search if all the tokens exist in training sentences. If yes, then using Viterbi Algorithm, we can get optimal state sequence. If any word is unknown that does not exists in training file, then we can search the transliterated file, if the word exists in transliterated file, then the correct tag can be allotted to it and hence the problem of unknown word can be solved. If we have performed training in NER in English initially and during testing if we give a word in Hindi, then this would be treated as an unknown word for which we have not performed training yet. So, we can generate the transliterated list of Named Entities from English to other languages and handle the problem of unknown words in NER. We have performed NER in English. For Training, we took English and Hindi text from NLTK and accuracy obtained with our approach is 70.6%. TABLE 1 displays the Named Entities on which transliteration is performed in different languages and then these transliterated Named Entities are stored in a file along with their tags that can be used further to resolve the problem of unknown entities in a Named Entity Recognition. 4. PERFORMANCE METRICS Performance Metrics is measure to estimate the performance of a NER based system. Performance Metrics can be calculated in terms of 3 parameters: Precision, Accuracy and F- Measure. The output of a NER based system is termed as “response” and the interpretation of human as the “answer key” [9]. Consider the following terms:
  • 5. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.3, June 2013 59 1. Correct-If the response is same as the answer key. 2. Incorrect-If the response is not same as the answer key. 3. Missing-If answer key is found to be tagged but response is not tagged. 4. Spurious-If response is found to be tagged but answer key is not tagged. [6] Hence, we define Precision, Recall and F-Measure as follows: [5]7][8] Precision (P): Correct / (Correct + Incorrect + Missing) Recall (R): Correct / (Correct + Incorrect + Spurious) F-Measure: (2 * P * R) / (P + R) 5. CONCLUSION In this paper, we have discussed about Transliteration. We have obtained 70.58% accuracy by performing Named Entity Recognition in English and handling unknown words using transliteration approach. We have also discussed about Performance Metrics which is a very important measure to judge the performance of a Named Entity Recognition based system. ACKNOWLEDGEMENT I would like to thank all those who helped me in accomplishing this task. REFERENCES [1] http://www.cse.iitb.ac.in/~cs626-460-2012/seminar_ppts/ner.pdf [2] http://www.sigkdd.org/explorations/issues/7-1-2005-06/4-Fu.pdf [3] http://acl.ldc.upenn.edu/acl2002/MAIN/pdfs/Main036.pdf [4] http://staff.um.edu.mt/cabe2/publications/nfHmms.pdf [5] G.V.S.RAJU, B.SRINIVASU, Dr.S.VISWANADHA RAJU, 4K.S.M.V.KUMAR “Named Entity Recognition for Telugu Using Maximum Entropy Model” [6] B. Sasidhar, P. M. Yohan, Dr. A. Vinaya Babu3, Dr. A. Govardhan,.“A Survey on Named Entity Recognition in Indian Languages with particular reference to Telugu” IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 2, March 2011. [7] Asif Ekbal, Rejwanul Haque, Amitava Das, Venkateswarlu Poka and Sivaji Bandyopadhyay “Language Independent Named Entity Recognition in Indian Languages” .In Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, pages 33–40,Hyderabad, India, January 2008.Available at: http://www.mt-archive.info/IJCNLP-2008-Ekbal.pdf [8] Darvinder kaur, Vishal Gupta. “A survey of Named Entity Recognition in English and other Indian Languages” .IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 6, November 2010. [9] Shilpi Srivastava, Mukund Sanglikar & D.C Kothari. ”Named Entity Recognition System for Hindi Language: A Hybrid Approach” International Journal of Computational Linguistics (IJCL), Volume (2) : Issue (1) : 2011.Available at http://cscjournals.org/csc/manuscript/Journals/IJCL/volume2/Issue1/IJCL-19.pdf
  • 6. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.3, June 2013 60 About Authors Sudha Morwal is an active researcher in the field of Natural Language Processing. Currently working as Associate Professor in the Department of Computer Science at Banasthali University (Rajast han), India. She has done M.Tech (Computer Science) , NET, M.Sc (Computer Science) and her PhD is in progress from Banasthali University (Rajasthan), India. She has published many papers in International Conferences and Journals. Deepti Chopra received B.Tech degree in Computer Science and Engineering from Rajasthan College of Engineering for Women, Jaipur, Rajasthan in 2011.Currently she has done M.Tech in Computer Science and Engineering from Banasthali University, Rajasthan. Her research interests include Artificial Intelligence, Natural Language Processing, and Information Retrieval. She has published many papers in International journals and conferences. Dr. G. N. Purohit is a Professor in Department of Mathematics & Statistics at Banasthali University (Rajasthan). Before joining Banasthali University, he was Professor and Head of the Department of Mathematics, University of Rajasthan, Jaipur. He had been Chief-editor of a research journal and regular reviewer of many journals. His present interest is in O.R., Discrete Mathematics and Communication networks. He has published around 40 research papers in various journals