SlideShare ist ein Scribd-Unternehmen logo
1 von 9
Downloaden Sie, um offline zu lesen
Script to Sentiment : on future of Language
                     Technology
                                 Jaganadh G
                            jaganadhg@gmail.com

            Different Diementions of Language Technology
                      Central Institute of Hindi
                               Mysore
                          Feb. 25-26 2010


                                    Abstract
         Human Language Technology(HLT) is no longer confined as a sub-
     ject for class room teaching. Revolutionary developments are occur-
     ring in the field of HLT. These developments are capable enough to
     bring changes in the human life. Information Communication Tech-
     nology(ICT) became and inevitable component for our day to day life.
     Directly or indirectly we are consumers of ICT based products. For
     the last few years we saw that the ICT revolution is appearing in our
     native languages too. As a result HLT became a direct or indirect com-
     ponent in ICT products and services. HLT is supposed to premate all
     ares of our life in future. Whether you are a Doctor, Engineer, Framer
     or a lover irrespective of your profile we are all going to be addictive of
     HLT based ICT products. The present paper discusses developments
     in the field of HLT and the future.


1   Introduction
Human Language Technology(HLT) is no longer confined as a subject for
class room teaching. Revolutionary developments are occurring in the field
of HLT. These developments are capable enough to bring changes in the
human life. Information Communication Technology(ICT) became and in-
evitable component for our day to day life. Directly or indirectly we are
consumers of ICT based products. For the last few years we saw that the
ICT revolution is appearing in our native languages too. As a result HLT
became a direct or indirect component in ICT products and services. HLT
is supposed to premate all ares of our life in future. Whether you are a
Doctor, Engineer, Framer or a lover irrespective of your profile we are all
going to be addictive of HLT based ICT products.


                                         1
The history of HLT begins from the birth of Personal Computers(PC).
From the early 1950’s Researchers and Scientists were trying to develop
computers programs that can handle human languages as like a human.
The earliest Research and Development(R&D) in this field was related to
the development of Machine Translation Systems(MT). As of now we can
say that significant developments were occurred in the field and working
systems are available. Some are ready accepted, some are imperfect but no
alternatives. So still we are not in a state to say that ’Yes! we cracked the
language challenge! and now able to provide smart engineering solutions’.
Path breaking R&D activities are happening this field. In this scenario it is
quiet interesting to investigate where we are standing in the field of Human
Language Technology.
    The present paper is a compilation on the developments in the field of
HLT. The paper also discuss some of future technologies in HLT. Recent
developments in Indian Language Technology is also discussed in the paper
with special fous on issues involved in it.


2       Where we are now?
R&D activities in HLT can be broadly classified in two major categories.
1) Text processing 2) Speech Processing. Activities under text process-
ing involves development of spell chcekr systems to discourese analysis sys-
tems. Speech processing involves text to speech conversion(TTS) to speech
to speech translation.For all most all tasks in both fields; Free and Open
Source (FOSS)1 and propitiatory solutions are available. Internet based so-
lutions are also there like; Google Translate and other services2 . The FOSS
based solutions as well as public domain solution in this field played a vital
role in rapid developments in HLT including Indian Languages. This section
is a brief survey on present status of the R&D activities in the field.

2.1     Language and Scripts in Computers
In early days of HLT representing the vernaculars in the computers was a
challenge. ASCII 3 was the early character encoding scheme4 existed in the
early days. The encoding scheme was used to represent English alphabets.
This encoding scheme was not sufficient enough to represent the other lan-
guages. Some work around were done for attaining the same. Most of these
    1
    http://en.wikipedia.org/wiki/Free and open source software
    2
    http://translate.google.com/#
www.google.com/transliterate/
www.google.com/dictionary etc..
  3
    http://en.wikipedia.org/wiki/ASCII - Accessed on 01-01-2010
  4
    http://en.wikipedia.org/wiki/Character encoding - Accessed on 01-01-2010




                                         2
workarounds were purely font5 based solutions. In India we developed such
a solution called ISCII 6 for representing Indian Languages. The introduc-
tion of Unicode 7 is a remarkable development in this field. Unicode made
the task of representing vernaculars in computers very easy and it became
de facto standard. Apparently suitable font8 technology also developed.
    The incarnation of Unicode standard boosted the penetration of local
language contents in internet. All the living languages which received en-
coding space in Unicode got opportunity to dominate in the Information
technology(IT) world. It leaded to information overflow. As result there is
an increasing demand for information processing tools like keyboard drivers
to search engines to decision support systems9 .

2.2    Developments in Text Processing
This section is a brief survey on the developments in text processing tech-
nologies. A wide variety and number of text processing systems are avail-
able now; like spell checkers, grammar correcting systems, MT systems and
search engines etc.. People who are using computer for preparing the docu-
ments etc.. are familiar with tools like spell checking systems. They knows
that life is not easy without such tools. Because human being is tend to
commit errors and lazy too! But when the computers were placed in the
desk of hard core language people like translators they were interested in
electronic dictionaries as well as machine translation. When computers were
came in to the life of business people they are having different intentions.
But who ever may be and what ever may be the profile of the computer users
category there demands were directly or indirectly related to HLT. Because
everybody’s uses language, and they can’t live with out language. The im-
pact of such demands caused to rise of new methodologies and technologies
in HLT itself. Those developments are discussed here.

Spellcheckers
In computing, a spell checker (spell check) is an application program that
flags words in a document that may not be spelled correctly10 . The very
technology is very-much advanced now. Spell checkers are available for
all most all languages in the world. Most of the popular word processing
software having the feature. Spell checker systems are available for Indian
  5
     http://en.wikipedia.org/wiki/Font - Accessed on 01-01-2010
  6
     http://en.wikipedia.org/wiki/Indian Script Code for Information Interchange - Ac-
cessed on 01-01-2010
   7
     http://unicode.org/
http://en.wikipedia.org/wiki/Unicode - Accessed on 01-01-2010
   8
     http://en.wikipedia.org/wiki/Font - Accessed on 01-02-10
   9
     http://en.wikipedia.org/wiki/Decision support system Accessed on 01-02-10
  10
     http://en.wikipedia.org/wiki/Spell checker



                                          3
Languages too. The language software collection cd’s distributed by the
TDIL11 program contains spell checker applications for almost all Indian
Languages.
   The FOSS movement in India is very active in spell checker dictionary
development for Indian Languages12 . The FOSS frameworks13 available for
spell checker systems are being widely used by these FOSS peoples. Develop-
ments in Indian Language Spell checker dictionaries needs more volunteers.

Machine Translation
MT is one of the oldest and live task in HLT. For the last 50 and more years
R&D activities in the very field is in progress. Some systems are available
for use too. But majority are not in a state to consider as a perfect solution.
Divergent methodologies are available for the task of MT like statistical,rule-
based and hybrid etc.14 . But fully automated high quality MT remains as
a target to be achieved. Among the available MT systems/services the
Google Translate and Babel Fish15 is most famous. Google Translate have
the facility of English to Hindi and vice verse translation.
    MT research in IL is very active from early 1970’s. AnglaBharati16 and
Anusaaraka17 are two major approaches developed in the early days and still
in active development. Other systems like Sampark18 , UNL based machine
translations systems19 are also available. The TDIL program of Govt. of
India is providing extensive support to MT research in India.Except the
above mentioned systems, some other IL MT initiatives are there.
    Some FOSS based solutions are also available for MT system develop-
ment. There are two famous frameworks called Moses20 and Apertium21 .
These tools follows the statistical paradigm of MT. MT researchers in India
is also came forward to work in these two frameworks. Hope that this will
boost the MT research in India too.
  11
     www.tdil.mit.gov.in
  12
     http://indlinux.org/
http://smc.org.in/
http://wiki.services.openoffice.org/wiki/Dictionaries
  13
     http://hunspell.sourceforge.net
http://en.wikipedia.org/wiki/MySpell
  14
     http://www.hutchinsweb.me.uk/IntroMT-TOC.htm
  15
     http://babelfish.yahoo.com/
  16
     http://www.cse.iitk.ac.in/users/langtech/anglabharti.htm
  17
     http://ltrc.iiit.ac.in/˜nusaaraka/
                            a
  18
     http://sampark.iiit.ac.in/
  19
     http://www.springerlink.com/content/t1005w166746727l/fulltext.pdf
  20
     www.statmt.org/moses
  21
     www.apertium.org




                                         4
Search and IR
Search Engines(SE) and Information Retrieval(IR) systems are the most
widely used HLT tool by the general public. Google22 , Yahoo23 and Bing24
are the three most famous search engine giants in the world. Revolutionary
developments are occurring in this field. Domain based searches like ’patent
search’, content based search like ’video search’, localized search like ’movie
timing’ and cross lingual search are the recent trends in this field. The latest
development in the field is Semantic Search which will be discussed in the
later section of the paper. All the search search engines are now capable
enough to handle local language search requests too. Cross Lingual Infor
Systems(CLIR) for Indian Languages are in development.


3        Speech Processing
This section is meant for to give a brief survey on the developments in
Speech Processing. The main technologies discussed in this section are Text
to Speech(TTS) system and Automatic Speech Recognition(ASR).

TTS
Text to Speech system or TTS is a software which can convert an electronic
text to corresponding speech. The very field involves both text processing
as well as signal processing techniques. R&D activities in this direction pro-
duced hopeful and acceptable solutions. FOSS based as well as proprietary
solutions are available now. The major FOSS based framework available for
TTS system development is Festival25 and Festvox26 system. Introduction
of both framework boosted the development of TTS in various languages in-
cluding Indian Languages too. The most remarkable development in Indian
Language TTS system under FOSS is the Dhvani project27 .
    Even-though we are in a state to say that we achieved significant growth
in the field of TTS development more challenges are there. Those challenges
includes providing more naturalness to the synthesized voice, intonation and
emotion based TTS etc..

ASR
Automatic Speech Recognition(ASR) is technology that allows a computer
program to identify and transcribe the word that a person speaks in to
    22
       http://www.google.co.in/
    23
       www.yahoo.co.in
    24
       http://www.bing.com/
    25
       http://www.cstr.ed.ac.uk/projects/festival/
    26
       http://festvox.org/festival/
    27
       http://dhvani.sourceforge.net/


                                             5
a microphone. As like TTS, ASR also involves both text processing and
signal processing techniques. It is one of the most challenging and inter-
esting tasks in HLT. Significant developments are in this field too. ASR
systems are available for some Indian Languages like Hindi28 and Telugu.
The most widely used FOSS based framework for ASR development is CMU
Sphinx29 . The introduction of CMU Sphinx opened a new direction in the
R&D of ASR. Apart from CMU Sphinx some other FOSS based as well as
propitiatory frameworks are available for ASR development.


4         Future of HLT
Over the past few decades colossal progress has been came up in the field
of HLT. From simple systems that can understand numbers to text un-
derstanding and summarization systems were developed with in the past
few decades. So many challenges are there to be addressed in the future.
Hopefully we can build complex systems from the existing HLT systems.
These developments are the results of a long journey from lab experiments
to deployment in real time work environments. The wide range of tools and
technologies developed as part of R&D in HLT is capable enough to make
deep impact in the human life. These tools are having great relevance and
impact in market oriented society.
    What will be the future? Can we imagine it? Yes! Imagine that you
are asking your car to show the route to Central Institute of Hindi from
Mysore bus stand, and it is telling the directions or giving a detailed printout
describing the route. In-fact it is not a dream technology.It is possible with
clubbing of other technologies like GPS(Global Positioning System) and
Speech Processing. Suppose that a judge is analyzing the arguments related
to a case with a software and reaching in judgment. Or consider a legislative
assembly publishes some draft bills in its website and receives comments on
the bill.After receiving the comment and before proceeding to further actions
they are analyzing it ti find how many of them are positive comments and
how many of them are negative!! It is already possible. The technology
which analyzes the opinion is called ’Sentiment Analysis’. There is no end
for imaginations. But these imaginations will come in to reality very soon.
This section highlights some of the future technologies are R&D ares in HLT.

Semantic Web/Search
Semantics is a branch of modern linguistics which studies about the struc-
ture of meaning. The Semantic Web(SeW) is an evolving development of the
World Wide Web in which the meaning (semantics) of information and ser-
    28
         http://sourceforge.net/projects/hindiasr/
    29
         http://www.speech.cs.cmu.edu/sphinx/



                                                6
vices on the web is defined, making it possible for the web to ”understand”
and satisfy the requests of people and machines to use the web content30 .
Tim Berners Lee the father of www31 is the inventor of this technology. W3C
or the World Wide Web consortium is the authority in publishing and main-
taining standards and recommendation on SeW. The semantic web based
HLT implementations are going to bring a big revolution in the coming
years. Semantic Search is one of such technologies which HLT people are
discussing now a days. SeW search engines are already there32 , but not that
much accepted as of now. It will bring revolutionary changes in the field of
online publishing, e-governance, and e-commerce etc...

Sentiment Analysis
Sentiment analysis or opinion mining refers to a broad (definitionally chal-
lenged) area of natural language processing, computational linguistics and
text mining33 . The basic task in sentiment analysis is classifying the polarity
of a given text at the document, sentence, or feature/aspect level — whether
the expressed opinion in a document, a sentence or an entity feature/aspect
is positive, negative or neutral34 . The rise social media like blog, twitter,
facebook, and linkedin etc.. has fueled great interest in the field of Sen-
timent Analysis. Publishers, movie companies and fast moving consumer
goods(FMCG) companies are the main consumers of this technology. The
technology is already present in the market. Very soon the technology will
be getting its own position in politics governance etc..

Future of MT
In previous section we discussed the developments in MT research. Re-
markable achievements were made in this direction. But still we have to
issue many issues to achieve the goal Fully Automated High Quality Ma-
chine Aided Translation (FAHQMAT). Other expectation is to build effi-
cient speech to speech translation systems. I think with in a few years our
researchers will be providing revolutionary solutions in this field.

HLT in Education
Computer Assisted Teaching(CAT) is already in practice through out the
globe. It is considered as one of the best way to for effective and interactive
 30
      Berners-Lee, Tim; James Hendler and Ora Lassila (May 17, 2001). ”The Seman-
tic Web”. Scientific American Magazine. http://www.sciam.com/article.cfm?id=the-
semanticweb&print=true. Accessed March 26, 2008.
   31
      World Wide Web
   32
      www.hakia.com
   33
      http://en.wikipedia.org/wiki/Sentiment analysis
   34
      http://en.wikipedia.org/wiki/Sentiment analysis



                                       7
teaching. HLT techniques like ASR, TTS, morphological synthesis, parsing
and MT can be used for interactive language teaching especially second
language teaching. With the help of HLT we can build online systems which
can teach second language and evaluate the progress made by the student
with out the intervention of a human instructor.

HLT in Bio-Medical Research
HLT techniques like Named Entity Recognition35 (NER),SeW and Text Min-
ing36 techniques are widely used in the field of Bio-Medical research. The
very field of research is now called as Bio-medical Natural Language Processing(Bio-
NLP).

HLT in Forensic Science
Another vital are which HLT is going to applied is Forensic Science. The
HLT techniques are very useful for authorship dispute resolution,disputes of
meaning and use, identification of the author of anonymous texts, identifying
cases of plagiarism37 and reconstructing mobile phone text conversations
etc..

HLT for Business
It is well known that without search engines there is no existence for web-
pages. Without advertisements there is no existence for business too. The
emergence of new media pawed the way to online advertisement techniques.
Marriage of IR and other HLT techniques with online advertisement give
birth to a new field called ’Computational Advertisement’. It helps the ad-
vertisers to put heir advertisement in appropriate place according to the
taste of consumers. Another vital business oriented area of R&D is ’Collec-
tive Intelligence’38 where wide range of HLT techniques are used. It helps
service providers like online stores to give product recommendations for the
consumer based on his/her purchasing behavior and taste. This will be
attained by comparing and analyzing the purchasing behavior and taste
customers who shares similar taste. So remember when ever you are receiv-
ing context relevant advertising or product recommendation the power of
HLT is there!!
 35
    http://en.wikipedia.org/wiki/Named entity recognition
 36
    http://en.wikipedia.org/wiki/Text mining
 37
    http://en.wikipedia.org/wiki/Plagiarism
 38
    http://en.wikipedia.org/wiki/Collective intelligence




                                         8
5         Issues in HLT
The developments in HLT which happened during the past few years is quite
promising and the future technologies which is slowly coming in to practice
and on the way out of the lab too are quite exiting one. Still there are lots
of research issues are there. This section is dedicated to the discussion on
some of the selected issues in HLT with special focus to Indian Language
Technology.
    A large number of Language Technology based products are coming in to
market. How these technology products can be evaluated? Many techniques
were evolved for evaluating LT project/product like EAGLES39 . But most
of the evaluation methodologies are not that much compatible enough to
handle the linguistic phenomenas in Indian Language. A typical example is
MT evaluation. BLUE and METROR are the two major methodologies for
evaluating MT. But both of this methedologies are not that much efficient
for handling MT between English and Indian Languages40 . Another vital
issue in evaluating HLT project/product is availability of data for testing
the tools. For example to evaluate an MT system reference translation sets
are required. In a way the reference translation is parallel corpus only. But
apart from a parallel corpus it has to posses some quality. Such reference
translation corpus should cover different syntactico-semantic phenomena in
source language as well as target language. Availability, especially publically
available such data sets and standards are lacking in the case of Indian Lan-
guages Technology. In the case of India we don’t have any defined standard
body, policy or standard body to evaluatie the HLT projects/products. In
short the issues in HLT can be classified in three broad ares 1) The devel-
opment challenges which involves the algorithm development and baffling
issues in language etc.. 2) Availability of resources and standards in public
domain 3) The evaluation problem. A detailed discussion on the topic is
quite out of the scope of this paper.


6         Conclusion
Much resources and tools were developed in the past few years in HLT.
The developments in the field is quite promising and the future too. As
we discussed in the beginning of this paper we can hope that all the ICT
tools will be powered by HLT in future. On the contrary we cant forgot the
challenges and issues which involved in the field. To solve the major issues
in HLT especially in Indian Language scenario much enhanced policies and
standards might be introduced in near future to boost the R&D activities
in the field.

    39
         http://www.issco.unige.ch/en/research/projects/ewg95//ewg95.html
    40
         http://www.cse.iitb.ac.in/˜b/papers/icon07bleu.pdf
                                   p


                                             9

Weitere ähnliche Inhalte

Was ist angesagt?

Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to HindiRajat Jain
 
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...kevig
 
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHHANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHijnlc
 
Machine Translation Approaches and Design Aspects
Machine Translation Approaches and Design AspectsMachine Translation Approaches and Design Aspects
Machine Translation Approaches and Design AspectsIOSR Journals
 
Language translation english to hindi
Language translation english to hindiLanguage translation english to hindi
Language translation english to hindiRAJENDRA VERMA
 
Error Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation OutputsError Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation OutputsParisa Niksefat
 
Role of Machine Translation and Word Sense Disambiguation in Natural Language...
Role of Machine Translation and Word Sense Disambiguation in Natural Language...Role of Machine Translation and Word Sense Disambiguation in Natural Language...
Role of Machine Translation and Word Sense Disambiguation in Natural Language...IOSR Journals
 
Natural Language Processing Theory, Applications and Difficulties
Natural Language Processing Theory, Applications and DifficultiesNatural Language Processing Theory, Applications and Difficulties
Natural Language Processing Theory, Applications and Difficultiesijtsrd
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGEScsandit
 
Types of machine translation
Types of machine translationTypes of machine translation
Types of machine translationRushdi Shams
 
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid ApproachPunjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid ApproachIJERA Editor
 
A New Approach: Automatically Identify Proper Noun from Bengali Sentence for ...
A New Approach: Automatically Identify Proper Noun from Bengali Sentence for ...A New Approach: Automatically Identify Proper Noun from Bengali Sentence for ...
A New Approach: Automatically Identify Proper Noun from Bengali Sentence for ...Syeful Islam
 
Summer Research Project (Anusaaraka) Report
Summer Research Project (Anusaaraka) ReportSummer Research Project (Anusaaraka) Report
Summer Research Project (Anusaaraka) ReportAnwar Jameel
 
Division_3_Fianna_O'Brien
Division_3_Fianna_O'BrienDivision_3_Fianna_O'Brien
Division_3_Fianna_O'BrienFianna O'Brien
 
Lec 15,16,17 NLP.machine translation
Lec 15,16,17  NLP.machine translationLec 15,16,17  NLP.machine translation
Lec 15,16,17 NLP.machine translationguest873a50
 
Code Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging haiCode Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging haiIIIT Hyderabad
 

Was ist angesagt? (19)

Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to Hindi
 
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
 
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHHANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
 
Machine Translation Approaches and Design Aspects
Machine Translation Approaches and Design AspectsMachine Translation Approaches and Design Aspects
Machine Translation Approaches and Design Aspects
 
Language translation english to hindi
Language translation english to hindiLanguage translation english to hindi
Language translation english to hindi
 
Error Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation OutputsError Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation Outputs
 
Role of Machine Translation and Word Sense Disambiguation in Natural Language...
Role of Machine Translation and Word Sense Disambiguation in Natural Language...Role of Machine Translation and Word Sense Disambiguation in Natural Language...
Role of Machine Translation and Word Sense Disambiguation in Natural Language...
 
Natural Language Processing Theory, Applications and Difficulties
Natural Language Processing Theory, Applications and DifficultiesNatural Language Processing Theory, Applications and Difficulties
Natural Language Processing Theory, Applications and Difficulties
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
 
Types of machine translation
Types of machine translationTypes of machine translation
Types of machine translation
 
Cf32516518
Cf32516518Cf32516518
Cf32516518
 
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid ApproachPunjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
 
A New Approach: Automatically Identify Proper Noun from Bengali Sentence for ...
A New Approach: Automatically Identify Proper Noun from Bengali Sentence for ...A New Approach: Automatically Identify Proper Noun from Bengali Sentence for ...
A New Approach: Automatically Identify Proper Noun from Bengali Sentence for ...
 
I026050054
I026050054I026050054
I026050054
 
Machine Translation
Machine TranslationMachine Translation
Machine Translation
 
Summer Research Project (Anusaaraka) Report
Summer Research Project (Anusaaraka) ReportSummer Research Project (Anusaaraka) Report
Summer Research Project (Anusaaraka) Report
 
Division_3_Fianna_O'Brien
Division_3_Fianna_O'BrienDivision_3_Fianna_O'Brien
Division_3_Fianna_O'Brien
 
Lec 15,16,17 NLP.machine translation
Lec 15,16,17  NLP.machine translationLec 15,16,17  NLP.machine translation
Lec 15,16,17 NLP.machine translation
 
Code Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging haiCode Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging hai
 

Ähnlich wie Script to Sentiment : on future of Language TechnologyMysore latest

Technical_Trends_Role_Machine_Translation_march15
Technical_Trends_Role_Machine_Translation_march15Technical_Trends_Role_Machine_Translation_march15
Technical_Trends_Role_Machine_Translation_march15Hardik Gohel
 
A REVIEW ON THE PROGRESS OF NATURAL LANGUAGE PROCESSING IN INDIA
A REVIEW ON THE PROGRESS OF NATURAL LANGUAGE PROCESSING IN INDIAA REVIEW ON THE PROGRESS OF NATURAL LANGUAGE PROCESSING IN INDIA
A REVIEW ON THE PROGRESS OF NATURAL LANGUAGE PROCESSING IN INDIAJoe Osborn
 
Survey of machine translation systems in india
Survey of machine translation systems in indiaSurvey of machine translation systems in india
Survey of machine translation systems in indiaijnlc
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsVahid Saffarian
 
Conversational AI:An Overview of Techniques, Applications & Future Scope - Ph...
Conversational AI:An Overview of Techniques, Applications & Future Scope - Ph...Conversational AI:An Overview of Techniques, Applications & Future Scope - Ph...
Conversational AI:An Overview of Techniques, Applications & Future Scope - Ph...PhD Assistance
 
Designing the Workflow of a Language Interpretation Device Using Artificial I...
Designing the Workflow of a Language Interpretation Device Using Artificial I...Designing the Workflow of a Language Interpretation Device Using Artificial I...
Designing the Workflow of a Language Interpretation Device Using Artificial I...IOSR Journals
 
Speech Automated Examination for Visually Impaired Students
Speech Automated Examination for Visually Impaired StudentsSpeech Automated Examination for Visually Impaired Students
Speech Automated Examination for Visually Impaired Studentsvivatechijri
 
NTM%20Project%20-Final%20Presentation%20Revised(2)
NTM%20Project%20-Final%20Presentation%20Revised(2)NTM%20Project%20-Final%20Presentation%20Revised(2)
NTM%20Project%20-Final%20Presentation%20Revised(2)finance14
 
Questions On Natural Language Processing
Questions On Natural Language ProcessingQuestions On Natural Language Processing
Questions On Natural Language ProcessingAdriana Wilson
 
IRJET - Analysis on Code-Mixed Data for Movie Reviews
IRJET - Analysis on Code-Mixed Data for Movie ReviewsIRJET - Analysis on Code-Mixed Data for Movie Reviews
IRJET - Analysis on Code-Mixed Data for Movie ReviewsIRJET Journal
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsAdnanBaloch15
 
IRJET- Hand Gesture based Recognition using CNN Methodology
IRJET- Hand Gesture based Recognition using CNN MethodologyIRJET- Hand Gesture based Recognition using CNN Methodology
IRJET- Hand Gesture based Recognition using CNN MethodologyIRJET Journal
 
How ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyHow ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyPekka Abrahamsson / Tampere University
 
Translation Ally: Document and Audio Translator
Translation Ally: Document and Audio TranslatorTranslation Ally: Document and Audio Translator
Translation Ally: Document and Audio TranslatorIRJET Journal
 
An Intelligent Career Counselling Bot A System for Counselling
An Intelligent Career Counselling Bot A System for CounsellingAn Intelligent Career Counselling Bot A System for Counselling
An Intelligent Career Counselling Bot A System for CounsellingIRJET Journal
 
Application for Iraqi sign language translation on Android system
Application for Iraqi sign language translation  on Android system Application for Iraqi sign language translation  on Android system
Application for Iraqi sign language translation on Android system IJECEIAES
 

Ähnlich wie Script to Sentiment : on future of Language TechnologyMysore latest (20)

Technical_Trends_Role_Machine_Translation_march15
Technical_Trends_Role_Machine_Translation_march15Technical_Trends_Role_Machine_Translation_march15
Technical_Trends_Role_Machine_Translation_march15
 
A REVIEW ON THE PROGRESS OF NATURAL LANGUAGE PROCESSING IN INDIA
A REVIEW ON THE PROGRESS OF NATURAL LANGUAGE PROCESSING IN INDIAA REVIEW ON THE PROGRESS OF NATURAL LANGUAGE PROCESSING IN INDIA
A REVIEW ON THE PROGRESS OF NATURAL LANGUAGE PROCESSING IN INDIA
 
Survey of machine translation systems in india
Survey of machine translation systems in indiaSurvey of machine translation systems in india
Survey of machine translation systems in india
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Conversational AI:An Overview of Techniques, Applications & Future Scope - Ph...
Conversational AI:An Overview of Techniques, Applications & Future Scope - Ph...Conversational AI:An Overview of Techniques, Applications & Future Scope - Ph...
Conversational AI:An Overview of Techniques, Applications & Future Scope - Ph...
 
Designing the Workflow of a Language Interpretation Device Using Artificial I...
Designing the Workflow of a Language Interpretation Device Using Artificial I...Designing the Workflow of a Language Interpretation Device Using Artificial I...
Designing the Workflow of a Language Interpretation Device Using Artificial I...
 
Speech Automated Examination for Visually Impaired Students
Speech Automated Examination for Visually Impaired StudentsSpeech Automated Examination for Visually Impaired Students
Speech Automated Examination for Visually Impaired Students
 
K33050053
K33050053K33050053
K33050053
 
NTM%20Project%20-Final%20Presentation%20Revised(2)
NTM%20Project%20-Final%20Presentation%20Revised(2)NTM%20Project%20-Final%20Presentation%20Revised(2)
NTM%20Project%20-Final%20Presentation%20Revised(2)
 
Questions On Natural Language Processing
Questions On Natural Language ProcessingQuestions On Natural Language Processing
Questions On Natural Language Processing
 
IRJET - Analysis on Code-Mixed Data for Movie Reviews
IRJET - Analysis on Code-Mixed Data for Movie ReviewsIRJET - Analysis on Code-Mixed Data for Movie Reviews
IRJET - Analysis on Code-Mixed Data for Movie Reviews
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
sample PPT.pptx
sample PPT.pptxsample PPT.pptx
sample PPT.pptx
 
A017420108
A017420108A017420108
A017420108
 
IRJET- Hand Gesture based Recognition using CNN Methodology
IRJET- Hand Gesture based Recognition using CNN MethodologyIRJET- Hand Gesture based Recognition using CNN Methodology
IRJET- Hand Gesture based Recognition using CNN Methodology
 
How ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyHow ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundly
 
Translation Ally: Document and Audio Translator
Translation Ally: Document and Audio TranslatorTranslation Ally: Document and Audio Translator
Translation Ally: Document and Audio Translator
 
An Intelligent Career Counselling Bot A System for Counselling
An Intelligent Career Counselling Bot A System for CounsellingAn Intelligent Career Counselling Bot A System for Counselling
An Intelligent Career Counselling Bot A System for Counselling
 
Application for Iraqi sign language translation on Android system
Application for Iraqi sign language translation  on Android system Application for Iraqi sign language translation  on Android system
Application for Iraqi sign language translation on Android system
 
Narrative: Text Generation Model from Data
Narrative: Text Generation Model from DataNarrative: Text Generation Model from Data
Narrative: Text Generation Model from Data
 

Mehr von Jaganadh Gopinadhan

Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment AnalysisJaganadh Gopinadhan
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - IJaganadh Gopinadhan
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language ProcessingJaganadh Gopinadhan
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language ProcessingJaganadh Gopinadhan
 
Natural Language Processing with Per
Natural Language Processing with PerNatural Language Processing with Per
Natural Language Processing with PerJaganadh Gopinadhan
 
Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org Jaganadh Gopinadhan
 
Sanskrit and Computational Linguistic
Sanskrit and Computational Linguistic Sanskrit and Computational Linguistic
Sanskrit and Computational Linguistic Jaganadh Gopinadhan
 
Linguistic localization framework for Ooo
Linguistic localization framework for OooLinguistic localization framework for Ooo
Linguistic localization framework for OooJaganadh Gopinadhan
 
ntroduction to GNU/Linux Linux Installation and Basic Commands
ntroduction to GNU/Linux Linux Installation and Basic Commands ntroduction to GNU/Linux Linux Installation and Basic Commands
ntroduction to GNU/Linux Linux Installation and Basic Commands Jaganadh Gopinadhan
 
Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python Jaganadh Gopinadhan
 
Introduction to Free and Open Source Software
Introduction to Free and Open Source Software Introduction to Free and Open Source Software
Introduction to Free and Open Source Software Jaganadh Gopinadhan
 
Opinion Mining and Sentiment Analysis Issues and Challenges
Opinion Mining and Sentiment Analysis Issues and Challenges Opinion Mining and Sentiment Analysis Issues and Challenges
Opinion Mining and Sentiment Analysis Issues and Challenges Jaganadh Gopinadhan
 
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...Jaganadh Gopinadhan
 
Tools andTechnologies for Large Scale Data Mining
Tools andTechnologies for Large Scale Data Mining Tools andTechnologies for Large Scale Data Mining
Tools andTechnologies for Large Scale Data Mining Jaganadh Gopinadhan
 
Practical Natural Language Processing From Theory to Industrial Applications
Practical Natural Language Processing From Theory to Industrial Applications Practical Natural Language Processing From Theory to Industrial Applications
Practical Natural Language Processing From Theory to Industrial Applications Jaganadh Gopinadhan
 

Mehr von Jaganadh Gopinadhan (20)

Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - I
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
 
Natural Language Processing with Per
Natural Language Processing with PerNatural Language Processing with Per
Natural Language Processing with Per
 
Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org
 
Sanskrit and Computational Linguistic
Sanskrit and Computational Linguistic Sanskrit and Computational Linguistic
Sanskrit and Computational Linguistic
 
Linguistic localization framework for Ooo
Linguistic localization framework for OooLinguistic localization framework for Ooo
Linguistic localization framework for Ooo
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Ilucbe python v1.2
Ilucbe python v1.2Ilucbe python v1.2
Ilucbe python v1.2
 
Social Media Analytics
Social Media Analytics Social Media Analytics
Social Media Analytics
 
Success Factor
Success Factor Success Factor
Success Factor
 
ntroduction to GNU/Linux Linux Installation and Basic Commands
ntroduction to GNU/Linux Linux Installation and Basic Commands ntroduction to GNU/Linux Linux Installation and Basic Commands
ntroduction to GNU/Linux Linux Installation and Basic Commands
 
Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python
 
Introduction to Free and Open Source Software
Introduction to Free and Open Source Software Introduction to Free and Open Source Software
Introduction to Free and Open Source Software
 
Opinion Mining and Sentiment Analysis Issues and Challenges
Opinion Mining and Sentiment Analysis Issues and Challenges Opinion Mining and Sentiment Analysis Issues and Challenges
Opinion Mining and Sentiment Analysis Issues and Challenges
 
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...
 
Tools andTechnologies for Large Scale Data Mining
Tools andTechnologies for Large Scale Data Mining Tools andTechnologies for Large Scale Data Mining
Tools andTechnologies for Large Scale Data Mining
 
Practical Natural Language Processing From Theory to Industrial Applications
Practical Natural Language Processing From Theory to Industrial Applications Practical Natural Language Processing From Theory to Industrial Applications
Practical Natural Language Processing From Theory to Industrial Applications
 
Hdfs
HdfsHdfs
Hdfs
 

Script to Sentiment : on future of Language TechnologyMysore latest

  • 1. Script to Sentiment : on future of Language Technology Jaganadh G jaganadhg@gmail.com Different Diementions of Language Technology Central Institute of Hindi Mysore Feb. 25-26 2010 Abstract Human Language Technology(HLT) is no longer confined as a sub- ject for class room teaching. Revolutionary developments are occur- ring in the field of HLT. These developments are capable enough to bring changes in the human life. Information Communication Tech- nology(ICT) became and inevitable component for our day to day life. Directly or indirectly we are consumers of ICT based products. For the last few years we saw that the ICT revolution is appearing in our native languages too. As a result HLT became a direct or indirect com- ponent in ICT products and services. HLT is supposed to premate all ares of our life in future. Whether you are a Doctor, Engineer, Framer or a lover irrespective of your profile we are all going to be addictive of HLT based ICT products. The present paper discusses developments in the field of HLT and the future. 1 Introduction Human Language Technology(HLT) is no longer confined as a subject for class room teaching. Revolutionary developments are occurring in the field of HLT. These developments are capable enough to bring changes in the human life. Information Communication Technology(ICT) became and in- evitable component for our day to day life. Directly or indirectly we are consumers of ICT based products. For the last few years we saw that the ICT revolution is appearing in our native languages too. As a result HLT became a direct or indirect component in ICT products and services. HLT is supposed to premate all ares of our life in future. Whether you are a Doctor, Engineer, Framer or a lover irrespective of your profile we are all going to be addictive of HLT based ICT products. 1
  • 2. The history of HLT begins from the birth of Personal Computers(PC). From the early 1950’s Researchers and Scientists were trying to develop computers programs that can handle human languages as like a human. The earliest Research and Development(R&D) in this field was related to the development of Machine Translation Systems(MT). As of now we can say that significant developments were occurred in the field and working systems are available. Some are ready accepted, some are imperfect but no alternatives. So still we are not in a state to say that ’Yes! we cracked the language challenge! and now able to provide smart engineering solutions’. Path breaking R&D activities are happening this field. In this scenario it is quiet interesting to investigate where we are standing in the field of Human Language Technology. The present paper is a compilation on the developments in the field of HLT. The paper also discuss some of future technologies in HLT. Recent developments in Indian Language Technology is also discussed in the paper with special fous on issues involved in it. 2 Where we are now? R&D activities in HLT can be broadly classified in two major categories. 1) Text processing 2) Speech Processing. Activities under text process- ing involves development of spell chcekr systems to discourese analysis sys- tems. Speech processing involves text to speech conversion(TTS) to speech to speech translation.For all most all tasks in both fields; Free and Open Source (FOSS)1 and propitiatory solutions are available. Internet based so- lutions are also there like; Google Translate and other services2 . The FOSS based solutions as well as public domain solution in this field played a vital role in rapid developments in HLT including Indian Languages. This section is a brief survey on present status of the R&D activities in the field. 2.1 Language and Scripts in Computers In early days of HLT representing the vernaculars in the computers was a challenge. ASCII 3 was the early character encoding scheme4 existed in the early days. The encoding scheme was used to represent English alphabets. This encoding scheme was not sufficient enough to represent the other lan- guages. Some work around were done for attaining the same. Most of these 1 http://en.wikipedia.org/wiki/Free and open source software 2 http://translate.google.com/# www.google.com/transliterate/ www.google.com/dictionary etc.. 3 http://en.wikipedia.org/wiki/ASCII - Accessed on 01-01-2010 4 http://en.wikipedia.org/wiki/Character encoding - Accessed on 01-01-2010 2
  • 3. workarounds were purely font5 based solutions. In India we developed such a solution called ISCII 6 for representing Indian Languages. The introduc- tion of Unicode 7 is a remarkable development in this field. Unicode made the task of representing vernaculars in computers very easy and it became de facto standard. Apparently suitable font8 technology also developed. The incarnation of Unicode standard boosted the penetration of local language contents in internet. All the living languages which received en- coding space in Unicode got opportunity to dominate in the Information technology(IT) world. It leaded to information overflow. As result there is an increasing demand for information processing tools like keyboard drivers to search engines to decision support systems9 . 2.2 Developments in Text Processing This section is a brief survey on the developments in text processing tech- nologies. A wide variety and number of text processing systems are avail- able now; like spell checkers, grammar correcting systems, MT systems and search engines etc.. People who are using computer for preparing the docu- ments etc.. are familiar with tools like spell checking systems. They knows that life is not easy without such tools. Because human being is tend to commit errors and lazy too! But when the computers were placed in the desk of hard core language people like translators they were interested in electronic dictionaries as well as machine translation. When computers were came in to the life of business people they are having different intentions. But who ever may be and what ever may be the profile of the computer users category there demands were directly or indirectly related to HLT. Because everybody’s uses language, and they can’t live with out language. The im- pact of such demands caused to rise of new methodologies and technologies in HLT itself. Those developments are discussed here. Spellcheckers In computing, a spell checker (spell check) is an application program that flags words in a document that may not be spelled correctly10 . The very technology is very-much advanced now. Spell checkers are available for all most all languages in the world. Most of the popular word processing software having the feature. Spell checker systems are available for Indian 5 http://en.wikipedia.org/wiki/Font - Accessed on 01-01-2010 6 http://en.wikipedia.org/wiki/Indian Script Code for Information Interchange - Ac- cessed on 01-01-2010 7 http://unicode.org/ http://en.wikipedia.org/wiki/Unicode - Accessed on 01-01-2010 8 http://en.wikipedia.org/wiki/Font - Accessed on 01-02-10 9 http://en.wikipedia.org/wiki/Decision support system Accessed on 01-02-10 10 http://en.wikipedia.org/wiki/Spell checker 3
  • 4. Languages too. The language software collection cd’s distributed by the TDIL11 program contains spell checker applications for almost all Indian Languages. The FOSS movement in India is very active in spell checker dictionary development for Indian Languages12 . The FOSS frameworks13 available for spell checker systems are being widely used by these FOSS peoples. Develop- ments in Indian Language Spell checker dictionaries needs more volunteers. Machine Translation MT is one of the oldest and live task in HLT. For the last 50 and more years R&D activities in the very field is in progress. Some systems are available for use too. But majority are not in a state to consider as a perfect solution. Divergent methodologies are available for the task of MT like statistical,rule- based and hybrid etc.14 . But fully automated high quality MT remains as a target to be achieved. Among the available MT systems/services the Google Translate and Babel Fish15 is most famous. Google Translate have the facility of English to Hindi and vice verse translation. MT research in IL is very active from early 1970’s. AnglaBharati16 and Anusaaraka17 are two major approaches developed in the early days and still in active development. Other systems like Sampark18 , UNL based machine translations systems19 are also available. The TDIL program of Govt. of India is providing extensive support to MT research in India.Except the above mentioned systems, some other IL MT initiatives are there. Some FOSS based solutions are also available for MT system develop- ment. There are two famous frameworks called Moses20 and Apertium21 . These tools follows the statistical paradigm of MT. MT researchers in India is also came forward to work in these two frameworks. Hope that this will boost the MT research in India too. 11 www.tdil.mit.gov.in 12 http://indlinux.org/ http://smc.org.in/ http://wiki.services.openoffice.org/wiki/Dictionaries 13 http://hunspell.sourceforge.net http://en.wikipedia.org/wiki/MySpell 14 http://www.hutchinsweb.me.uk/IntroMT-TOC.htm 15 http://babelfish.yahoo.com/ 16 http://www.cse.iitk.ac.in/users/langtech/anglabharti.htm 17 http://ltrc.iiit.ac.in/˜nusaaraka/ a 18 http://sampark.iiit.ac.in/ 19 http://www.springerlink.com/content/t1005w166746727l/fulltext.pdf 20 www.statmt.org/moses 21 www.apertium.org 4
  • 5. Search and IR Search Engines(SE) and Information Retrieval(IR) systems are the most widely used HLT tool by the general public. Google22 , Yahoo23 and Bing24 are the three most famous search engine giants in the world. Revolutionary developments are occurring in this field. Domain based searches like ’patent search’, content based search like ’video search’, localized search like ’movie timing’ and cross lingual search are the recent trends in this field. The latest development in the field is Semantic Search which will be discussed in the later section of the paper. All the search search engines are now capable enough to handle local language search requests too. Cross Lingual Infor Systems(CLIR) for Indian Languages are in development. 3 Speech Processing This section is meant for to give a brief survey on the developments in Speech Processing. The main technologies discussed in this section are Text to Speech(TTS) system and Automatic Speech Recognition(ASR). TTS Text to Speech system or TTS is a software which can convert an electronic text to corresponding speech. The very field involves both text processing as well as signal processing techniques. R&D activities in this direction pro- duced hopeful and acceptable solutions. FOSS based as well as proprietary solutions are available now. The major FOSS based framework available for TTS system development is Festival25 and Festvox26 system. Introduction of both framework boosted the development of TTS in various languages in- cluding Indian Languages too. The most remarkable development in Indian Language TTS system under FOSS is the Dhvani project27 . Even-though we are in a state to say that we achieved significant growth in the field of TTS development more challenges are there. Those challenges includes providing more naturalness to the synthesized voice, intonation and emotion based TTS etc.. ASR Automatic Speech Recognition(ASR) is technology that allows a computer program to identify and transcribe the word that a person speaks in to 22 http://www.google.co.in/ 23 www.yahoo.co.in 24 http://www.bing.com/ 25 http://www.cstr.ed.ac.uk/projects/festival/ 26 http://festvox.org/festival/ 27 http://dhvani.sourceforge.net/ 5
  • 6. a microphone. As like TTS, ASR also involves both text processing and signal processing techniques. It is one of the most challenging and inter- esting tasks in HLT. Significant developments are in this field too. ASR systems are available for some Indian Languages like Hindi28 and Telugu. The most widely used FOSS based framework for ASR development is CMU Sphinx29 . The introduction of CMU Sphinx opened a new direction in the R&D of ASR. Apart from CMU Sphinx some other FOSS based as well as propitiatory frameworks are available for ASR development. 4 Future of HLT Over the past few decades colossal progress has been came up in the field of HLT. From simple systems that can understand numbers to text un- derstanding and summarization systems were developed with in the past few decades. So many challenges are there to be addressed in the future. Hopefully we can build complex systems from the existing HLT systems. These developments are the results of a long journey from lab experiments to deployment in real time work environments. The wide range of tools and technologies developed as part of R&D in HLT is capable enough to make deep impact in the human life. These tools are having great relevance and impact in market oriented society. What will be the future? Can we imagine it? Yes! Imagine that you are asking your car to show the route to Central Institute of Hindi from Mysore bus stand, and it is telling the directions or giving a detailed printout describing the route. In-fact it is not a dream technology.It is possible with clubbing of other technologies like GPS(Global Positioning System) and Speech Processing. Suppose that a judge is analyzing the arguments related to a case with a software and reaching in judgment. Or consider a legislative assembly publishes some draft bills in its website and receives comments on the bill.After receiving the comment and before proceeding to further actions they are analyzing it ti find how many of them are positive comments and how many of them are negative!! It is already possible. The technology which analyzes the opinion is called ’Sentiment Analysis’. There is no end for imaginations. But these imaginations will come in to reality very soon. This section highlights some of the future technologies are R&D ares in HLT. Semantic Web/Search Semantics is a branch of modern linguistics which studies about the struc- ture of meaning. The Semantic Web(SeW) is an evolving development of the World Wide Web in which the meaning (semantics) of information and ser- 28 http://sourceforge.net/projects/hindiasr/ 29 http://www.speech.cs.cmu.edu/sphinx/ 6
  • 7. vices on the web is defined, making it possible for the web to ”understand” and satisfy the requests of people and machines to use the web content30 . Tim Berners Lee the father of www31 is the inventor of this technology. W3C or the World Wide Web consortium is the authority in publishing and main- taining standards and recommendation on SeW. The semantic web based HLT implementations are going to bring a big revolution in the coming years. Semantic Search is one of such technologies which HLT people are discussing now a days. SeW search engines are already there32 , but not that much accepted as of now. It will bring revolutionary changes in the field of online publishing, e-governance, and e-commerce etc... Sentiment Analysis Sentiment analysis or opinion mining refers to a broad (definitionally chal- lenged) area of natural language processing, computational linguistics and text mining33 . The basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level — whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative or neutral34 . The rise social media like blog, twitter, facebook, and linkedin etc.. has fueled great interest in the field of Sen- timent Analysis. Publishers, movie companies and fast moving consumer goods(FMCG) companies are the main consumers of this technology. The technology is already present in the market. Very soon the technology will be getting its own position in politics governance etc.. Future of MT In previous section we discussed the developments in MT research. Re- markable achievements were made in this direction. But still we have to issue many issues to achieve the goal Fully Automated High Quality Ma- chine Aided Translation (FAHQMAT). Other expectation is to build effi- cient speech to speech translation systems. I think with in a few years our researchers will be providing revolutionary solutions in this field. HLT in Education Computer Assisted Teaching(CAT) is already in practice through out the globe. It is considered as one of the best way to for effective and interactive 30 Berners-Lee, Tim; James Hendler and Ora Lassila (May 17, 2001). ”The Seman- tic Web”. Scientific American Magazine. http://www.sciam.com/article.cfm?id=the- semanticweb&print=true. Accessed March 26, 2008. 31 World Wide Web 32 www.hakia.com 33 http://en.wikipedia.org/wiki/Sentiment analysis 34 http://en.wikipedia.org/wiki/Sentiment analysis 7
  • 8. teaching. HLT techniques like ASR, TTS, morphological synthesis, parsing and MT can be used for interactive language teaching especially second language teaching. With the help of HLT we can build online systems which can teach second language and evaluate the progress made by the student with out the intervention of a human instructor. HLT in Bio-Medical Research HLT techniques like Named Entity Recognition35 (NER),SeW and Text Min- ing36 techniques are widely used in the field of Bio-Medical research. The very field of research is now called as Bio-medical Natural Language Processing(Bio- NLP). HLT in Forensic Science Another vital are which HLT is going to applied is Forensic Science. The HLT techniques are very useful for authorship dispute resolution,disputes of meaning and use, identification of the author of anonymous texts, identifying cases of plagiarism37 and reconstructing mobile phone text conversations etc.. HLT for Business It is well known that without search engines there is no existence for web- pages. Without advertisements there is no existence for business too. The emergence of new media pawed the way to online advertisement techniques. Marriage of IR and other HLT techniques with online advertisement give birth to a new field called ’Computational Advertisement’. It helps the ad- vertisers to put heir advertisement in appropriate place according to the taste of consumers. Another vital business oriented area of R&D is ’Collec- tive Intelligence’38 where wide range of HLT techniques are used. It helps service providers like online stores to give product recommendations for the consumer based on his/her purchasing behavior and taste. This will be attained by comparing and analyzing the purchasing behavior and taste customers who shares similar taste. So remember when ever you are receiv- ing context relevant advertising or product recommendation the power of HLT is there!! 35 http://en.wikipedia.org/wiki/Named entity recognition 36 http://en.wikipedia.org/wiki/Text mining 37 http://en.wikipedia.org/wiki/Plagiarism 38 http://en.wikipedia.org/wiki/Collective intelligence 8
  • 9. 5 Issues in HLT The developments in HLT which happened during the past few years is quite promising and the future technologies which is slowly coming in to practice and on the way out of the lab too are quite exiting one. Still there are lots of research issues are there. This section is dedicated to the discussion on some of the selected issues in HLT with special focus to Indian Language Technology. A large number of Language Technology based products are coming in to market. How these technology products can be evaluated? Many techniques were evolved for evaluating LT project/product like EAGLES39 . But most of the evaluation methodologies are not that much compatible enough to handle the linguistic phenomenas in Indian Language. A typical example is MT evaluation. BLUE and METROR are the two major methodologies for evaluating MT. But both of this methedologies are not that much efficient for handling MT between English and Indian Languages40 . Another vital issue in evaluating HLT project/product is availability of data for testing the tools. For example to evaluate an MT system reference translation sets are required. In a way the reference translation is parallel corpus only. But apart from a parallel corpus it has to posses some quality. Such reference translation corpus should cover different syntactico-semantic phenomena in source language as well as target language. Availability, especially publically available such data sets and standards are lacking in the case of Indian Lan- guages Technology. In the case of India we don’t have any defined standard body, policy or standard body to evaluatie the HLT projects/products. In short the issues in HLT can be classified in three broad ares 1) The devel- opment challenges which involves the algorithm development and baffling issues in language etc.. 2) Availability of resources and standards in public domain 3) The evaluation problem. A detailed discussion on the topic is quite out of the scope of this paper. 6 Conclusion Much resources and tools were developed in the past few years in HLT. The developments in the field is quite promising and the future too. As we discussed in the beginning of this paper we can hope that all the ICT tools will be powered by HLT in future. On the contrary we cant forgot the challenges and issues which involved in the field. To solve the major issues in HLT especially in Indian Language scenario much enhanced policies and standards might be introduced in near future to boost the R&D activities in the field. 39 http://www.issco.unige.ch/en/research/projects/ewg95//ewg95.html 40 http://www.cse.iitb.ac.in/˜b/papers/icon07bleu.pdf p 9