This document summarizes developments in the Open-Tamil library, an open-source library for Tamil text processing. It describes how the library enables word games and spell checking applications. It also provides statistics on library usage and discusses ongoing work to improve quality through testing and documentation to encourage more developers to use and contribute to the library.
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Developments in Open-Tamil Library for Text Processing and Word Games
1. Developments in Open-Tamil Library
T. Arulalan, T. Shrinivasan+
, and A. Muthiah*
INFIIT 2016, Dindugul, Tamilnadu
+tshrinivasan@gmail.com
*ezhillang@gmail.com
2. Introduction
● Help create high-level
applications in Tamil
● Fully open-source
● Published and
maintained from 2014
● Available via Python
Package index - pip
● Developed via Git
● Many contributions
from 10 developers
● Library for Tamil text
processing
● Word games
● Encoding conversions
3. Spell checker based on OpenTamil
Integration of solthiruthi spell-checker from open-tamil with TinyMCE web editor
Work in progress
4. Word Search Games
Game generated using Open-Tamil library
Find the world leaders
during the 1950s
5. Document Statistics
$ python solpattiyal.py document.txt
Outputs:
1. Number of unique words in doc
2. Word frequency
3. Words in Tamil dictionary sorted order
6. Open-tamil ச|வா|வால|வாசல|சவால
● Generate Anagrams
● Generate Combinations of words
● Partial words
● Check if word is a palindrome
import tamil
from solthiruthi.dictionary import *
TVU_dict = DictionaryBuilder.create(TamilVU)
word = u’சவால’
q=list(tamil.wordutils.combinagrams(word,TVU_dict))
print(u'|'.join(q))
which gives you the output, ச|வா|வால|வாசல|சவால
Ref: See ezhillang.wordpress.com blog here
7. Word Play - Tamil Anagrams
We can compute anagrams in Tamil
e.g. using TVU word list → at ezhillang blog
https://ezhillang.wordpress.com/2015/07/27/open-tamil-anagrams-in-tamil-vu-word-list/
8. Open-Tamil – Java library
● Available for use in Java
● Build Tamil apps easily
9. Quality
● Open-Tamil project is developed on www.github.com
● Approximately 16k LOC - (13,579 LOC), solthiruthi (1,594 LOC),
and ngram (187 LOC), in the latest development repository.
● Over 208 unittests (2,705 LOC) that test our source code
modules tamil
● All source code checkin on github trigger the continuous
integration tests via Travis-CI
● Supported Python flavors ( v2.6, v2.7, v3.3, v3.5, and PyPy)
● Manual testing of Java and Ruby tests.
● Github workflow
10. Conclusions
● Further contributions required to document the
library and write tutorials for us
● Improve quality, test and report bugs
● More students and developers may avail this
library and build high level applications
● Project support and sponsors are sought out