Applied Research Council (ARC) was founded in the year 2004 (formerly YRIF) to provide a medium for exchange of knowlegde. We promote students to demonstrate their indigenous technical skills, there by see them become scientists and technologists. We provide a platform for everyone to showcase their technical and research skills.


Tamil TTS

An open source Text-to-Speech implementation for Tamil language is to be developed which would read out UNICODE formatted Tamil text documents. The basis of TTS is building a library of phonemes and its n-grams with an optimal chaining to give better tamil speech synthesis.

Audio Language Detection

Given an audio snippet, be it speech, song or whatever, the objective is to detect the spoken language from the audio snippet. We want to explore the properties of Indian languages when they sound and use them to detect the language appropriately. For example, languages like Oriya, Bengali have a lot of "O" sound, languages like Kannada have a lot of "ha" sound and many time they end with "small 'a'" (kuril) sound.

Latent Dirichlet Allocation (LDA)

Latent Dirichlet allocation (LDA), a topic model for text or other discrete data. LDA allows you to analyze of corpus, and extract the topics that combined to form its documents. LDA, a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. This project is currently hosted at SourceForge.net under YRIF projects.

பல்லாங்குழி (Pallanghuzhi)

பல்லாங்குழி (Pallanghuzhi) is a traditional south indian game played by women and children using pebbles and a wooden board. Pallanghuzhi is a game for two, which can be played between networked computers or over the web or as a standalone game played by one against the computer. The game is to be developed in Linux/C++ as an OSS project which we plan to license under GPL. Students with good background on C++, Linux, Qt/WxWidgets are invited.

Inter-Indian Language Translator

Inter-Indian Language Translator is a TUI/GUI tool that converts text from any Indian Language to any other Indian Language. We want to start with Tamil to Telugu translator as the first stage of development which we expect to expand in terms of more target languages firstly and more source languages following that. Students with command on Linux, Perl/Python/C++, WxWidgets/Qt, Tamil and/or Telugu, are invited.

Indian Language Stemmer

Stemming is a concept of finding the root word of any inflected word forms syntactically. Stemming algorithms are implemented for many languages but not indian languages. We want to start the chain by implementing a stemming algorithm for Tamil language and gradually expand to other languages. Stemming plays a vital role in the development of Information Retrieval systems and Text mining. Porter algorithm is a popular English Stemmer. Students with Linguistic background with Programming experience in C/C++ are invited.

வட்டெழுத்து (Grandham) OCR

South India is flourished with a lot of temples with inscriptions written in an ancient form of Tamil language called Grandham. Inscriptions written on palm-leaves and stones take a round shape without dots for most of the symbols. We intend to develop a tool which converts Grandham (வட்டெழுத்து) to modern tamil and vice versa with an OCR plugin. So photographs of grandham inscriptions could be converted to modern tamil writing using the proposed tool. Students with inclination towards Tamil literature with C++/Linux are invited to participate.

Plagiarism Detector

Plagiarism can be defined as the deliberate use of another person's work in your own work, as if it were your own, without adequate acknowledgement of the original source. If this is done in work that you submit for assessment, then you are attempting to deceive the examiners. In other words, plagiarism is cheating - trying to claim the credit for something that is not your work (As on http://helios.bto.ed.ac.uk). Plagiarism detector compares a test document against the reference document and gives a score of similarity. When the score crosses a preset threshold, the detector triggers the plagiarism alarm. The detector is to be implemented as a standalone GUI tool which should have the facility to learn the similarity-features from training documents. The detector should also implement rewritable rules based similarity measurement. Upon successful standalone implementation, the concept is to be ported as a web-based detector tool.