Conversation with sarayu_anusha at 9/3/2007 1:15:52 PM on radusdirect (yahoo)

(1:15:52 PM) sarayu_anusha: good afternoon. sarayu here sir
(1:16:00 PM) Sudarsun: hi sarayu
(1:17:04 PM) Sudarsun: alright.
(1:18:34 PM) sarayu_anusha: so do we do the project wit text DB..?
(1:18:56 PM) Sudarsun: is the other lady beside you ?
(1:19:04 PM) sarayu_anusha: yes..
(1:19:06 PM) Sudarsun: good.
(1:19:12 PM) Sudarsun: here we go;
(1:19:52 PM) sarayu_anusha: ya
(1:19:57 PM) Sudarsun: given a database with multiple tables ( decide on what type application you want to work upon.. ),
you are develop an NL interface which would allow users to give their query in plain English.
(1:20:22 PM) Sudarsun: the NL interface would translate the NL query to SQL and query it on the DB and return results in NL.
(1:20:49 PM) sarayu_anusha: ok.. how abt banking sir??
(1:20:51 PM) Sudarsun: let's say you have DB of songs {name, singer, type, album, year, etc }
(1:20:59 PM) Sudarsun: or videos..
(1:21:00 PM) sarayu_anusha: ok
(1:21:22 PM) Sudarsun: banking db would be interesting but it would relatively difficult to build the database.
(1:21:41 PM) Sudarsun: if you work on Songs DB/Video DB, it becomes easier to appreciate the effect of NL.
(1:21:56 PM) sarayu_anusha: oh ok
(1:22:22 PM) Sudarsun: you will have to decide on what is the target application.
(1:22:32 PM) sarayu_anusha: video as in wit no annotations?
(1:22:35 PM) Sudarsun: songs db is just a suggestion, you are free to decide the target platform.
(1:24:37 PM) sarayu_anusha: ok we wil decide upon tat. college specifies on a ieee based paper.we wont be having one now.
(1:25:54 PM) Sudarsun: there are a lot of papers which discuss about NL interface to SQL DBs.
(1:26:04 PM) Sudarsun: you can find 10s of them..
(1:26:53 PM) sarayu_anusha: ok...
(1:27:38 PM) sarayu_anusha: what would the next step be.. how do we proceed..
(1:28:42 PM) Sudarsun: Here are the following deliverables from your end;
(1:28:56 PM) Sudarsun: 1. Finalization on the target application
(1:29:18 PM) Sudarsun: 2. Plan to prepare the Database with multiple tables and schemas
(1:29:25 PM) sarayu_anusha: ok
(1:29:30 PM) Sudarsun: 3. Timeline, milestones for the same.
(1:30:08 PM) Sudarsun: 4. Learning exercise on Augumented Transition Grammar (ATN)
(1:30:36 PM) sarayu_anusha: what is ATN
(1:30:50 PM) Sudarsun: 5. Preparation of an exhaustive collection of NL queries to query your application
(1:31:09 PM) sarayu_anusha: ok
(1:31:40 PM) Sudarsun: The would be all for stage 1.
(1:32:07 PM) sarayu_anusha: alright
(1:32:15 PM) Sudarsun: ATN is a Finite State Automata, which is generally used to parse sentences.
(1:32:29 PM) sarayu_anusha: oh ok
(1:35:17 PM) sarayu_anusha: then we will search for the ieee papers and get back to you sir..
(1:35:26 PM) Sudarsun: sure.
(1:36:24 PM) Sudarsun: but i need a writeup as well about the entire project, briefing onwhat you are intending to do { the input, the business logic, the output, assumptions, limitations, etc }
(1:37:01 PM) Sudarsun: you should probably get hold of this book "Natural Language Understanding, James Allen"
(1:37:30 PM) Sudarsun: where are you guys put up ?
(1:37:41 PM) sarayu_anusha: ok.. we will do that.. but we will need a base paper for that.
(1:37:52 PM) sarayu_anusha: sarayu-nandanam
(1:38:01 PM) sarayu_anusha: sahana-vadapalani
(1:38:15 PM) Sudarsun: alright
(1:39:05 PM) Sudarsun: http://www.freepatentsonline.com/5197005.html
(1:40:16 PM) Sudarsun: http://portal.acm.org/citation.cfm?id=973927.973928
(1:41:43 PM) Sudarsun: http://www.cs.umu.se/education/examina/Rapporter/RafalPiotrowski.pdf
(1:41:54 PM) Sudarsun: http://www.elfsoft.com/Resources/hons_9904.pdf
(1:43:07 PM) Sudarsun: you can find a lot of references..
(1:43:20 PM) Sudarsun: you need to do a lot of homework girls.. :)
(1:43:33 PM) sarayu_anusha: sure sir
(1:43:38 PM) Sudarsun: good
(1:43:52 PM) Sudarsun: who is the team leader ?
(1:44:14 PM) sarayu_anusha: you mean guide??
(1:44:23 PM) Sudarsun: no, team leader amongst you both.
(1:44:30 PM) Sudarsun: by the by, who is your guide ?
(1:44:39 PM) Sudarsun: Saba or Sridhar ?
(1:44:53 PM) sarayu_anusha: both equally
(1:45:04 PM) Sudarsun: never works that way.. ;)
(1:45:08 PM) sarayu_anusha: saba sir is our coordinator
(1:45:16 PM) Sudarsun: who is the guide then ?
(1:45:23 PM) sarayu_anusha: girija mam is our guide
(1:45:33 PM) Sudarsun: hmm...
(1:46:00 PM) sarayu_anusha: then u can be the leader sir:)
(1:46:29 PM) Sudarsun: i cannot your leader, i am the mentor.
(1:47:16 PM) sarayu_anusha: we r thankful 4 tat
(1:47:51 PM) Sudarsun: meanwhile, I want you guys to learn c++, c, lex and yacc.
(1:48:14 PM) Sudarsun: or perl / python
(1:48:50 PM) sarayu_anusha: we r only familiar wit c and c++
(1:49:25 PM) Sudarsun: then, get familiar with lex, yacc, perl, python..
(1:49:42 PM) Sudarsun: if you do the entire project in perl / python, there is not real need for lex/yacc.
(1:49:52 PM) Sudarsun: if the project is done in c++, you need lex/yacc.
(1:50:01 PM) sarayu_anusha: oh ok
(1:50:08 PM) Sudarsun: do you have computers at home ?
(1:50:16 PM) sarayu_anusha: yes sir we do
(1:50:35 PM) Sudarsun: what is the configuration ?
(1:51:21 PM) sarayu_anusha: 256mb RAM
(1:51:35 PM) sarayu_anusha: 80gb hard disk
(1:51:54 PM) sarayu_anusha: p4
(1:52:04 PM) Sudarsun: that's pretty old.
(1:52:38 PM) Sudarsun: do you people know php ?
(1:53:15 PM) sarayu_anusha: what configuration would be required
(1:53:22 PM) sarayu_anusha: no sir
(1:53:33 PM) Sudarsun: hmm.. if you learn php, it should be very useful.
(1:53:47 PM) Sudarsun: here is how, i am visualizing the project.
(1:54:07 PM) sarayu_anusha: wat is php exactly?
(1:54:37 PM) Sudarsun: browser <-> php engine <-> php-c++ module <-> mysql server
(1:54:41 PM) Sudarsun: what!!??
(1:55:05 PM) sarayu_anusha: ok
(1:56:10 PM) Sudarsun: www.php.net
(1:58:42 PM) Sudarsun: ok, if php is too much for you, you may limit it to only the c++ module alone.
(1:59:10 PM) Sudarsun: where, you may give the nl query in console and get the results shown in the same screen.
(1:59:51 PM) Sudarsun: i would expect you to use visual studio 6.0 for your c++ development, no your turbo c++
(1:59:57 PM) sarayu_anusha: we have not learnt abt php..
(2:00:11 PM) sarayu_anusha: ok
(2:00:21 PM) Sudarsun: it's not your mistake... it is the education system which is screwed up.
(2:01:33 PM) sarayu_anusha: what do you suggest.. is it better with php..we are ready to learn but have to start from scratch..
(2:02:05 PM) Sudarsun: no, lets get the c++ module working.
(2:02:17 PM) sarayu_anusha: ok..
(2:16:50 PM) Sudarsun: i am breaking for lunch. talk to you soon.
(2:17:07 PM) sarayu_anusha: ok sir
(2:40:45 PM) Sudarsun: i am back
(2:41:43 PM) sarayu_anusha: we are searching for base papers.. not found an appropriate one yet
(2:41:56 PM) Sudarsun: what about the links that i gave ?
(2:42:21 PM) sarayu_anusha: looking into them too
(2:42:35 PM) Sudarsun: ok
(2:46:37 PM) sarayu_anusha: one of the links about video annotations..
(2:46:46 PM) Sudarsun: yes i know.
(2:50:02 PM) Sudarsun: when you write any emails to me, please cc to projects@arc.sudarsun.in as well.
(2:51:51 PM) sarayu_anusha: ok
(2:52:07 PM) Sudarsun: what are your email ids ?
(2:52:37 PM) sarayu_anusha: sarayu_anusha@yahoo.co.in
(2:52:47 PM) Sudarsun: i am adding your emails to my mailing list projects@arc.sudarsun.in
(2:52:51 PM) sarayu_anusha: sahana_badal@yahoo.com
(2:53:09 PM) sarayu_anusha: ok..
(2:54:02 PM) Sudarsun: you would have received a mail now.
(2:54:19 PM) sarayu_anusha: ya got it
(2:58:42 PM) sarayu_anusha: the first link seems fine.. is it again ieee?
(2:59:06 PM) Sudarsun: do you mean the freepatents ?
(2:59:33 PM) sarayu_anusha: and wont legal databases be difficult to collect
(2:59:38 PM) sarayu_anusha: yes
(2:59:51 PM) Sudarsun: not easy to do.
(3:00:13 PM) Sudarsun: hey.. don't think that you have to do what is given in the reference paper.
(3:00:46 PM) Sudarsun: if you could bend it to your need, it should be perfect.
(3:02:08 PM) sarayu_anusha: so do all these papers have different techniques behind them
(3:02:16 PM) Sudarsun: may be.
(3:06:01 PM) sarayu_anusha: so can we have the same base paper which you saw the other day.. and just change the application.. we kind of understood the technique behind it..
(3:06:46 PM) Sudarsun: should be fine.
(3:07:33 PM) sarayu_anusha: its based on POS tagging algo
(3:07:58 PM) Sudarsun: very fine.
(3:13:25 PM) sarayu_anusha: instead of annotatin the videos how bout havin the DBs itself as stills as in images db
(3:13:43 PM) Sudarsun: what is your point ?
(3:13:55 PM) sarayu_anusha: application db
(3:14:29 PM) Sudarsun: fine. you may have image db as well.
(3:15:13 PM) sarayu_anusha: so tat we can use spatial relations bet. objects as queries
(3:16:09 PM) Sudarsun: how spatial relationships ?
(3:17:32 PM) sarayu_anusha: as in say person A next to person B.. as query which wud generate all images accordingly
(3:18:07 PM) Sudarsun: if you want to query for context, you should have annotations of the image.
(3:18:12 PM) Sudarsun: can you annotate images ?
(3:18:16 PM) Sudarsun: if so, you can do that.
(3:19:07 PM) Sudarsun: i think, you guys are desperate in getting an annotated system work!! :x
(3:19:15 PM) Sudarsun: is it ?
(3:21:32 PM) Sudarsun: if so, here is a way..
1. Start collecting lot of photographs belonging to a particular domain.
2. Prepare a XML DTD to represent the annotation
3. Start annotating the images and represent them in XML format
4. Databases allow XML to be stored in them
5. Develop a NL translator
6. NL translation converts the input query to an XML Xpath query to Database
7. Get the results and populate the output.

(3:24:34 PM) sarayu_anusha: sorry sir.. got disconnected
(3:25:23 PM) Sudarsun: if you want to query for context, you should have annotations of the image.
can you annotate images ?
if so, you can do that.
i think, you guys are desperate in getting an annotated system work!! :x
is it ?
if so, here is a way..
1. Start collecting lot of photographs belonging to a particular domain.
2. Prepare a XML DTD to represent the annotation
3. Start annotating the images and represent them in XML format
4. Databases allow XML to be stored in them
5. Develop a NL translator
6. NL translation converts the input query to an XML Xpath query to Database
7. Get the results and populate the output.

(3:27:02 PM) sarayu_anusha: in tat case we feel song db is interestin..
(3:27:38 PM) Sudarsun: alright.
(3:28:15 PM) sarayu_anusha: so how do we go bout the song db?
(3:29:15 PM) sarayu_anusha: does it require just collectin songs and groupin them?
(3:29:42 PM) Sudarsun: you need to have the details on them.
(3:29:58 PM) Sudarsun: you can query CDDB to get the details of songs.
(3:30:19 PM) Sudarsun: CDDB is a Internet resource.
(3:30:37 PM) sarayu_anusha: ok:)
(3:31:14 PM) sarayu_anusha: details as in?
(3:32:15 PM) Sudarsun: CDs
(3:33:18 PM) sarayu_anusha: ok..
(3:36:44 PM) Sudarsun: breaking for tea. ~o)
(3:40:17 PM) sarayu_anusha: actually lunch sir..
(3:48:10 PM) Sudarsun: oh :(
(3:48:13 PM) Sudarsun: i am back
(3:53:59 PM) sarayu_anusha: we r back too..
(3:54:15 PM) Sudarsun: good.
(4:12:54 PM) sarayu_anusha: so do we have to get details from cddb for our database?
(4:13:15 PM) Sudarsun: that is the easiest.
(4:13:36 PM) Sudarsun: if you get traverse through all the collection you have home, it should also work. :)
(4:15:04 PM) sarayu_anusha: around how large should the DB be
(4:15:41 PM) Sudarsun: if you could have about 200-250 songs...
(4:17:14 PM) sarayu_anusha: tat shud not b a prob.. wher do we start wit the project now?
(4:17:37 PM) Sudarsun: deliver what i had asked for.
(4:25:14 PM) sarayu_anusha: wat wud b the existing system?
(4:25:22 PM) Sudarsun: what do you mean ?
(4:28:16 PM) sarayu_anusha: sorry we forgot.. existing system wud be data rerieval using sql queries
(4:28:27 PM) Sudarsun: hmm
(4:30:20 PM) sarayu_anusha: input - NLP query
(4:30:31 PM) Sudarsun: NL query
(4:32:26 PM) sarayu_anusha: business logic wud be POS taggin algo
(4:32:30 PM) sarayu_anusha: ok..
(4:32:40 PM) Sudarsun: not limited to POS tagging
(4:34:10 PM) sarayu_anusha: ya further there is a query construction algo and monty tagger algo is used..
(4:34:32 PM) Sudarsun: ok
(4:35:12 PM) sarayu_anusha: wat wud be the assumptions and limitations here?
(4:35:50 PM) Sudarsun: write down your assumptions in the project. list down the implicit and explicit assumption you make in this work.
(4:36:38 PM) Sudarsun: limitations are the specifications, beyond which the tool would not work as expect. {basically, this is the boundary condition}
(4:38:21 PM) sarayu_anusha: wen do u want us to submit all of these.. cos our 1st review wil b only next sem..
(4:39:13 PM) Sudarsun: your university reviews does not matter here. i assume that this is a one year project.
(4:39:26 PM) Sudarsun: so fix the deadline for stage1 submission.
(4:39:33 PM) Sudarsun: i mean you give me a date.
(4:44:11 PM) sarayu_anusha: learnin exercises for ATN means does it have to be on paper or u want us to just learn bout it.
(4:45:01 PM) Sudarsun: you should learn about it
(4:47:28 PM) sarayu_anusha: i think we wud need bout 2 months time..
(4:48:17 PM) Sudarsun: that's ok. but this 2 months should be divided into chunks of fortnights. i need a bimonthly review to be done.
(4:48:37 PM) Sudarsun: so you will have to tell me, what you would be able to finish by next 2 weeks and beyond.
(4:49:59 PM) sarayu_anusha: ok.. wat do you think we will be able to finish.. we dont have an exact idea as to how much time all of these would require..
(4:51:09 PM) sarayu_anusha: and how about learning about lex and yacc.. would that be stage 2..
(4:51:45 PM) Sudarsun: lex/yacc can learnt on the fly.
(4:52:04 PM) Sudarsun: i dont know how good or bad are you both. so i cannot give your estimate.
(4:52:19 PM) sarayu_anusha: ok :)
(4:53:55 PM) Sudarsun: i know, i am loading you with lots.. but this is how you learn things.
take it from me, if you successfully finish this project with me, you would become far far far experienced than any of your mates..
but it is going to be very demanding.

(4:55:32 PM) sarayu_anusha: yes we could make that out.. it does sound hectic.. we will put in our best.
(4:56:34 PM) Sudarsun: ok good.
(4:57:29 PM) sarayu_anusha: can you suggest some book to learn ATN from..
(4:58:42 PM) Sudarsun: Natural Language Understanding by James Allen
(4:58:48 PM) sarayu_anusha: we will be ready with the database by 23 of this month..
(4:58:57 PM) Sudarsun: again, you can learn a lot of ATN from the Internet as well.
(4:59:19 PM) Sudarsun: send me a plan in writing, no verbal planning..
(4:59:52 PM) sarayu_anusha: we dont get it..
(5:00:19 PM) Sudarsun: i mean, send the task:time sheet in writing (by email).
(5:00:55 PM) Sudarsun: the sheet should have list of modules,modules broken down as tasks and time required to finish the task.
(5:01:47 PM) Sudarsun: do you have orkut profiles, if so, join Applied Research Council at
http://www.orkut.com/Community.aspx?cmm=21816883

(5:02:19 PM) sarayu_anusha: sarayu does.. will join..
(5:06:37 PM) sarayu_anusha: we will send the task time sheet by tomorrow..
(5:06:58 PM) Sudarsun: fine
(5:09:51 PM) sarayu_anusha: we will take leave for today sir... thanks a lot..
(5:19:26 PM) Sudarsun: sure. bye