Grammar extraction from treebanks for Hindi and telugu

Kolachina, Prasanth; Kolachina, Sudheer; Singh, Anil Kumar; Husain, Samar; Naidu, Viswanatha; Sangal, Rajeev; Bharati, Akshar

Grammar extraction from treebanks for Hindi and telugu

dc.contributor.author	Kolachina, Prasanth
dc.contributor.author	Kolachina, Sudheer
dc.contributor.author	Singh, Anil Kumar
dc.contributor.author	Husain, Samar
dc.contributor.author	Naidu, Viswanatha
dc.contributor.author	Sangal, Rajeev
dc.contributor.author	Bharati, Akshar
dc.date.accessioned	2022-03-26T13:38:05Z
dc.date.available	2022-03-26T13:38:05Z
dc.date.issued	2010-01-01
dc.description.abstract	Grammars play an important role in many Natural Language Processing (NLP) applications. The traditional approach to creating grammars manually, besides being labor-intensive, has several limitations. With the availability of large scale syntactically annotated tree-banks, it is now possible to automatically extract an approximate grammar of a language in any of the existing formalisms from a corresponding treebank. In this paper, we present a basic approach to extract grammars from dependency treebanks of two Indian languages, Hindi and Telugu. The process of grammar extraction requires a generalization mechanism. Towards this end, we explore an approach which relies on generalization of argument structure over the verbs based on their syntactic similarity. Such a generalization counters the effect of data sparseness in the treebanks. A grammar extracted using this system can not only expand already existing knowledge bases for NLP tasks such as parsing, but also aid in the creation of grammars for languages where none exist. Further, we show that the grammar extraction process can help in identifying annotation errors and thus aid in the task of the treebank validation.
dc.identifier.citation	Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010
dc.identifier.uri	https://dspace.uohyd.ac.in/handle/1/2034
dc.title	Grammar extraction from treebanks for Hindi and telugu
dc.type	Conference Proceeding. Conference Paper
dspace.entity.type

Files

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Applied Linguistics and Translation Studies - Publications