WordNet-SHP: Towards the building of a lexical database for a Peruvian minority language

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

European Language Resources Association (ELRA)

DOI

Acceso al texto completo solo para la Comunidad PUCP

Abstract

WordNet-like resources are lexical databases with highly relevance information and data which could be exploited in more complex computational linguistics research and applications. The building process requires manual and automatic tasks, that could be more arduous if the language is a minority one with fewer digital resources. This study focuses in the construction of an initial WordNetdatabase for a low-resourced and indigenous language in Peru: Shipibo-Konibo (shp). First, the stages of development from a scarce scenario (a bilingual dictionary shp-es) are described. Then, it is proposed a synset alignment method by comparing the definition glosses in the dictionary (written in Spanish) with the content of a Spanish WordNet. In this sense, word2vec similarity was the chosen metric for the proximity measure. Finally, an evaluation process is performed for the synsets, using a manually annotated Gold Standard inShipibo-Konibo. The obtained results are promising, and this resource is expected to serve well in further applications, such as word sense disambiguation and even machine translation in the shp-es language pair.

Description

Keywords

Wordnet, Computational linguistics, Database systems, Natural language processing systems, Ships, Bilingual dictionary, Digital resources, Lexical database, Machine translations, Minority languages, Research and application, Word Sense Disambiguation, Ontology

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By