WordNet-SHP: Towards the building of a lexical database for a Peruvian minority language

dc.contributor.affiliationPontificia Universidad Católica del Perú. Departamento de Ingeniería
dc.contributor.authorMaguiño-Valencia, D.
dc.contributor.authorOncevay, A.
dc.contributor.authorSobrevilla Cabezudo, M.A.
dc.date.accessioned2026-03-13T17:01:05Z
dc.date.issued2018
dc.description.abstractWordNet-like resources are lexical databases with highly relevance information and data which could be exploited in more complex computational linguistics research and applications. The building process requires manual and automatic tasks, that could be more arduous if the language is a minority one with fewer digital resources. This study focuses in the construction of an initial WordNetdatabase for a low-resourced and indigenous language in Peru: Shipibo-Konibo (shp). First, the stages of development from a scarce scenario (a bilingual dictionary shp-es) are described. Then, it is proposed a synset alignment method by comparing the definition glosses in the dictionary (written in Spanish) with the content of a Spanish WordNet. In this sense, word2vec similarity was the chosen metric for the proximity measure. Finally, an evaluation process is performed for the synsets, using a manually annotated Gold Standard inShipibo-Konibo. The obtained results are promising, and this resource is expected to serve well in further applications, such as word sense disambiguation and even machine translation in the shp-es language pair.
dc.description.sponsorshipFunding: We highly appreciate the linguistic team effort that made possible the creation of this resource: Dr. Roberto Zariquiey, Alonso Vásquez, Gabriela Tello, Renzo Ego-Aguirre, Lea Reinhardt and Marcela Castro. We are also thankful to our native speakers (Shipibo-Konibo) collaborators: Juan Agustín, Carlos Guimaraes, Ronald Suárez and Miguel Gomez. Finally, we gratefully acknowledge the support of the “Consejo Nacional de Ciencia, Tecnología e Innovación Tecnológica” (CONCYTEC, Peru) under the contract 225-2015-FONDECYT.
dc.identifier.urihttp://hdl.handle.net/20.500.14657/206840
dc.language.isoeng
dc.publisherEuropean Language Resources Association (ELRA)
dc.relation.conferencenameLREC 2018 - 11th International Conference on Language Resources and Evaluation
dc.relation.urihttps://aclanthology.org/L18-1697/
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.subjectWordnet
dc.subjectComputational linguistics
dc.subjectDatabase systems
dc.subjectNatural language processing systems
dc.subjectShips
dc.subjectBilingual dictionary
dc.subjectDigital resources
dc.subjectLexical database
dc.subjectMachine translations
dc.subjectMinority languages
dc.subjectResearch and application
dc.subjectWord Sense Disambiguation
dc.subjectOntology
dc.subject.ocdehttps://purl.org/pe-repo/ocde/ford#1.02.02
dc.titleWordNet-SHP: Towards the building of a lexical database for a Peruvian minority language
dc.typehttp://purl.org/coar/resource_type/c_5794
dc.type.otherComunicación de congreso
dc.type.versionhttps://vocabularies.coar-repositories.org/version_types/c_970fb48d4fbd8a85/

Files

Collections