Ship-lemmatagger: Building an nlp toolkit for a peruvian native language

dc.contributor.affiliationPontificia Universidad Católica del Perú. Facultad de Ciencias e Ingeniería
dc.contributor.affiliationPontificia Universidad Católica del Perú. Departamento de Ingeniería
dc.contributor.authorPereira-Noriega, J.
dc.contributor.authorMercado-Gonzales, R.
dc.contributor.authorSasieta, A.
dc.contributor.authorSobrevilla Cabezudo, M.A.S.
dc.contributor.authorOncevay, A.
dc.date.accessioned2026-03-13T16:58:40Z
dc.date.issued2017
dc.description.abstractNatural Language Processing deals with the understanding and generation of texts through computer programs. There are many different functionalities used in this area, but among them there are some functions that are the support of the remaining ones. These methods are related to the core processing of the morphology of the language (such as lemmatization) and automatic identification of the part-of-speech tag. Thereby, this paper describes the implementation of a basic NLP toolkit for a new language, focusing in the features mentioned before, and testing them in an own corpus built for the occasion. The obtained results exceeded the expected results and could be used for more complex tasks such as machine translation.
dc.description.sponsorshipFunding: Acknowledgments. For this study, the authors appreciate the linguistic team effort that made possible the corpus annotatión, and also acknowledge the support of the “Consejo Nacional de Ciencia, Tecnología e Innovación Tecnológica” (CONCYTEC Perú) under the contract 225-2015-FONDECYT.
dc.identifier.doihttps://doi.org/10.1007/978-3-319-64206-2_53
dc.identifier.urihttp://hdl.handle.net/20.500.14657/206012
dc.language.isoeng
dc.publisherSpringer Verlag
dc.relation.conferencenameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10415 LNAI (2017)
dc.relation.ispartofurn:isbn:978-3-319-64206-2
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.subjectLemmatisation
dc.subjectComputer science
dc.subjectMachine translation
dc.subjectNatural language processing
dc.subjectArtificial intelligence
dc.subjectIdentification (biology)
dc.subject.ocdehttps://purl.org/pe-repo/ocde/ford#1.02.01
dc.titleShip-lemmatagger: Building an nlp toolkit for a peruvian native language
dc.typehttp://purl.org/coar/resource_type/c_5794
dc.type.otherComunicación de congreso
dc.type.versionhttps://vocabularies.coar-repositories.org/version_types/c_970fb48d4fbd8a85/

Files

Collections