Corpus creation and initial SMT experiments between Spanish and Shipibo-Konibo

dc.contributor.affiliationPontificia Universidad Católica del Perú. Departamento de Ciencias
dc.contributor.affiliationPontificia Universidad Católica del Perú. Departamento de Ingeniería
dc.contributor.authorGalarreta, A.-P.
dc.contributor.authorSasieta, A.
dc.contributor.authorOncevay, A.
dc.date.accessioned2026-03-13T16:58:26Z
dc.date.issued2017
dc.description.abstractIn this paper, we present the first attempts to develop a machine translation (MT) system between Spanish and Shipibo-konibo (es-shp). There are very few digital texts written in Shipibo-konibo and even less bilingual texts that can be aligned, hence we had to create a parallel corpus using both bilingual and monolingual texts. We will describe how this corpus was made, as well as the process we followed to improve the quality of the sentences used to build a statistical MT model or SMT. The results obtained surpassed the baseline proposed (dictionary based) and made a promising result for further development considering the size of corpus used. Finally, it is expected that this MT system can be reinforced with the use of additional linguistic rules and automatic language processing functions that are being implemented.
dc.description.sponsorshipFunding: For this study, the authors acknowledge the support of the “Concejo Nacional de Ciencia, Tec-nología e Innovación Tecnológica” (CONCYTEC Perú) under the contract 225-2015-FONDECYT, and the PAIP research program from the Vicerrec-torado de Investigación, PUCP.
dc.identifier.doihttps://doi.org/10.26615/978-954-452-049-6_033
dc.identifier.urihttp://hdl.handle.net/20.500.14657/205920
dc.language.isoeng
dc.publisherIncoma
dc.relation.conferencenameInterNational Conference Recent Advances in Natural Language Processing, RANLP; Vol. 2017-September (2017)
dc.relation.ispartofurn:isbn:978-954-452-049-6
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.subjectComputer science
dc.subjectMachine translation
dc.subjectNatural language processing
dc.subjectArtificial intelligence
dc.subjectBaseline (sea)
dc.subjectMachine translation system
dc.subjectProcess (computing)
dc.subjectQuality (philosophy)
dc.subjectText corpus
dc.subjectProgramming language
dc.subject.ocdehttps://purl.org/pe-repo/ocde/ford#1.02.01
dc.titleCorpus creation and initial SMT experiments between Spanish and Shipibo-Konibo
dc.typehttp://purl.org/coar/resource_type/c_5794
dc.type.otherComunicación de congreso
dc.type.versionhttps://vocabularies.coar-repositories.org/version_types/c_970fb48d4fbd8a85/

Files

Collections