CLD2: Language Documentation Meets Natural Language Processing for Revitalising Endangered Languages

dc.contributor.affiliationPontificia Universidad Católica del Perú. Departamento de Humanidades
dc.contributor.authorZariquiey, R.
dc.contributor.authorOncevay, A.
dc.contributor.authorVera, J.
dc.date.accessioned2026-03-13T16:58:35Z
dc.date.issued2022
dc.description.abstractLanguage revitalisation should not be understood as a direct outcome of language documentation, which is mainly focused on the creation of language repositories. Natural language processing (NLP) offers the potential to complement and exploit these repositories through the development of language technologies that may contribute to improving the vitality status of endangered languages. In this paper, we discuss the current state of the interaction between language documentation and computational linguistics, present a diagnosis of how the outputs of recent documentation projects for endangered languages are underutilised for the NLP community, and discuss how the situation could change from both the documentary linguistics and NLP perspectives. All this is introduced as a bridging paradigm dubbed as Computational Language Documentation and Development (CLD). CLD calls for (1) the inclusion of NLP-friendly annotated data as a deliverable of future language documentation projects; and (2) the exploitation of language documentation databases by the NLP community to promote the computerization of endangered languages, as one way to contribute to their revitalization.
dc.description.sponsorshipFunding: The first author acknowledges the support of CONCYTEC-ProCiencia, Peru, under the contract 183-2018-FONDECYT-BM-IADT-MU from the funding call E041-2018-01-BM.
dc.identifier.doihttps://doi.org/10.18653/v1/2022.computel-1.4
dc.identifier.urihttp://hdl.handle.net/20.500.14657/205967
dc.language.isoeng
dc.publisherAssociation for Computational Linguistics (ACL)
dc.relation.conferencenameCOMPUTEL 2022 - 5th Workshop on the Use of Computational Methods in the Study of Endangered Languages, Proceedings of the Workshop
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.subjectDocumentation
dc.subjectComputer science
dc.subjectNatural language processing
dc.subjectTechnical documentation
dc.subjectNatural language
dc.subjectLanguage technology
dc.subjectExploit
dc.subjectArtificial intelligence
dc.subjectLinguistics
dc.subjectProgramming language
dc.subjectComprehension approach
dc.subject.ocdehttps://purl.org/pe-repo/ocde/ford#1.02.01
dc.titleCLD2: Language Documentation Meets Natural Language Processing for Revitalising Endangered Languages
dc.typehttp://purl.org/coar/resource_type/c_5794
dc.type.otherComunicación de congreso
dc.type.versionhttps://vocabularies.coar-repositories.org/version_types/c_970fb48d4fbd8a85/

Files

Collections