(Pontificia Universidad Católica del Perú, 2024-04-15) Salas, Fabio; Caldas, Josué
Despite improved higher education accessibility in low and middle-income countries (LMICs), challenges persist in student drop-out, especially for socio-economically disadvantaged students. While machine learning models have enhanced our understanding of this challenge by predicting academic performance, many studies overlook LMIC-specific institutional factors or focus on specific courses, limiting their generalizability and policy uses. To address these issues, the authors compiled a comprehensive database using administrative and census data to predict undergraduate academic performance at the Pontifical Catholic University of Peru (PUCP). The study found that the most effective models were tree-based ensembles, particularly Random Forest, with key predictors including prior secondary school performance and university admission test scores. They present a high-performing model using only ten features that can predict future academic performance and potentially aid in reducing student drop-out at PUCP.