Classifier based on straight line segments: an overview and theoretical improvements
Medina Rodríguez, Rosario Alejandra
Literature offers several supervised machine learning algorithms focused on binary classification for solving daily problems. Compared to well-known conventional classifiers, the Straight-line Segment Classifier (SLS Classifier) stands out for its low complexity and competitiveness. It takes advantage of some good characteristics of Learning Vector Quantization and Nearest Feature Line. In addition, it has lower computational complexity than Support Vector Machines. The SLS binary classifier is based on distances between a set of points and two sets of straight line segments. Therefore, it involves finding the optimal placement of straight line segment extremities to achieve the minimum mean square error. In previous works, we explored three different evolutive algorithms as optimization methods to increase the possibilities of finding a global optimum generating different solutions as the initial population. Additionally, we proposed a new way of estimating the number of straight line segments by applying an unsupervised clustering method. However, some interesting questions remained to be further analyzed, such as a detailed analysis of the parameters and base definitions of the optimization algorithm. Furthermore, it was straightforward that the straight-line segment lengths can grow significantly during the training phase, negatively impacting the classification rate. Therefore, the main goal of this thesis is to outline the SLS Classifier baseline and propose some theoretical improvements, such as (i) Formulating an optimization approach to provide optimal final positions for the straight line segments; (ii) Proposing a model selection approach for the SLS Classifier; and, (iii) Determining the SLS Classifier performance when applied on real problems (10 artificial and 8 UCI public datasets). The proposed methodology showed promising results compared to the original SLS Classifier version and other classifiers. Moreover, this classifier can be used in research and industry for decisionmaking problems due to the straightforward interpretation and classification rates.