Cimetta, Elisa and Zanella, Luca and Bezzo, Fabrizio and Facco, Pierantonio (2022) Feature selection and molecular classification of cancer phenotypes: a comparative study. [Data Collection]
This is the latest version of this item.
Available Versions of this Item
-
Feature selection and molecular classification of cancer phenotypes: a comparative study. (deposited 09 Aug 2022 15:40)
- Feature selection and molecular classification of cancer phenotypes: a comparative study. (deposited 02 May 2023 06:29) [Currently Displayed]
Collection description
Classification of high dimensional gene expression data is key to the development of effective di-agnostic and prognostic tools. Feature selection involves finding the best subset with the highest power in predicting class labels. We here conducted a comparative study focused on different combinations of feature selectors (Chi-Squared, mRMR, Relief-F, Genetic Algorithms) and classi-fication learning algorithms (Random Forests, PLS-DA, SVM, Regularized Logistic/Multinomial Regression, kNN) to identify those with the best predictive capacity. The performance of each combination is evaluated through an empirical study on three benchmark cancer-related micro-array datasets. Our results first suggest that the quality of the data relevant to the target classes is key for the successful classification of cancer phenotypes. We also proved that, for a given classi-fication learning algorithm and dataset, all filters have a similar performance. Interestingly, fil-ters achieve comparable or even better results with respect to the GA-based wrappers, while also being easier to implement and faster. Taken together, our findings suggest that simple, well-established feature selectors in combination with optimized classifiers guarantee good per-formances, with no need for complicated and computationally demanding methodologies