- October 10, 2015
- Category: Scientific Publications
P. Desbordes1,2, R.Modzelewski1,3, S. Vauclin2, P. Vera1,3, I. Gardin1,3
1 QuantIF LITIS – EA4108, University of Rouen, Rouen, FRANCE,
2 Dosisoft, Cachan, FRANCE
3 Henri Becquerel Center, Rouen, FRANCE
Presented at EANM 2015
Aim: Many features can be extracted from 18FDG PET images to describe cancer. We propose a machine learning technique based on a Random Forest (RF) classifier to select features having a prognostic or predictive value among a large amount of different characteristics.
Materials and Methods: 65 features are extracted from medical records (age, stage⋯) and PET images: classical features (SUV, Metabolically Tumor Volume (MTV) ⋯), 1st order features (skewness, entropy ⋯) and texture parameters from texture matrices: Gray Level Cooccurrence Matrix (GLCM), Gray Level Zone Length Matrix (GLZLM) and Gray Level Difference Matrix (GLDM). Patient classification is performed using RF algorithm with 2000 decision trees, firstly without any Feature Selection (FS), and secondly with a FS. The selection is performed in 2 steps. First, a correlation analysis is done using the Spearman method to keep uncorrelated features. They are compared two by two and are considered as correlated if the Spearman coefficient (sp) verified |sp| ≥ 0.8 and p < 0.05. Next, the RF algorithm is applied on the remaining features to find the most relevant features using the importance index. The RF classifier has been applied to a database of 66 patients with an oesophageal cancer treated by radio-chemotherapy (CRT). The classification accuracy has been evaluated using the Out-Of-Bag (OOB) error.
Results: When the RF classifier is applied to the 65 initial features, OOB error reaches 33.3% and 25.8% for prognostic and predictive studies respectively. The FS strategy improves the classification accuracy, to reach an OOB error of 22.7% and 22.7% respectively. The Spearman analysis revealed that none of the clinical data are correlated with PET characteristics, neither for correlation (GLCM), Cluster Shade (CS, GLCM) Busyness (GLDM) and Zone Percentage (ZP, GLZLM). Twelve groups of correlated features can be created leading to 31/65 features selected. The best 3 prognostic features are MTV, correlation (GLCM) and the Nutritional Risk Index (NRI), whereas the best 3 predictive features are MTV, ZP and correlation (GLCM).
Conclusion: ML technique, such as random forest classifier, is an interesting tool to find the most relevant among a large amount of features, to classify patients. A FS is mandatory to improve the classification accuracy. In case of oesophageal cancer, MTV and texture parameters appear as relevant feature and improve predictions.