Breast Cancer Screening

Machine Learning analysis of the Wiconsin's (Diagnostic) Breast Cancer dataset ( are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. The goal is to determine whether the characteristics extracted from a breast cancer exam correspond to a malignant or benign type of cancer. We also measure the cross-validation performance of a decision tree with a maximum tree height of 5 and where variables are able to join the decision path at most 2 times during evaluation.


Feature of the Breast Cancer dataset

Ten real-valued features are computed for each cell nucleus:

a) radius (mean of distances from center to points on the perimeter) b) texture (standard deviation of gray-scale values) c) perimeter d) area e) smoothness (local variation in radius lengths) f) compactness (perimeter^2 / area - 1.0) g) concavity (severity of concave portions of the contour) h) concave points (number of concave portions of the contour) i) symmetry j) fractal dimension ("coastline approximation" - 1). The mean, standard error and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features. For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is Worst Radius.

Record Mean Radius Texture Perimeter Area Smoothness Compactness Concavity Concave Points Symmetry Fractal dimension Class Lable