Skin Segmentation (UCI, Classification, n=245057, d=3, 2 classes)¶
Loading The Data¶
In [1]:
from kxy_datasets.uci_classifications import SkinSegmentation # pip install kxy_datasets
In [2]:
dataset = SkinSegmentation()
df = dataset.df # Retrieve the dataset as a pandas dataframe
y_column = dataset.y_column # The name of the column corresponding to the target
problem_type = dataset.problem_type # 'regression' or 'classification'
In [3]:
df.kxy.describe() # Visualize a summary of the data
---------
Column: B
---------
Type: Continuous
Max: 255
p75: 176
Mean: 125
Median: 139
p25: 68
Min: 0.0
---------
Column: G
---------
Type: Continuous
Max: 255
p75: 177
Mean: 132
Median: 153
p25: 87
Min: 0.0
---------
Column: R
---------
Type: Continuous
Max: 255
p75: 164
Mean: 123
Median: 128
p25: 70
Min: 0.0
---------
Column: y
---------
Type: Continuous
Max: 2.0
p75: 2.0
Mean: 1.8
Median: 2.0
p25: 2.0
Min: 1.0
Data Valuation¶
In [4]:
df.kxy.data_valuation(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s Duration: 0s
Out[4]:
Achievable R-Squared | Achievable Log-Likelihood Per Sample | Achievable Accuracy | |
---|---|---|---|
0 | 0.64 | -1.08e-04 | 1.00 |
Automatic (Model-Free) Variable Selection¶
In [5]:
df.kxy.variable_selection(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s Duration: 0s
Out[5]:
Variable | Running Achievable R-Squared | Running Achievable Accuracy | |
---|---|---|---|
Selection Order | |||
0 | No Variable | 0.00 | 0.79 |
1 | R | 0.40 | 0.93 |
2 | G | 0.63 | 1.00 |
3 | B | 0.64 | 1.00 |