Shuttle (UCI, Classification, n=58000, d=9, 7 classes)¶
Loading The Data¶
In [1]:
from kxy_datasets.uci_classifications import Shuttle # pip install kxy_datasets
In [2]:
dataset = Shuttle()
df = dataset.df # Retrieve the dataset as a pandas dataframe
y_column = dataset.y_column # The name of the column corresponding to the target
problem_type = dataset.problem_type # 'regression' or 'classification'
In [3]:
df.kxy.describe() # Visualize a summary of the data
-----------
Column: x_0
-----------
Type: Continuous
Max: 126
p75: 55
Mean: 48
Median: 45
p25: 38
Min: 27
-----------
Column: x_1
-----------
Type: Continuous
Max: 5,075
p75: 0.0
Mean: -0.0
Median: 0.0
p25: 0.0
Min: -4821.0
-----------
Column: x_2
-----------
Type: Continuous
Max: 149
p75: 89
Mean: 85
Median: 83
p25: 79
Min: 21
-----------
Column: x_3
-----------
Type: Continuous
Max: 3,830
p75: 0.0
Mean: 0.3
Median: 0.0
p25: 0.0
Min: -3939.0
-----------
Column: x_4
-----------
Type: Continuous
Max: 436
p75: 46
Mean: 34
Median: 42
p25: 26
Min: -188.0
-----------
Column: x_5
-----------
Type: Continuous
Max: 15,164
p75: 5.0
Mean: 1.6
Median: 0.0
p25: -5.0
Min: -26739.0
-----------
Column: x_6
-----------
Type: Continuous
Max: 105
p75: 42
Mean: 37
Median: 39
p25: 32
Min: -48.0
-----------
Column: x_7
-----------
Type: Continuous
Max: 270
p75: 60
Mean: 50
Median: 44
p25: 37
Min: -353.0
-----------
Column: x_8
-----------
Type: Continuous
Max: 266
p75: 14
Mean: 13
Median: 2.0
p25: 0.0
Min: -356.0
---------
Column: y
---------
Type: Continuous
Max: 7.0
p75: 1.0
Mean: 1.7
Median: 1.0
p25: 1.0
Min: 1.0
Data Valuation¶
In [4]:
df.kxy.data_valuation(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s Duration: 0s
Out[4]:
Achievable R-Squared | Achievable Log-Likelihood Per Sample | Achievable Accuracy | |
---|---|---|---|
0 | 0.74 | 0.00 | 1.00 |
Automatic (Model-Free) Variable Selection¶
In [5]:
df.kxy.variable_selection(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s Duration: 0s
Out[5]:
Variable | Running Achievable R-Squared | Running Achievable Accuracy | |
---|---|---|---|
Selection Order | |||
0 | No Variable | 0.00 | 0.79 |
1 | x_0 | 0.65 | 1.00 |
2 | x_8 | 0.73 | 1.00 |
3 | x_1 | 0.74 | 1.00 |
4 | x_3 | 0.74 | 1.00 |
5 | x_2 | 0.74 | 1.00 |
6 | x_5 | 0.74 | 1.00 |
7 | x_4 | 0.74 | 1.00 |
8 | x_6 | 0.74 | 1.00 |
9 | x_7 | 0.74 | 1.00 |