Abalone (UCI, Regression, n=4177, d=8)¶
Loading The Data¶
In [1]:
from kxy_datasets.uci_regressions import Abalone # pip install kxy_datasets
In [2]:
dataset = Abalone()
df = dataset.df # Retrieve the dataset as a pandas dataframe
y_column = dataset.y_column # The name of the column corresponding to the target
problem_type = dataset.problem_type # 'regression' or 'classification'
In [3]:
df.kxy.describe() # Visualize a summary of the data
-----------
Column: Age
-----------
Type: Continuous
Max: 30
p75: 12
Mean: 11
Median: 10
p25: 9.5
Min: 2.5
----------------
Column: Diameter
----------------
Type: Continuous
Max: 0.7
p75: 0.5
Mean: 0.4
Median: 0.4
p25: 0.3
Min: 0.1
--------------
Column: Height
--------------
Type: Continuous
Max: 1.1
p75: 0.2
Mean: 0.1
Median: 0.1
p25: 0.1
Min: 0.0
--------------
Column: Length
--------------
Type: Continuous
Max: 0.8
p75: 0.6
Mean: 0.5
Median: 0.5
p25: 0.5
Min: 0.1
-----------
Column: Sex
-----------
Type: Categorical
Frequency: 36.58%, Label: M
Frequency: 32.13%, Label: I
Frequency: 31.29%, Label: F
--------------------
Column: Shell weight
--------------------
Type: Continuous
Max: 1.0
p75: 0.3
Mean: 0.2
Median: 0.2
p25: 0.1
Min: 0.0
----------------------
Column: Shucked weight
----------------------
Type: Continuous
Max: 1.5
p75: 0.5
Mean: 0.4
Median: 0.3
p25: 0.2
Min: 0.0
----------------------
Column: Viscera weight
----------------------
Type: Continuous
Max: 0.8
p75: 0.3
Mean: 0.2
Median: 0.2
p25: 0.1
Min: 0.0
--------------------
Column: Whole weight
--------------------
Type: Continuous
Max: 2.8
p75: 1.2
Mean: 0.8
Median: 0.8
p25: 0.4
Min: 0.0
Data Valuation¶
In [4]:
df.kxy.data_valuation(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s Duration: 0s
Out[4]:
Achievable R-Squared | Achievable Log-Likelihood Per Sample | Achievable RMSE | |
---|---|---|---|
0 | 1.00 | 2.46 | 2.50e-02 |
Automatic (Model-Free) Variable Selection¶
In [5]:
df.kxy.variable_selection(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s Duration: 0s
Out[5]:
Variable | Running Achievable R-Squared | Running Achievable RMSE | |
---|---|---|---|
Selection Order | |||
0 | No Variable | 0.00 | 3.22 |
1 | Shell weight | 0.58 | 2.09 |
2 | Shucked weight | 0.64 | 1.93 |
3 | Whole weight | 0.64 | 1.93 |
4 | Height | 0.92 | 0.92 |
5 | Sex | 0.92 | 0.92 |
6 | Viscera weight | 0.92 | 0.92 |
7 | Diameter | 0.92 | 0.92 |
8 | Length | 1.00 | 0.0250 |