Energy Efficiency (UCI, Regression, n=768, d=8)¶
Loading The Data¶
In [1]:
from kxy_datasets.uci_regressions import EnergyEfficiency # pip install kxy_datasets
In [2]:
dataset = EnergyEfficiency()
df = dataset.df # Retrieve the dataset as a pandas dataframe
y_column = dataset.y_column # The name of the column corresponding to the target
problem_type = dataset.problem_type # 'regression' or 'classification'
In [3]:
df.kxy.describe() # Visualize a summary of the data
----------
Column: X1
----------
Type: Continuous
Max: 1.0
p75: 0.8
Mean: 0.8
Median: 0.8
p25: 0.7
Min: 0.6
----------
Column: X2
----------
Type: Continuous
Max: 808
p75: 741
Mean: 671
Median: 673
p25: 606
Min: 514
----------
Column: X3
----------
Type: Continuous
Max: 416
p75: 343
Mean: 318
Median: 318
p25: 294
Min: 245
----------
Column: X4
----------
Type: Continuous
Max: 220
p75: 220
Mean: 176
Median: 183
p25: 140
Min: 110
----------
Column: X5
----------
Type: Continuous
Max: 7.0
p75: 7.0
Mean: 5.2
Median: 5.2
p25: 3.5
Min: 3.5
----------
Column: X6
----------
Type: Continuous
Max: 5.0
p75: 4.2
Mean: 3.5
Median: 3.5
p25: 2.8
Min: 2.0
----------
Column: X7
----------
Type: Continuous
Max: 0.4
p75: 0.4
Mean: 0.2
Median: 0.2
p25: 0.1
Min: 0.0
----------
Column: X8
----------
Type: Continuous
Max: 5.0
p75: 4.0
Mean: 2.8
Median: 3.0
p25: 1.8
Min: 0.0
----------
Column: Y1
----------
Type: Continuous
Max: 43
p75: 31
Mean: 22
Median: 18
p25: 12
Min: 6.0
----------
Column: Y2
----------
Type: Continuous
Max: 48
p75: 33
Mean: 24
Median: 22
p25: 15
Min: 10
Data Valuation¶
In [4]:
df.kxy.data_valuation(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s Duration: 0s
Out[4]:
Achievable R-Squared | Achievable Log-Likelihood Per Sample | Achievable RMSE | |
---|---|---|---|
0 | 0.98 | -1.61 | 1.35 |
Automatic (Model-Free) Variable Selection¶
In [5]:
df.kxy.variable_selection(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s Duration: 0s
Out[5]:
Variable | Running Achievable R-Squared | Running Achievable RMSE | |
---|---|---|---|
Selection Order | |||
0 | No Variable | 0.00 | 1.01e+01 |
1 | Y2 | 0.92 | 2.83 |
2 | X3 | 0.98 | 1.38 |
3 | X4 | 0.98 | 1.38 |
4 | X7 | 0.98 | 1.35 |
5 | X2 | 0.98 | 1.35 |
6 | X8 | 0.98 | 1.35 |
7 | X6 | 0.98 | 1.35 |
8 | X5 | 0.98 | 1.35 |
9 | X1 | 0.98 | 1.35 |