Bank Note (UCI, Classification, n=1372, d=4, 2 classes)¶
Loading The Data¶
In [1]:
from kxy_datasets.uci_classifications import BankNote # pip install kxy_datasets
In [2]:
dataset = BankNote()
df = dataset.df # Retrieve the dataset as a pandas dataframe
y_column = dataset.y_column # The name of the column corresponding to the target
problem_type = dataset.problem_type # 'regression' or 'classification'
In [3]:
df.kxy.describe() # Visualize a summary of the data
---------------
Column: Entropy
---------------
Type: Continuous
Max: 2.4
p75: 0.4
Mean: -1.2
Median: -0.6
p25: -2.4
Min: -8.5
---------------
Column: Is Fake
---------------
Type: Continuous
Max: 1.0
p75: 1.0
Mean: 0.4
Median: 0.0
p25: 0.0
Min: 0.0
----------------
Column: Kurtosis
----------------
Type: Continuous
Max: 17
p75: 3.2
Mean: 1.4
Median: 0.6
p25: -1.6
Min: -5.3
----------------
Column: Skewness
----------------
Type: Continuous
Max: 12
p75: 6.8
Mean: 1.9
Median: 2.3
p25: -1.7
Min: -13.8
----------------
Column: Variance
----------------
Type: Continuous
Max: 6.8
p75: 2.8
Mean: 0.4
Median: 0.5
p25: -1.8
Min: -7.0
Data Valuation¶
In [4]:
df.kxy.data_valuation(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s Duration: 0s
Out[4]:
Achievable R-Squared | Achievable Log-Likelihood Per Sample | Achievable Accuracy | |
---|---|---|---|
0 | 0.75 | 0.00 | 1.00 |
Automatic (Model-Free) Variable Selection¶
In [5]:
df.kxy.variable_selection(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s Duration: 0s
Out[5]:
Variable | Running Achievable R-Squared | Running Achievable Accuracy | |
---|---|---|---|
Selection Order | |||
0 | No Variable | 0.00 | 0.56 |
1 | Variance | 0.51 | 0.90 |
2 | Skewness | 0.58 | 0.93 |
3 | Kurtosis | 0.75 | 1.00 |
4 | Entropy | 0.75 | 1.00 |