Your Data¶

How We Use Your Data¶

To run our analyzes, the KXY backend needs your data. The methods below are the only methods involved in sharing your data with us. The kxy package only uploads your data if and when needed.

kxy.api.data_transfer.generate_upload_url(file_name)¶

Requests a pre-signed URL to upload a dataset.

Parameters: file_name (str) – A string that uniquely identifies the content of the file.
Returns: d – The dictionary containing the pre-signed url.
Return type: dict or None

kxy.api.data_transfer.upload_data(df, file_name=None)¶

Updloads a dataframe to kxy servers.

Parameters: df (pd.DataFrame) – The dataframe to upload.
Returns: d – Whether the upload was successful.
Return type: bool

Anonymizing Your Data¶

Fortunately, our analyses are invariant by various transformations that can completely anonymize your data.

You may simply run df_anonymized = df.kxy.anonymize() on any dataframe df to anonymize it, and work with df_anonymized instead df.

Check out the function below for more information on how we anonymize your data.

BaseAccessor.anonymize(columns_to_exclude=[])¶

Anonymize the dataframe in a manner that leaves all pre-learning and post-learning analyses (including data valuation, variable selection, model-driven improvability, data-driven improvability and model explanation) invariant.

Any transformation on continuous variables that preserves ranks will not change our pre-learning and post-learning analyses. The same holds for any 1-to-1 transformation on categorical variables.

This implementation replaces ordinal values (i.e. any column that can be cast as a float) with their within-column Gaussian score. For each non-ordinal column, we form the set of all possible values, we assign a unique integer index to each value in the set, and we systematically replace said value appearing in the dataframe by the hexadecimal code of its associated integer index.

For regression problems, accurate estimation of RMSE related metrics require the target column (and the prediction column for post-learning analyses) not to be anonymized.

Parameters: columns_to_exclude (list (optional)) – List of columns not to anonymize (e.g. target and prediction columns for regression problems).
Returns: result – The result is a pandas.Dataframe with columns (where applicable):
Return type: pandas.DataFrame