Machine Unlearning in Learned Databases: An Experimental Analysis

November 21, 2023 ยท View on GitHub

License: Apache

Introduction

This repository contains the code for experimental analysis of data deletion in learned database systems. For this purpose, we study four different learned database systems:

DBEst++: Approximate Query Processing using Mixture Density Networks,

Naru: Cardinality Estimation using Deep Autoregressive Networks

TVAE: Data Generation using Tabular Variational AutoEncoders

Classification: Tabular data classification using deep neural networks such as ResNet.

Setup

To start installing the packages, run the environmental setup for each application.

bash ./environments/dbest/setup.sh
bash ./environments/naru/setup.sh
bash ./environments/tvae/setup.sh
bash ./environments/tcls/setup.sh

Datasets

We use three real-world datasets for our evaluations: Census, Forest, and DMV. You can download the versions we use from here

Experiments

To increase reproducibility, we have created experimental pipelines for each applications. For census, and forest, you can find related exp_census.py and exp_forest.py scripts. For dmv, there are bash scripts for each dataset to run training/evaluating commands.

References

We have used the codes from the below repositories which are the official implementations of the applications we have studied.

DBEst++

Naru

TVAE