Tianhang Zheng (University of Missouri-Kansas City), Baochun Li (University of Toronto)
Recent work in ICML’22 established a connection between dataset condensation (DC) and differential privacy (DP), which is unfortunately problematic. To correctly connect DC and DP, we propose two differentially private dataset condensation (DPDC) algorithms—LDPDC and NDPDC. LDPDC is a linear DC algorithm that can be executed on a low-end Central Processing Unit (CPU), while NDPDC is a nonlinear DC algorithm that leverages neural networks to extract and match the latent representations between real and synthetic data. Through extensive evaluations, we demonstrate that LDPDC has comparable performance to recent DP generative methods despite its simplicity. NDPDC provides acceptable DP guarantees with a mild utility loss, compared to distribution matching (DM). Additionally, NDPDC allows a flexible trade-off between the synthetic data utility and DP budget.