A Study on Privacy-Preserving Synthetic Text Generation via Transformer-Based GANs
A Study on Privacy-Preserving Synthetic Text Generation via Transformer-Based GANs
우타리예바 아쎔(부산대학교); 박혜경(부산대학교); 최윤호(부산대학교)
35권 6호, 1541~1554쪽
초록
Controlled generation of high-quality synthetic data is essential in the modern data-driven world, particularly when addressing human-centered privacy concerns. In this study, we propose a privacy-preserving synthetic text generation framework based on transformer-based generative adversarial networks (GANs). The generator produces fluent, domain-aligned text guided by structured prompts that incorporate human-defined privacy preferences. A multi-task discriminator evaluates each generated sample in three ways: realism, domain appropriateness, and presence of sensitive information. To further enhance the generation process, we introduce a non-parametric feedback loop that iteratively refines the input prompt based on discriminator feedback. Experimental results demonstrate that our method achieves high text quality and strong privacy preservation, enabling on-demand generation of synthetic datasets suitable for fine-tuning large language models in privacy-sensitive domains
Abstract
Controlled generation of high-quality synthetic data is essential in the modern data-driven world, particularly when addressing human-centered privacy concerns. In this study, we propose a privacy-preserving synthetic text generation framework based on transformer-based generative adversarial networks (GANs). The generator produces fluent, domain-aligned text guided by structured prompts that incorporate human-defined privacy preferences. A multi-task discriminator evaluates each generated sample in three ways: realism, domain appropriateness, and presence of sensitive information. To further enhance the generation process, we introduce a non-parametric feedback loop that iteratively refines the input prompt based on discriminator feedback. Experimental results demonstrate that our method achieves high text quality and strong privacy preservation, enabling on-demand generation of synthetic datasets suitable for fine-tuning large language models in privacy-sensitive domains
- 발행기관:
- 한국정보보호학회
- 분류:
- 컴퓨터학