애스크로AIPublic Preview
← 학술논문 검색
학술논문한국경영과학회지2026.02 발행

TVAE 기반 합성데이터 활용이 복지 사각지대 예측모형의 변수 중요도에 미치는 영향: 원본 데이터와 결합 데이터의 비교 분석

The Impact of TVAE-Based Synthetic Data Utilization on Feature Importance in Welfare Blind-Spot Prediction Models: A Comparative Analysis of Original and Combined Datasets

박영식(한성대학교 경영학과); 이동원(한성대학교 경영학과); 이형용(한성대학교 경영학부)

51권 1호, 51~69쪽

초록

Welfare blind spots arise when households in need are not identified in a timely manner, partly because policy indicators are introduced gradually and are therefore structurally missing in earlier administrative records. This study proposes a recall-oriented decision-support framework that integrates a Tabular Variational Autoencoder (TVAE) to reconstruct historically unavailable policy variables and to examine how progressive feature availability affects model performance and interpretability. Using 2,630,195 welfare-crisis records collected between 2019 and 2023, we designed a four-stage empirical procedure. In Stage 1, a complete-data benchmark was established using the most recent timeline to evaluate whether incremental feature expansion improves classification performance without compromising recall. Stage 2 applied a time-split training strategy to prevent temporal information leakage when generating synthetic values for late-introduced policy variables, and the fidelity of the generated data was assessed through distributional similarity measures. Stage 3 examined recall-oriented classification under progressive feature augmentation using tree-based ensemble models with threshold tuning aligned with policy objectives. The results show that expanding feature availability improved overall discrimination while maintaining high recall suitable for omission-minimizing screening. In Stage 4, feature importance was compared across cumulatively combined datasets to trace how key risk factors evolved as historical coverage expanded. The analysis identified risk factors whose influence remained stable across data expansion, as well as factors whose importance emerged only after additional policy variables became available. Overall, the findings demonstrate that leakage-controlled generative reconstruction combined with recall-focused decision boundaries can enhance the robustness and interpretability of welfare screening systems operating under structurally missing policy information.

Abstract

Welfare blind spots arise when households in need are not identified in a timely manner, partly because policy indicators are introduced gradually and are therefore structurally missing in earlier administrative records. This study proposes a recall-oriented decision-support framework that integrates a Tabular Variational Autoencoder (TVAE) to reconstruct historically unavailable policy variables and to examine how progressive feature availability affects model performance and interpretability. Using 2,630,195 welfare-crisis records collected between 2019 and 2023, we designed a four-stage empirical procedure. In Stage 1, a complete-data benchmark was established using the most recent timeline to evaluate whether incremental feature expansion improves classification performance without compromising recall. Stage 2 applied a time-split training strategy to prevent temporal information leakage when generating synthetic values for late-introduced policy variables, and the fidelity of the generated data was assessed through distributional similarity measures. Stage 3 examined recall-oriented classification under progressive feature augmentation using tree-based ensemble models with threshold tuning aligned with policy objectives. The results show that expanding feature availability improved overall discrimination while maintaining high recall suitable for omission-minimizing screening. In Stage 4, feature importance was compared across cumulatively combined datasets to trace how key risk factors evolved as historical coverage expanded. The analysis identified risk factors whose influence remained stable across data expansion, as well as factors whose importance emerged only after additional policy variables became available. Overall, the findings demonstrate that leakage-controlled generative reconstruction combined with recall-focused decision boundaries can enhance the robustness and interpretability of welfare screening systems operating under structurally missing policy information.

발행기관:
한국경영과학회
분류:
경영학

AI 법률 상담

이 논문의 주제에 대해 더 알고 싶으신가요?

460만+ 법률 자료에서 관련 판례·법령·해석례를 찾아 답변합니다

AI 상담 시작
TVAE 기반 합성데이터 활용이 복지 사각지대 예측모형의 변수 중요도에 미치는 영향: 원본 데이터와 결합 데이터의 비교 분석 | 한국경영과학회지 2026 | AskLaw | 애스크로 AI