애스크로AIPublic Preview
← 학술논문 검색
학술논문JIPS(Journal of Information Processing Systems)2014.06 발행

Default Prediction for Real Estate Companies with Imbalanced Dataset

Default Prediction for Real Estate Companies with Imbalanced Dataset

Yuan-Xiang Dong(Chongqing University); Zhi Xiao(Chongqing University); Xue Xiao(National University of Singapore)

10권 2호, 314~333쪽

초록

When analyzing default predictions in real estate companies, the number ofnon-defaulted cases always greatly exceeds the defaulted ones, which creates the twoclassimbalance problem. This lowers the ability of prediction models to distinguish thedefault sample. In order to avoid this sample selection bias and to improve theprediction model, this paper applies a minority sample generation approach to createnew minority samples. The logistic regression, support vector machine (SVM)classification, and neural network (NN) classification use an imbalanced dataset. Theywere used as benchmarks with a single prediction model that used a balanced datasetcorrected by the minority samples generation approach. Instead of using predictionorientedtests and the overall accuracy, the true positive rate (TPR), the true negativerate (TNR), G-mean, and F-score are used to measure the performance of defaultprediction models for imbalanced dataset. In this paper, we describe an empiricalexperiment that used a sampling of 14 default and 315 non-default listed real estatecompanies in China and report that most results using single prediction models with abalanced dataset generated better results than an imbalanced dataset.

Abstract

When analyzing default predictions in real estate companies, the number ofnon-defaulted cases always greatly exceeds the defaulted ones, which creates the twoclassimbalance problem. This lowers the ability of prediction models to distinguish thedefault sample. In order to avoid this sample selection bias and to improve theprediction model, this paper applies a minority sample generation approach to createnew minority samples. The logistic regression, support vector machine (SVM)classification, and neural network (NN) classification use an imbalanced dataset. Theywere used as benchmarks with a single prediction model that used a balanced datasetcorrected by the minority samples generation approach. Instead of using predictionorientedtests and the overall accuracy, the true positive rate (TPR), the true negativerate (TNR), G-mean, and F-score are used to measure the performance of defaultprediction models for imbalanced dataset. In this paper, we describe an empiricalexperiment that used a sampling of 14 default and 315 non-default listed real estatecompanies in China and report that most results using single prediction models with abalanced dataset generated better results than an imbalanced dataset.

발행기관:
한국정보처리학회
DOI:
http://dx.doi.org/10.3745/JIPS.04.0002
분류:
기타컴퓨터학

AI 법률 상담

이 논문의 주제에 대해 더 알고 싶으신가요?

460만+ 법률 자료에서 관련 판례·법령·해석례를 찾아 답변합니다

AI 상담 시작
Default Prediction for Real Estate Companies with Imbalanced Dataset | JIPS(Journal of Information Processing Systems) 2014 | AskLaw | 애스크로 AI