SHAP Value를 활용한 국내 브랜드 중고차의 가격 예측 모델에 관한 연구: 차급 특성을 중심으로
A Study on Price Prediction Model of Used Car of Domestic brands Using SHAP Value: Focus on Feature of the Vehicle Size
임승준(홍익대학교 경영학과); 이정호(건국대학교 경영학과); 류춘호(홍익대학교 경영학과)
49권 2호, 21~41쪽
초록
This study conducted crawling on data from used car online platforms. As a result, data on 28,751 vehicles including 72 features of vehicle specifications and vehicle options were obtained. And this data were classified by vehicle size. Next, we tried to select the model for the best used car price prediction by vehicle size by executing a ML model. To this purpose, we first execute a Decision Tree based ML models that utilizes the all feature by vehicle size, and feature selection(removing the feature with zero influence) was executed through the Lasso regression model with the same sampling, based on this, a Decision Tree based ML model was re-executed. As a result, it was confirmed that there was no difference in the cost function between them. Next, a stacking ensemble model was executed based on the execution results of the bagging ensemble model and the boosting ensemble model by vehicle size. As a result, the excellence of the stacking ensemble model was not confirmed. Lastly, Tree SHAP Value was visualized for the best models by vehicle size to confirm the attribution level and direction of the feature. Through this, it is intended to prove the usability of the Lasso regression model through feature selection techniques and to support problem solving due to asymmetry of information between used car sales officials.
Abstract
This study conducted crawling on data from used car online platforms. As a result, data on 28,751 vehicles including 72 features of vehicle specifications and vehicle options were obtained. And this data were classified by vehicle size. Next, we tried to select the model for the best used car price prediction by vehicle size by executing a ML model. To this purpose, we first execute a Decision Tree based ML models that utilizes the all feature by vehicle size, and feature selection(removing the feature with zero influence) was executed through the Lasso regression model with the same sampling, based on this, a Decision Tree based ML model was re-executed. As a result, it was confirmed that there was no difference in the cost function between them. Next, a stacking ensemble model was executed based on the execution results of the bagging ensemble model and the boosting ensemble model by vehicle size. As a result, the excellence of the stacking ensemble model was not confirmed. Lastly, Tree SHAP Value was visualized for the best models by vehicle size to confirm the attribution level and direction of the feature. Through this, it is intended to prove the usability of the Lasso regression model through feature selection techniques and to support problem solving due to asymmetry of information between used car sales officials.
- 발행기관:
- 한국경영과학회
- 분류:
- 경영학