F_MixBERT: Sentiment Analysis Model using Focal Loss for Imbalanced E-commerce Reviews
F_MixBERT: Sentiment Analysis Model using Focal Loss for Imbalanced E-commerce Reviews
Fengqian Pang(North China University of Technology); Xi Chen(North China University of Technology); Letong Li(North China University of Technology); Xin Xu(North China University of Technology); Zhiqiang Xing(North China University of Technology)
18권 2호, 263~283쪽
초록
Users' comments after online shopping are critical to product reputation and business improvement. These comments, sometimes known as e-commerce reviews, influence other customers' purchasing decisions. To confront large amounts of e-commerce reviews, automatic analysis based on machine learning and deep learning draws more and more attention. A core task therein is sentiment analysis. However, the e-commerce reviews exhibit the following characteristics: (1) inconsistency between comment content and the star rating; (2) a large number of unlabeled data, i.e., comments without a star rating, and (3) the data imbalance caused by the sparse negative comments. This paper employs Bidirectional Encoder Representation from Transformers (BERT), one of the best natural language processing models, as the base model. According to the above data characteristics, we propose the F_MixBERT framework, to more effectively use inconsistently low-quality and unlabeled data and resolve the problem of data imbalance. In the framework, the proposed MixBERT incorporates the MixMatch approach into BERT’s high-dimensional vectors to train the unlabeled and low-quality data with generated pseudo labels. Meanwhile, data imbalance is resolved by Focal loss, which penalizes the contribution of large-scale data and easily-identifiable data to total loss. Comparative experiments demonstrate that the proposed framework outperforms BERT and MixBERT for sentiment analysis of e-commerce comments.
Abstract
Users' comments after online shopping are critical to product reputation and business improvement. These comments, sometimes known as e-commerce reviews, influence other customers' purchasing decisions. To confront large amounts of e-commerce reviews, automatic analysis based on machine learning and deep learning draws more and more attention. A core task therein is sentiment analysis. However, the e-commerce reviews exhibit the following characteristics: (1) inconsistency between comment content and the star rating; (2) a large number of unlabeled data, i.e., comments without a star rating, and (3) the data imbalance caused by the sparse negative comments. This paper employs Bidirectional Encoder Representation from Transformers (BERT), one of the best natural language processing models, as the base model. According to the above data characteristics, we propose the F_MixBERT framework, to more effectively use inconsistently low-quality and unlabeled data and resolve the problem of data imbalance. In the framework, the proposed MixBERT incorporates the MixMatch approach into BERT’s high-dimensional vectors to train the unlabeled and low-quality data with generated pseudo labels. Meanwhile, data imbalance is resolved by Focal loss, which penalizes the contribution of large-scale data and easily-identifiable data to total loss. Comparative experiments demonstrate that the proposed framework outperforms BERT and MixBERT for sentiment analysis of e-commerce comments.
- 발행기관:
- 한국인터넷정보학회
- 분류:
- 컴퓨터학