효율적 문맥 반영을 위한 워드 임베딩 기반 감성 분석 연구
A Study on Word Embedding-Based Sentiment Analysis for Efficient Contextual Representation
나인(한국기술교육대학교 산업경영학부); 배장원(한국기술교육대학교 산업경영학부)
51권 1호, 17~30쪽
초록
With the rapid growth of text generated online, sentiment analysis technology—which automatically classifies users’ opinions—has gained increasing importance. This study proposes a sentiment analysis method that effectively reflects the contextual information of sentences without requiring large-scale training data. The proposed model extracts context-aware vector representations through word embeddings and determines sentiment by calculating semantic similarity between review sentences and words in a sentiment lexicon. In addition, to identify the embedding model that best captures Korean contextual semantics, two BERT-based models and two SBERT-based models were compared. Using product review data provided by AI Hub, both sentiment-expressive sentence data and a Korean sentiment lexicon were employed. Cosine similarity between each sentence and the sentiment lexicon words was computed, and four selection criteria were established to identify the most semantically relevant sentiment words. As a result, the KoSBERT model achieved the best performance, with the highest accuracy (0.8919) and F1-score (0.9014) when averaging the sentiment word vectors whose cosine similarity with the sentence was 0.5 or higher.
Abstract
With the rapid growth of text generated online, sentiment analysis technology—which automatically classifies users’ opinions—has gained increasing importance. This study proposes a sentiment analysis method that effectively reflects the contextual information of sentences without requiring large-scale training data. The proposed model extracts context-aware vector representations through word embeddings and determines sentiment by calculating semantic similarity between review sentences and words in a sentiment lexicon. In addition, to identify the embedding model that best captures Korean contextual semantics, two BERT-based models and two SBERT-based models were compared. Using product review data provided by AI Hub, both sentiment-expressive sentence data and a Korean sentiment lexicon were employed. Cosine similarity between each sentence and the sentiment lexicon words was computed, and four selection criteria were established to identify the most semantically relevant sentiment words. As a result, the KoSBERT model achieved the best performance, with the highest accuracy (0.8919) and F1-score (0.9014) when averaging the sentiment word vectors whose cosine similarity with the sentence was 0.5 or higher.
- 발행기관:
- 한국경영과학회
- 분류:
- 경영학