실루엣을 적용한 2단계 인공벌군집 데이터 클러스터링
Two-Step Artificial Bee Colony Data Clustering Based on Silhouette
강범수(강원대학교); 김성수(강원대학교)
43권 2호, 1~9쪽
초록
A popular data clustering K-means uses the only intra-cluster distance for valid index with given fixed number of clusters in prior. We can’t use K-means without fixed number of clusters for the unsupervised data. K-means is also sensitive for initialization and has the possibility to be stuck in local optimum because of hill climbing clustering method. Silhouette valid index can be used to decide the number of clusters with considering the intra and inter cluster distances. But, it needs much computation time to evaluate the solutions. So, we need more efficient data clustering method. The objective of this paper is to propose the two-step Artificial Bee Colony (ABC) which is based on Silhouette in the second step using initial solutions using K-means in the first step to find the global optimal data clustering solution with appropriate number of clusters within limited computation time for the unsupervised data. The performance of ABC using Silhouette is validated using several real data sets by experiment and analysis.
Abstract
A popular data clustering K-means uses the only intra-cluster distance for valid index with given fixed number of clusters in prior. We can’t use K-means without fixed number of clusters for the unsupervised data. K-means is also sensitive for initialization and has the possibility to be stuck in local optimum because of hill climbing clustering method. Silhouette valid index can be used to decide the number of clusters with considering the intra and inter cluster distances. But, it needs much computation time to evaluate the solutions. So, we need more efficient data clustering method. The objective of this paper is to propose the two-step Artificial Bee Colony (ABC) which is based on Silhouette in the second step using initial solutions using K-means in the first step to find the global optimal data clustering solution with appropriate number of clusters within limited computation time for the unsupervised data. The performance of ABC using Silhouette is validated using several real data sets by experiment and analysis.
- 발행기관:
- 한국경영과학회
- 분류:
- 경영학