애스크로AIPublic Preview
← 학술논문 검색
학술논문한국경영과학회지2014.11 발행KCI 피인용 1

Set Covering 기반의 대용량 오믹스데이터 특징변수 추출기법

Set Covering-based Feature Selection of Large-scale Omics Data

마정우(고려대학교 산업경영공학과); 안기동(고려대학교 정보경영공학과); 김광수(서울대학교병원); 류홍서(고려대학교)

39권 4호, 75~84쪽

초록

In this paper, we dealt with feature selection problem of large-scale and high-dimensional biological data such as omics data. For this problem, most of the previous approaches used simple score function to reduce the number of original variables and selected features from the small number of remained variables. In the case of methods that do not rely on filtering techniques, they do not consider the interactions between the variables, or generate approximate solutions to the simplified problem. Unlike them, by combining set covering and clustering techniques, we developed a new method that could deal with total number of variables and consider the combinatorial effects of variables for selecting good features. To demonstrate the efficacy and effectiveness of the method, we downloaded gene expression datasets from TCGA (The Cancer Genome Atlas) and compared our method with other algorithms including WEKA embeded feature selection algorithms. In the experimental results, we showed that our method could select high quality features for constructing more accurate classifiers than other feature selection algorithms.

Abstract

In this paper, we dealt with feature selection problem of large-scale and high-dimensional biological data such as omics data. For this problem, most of the previous approaches used simple score function to reduce the number of original variables and selected features from the small number of remained variables. In the case of methods that do not rely on filtering techniques, they do not consider the interactions between the variables, or generate approximate solutions to the simplified problem. Unlike them, by combining set covering and clustering techniques, we developed a new method that could deal with total number of variables and consider the combinatorial effects of variables for selecting good features. To demonstrate the efficacy and effectiveness of the method, we downloaded gene expression datasets from TCGA (The Cancer Genome Atlas) and compared our method with other algorithms including WEKA embeded feature selection algorithms. In the experimental results, we showed that our method could select high quality features for constructing more accurate classifiers than other feature selection algorithms.

발행기관:
한국경영과학회
DOI:
http://dx.doi.org/10.7737/JKORMS.2014.39.4.075
분류:
경영학

AI 법률 상담

이 논문의 주제에 대해 더 알고 싶으신가요?

460만+ 법률 자료에서 관련 판례·법령·해석례를 찾아 답변합니다

AI 상담 시작
Set Covering 기반의 대용량 오믹스데이터 특징변수 추출기법 | 한국경영과학회지 2014 | AskLaw | 애스크로 AI