애스크로AIPublic Preview
← 학술논문 검색
학술논문한국경영과학회지2022.02 발행KCI 피인용 2

예산제약을 고려한 다기간 Newsvendor 문제에서의 Q-learning 기법 적용

The Use of Q-learning Method for a Multi-Period Newsvendor Problem with Budget Constraints

박나희(이화여자대학교 빅데이터분석학 협동과정); Xia Jiun Lau(이화여자대학교 빅데이터분석학 협동과정); 민대기(이화여자대학교)

47권 1호, 1~14쪽

초록

This paper considers a multi-period, multi-item Newsvendor problem under budget constraints in which a decision-maker orders items with aims to minimize the total inventory cost including inventory holding cost and backlog cost. In this decision process, the order quantities are constrained by two types of budget constraint: periodic budget and flexible budget. The problem is formulated as an action-constrained Markov Decision Process (MDP). To overcome the dimensionality and ambiguity, we employed a Q-learning method for solving the MDP model. In particular, we modified the conventional Q-learning procedure to handle a constrained action space by imposing penalties for constraint violations or incentives for constraint satisfactions on Q-values. The penalties and incentives are obtained by solving a quadratic optimization problem included in the learning procedure. Numerical analysis compares the performance of the proposed Q-learning method with others such as EOQ (Economic Order Quantity), Q-learning without the budget constraint, and a heuristic method. The experimental results showed that the proposed Q-learning method lowers the total inventory cost while increasing the chance of satisfying the budget constraint.

Abstract

This paper considers a multi-period, multi-item Newsvendor problem under budget constraints in which a decision-maker orders items with aims to minimize the total inventory cost including inventory holding cost and backlog cost. In this decision process, the order quantities are constrained by two types of budget constraint: periodic budget and flexible budget. The problem is formulated as an action-constrained Markov Decision Process (MDP). To overcome the dimensionality and ambiguity, we employed a Q-learning method for solving the MDP model. In particular, we modified the conventional Q-learning procedure to handle a constrained action space by imposing penalties for constraint violations or incentives for constraint satisfactions on Q-values. The penalties and incentives are obtained by solving a quadratic optimization problem included in the learning procedure. Numerical analysis compares the performance of the proposed Q-learning method with others such as EOQ (Economic Order Quantity), Q-learning without the budget constraint, and a heuristic method. The experimental results showed that the proposed Q-learning method lowers the total inventory cost while increasing the chance of satisfying the budget constraint.

발행기관:
한국경영과학회
분류:
경영학

AI 법률 상담

이 논문의 주제에 대해 더 알고 싶으신가요?

460만+ 법률 자료에서 관련 판례·법령·해석례를 찾아 답변합니다

AI 상담 시작
예산제약을 고려한 다기간 Newsvendor 문제에서의 Q-learning 기법 적용 | 한국경영과학회지 2022 | AskLaw | 애스크로 AI