예산제약을 고려한 다기간 Newsvendor 문제에서의 Q-learning 기법 적용
The Use of Q-learning Method for a Multi-Period Newsvendor Problem with Budget Constraints
박나희(이화여자대학교 빅데이터분석학 협동과정); Xia Jiun Lau(이화여자대학교 빅데이터분석학 협동과정); 민대기(이화여자대학교)
47권 1호, 1~14쪽
초록
This paper considers a multi-period, multi-item Newsvendor problem under budget constraints in which a decision-maker orders items with aims to minimize the total inventory cost including inventory holding cost and backlog cost. In this decision process, the order quantities are constrained by two types of budget constraint: periodic budget and flexible budget. The problem is formulated as an action-constrained Markov Decision Process (MDP). To overcome the dimensionality and ambiguity, we employed a Q-learning method for solving the MDP model. In particular, we modified the conventional Q-learning procedure to handle a constrained action space by imposing penalties for constraint violations or incentives for constraint satisfactions on Q-values. The penalties and incentives are obtained by solving a quadratic optimization problem included in the learning procedure. Numerical analysis compares the performance of the proposed Q-learning method with others such as EOQ (Economic Order Quantity), Q-learning without the budget constraint, and a heuristic method. The experimental results showed that the proposed Q-learning method lowers the total inventory cost while increasing the chance of satisfying the budget constraint.
Abstract
This paper considers a multi-period, multi-item Newsvendor problem under budget constraints in which a decision-maker orders items with aims to minimize the total inventory cost including inventory holding cost and backlog cost. In this decision process, the order quantities are constrained by two types of budget constraint: periodic budget and flexible budget. The problem is formulated as an action-constrained Markov Decision Process (MDP). To overcome the dimensionality and ambiguity, we employed a Q-learning method for solving the MDP model. In particular, we modified the conventional Q-learning procedure to handle a constrained action space by imposing penalties for constraint violations or incentives for constraint satisfactions on Q-values. The penalties and incentives are obtained by solving a quadratic optimization problem included in the learning procedure. Numerical analysis compares the performance of the proposed Q-learning method with others such as EOQ (Economic Order Quantity), Q-learning without the budget constraint, and a heuristic method. The experimental results showed that the proposed Q-learning method lowers the total inventory cost while increasing the chance of satisfying the budget constraint.
- 발행기관:
- 한국경영과학회
- 분류:
- 경영학