國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,基於強化學習優化韌性供應鏈中的庫存補貨策略,Optimizing Inventory Replenishment Strategy in Resilient Supply Chains using Reinforcement Learning

論文名稱 Title	基於強化學習優化韌性供應鏈中的庫存補貨策略 Optimizing Inventory Replenishment Strategy in Resilient Supply Chains using Reinforcement Learning
系所名稱 Department	資訊管理學系電子商務與商業分析數位學習碩士在職專班 Online Master of Information Management in Electronic Commerce and Business Analytics
畢業學年期 Year, semester	111 學年度第 2 學期 The spring semester of Academic Year 111	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	53
研究生 Author	陳嬿雅 Yen-Ya Chen
指導教授 Advisor	康藝晃 KANG, YI-HUANG
召集委員 Convenor	李珮如 LEE,PEI-JU
口試委員 Advisory Committee	楊惠芳 Yang,Huei-Fang
口試日期 Date of Exam	2023-07-07	繳交日期 Date of Submission	2023-07-31
關鍵字 Keywords	RL、DDMRP、庫存管理、庫存補貨、MDP RL, DDMRP, Inventory Management, Replenishment, MDP
統計 Statistics	本論文已被瀏覽 660 次，被下載 25 次 The thesis/dissertation has been browsed 660 times, has been downloaded 25 times.

中文摘要
面對COVID-19疫情全球擴散所引起的商業模式改變，並且由於消費行為的改變以及市場需求變動，電子物料的品牌商和製造商需不斷調整生產策略與物料採購策略。在這樣的環境下，供應鏈中的電子零件通路商面臨著嚴重的庫存風險挑戰，例如需求波動、供應中斷、價格波動等。此等風險對通路商的服務品質、成本控制及客戶忠誠度產生重大影響，進而對其競爭力造成衝擊。本研究以台灣電子產業供應鏈中的一家電子零件通路商為例，該公司利用公有雲平台提供客戶電子零件的採購與管理服務，並在供應鏈中擔任緩衝供需波動的角色。本研究開發結合強化學習(Reinforcement learning, RL)、需求驅動物料需求規劃(Demand Driven Material Requirements Planning, DDMRP) (Ptak & Smith, 2016)和領域專家經驗的庫存補貨策略工具，以增強通路商的競爭力和獲利能力。研究中利用RL機制和DDMRP的緩衝控制來對應需求、供應的不穩定性，實現動態決策和持續學習的特性。透過實際數據模擬證實，相對於個案公司現行基於規則的算法，RL的方法在平均庫存持有成本以及備貨策略上更有效益，顯示基於RL的補貨策略在提高庫存管理效能方面的潛力。此外，RL代理反應的行為模式與領域專家的真實行為雷同，證明 RL能建構實用且具有操作性的庫存補貨策略工具，協助電子產業通路商的產品經理實現補貨決策智能化，在不確定環境下提高供應鏈韌性。
Abstract
Facing the changes in business models caused by the global spread of the COVID-19 pandemic, as well as changes in consumer behavior and market demand, brand manufacturers and suppliers of electronic components need to continuously adjust their production and procurement strategies. In such an environment, electronic component distributors in the supply chain face significant inventory risk challenges, such as demand fluctuations, supply disruptions, and price fluctuations. These risks have a significant impact on the service quality, cost control, and customer loyalty of distributors, thereby impacting their competitiveness. This study takes an electronic component distributor in the Taiwan electronics industry supply chain as an example. The company utilizes a public cloud platform to provide customers with procurement and management services for electronic components and plays a role in buffering supply and demand fluctuations in the supply chain. This study develops a replenishment decision-making tool that combines reinforcement learning (RL), demand-driven material requirements planning (DDMRP) (Ptak & Smith, 2016), and domain expert experience to enhance the competitiveness and profitability of distributors. The study uses RL mechanisms and DDMRP buffer control to address the instability of demand and supply, enabling dynamic decision-making and continuous learning. Through simulation with actual data, it is confirmed that the RL approach is more beneficial than the current algorithms used by the case company in terms of average inventory holding costs and stocking strategies, demonstrating the potential of RL-based replenishment decisions in improving inventory management efficiency. Furthermore, the behavioral patterns exhibited by the RL agent are identical to those of domain experts, proving the effectiveness of RL in assisting material managers in achieving replenishment automation feasibility. This study effectively constructs a practical and operational inventory replenishment decision-making tool for electronic component distributors using reinforcement learning (RL), DDMRP buffer mechanisms, and domain expert experience, thereby enhancing supply chain resilience in uncertain environments.

目次 Table of Contents
論文審定書 i 致謝 ii 中文摘要 iii ABSTRACT iv 第一章、緒論 1 1.1 研究背景與動機 1 1.2 研究目的 2 1.3 研究範疇 3 第二章、文獻回顧 4 2.1 供應鏈的韌性 4 2.2 需求驅動物料需求規劃 4 2.3 補貨模型分析 6 2.3.1 傳統補貨模型 6 2.3.2 需求驅動的補貨模型 6 2.3.3 基於規則的專家模型 6 2.3.4 智能補貨模型 7 2.4 馬可夫決策過程 7 2.5 強化學習 9 2.6 RL的庫存補貨應用 10 第三章、研究方法與個案分析 12 3.1 研究方法 12 3.2 DDMRP的緩衝區機制 13 3.2.1 緩衝區定義 14 3.2.2 緩衝區設置 14 3.3 數據背景與作業現況 15 3.4 研究數據與處理 16 3.5 物料分群 17 3.6 MDP框架下的補貨機制 18 3.6.1 定義狀態（States） 19 3.6.2 定義動作（Actions） 19 3.6.3 定義獎勵函數（Reward Function） 20 3.7 DQN 架構應用於庫存補貨 21 3.8 領域專家基於規則的方法 23 第四章、實驗設置 25 4.1 實驗步驟 25 4.2 實驗數據與參數 26 4.3 實驗設計與評估指標 27 4.4 RL模型與基線模型的補貨計畫績效比較 28 4.5 AU參數調整對庫存管理的影響 30 4.6 獎勵機制對RL補貨模型影響 32 第五章、結論與建議 35 5.1 結論 35 5.2 建議與未來研究方向 35 5.3 研究侷限性 36 參考文獻： 37 附錄一:DDMRP 名詞解釋 40 附錄二:物料分類說明 42

參考文獻 References
[1] H. Scarf, “The Optimality of (S, s) Policies in the Dynamic Inventory Problem,” Math Methods Soc Sci, Jan. 1960. [2] A. Veinott, “Optimal Policy in a Dynamic, Single Product, Nonstationary Inventory Model with Several Demand Classes,” Operations Research, vol. 13, Oct. 1965,doi: 10.1287/opre.13.5.761. [3] E. Silver, D. Pyke, and R. Peterson, Inventory Management and Production Scheduling, vol. 19. 1998. [4] R. Ganeshan, T. Boone, and A. Stenger, “The impact of inventory and flow planning parameters on supply chain performance: An exploratory study,” International Journal of Production Economics, vol. 71, pp. 111–118, May 2001, doi: 10.1016/S0925-5273(00)00109-2. [5] J. Van Mieghem, “Commissioned Paper: Capacity Management, Investment, and Hedging: Review and Recent Developments,” Manufacturing & Service Operations Management, vol. 5, pp. 269–302, Oct. 2003, doi: 10.1287/msom.5.4.269.24882. [6] S. Chopra and M. Sodhi, “Managing Risk to Avoid Supply-Chain Breakdown,” MIT Sloan Management Review, Sep. 2004. [7] Y. Sheffi and J. Rice James, “A Supply Chain View of the Resilient Enterprise,” MIT Sloan Management Review, vol. 47, Sep. 2005. [8] C. Tang, “Robust Strategies for Mitigating Supply Chain Disruptions,” International Journal of Logistics: Research and Applications, vol. 9, pp. 33–45, Apr. 2006, doi: 10.1080/13675560500405584. [9] S. Ponomarov and M. Holcomb, “Understanding the Concept of Supply Chain Resilience,” International Journal of Logistics Management, The, vol. 20, pp. 124–143, May 2009, doi: 10.1108/09574090910954873. [10] A. Syntetos, M. Z. Babai, Y. Dallery, and R. Teunter, “Periodic control of intermittent demand items: Theory and empirical analysis,” Journal of the Operational Research Society, vol. 60, pp. 611–618, May 2009, doi: 10.1057/palgrave.jors.2602593. [11] C. Ptak and C. Smith, Orlicky’s Material Requirements Planning, 3rd Edition. New York: McGraw-Hill Education, 2011. [Online]. Available: https://www.accessengineeringlibrary.com/content/book/9780071755634 [12] T. Lillicrap et al., “Continuous control with deep reinforcement learning,” CoRR, Sep. 2015. [13] V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015, doi: 10.1038/nature14236. [14] C.-F. Tsai, Y.-F. Hsu, and D. C. Yen, “Reinforcement learning in solving uncertain demand inventory decision problems,” Computers & Operations Research, vol. 73, pp. 1–12, Sep. 2016, doi: 10.1016/j.cor.2016.03.016. [15] D. Ivanov, Structural Dynamics and Resilience in Supply Chain Risk Management. 2017. doi: 10.1007/978-3-319-69305-7. [16] Y. Kong, Z. Liu, Y. Liu, and L. Liang, “Inventory management and demand prediction under uncertainty: a reinforcement learning approach,” Journal of the Operational Research Society, vol. 69, no. 6, pp. 911–924, 2017. [17] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal Policy Optimization Algorithms,” Jul. 2017. [18] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018. [19] J. C. Alves and G. R. Mateus, “Multi-echelon Supply Chains with Uncertain Seasonal Demands and Lead Times Using Deep Reinforcement Learning.” arXiv, Jan. 12, 2022. doi: 10.48550/arXiv.2201.04651. [20] K. Namir, E. Benlahmar, and L. el houssine, “Decision Support Tool for Dynamic Inventory Management using Machine Learning, Time Series and Combinatorial Optimization,” Procedia Computer Science, vol. 198C, pp. 423–428, Jan. 2022, doi: 10.1016/j.procs.2021.12.264. [21] F. Stranieri and F. Stella, “A Deep Reinforcement Learning Approach to Supply Chain Inventory Management.” arXiv, Aug. 13, 2022. doi: 10.48550/arXiv.2204.09603. [22] L. Duhem, M. Benali, and G. Martin, “Parametrization of a demand-driven operating model using reinforcement learning,” Computers in Industry, vol. 147, p. 103874, May 2023, doi: 10.1016/j.compind.2023.103874.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外完全公開 unrestricted 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0631123-220058.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2453 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2453 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS