Responsive image
博碩士論文 etd-0729123-205631 詳細資訊
Title page for etd-0729123-205631
論文名稱
Title
基於傾向分數的概念飄移發現方法
Discovering concept drift based on propensity score
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
44
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2023-07-27
繳交日期
Date of Submission
2023-08-29
關鍵字
Keywords
治療效果、時間變化資料、傾向分數、干擾因子、概念飄移
Treatment effect, Temporal data, Confounder, Propensity score, Concept drift
統計
Statistics
本論文已被瀏覽 228 次,被下載 13
The thesis/dissertation has been browsed 228 times, has been downloaded 13 times.
中文摘要
現在很多領域都會使用機器學習模型來協助我們進行決策,但隨著模型的 上線,伴隨而來會是概念飄移的問題,模型會因為時間的推移或政策的改變而導 致逐漸地不堪使用。因此,我們需要定時地去偵測模型的實用性,一旦我們發現 模型開始出現問題,就需要對模型進行修正或是抽換。在現有的概念飄移方法中 都是著重偵測和改進方法,雖然可以部分解決概念飄移的問題,但是常會因為忽 略干擾因子對於自變數和應變數的影響,而導致偵測概念飄移時產生錯誤的結果。 為了解決這個問題,本論文提出概念飄移使用傾向分數的方法,透過加入傾向分 數的方法來改善干擾因子的影響,進而讓估計治療效果可以有效的量化資料。
Abstract
In many fields, machine learning models are widely used to assist in decision- making. However, with the deployment of these models, the issue of concept drift arises. Over time or due to policy changes, models gradually become less effective and reliable. Therefore, it is necessary to regularly monitor the usefulness of the models. Once problems are detected, appropriate adjustments or replacements need to be made. Existing concept drift methods primarily focus on detection and improvement techniques, which partially address the concept drift problem. However, they often overlook the influence of confounding factors on the relationships between independent and dependent variables, leading to erroneous results in concept drift detection. To address this issue, this paper proposes Concept Drift using Propensity Score (CDPS). By incorporating propensity scores, the impact of confounding factors can be mitigated, thereby enhancing the ability to accurately quantify treatment effects from the data.
目次 Table of Contents
論文審訂書 i
摘要 ii
Abstract iii
List of Figures vi
List of Tables vii
1. Introduction 1
2. Background 2
2.1 Concept drift detection 2
2.1.1 Error-based drift detection 3
2.1.2 Distribution-based drift detection 4
2.1.3 Explain-based drift detection 6
2.1.4 Unsupervised-based drift detection 6
2.1.5 Ensemble-based drift detection 7
2.1.6 Neural network-based drift detection 8
2.2 Propensity score 9
2.2.1 Identifying potentially confounding factors 9
2.2.2 Computing the propensity score 10
2.2.3 Applying the matching method 11
2.2.4 Evaluating the performance of the matching process 11
3. Methodology 12
3.1 Data segmentation 12
3.2 Propensity score process 13
3.3 Estimate the treatment effect 15
3.4 Detect concept drift 16
4. Experiment 18
4.1 Experiment description 18
4.2 Dataset 19
4.2.1 KMUH dataset 19
4.2.2 GEO dataset 20
4.2.3 Electricity dataset 21
4.2.4 Artificial dataset 22
4.3 Experiment result 23
4.3.1 KMUH dataset 23
4.3.2 GEO dataset 26
4.3.3 Electricity dataset 26
4.3.4 Artificial dataset 27
4.3.5 Discussion 28
5. Conclusion 29
References 30
Appendix 36
參考文獻 References
Abbasi, A., Javed, A. R., Chakraborty, C., Nebhen, J., Zehra, W., & Jalil, Z. (2021). ElStream: An Ensemble Learning Approach for Concept Drift Detection in Dynamic Social Big Data Stream Learning. IEEE Access, 9, 66408–66419. https://doi.org/10.1109/ACCESS.2021.3076264
Austin, P. C. (2009). Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Statistics in Medicine, 28(25), 3083–3107. https://doi.org/10.1002/sim.3697
Austin, P. C., & Stuart, E. A. (2015). Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Statistics in Medicine, 34(28), 3661– 3679. https://doi.org/10.1002/sim.6607
Baena-Garcıa, M., Gavalda, R., & Morales-Bueno, R. (n.d.). Early Drift Detection Method.
Baier, L., Schlör, T., Schöffer, J., & Kühl, N. (2022). Detecting Concept Drift With Neural Network Model Uncertainty (arXiv:2107.01873). arXiv. http://arxiv.org/abs/2107.01873

Bifet, A., & Gavaldà, R. (2007). Learning from Time-Changing Data with Adaptive Windowing. Proceedings of the 2007 SIAM International Conference on Data Mining, 443–448. https://doi.org/10.1137/1.9781611972771.42
Breiman, L. (n.d.). Statistical Modeling: The Two Cultures. THE TWO CULTURES. Brzeziński, D., & Stefanowski, J. (2011). Accuracy Updated Ensemble for Data Streams with Concept Drift. In E. Corchado, M. Kurzyński, & M. Woźniak (Eds.), Hybrid Artificial Intelligent Systems (Vol. 6679, pp. 155–163). Springer Berlin Heidelberg.
https://doi.org/10.1007/978-3-642-21222-2_19
Buciluǎ, C., Caruana, R., & Niculescu-Mizil, A. (2006). Model compression. Proceedings
of the 12th ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining, 535–541. https://doi.org/10.1145/1150402.1150464
Carbonell, E. J. G., & Siekmann, J. (n.d.). Lecture Notes in Artificial Intelligence. 308. Cerqueira, V., Gomes, H. M., Bifet, A., & Torgo, L. (2022). STUDD: A student–teacher
method for unsupervised concept drift detection. Machine Learning.
https://doi.org/10.1007/s10994-022-06188-7
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed). L.
Erlbaum Associates.
Collier, Z. K., & Leite, W. L. (2022). A Tutorial on Artificial Neural Networks in
Propensity Score Analysis. The Journal of Experimental Education, 90(4), 1003–
1020. https://doi.org/10.1080/00220973.2020.1854158
Feng Gu, Zhang, G., Jie Lu, & Chin-Teng Lin. (2016). Concept drift detection based on
equal density estimation. 2016 International Joint Conference on Neural Networks (IJCNN), 24–30. https://doi.org/10.1109/IJCNN.2016.7727176
Ferri-García, R., & Rueda, M. D. M. (2020). Propensity score adjustment using machine learning classification algorithms to control selection bias in online surveys. PLOS ONE, 15(4), e0231500. https://doi.org/10.1371/journal.pone.0231500
Frias-Blanco, I., Campo-Avila, J. D., Ramos-Jimenez, G., Morales-Bueno, R., Ortiz-Diaz, A., & Caballero-Mota, Y. (2015). Online and Non-Parametric Drift Detection Methods Based on Hoeffding’s Bounds. IEEE Transactions on Knowledge and Data Engineering, 27(3), 810–823. https://doi.org/10.1109/TKDE.2014.2345382
Gail, M. H., Wieand, S., & Piantadosi, S. (1984). Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates. Biometrika, 71(3), 431–444. https://doi.org/10.1093/biomet/71.3.431
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys, 46(4), 1–37. https://doi.org/10.1145/2523813
Granger, E., Watkins, T., Sergeant, J. C., & Lunt, M. (2020). A review of the use of propensity score diagnostics in papers published in high-ranking medical journals. BMC Medical Research Methodology, 20(1), 132. https://doi.org/10.1186/s12874- 020-00994-0
Haug, J., & Kasneci, G. (2021). Learning Parameter Distributions to Detect Concept Drift in Data Streams. 2020 25th International Conference on Pattern Recognition (ICPR), 9452–9459. https://doi.org/10.1109/ICPR48806.2021.9412499
Iwashita, A. S., & Papa, J. P. (2019). An Overview on Concept Drift Learning. IEEE Access, 7, 1532–1547. https://doi.org/10.1109/ACCESS.2018.2886026
Jaworski, M., Rutkowski, L., & Angelov, P. (2020). Concept Drift Detection Using Autoencoders in Data Streams Processing. In L. Rutkowski, R. Scherer, M. Korytkowski, W. Pedrycz, R. Tadeusiewicz, & J. M. Zurada (Eds.), Artificial
Intelligence and Soft Computing (Vol. 12415, pp. 124–133). Springer International
Publishing. https://doi.org/10.1007/978-3-030-61401-0_12
Keller, B., Kim, J.-S., & Steiner, P. M. (2015). Neural Networks for Propensity Score
Estimation: Simulation Results and Recommendations. In L. A. Van Der Ark, D. M. Bolt, W.-C. Wang, J. A. Douglas, & S.-M. Chow (Eds.), Quantitative Psychology Research (Vol. 140, pp. 279–291). Springer International Publishing. https://doi.org/10.1007/978-3-319-19977-1_20
Krawczyk, B., Minku, L. L., Gama, J., Stefanowski, J., & Woźniak, M. (2017). Ensemble learning for data stream analysis: A survey. Information Fusion, 37, 132–156. https://doi.org/10.1016/j.inffus.2017.02.004
Kullback, S., & Leibler, R. A. (1951). On Information and Sufficiency. The Annals of Mathematical Statistics, 22(1), 79–86. https://doi.org/10.1214/aoms/1177729694 Lee, B. K., Lessler, J., & Stuart, E. A. (2010). Improving propensity score weighting
using machine learning. Statistics in Medicine, 29(3), 337–346.
https://doi.org/10.1002/sim.3782
Lin, J. (n.d.). Divergence Measures Based on the Shannon Entropy. 7.
Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., & Zhang, G. (2018). Learning under Concept
Drift: A Review. IEEE Transactions on Knowledge and Data Engineering, 1–1.
https://doi.org/10.1109/TKDE.2018.2876857
McCaffrey, D. F., Ridgeway, G., & Morral, A. R. (n.d.). Propensity Score Estimation
With Boosted Regression for Evaluating Causal Effects in Observational Studies. Molnar, C. (n.d.). Interpretable Machine Learning Interpretable Machine Learning. Nishida, K., & Yamauchi, K. (2007). Detecting Concept Drift Using Statistical Testing.
In V. Corruble, M. Takeda, & E. Suzuki (Eds.), Discovery Science (Vol. 4755, pp.
264–269). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-75488-
6_27
Rosenbaum, P. R., & Rubin, D. B. (n.d.). The central role of the propensity score in
observational studies for causal effects.
Schlimmer, J. C., & Granger, R. H. (1986). Incremental learning from noisy data.
Machine Learning, 1(3), 317–354. https://doi.org/10.1007/BF00116895
Sethi, T. S., & Kantardzic, M. (2017). On the Reliable Detection of Concept Drift from Streaming Unlabeled Data (arXiv:1704.00023). arXiv.
http://arxiv.org/abs/1704.00023
Setoguchi, S., Schneeweiss, S., Brookhart, M. A., Glynn, R. J., & Cook, E. F. (2008).
Evaluating uses of data mining techniques in propensity score estimation: A simulation study. Pharmacoepidemiology and Drug Safety, 17(6), 546–555. https://doi.org/10.1002/pds.1555
Sharkawy, A.-N. (n.d.). Principle of Neural Network and Its Main Types: Review. Sohil, F., Sohali, M. U., & Shabbir, J. (2022). An introduction to statistical learning with applications in R: By Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani, New York, Springer Science and Business Media, 2013, $41.98, eISBN: 978-1-4614-7137-7. Statistical Theory and Related Fields, 6(1), 87–87.
https://doi.org/10.1080/24754269.2021.1980261
Song, X., Wu, M., Jermaine, C., & Ranka, S. (2007). Statistical change detection for
multi-dimensional data. Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’07, 667. https://doi.org/10.1145/1281192.1281264
Staffa, S. J., & Zurakowski, D. (2018). Five Steps to Successfully Implement and Evaluate Propensity Score Matching in Clinical Research Studies: Anesthesia & Analgesia, 127(4), 1066–1073. https://doi.org/10.1213/ANE.0000000000002787
Sweredoski, M. J., & Baldi, P. (2009). COBEpro: A novel system for predicting continuous B-cell epitopes. Protein Engineering, Design and Selection, 22(3), 113– 120. https://doi.org/10.1093/protein/gzn075
Tsymbal, A., Pechenizkiy, M., Cunningham, P., & Puuronen, S. (2008). Dynamic integration of classifiers for handling concept drift. Information Fusion, 9(1), 56–68. https://doi.org/10.1016/j.inffus.2006.11.002
Watkins, S., Jonsson-Funk, M., Brookhart, M. A., Rosenberg, S. A., O’Shea, T. M., & Daniels, J. (2013). An Empirical Comparison of Tree-Based Methods for Propensity Score Estimation. Health Services Research, n/a-n/a. https://doi.org/10.1111/1475- 6773.12068
written on behalf of AME Big-Data Clinical Trial Collaborative Group, Zhang, Z., Kim, H. J., Lonjon, G., & Zhu, Y. (2019). Balance diagnostics after propensity score matching. Annals of Translational Medicine, 7(1), 16–16. https://doi.org/10.21037/atm.2018.12.10
Zhao, P., Su, X., Ge, T., & Fan, J. (2016). Propensity score and proximity matching using random forest. Contemporary Clinical Trials, 47, 85–92. https://doi.org/10.1016/j.cct.2015.12.012
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code