博碩士論文 etd-0719121-103014 詳細資訊


[回到前頁查詢結果 | 重新搜尋]

姓名 沈育嬋(Yu-Chan Shen) 電子郵件信箱 E-mail 資料不公開
畢業系所 資訊管理學系研究所(Department of Information Management)
畢業學位 碩士(Master) 畢業時期 109學年第2學期
論文名稱(中) 基於概率流程模型的時間序列異常偵測
論文名稱(英) Temporal Anomaly Detection using Probabilistic Process Models
檔案
  • etd-0719121-103014.pdf
  • 本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
    請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
    論文使用權限

    紙本論文:立即公開

    電子論文:校內校外完全公開

    論文語文/頁數 英文/47
    統計 本論文已被瀏覽 136 次,被下載 19 次
    摘要(中)      生活中有許多具有階層性的現象,例如:同一位醫師治療多位病患,而同一位病患有多次的生理量測數值,其中階層由高至低依序為醫師、病患、量測數值,如此的重複測量資料不僅具階層性,還考量了時間因素。針對這種資料,本研究試圖解決以下三個問題:第一、各群體的資料是否隨著時間而遵循一定的模式改變?如何找出其中的變化模式?第二、如何偵測該數值變化過程出現的異常?第三、如何解釋其中的機制,包含變化模式的意義以及為何異常會發生?
         考量到重複測量資料中資料點間的相依性,本研究使用廣義線性混合模型樹,並結合隱半馬可夫模型,以發掘系統潛在的變化模式,即流程發現,至於異常偵測,依照資料集中資訊量的多寡分別使用粒子群演算法、最大概似估計或廣義 Jensen–Shannon 散度判別該資料點是否異常,最後,可由混合模型樹的規則進行模型解釋。因此,本研究期望提出的模型可用來偵測時間序列的異常並幫助面臨相關問題的人們做出決策。
    摘要(英)    In the real world, there are many phenomena which are hierarchical. For example, the same doctor treats multiple patients, and the same patient has multiple physiological measurements. This hierarchy from high to low is doctors, patients, and measurements respectively. The repeated measures data considers not only the hierarchy but also the time factor. For this kind of data, our research attempts to solve the following problems: first, does each grouped data change with a specific pattern as time goes on? How to find the changing patterns? Second, how to detect the anomalies in a changing process? Third, how to explain the mechanisms, including the meaning of a changing pattern and why the anomalies occur?
       For the dependence of data points in the repeated measures data, we use the generalized linear mixed model trees and combine the hidden semi-Markov model to discover underlying changing patterns of a system, namely the process discovery. As for the anomaly detection, we use the particle swarm optimization, maximum likelihood estimation, or generalized Jensen–Shannon divergence to judge whether the data point is anomalous depending on the amount of information in the dataset. Finally, the model interpretability can be done by the mixed-effect trees rules. As a result, we hope our proposed model can be used to detect the anomalies in the temporal data and help those who face relevant problems make decisions.
    關鍵字(中)
  • 重複測量資料
  • 混合模型
  • 隱半馬可夫模型
  • 流程發現
  • 異常偵測
  • 關鍵字(英)
  • Repeated Measures Data
  • Mixed Model
  • Hidden Semi-Markov Model
  • Process Discovery
  • Anomaly Detection
  • 論文目次 論文審定書........................................................................................ i
    摘要.................................................................................................... ii
    Abstract.............................................................................................. iii
    List of Figures..................................................................................... v
    List of Tables...................................................................................... vi
    1. Introduction.................................................................................. 1
    2. Background and Related Work.................................................... 3
        2.1. Correlated Data................................................................... 3
        2.2. Generalized Linear Mixed Model (GLMM)........................... 6
        2.3. Hidden Semi-Markov Model (HSMM).................................. 8
        2.4. Classification Tree Hidden Semi-Markov Model (CTHSMM)... 9
    3. Methodology.................................................................................. 10
        3.1. Process Discovery Using MMT-HSMM.................................. 11
        3.2. Outlier Detection Using PSO and MLE................................. 14
    3.3. Anomaly Detection Using Generalized Jensen–Shannon Divergence... 17
    3.4. Model Interpretability Using Tree Rules.............................. 20
    4. Experiment and Discussion ......................................................... 22
        4.1. Introduction to Dataset......................................................... 22
        4.2. Experiment Setup ................................................................ 26
        4.3. Leaf encoding with GLMM trees............................................ 31
        4.4. Comparison of Outlier Definition......................................... 34
    5. Conclusion...................................................................................... 35
    6. References...................................................................................... 36
    參考文獻 Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01
    Bendtsen, C. (2012). pso: Particle Swarm Optimization. https://CRAN.R-project.org/package=pso
    Bryk, A. S., & Raudenbush, S. W. (1989). 8 - TOWARD A MORE APPROPRIATE CONCEPTUALIZATION OF RESEARCH ON SCHOOL EFFECTS: A THREE-LEVEL HIERARCHICAL LINEAR MODEL11The research reported here has been supported by grants from the Spencer Foundation and the Benton Center for Curriculum and Instruction at the University of Chicago. The authors wish to acknowledge the gracious assistance of David Myers in making a small subset of his Sustaining Effects data files available to us for this analysis. The authors also wish to thank Michael Seltzer for his thoughtful comments on an earlier draft of this manuscript. In R. D. Bock (Ed.), Multilevel Analysis of Educational Data (pp. 159–204). Academic Press. https://doi.org/10.1016/B978-0-12-108840-8.50014-7
    Delignette-Muller, M. L., & Dutang, C. (2015). fitdistrplus: An R Package for Fitting Distributions. Journal of Statistical Software, 64(4), 1–34.
    Diggle, P., Diggle, D. of M. and S. P. J., Diggle, P. J., Heagerty, P., Liang, K.-Y., Heagerty, P. J., Zeger, S., & Zeger, B. at B. D. S. (2002). Analysis of Longitudinal Data. OUP Oxford.
    Field, A. P. (2014). Intraclass Correlation. In N. Balakrishnan, T. Colton, B. Everitt, W. Piegorsch, F. Ruggeri, & J. L. Teugels (Eds.), Wiley StatsRef: Statistics Reference Online (p. stat06612). John Wiley & Sons, Ltd. https://doi.org/10.1002/9781118445112.stat06612
    Field, A. P., Miles, J., & Field, Z. (2012). Discovering statistics using R. Sage.
    Fokkema, M., Smits, N., Zeileis, A., Hothorn, T., & Kelderman, H. (2018). Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees. Behavior Research Methods, 50(5), 2016–2034. https://doi.org/10.3758/s13428-017-0971-x
    Galbraith, S., Daniel, J. A., & Vissel, B. (2010). A Study of Clustered Data and Approaches to Its Analysis. Journal of Neuroscience, 30(32), 10601–10608. https://doi.org/10.1523/JNEUROSCI.0362-10.2010
    Google’s PageRank and Beyond. (2012). https://press.princeton.edu/books/paperback/9780691152660/googles-pagerank-and-beyond
    Grosse, I., Bernaola-Galván, P., Carpena, P., Román-Roldán, R., Oliver, J., & Stanley, H. E. (2002). Analysis of symbolic sequences using the Jensen-Shannon divergence. Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics, 65(4 Pt 1), 041905. https://doi.org/10.1103/PhysRevE.65.041905
    Hedeker, D. (2005). Generalized Linear Mixed Models. In Encyclopedia of Statistics in Behavioral Science. American Cancer Society. https://doi.org/10.1002/0470013192.bsa251
    Hunter, J. S. (1986). The Exponentially Weighted Moving Average. Journal of Quality Technology, 18(4), 203–210. https://doi.org/10.1080/00224065.1986.11979014
    Inker, L. A., Astor, B. C., Fox, C. H., Isakova, T., Lash, J. P., Peralta, C. A., Kurella Tamura, M., & Feldman, H. I. (2014). KDOQI US Commentary on the 2012 KDIGO Clinical Practice Guideline for the Evaluation and Management of CKD. American Journal of Kidney Diseases, 63(5), 713–735. https://doi.org/10.1053/j.ajkd.2014.01.416
    Kang, Y., & Zadorozhny, V. (2016a). Process monitoring using maximum sequence divergence. Knowledge and Information Systems, 48(1), 81–109. https://doi.org/10.1007/s10115-015-0858-z
    Kang, Y., & Zadorozhny, V. (2016b). Process Discovery Using Classification Tree Hidden Semi-Markov Model. 2016 IEEE 17th International Conference on Information Reuse and Integration (IRI), 361–368. https://doi.org/10.1109/IRI.2016.55
    Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. Proceedings of ICNN’95 - International Conference on Neural Networks, 4, 1942–1948 vol.4. https://doi.org/10.1109/ICNN.1995.488968
    Levey, A. S., Coresh, J., Greene, T., Stevens, L. A., Zhang, Y., Hendriksen, S., Kusek, J. W., & Lente, F. V. (2006). Using standardized serum creatinine values in the modification of diet in renal disease study equation for estimating glomerular filtration rate. Annals of Internal Medicine, 145(4), 247–254. https://doi.org/10.7326/0003-4819-145-4-200608150-00004
    Levey, A. S., Stevens, L. A., Schmid, C. H., Zhang, Y. L., Castro, A. F., Feldman, H. I., Kusek, J. W., Eggers, P., Van Lente, F., Greene, T., Coresh, J., & CKD-EPI (Chronic Kidney Disease Epidemiology Collaboration). (2009). A new equation to estimate glomerular filtration rate. Annals of Internal Medicine, 150(9), 604–612. https://doi.org/10.7326/0003-4819-150-9-200905050-00006
    Lin, J. (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1), 145–151. https://doi.org/10.1109/18.61115
    Liu, O. L., Lee, H.-S., & Linn, M. C. (2010). An investigation of teacher impact on student inquiry science performance using a hierarchical linear model. Journal of Research in Science Teaching, 47(7), 807–819. https://doi.org/10.1002/tea.20372
    Lorch, R. F., & Myers, J. L. (1990). Regression analyses of repeated measures data in cognitive research. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16(1), 149–157. https://doi.org/10.1037/0278-7393.16.1.149
    Myung, I. J. (2003). Tutorial on maximum likelihood estimation. Journal of Mathematical Psychology, 47(1), 90–100. https://doi.org/10.1016/S0022-2496(02)00028-7
    Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized Linear Models. Journal of the Royal Statistical Society: Series A (General), 135(3), 370–384. https://doi.org/10.2307/2344614
    R Core Team. (2019). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. https://www.R-project.org/
    Reeves, S. J., & Zhe, Z. (1999). Sequential algorithms for observation selection. IEEE Transactions on Signal Processing, 47(1), 123–132. https://doi.org/10.1109/78.738245
    Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. CRC Press.
    Snijders, T. A. B., & Bosker, R. J. (2011). Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling. SAGE.
    Song, P. X.-K. (2007). Correlated data analysis: Modeling, analytics, and applications. Springer Verlag.
    Stevens, L. A., Coresh, J., Greene, T., & Levey, A. S. (2009, October 8). Assessing Kidney Function—Measured and Estimated Glomerular Filtration Rate (world) [Review-article]. Http://Dx.Doi.Org/10.1056/NEJMra054415; Massachusetts Medical Society. https://doi.org/10.1056/NEJMra054415
    Subramanian, S. V., Kim, D. J., & Kawachi, I. (2002). Social trust and self-rated health in US communities: A multilevel analysis. Journal of Urban Health, 79(1), S21–S34. https://doi.org/10.1093/jurban/79.suppl_1.S21
    Therneau, T., & Atkinson, B. (2019). rpart: Recursive Partitioning and Regression Trees. https://CRAN.R-project.org/package=rpart
    Verbeke, G. (1997). Linear Mixed Models for Longitudinal Data. In G. Verbeke & G. Molenberghs (Eds.), Linear Mixed Models in Practice: A SAS-Oriented Approach (pp. 63–153). Springer. https://doi.org/10.1007/978-1-4612-2294-1_3
    Wright, M. N., & Ziegler, A. (2017). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, 77(1), 1–17. https://doi.org/10.18637/jss.v077.i01
    Yu, S.-Z. (2010). Hidden semi-Markov models. Artificial Intelligence, 174(2), 215–243. https://doi.org/10.1016/j.artint.2009.11.011
    Zeger, S. L., Liang, K.-Y., & Albert, P. S. (1988). Models for Longitudinal Data: A Generalized Estimating Equation Approach. Biometrics, 44(4), 1049. https://doi.org/10.2307/2531734
    Zhang, H., Yu, Q., Feng, C., Gunzler, D., Wu, P., & Tu, X. M. (2012). A new look at the difference between the GEE and the GLMM when modeling longitudinal count responses. Journal of Applied Statistics, 39(9), 2067–2079. https://doi.org/10.1080/02664763.2012.700452
    口試委員
  • 胡雅涵 - 召集委員
  • 林耕霈 - 委員
  • 康藝晃 - 指導教授
  • 口試日期 2021-07-23 繳交日期 2021-08-19

    [回到前頁查詢結果 | 重新搜尋]


    如有任何問題請與論文審查小組聯繫