Temporal Anomaly Detection using Probabilistic Process Models
Repeated Measures Data, Mixed Model, Hidden Semi-Markov Model, Process Discovery, Anomaly Detection
考量到重複測量資料中資料點間的相依性,本研究使用廣義線性混合模型樹,並結合隱半馬可夫模型,以發掘系統潛在的變化模式,即流程發現,至於異常偵測,依照資料集中資訊量的多寡分別使用粒子群演算法、最大概似估計或廣義 Jensen–Shannon 散度判別該資料點是否異常,最後,可由混合模型樹的規則進行模型解釋。因此,本研究期望提出的模型可用來偵測時間序列的異常並幫助面臨相關問題的人們做出決策。
In the real world, there are many phenomena which are hierarchical. For example, the same doctor treats multiple patients, and the same patient has multiple physiological measurements. This hierarchy from high to low is doctors, patients, and measurements respectively. The repeated measures data considers not only the hierarchy but also the time factor. For this kind of data, our research attempts to solve the following problems: first, does each grouped data change with a specific pattern as time goes on? How to find the changing patterns? Second, how to detect the anomalies in a changing process? Third, how to explain the mechanisms, including the meaning of a changing pattern and why the anomalies occur?
For the dependence of data points in the repeated measures data, we use the generalized linear mixed model trees and combine the hidden semi-Markov model to discover underlying changing patterns of a system, namely the process discovery. As for the anomaly detection, we use the particle swarm optimization, maximum likelihood estimation, or generalized Jensen–Shannon divergence to judge whether the data point is anomalous depending on the amount of information in the dataset. Finally, the model interpretability can be done by the mixed-effect trees rules. As a result, we hope our proposed model can be used to detect the anomalies in the temporal data and help those who face relevant problems make decisions.
目次 Table of Contents
論文審定書........................................................................................ i
摘要.................................................................................................... ii
Abstract.............................................................................................. iii
List of Figures..................................................................................... v
List of Tables...................................................................................... vi
1. Introduction.................................................................................. 1
2. Background and Related Work.................................................... 3
2.1. Correlated Data................................................................... 3
2.2. Generalized Linear Mixed Model (GLMM)........................... 6
2.3. Hidden Semi-Markov Model (HSMM).................................. 8
2.4. Classification Tree Hidden Semi-Markov Model (CTHSMM)... 9
3. Methodology.................................................................................. 10
3.1. Process Discovery Using MMT-HSMM.................................. 11
3.2. Outlier Detection Using PSO and MLE................................. 14
3.3. Anomaly Detection Using Generalized Jensen–Shannon Divergence... 17
3.4. Model Interpretability Using Tree Rules.............................. 20
4. Experiment and Discussion ......................................................... 22
4.1. Introduction to Dataset......................................................... 22
4.2. Experiment Setup ................................................................ 26
4.3. Leaf encoding with GLMM trees............................................ 31
4.4. Comparison of Outlier Definition......................................... 34
5. Conclusion...................................................................................... 35
6. References...................................................................................... 36
參考文獻 References
