Responsive image
博碩士論文 etd-0024125-190807 詳細資訊
Title page for etd-0024125-190807
論文名稱
Title
機器學習方法於台灣實價登錄價格預測
Predicting Taiwan Housing Price with Machine Learning Methods
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
79
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2024-06-13
繳交日期
Date of Submission
2025-01-24
關鍵字
Keywords
房價、機器學習、XGBoost、Random Forest、SHAP Value
Housing Price, Machine Learning, XGBoost, Random Forest, SHAP Value
統計
Statistics
本論文已被瀏覽 36 次,被下載 0
The thesis/dissertation has been browsed 36 times, has been downloaded 0 times.
中文摘要
本研究旨在協助提供進行不動產交易時買賣雙方資訊不對稱問題的解決方案。研究將傳統的線性回歸模型以及XGBoost和Random Forest兩種機器學習模型應用於台灣實價登錄資料中的台北新北地區、台中地區和高雄地區的房地產買賣資料,並額外加入設施和總體經濟方面的變數,並透過變數重要度和
SHAP(SHapley Additive exPlanations) Value 去解釋模型。
本研究發現訓練期間大於五個季度後模型表現不再因有更多樣本而有所提升。另外,在不同地區的模型中,XGBoost 模型的表現整體而言要比 Random Forest模型略好,而完全優於線性回歸模型,顯示這些變數與房價之間的關係並非完全由線性所能描述。至於模型解釋性的部分,除了屋齡、最近百貨公司距
離、近額滿小學距離和房價的每坪單價呈現負向關係是在所有地區皆存在外,其餘變數的影響性皆因地制宜。
Abstract
This research aims to explore and propose solutions in making informed decisions regarding housing transactions, thereby reducing the problem of information asymmetry. The research examines not only the effect of structural characteristics but also the effect of facilities and macroeconomic to housing price in Taipei, Taichung, and Kaohsiung in Taiwan respectively. Linear regression and two machine learning models (XGBoost and Random Forest) are applied, and the models are interpreted using variable importance and SHAP(SHapley Additive exPlanations) values.
This research shows that model performance reaches a plateau after five quarters of training data, despite increasing the dataset size. Moreover, across different districts, XGBoost and Random Forest outperforms linear regression, indicating that the relationship between variables and housing prices is not solely linear. In terms of interpretability, house age, distance to the nearest department store and distance to the nearest full-capacity elementary school have negative relationships with unit price per square meter of housing across all districts, and the influence of other variables varied by district.
目次 Table of Contents
Contents
Thesis Validation Letter…………………………………………………………………… i
摘要 …………………………………………………………….....................................….ii
Abstract …………………………..........................................................…………………iii
Chapter1 Introduction .................................................................................................... 1
Chapter2 Literature Review ........................................................................................... 6
2.1 Structural Attributes ......................................................................................... 6
2.2 Macroeconomics and Demographics ............................................................... 7
2.3 Accessibility ..................................................................................................... 9
2.4 Machine Learning and Housing Price ............................................................ 12
Chapter3 Data and Methods......................................................................................... 15
3.1 Methods.......................................................................................................... 15
3.2 Data and Variables ......................................................................................... 22
Chapter4 Data Analysis and Research Results ............................................................ 31
4.1 Data Selection ................................................................................................ 31
4.2 Feature Selection ............................................................................................ 35
4.3 Performance Evaluation ................................................................................. 42
4.4 Interval Selection ........................................................................................... 44
4.5 Descriptive statistics ...................................................................................... 47
4.6 Empirical Results ........................................................................................... 52
Chapter5 Conclusion and Research Limitations .......................................................... 63
5.1 Conclusion ..................................................................................................... 63
5.2 Limitation and Future Work ........................................................................... 64
Reference ..................................................................................................................... 66

List of Figures
Figure 3-1 Boosting Schematic Diagram ..................................................................... 16
Figure 3-2 Bagging Schematic Diagram...................................................................... 20
Figure 4-1 Pearson coefficient matrix (all sample) ...................................................... 36
Figure 4-2 Spearman coefficient matrix (all sample) .................................................. 38
Figure 4-3 SHAP Value (Taipei) .................................................................................. 61
Figure 4-4 SHAP Value (Taichung) ............................................................................. 62
Figure 4-5 SHAP Value (Kaohsiung) ........................................................................... 62

List of Tables
Table 3-1 Variable Definition ....................................................................................... 29
Table 4-1 Cramer’s V (all sample) ............................................................................... 39
Table 4-2 Point biserial correlation coefficient (all sample) ........................................ 40
Table 4-3 VIF (all sample) ........................................................................................... 42
Table 4-4 Model performance of all experiments ........................................................ 45
Table 4-5: Descriptive statistics (all sample) ............................................................... 48
Table 4-6: Descriptive statistics (Taipei) ..................................................................... 49
Table 4-7: Descriptive statistics (Taichung) ................................................................. 50
Table 4-8 Descriptive statistics (Kaohsiung) ............................................................... 51
Table 4-9: Model Performance (Taipei) ....................................................................... 53
Table 4-10: Model Performance (Taichung) ................................................................ 54
Table 4-11: Model Performance (Kaohsiung).............................................................. 55
Table 4-12: Feature Importance for the Best and Worst Performance Quarter of Each
District……………………………………………………………………………….58
參考文獻 References
Abdul-Rahman, S., Zulkifley, N. H., Ismail, I., & Mutalib, S. (2021). Advanced machine learning algorithms for house price prediction: Case study in Kuala Lumpur. International Journal of Advanced Computer Science and Applications, 12(12).
Abdullahi, A., Usman, H., & Ibrahim, I. (2018). Determining house price for mass appraisal using multiple regression analysis modeling in Kaduna North, Nigeria. ATBU journal of environmental technology, 11(1), 26-40.
Adair, A., McGreal, S., Smyth, A., Cooper, J., & Ryley, T. (2000). House prices and accessibility: The testing of relationships within the Belfast urban area. Housing studies, 15(5), 699-716.
Adetunji, A. B., Akande, O. N., Ajala, F. A., Oyewo, O., Akande, Y. F., & Oluwadara, G. (2022). House price prediction using random forest machine learning technique. Procedia Computer Science, 199, 806-813.
Asselman, A., Khaldi, M., & Aammou, S. (2023). Enhancing the prediction of student performance based on the machine learning XGBoost algorithm. Interactive Learning Environments, 31(6), 3360-3379.
Avanijaa, J. (2021). Prediction of house price using xgboost regression algorithm. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12(2), 2151-2155.
Borsboom, D., Rhemtulla, M., Cramer, A. O., van der Maas, H. L., Scheffer, M., & Dolan, C. V. (2016). Kinds versus continua: A review of psychometric approaches to uncover the structure of psychiatric constructs. Psychological medicine, 46(8), 1567-1579.
Bourassa, S. C., & Peng, V. S. (1999). Hedonic prices and house numbers: The influence of feng shui. International Real Estate Review, 2(1), 79-93.Breiman, L. (2001). Random forests. Machine learning, 45, 5-32.
Čeh, M., Kilibarda, M., Lisec, A., & Bajat, B. (2018). Estimating the performance of random forest versus multiple regression for predicting prices of the apartments. ISPRS international journal of geo-information, 7(5), 168.
Chau, K. W., & Ng, F. F. (1998). The effects of improvement in public transportation capacity on residential price gradient in Hong Kong. Journal of property valuation and investment, 16(4), 397-410.
Chen, J. Y., Yu, Y., Yuan, Y., Zhang, Y. J., Fan, X. P., Yuan, S. Y., ... & Yao, S. L. (2017). Enriched housing promotes post-stroke functional recovery through astrocytic HMGB1-IL-6-mediated angiogenesis. Cell Death Discovery, 3(1), 1-10.
Chen, C. F., & Chiang, S. H. (2021). Time-varying causality in the price-rent relationship: Revisiting housing bubble symptoms. Journal of Housing and the Built Environment, 36(2), 539-558.
Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).
Cheshire, P., & Sheppard, S. (2004). Introduction to feature: the price of access to better neighbourhoods. The Economic Journal, 114(499), F391-F396.
Chebrolu, S., Abraham, A., & Thomas, J. P. (2005). Feature deduction and ensemble design of intrusion detection systems. Computers & security, 24(4), 295-307.
Chiang, Y. H., Peng, T. C., & Chang, C. O. (2015). The nonlinear effect of convenience stores on residential property prices: A case study of Taipei, Taiwan. Habitat International, 46, 82-90.
Chicco, D., Warrens, M. J., & Jurman, G. (2021). The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. Peerj computer science, 7, e623.
Chikhmous, A., & Rahman, M. T. (2024). Examining the effect of apartment attributes on their sale prices in Riyadh, Saudi Arabia. Spatial Information Research, 1-14.
Choy, L. H., Mak, S. W., & Ho, W. K. (2007). Modeling Hong Kong real estate prices. Journal of Housing and the Built Environment, 22, 359-368.
Clark, D. E., & Herrin, W. E. (2000). The impact of public school attributes on home sale prices in California. Growth and change, 31(3), 385-407.
Clapp, J. M., & Giaccotto, C. (1998). Residential hedonic models: A rational expectations approach to age effects. Journal of Urban Economics, 44(3), 415-437.
Crompton, J. L. (2001). The impact of parks on property values: A review of the empirical evidence. Journal of leisure research, 33(1), 1-31.
Dai, X., Bai, X., & Xu, M. (2016). The influence of Beijing rail transfer stations on surrounding housing prices. Habitat International, 55, 79-88.
De Winter, J. C., Gosling, S. D., & Potter, J. (2016). Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data. Psychological methods, 21(3), 273.
Des Rosiers, F., Lagana, A., Thériault, M., & Beaudoin, M. (1996). Shopping centres and house values: an empirical investigation. Journal of Property Valuation and Investment, 14(4), 41-62.
Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., ... & Lautenbach, S. (2013). Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27-46.
Dziauddin, F. (2014). The determinants of house prices in the Klang Valley, Malaysia. Perspektif Jurnal Sains Sosial dan Kemanusiaan, 6(1), 70-80.
Efthymiou, D., & Antoniou, C. (2013). How do transport infrastructure and policies affect house prices and rents? Evidence from Athens, Greece. Transportation Research Part A: Policy and Practice, 52, 1-22.
Espey, M., & Lopez, H. (2000). The impact of airport noise and proximity on residential property values. Growth and change, 31(3), 408-419.
Fitzgerald, M., Hansen, D. J., McIntosh, W., & Slade, B. A. (2020). Urban land: price indices, performance, and leading indicators. The Journal of Real Estate Finance and Economics, 60, 396-419.
Feitelson, E. I., Hurd, R. E., & Mudge, R. R. (1996). The impact of airport noise on willingness to pay for residences. Transportation Research Part D: Transport and Environment, 1(1), 1-14.
Fletcher, M., Gallimore, P., & Mangan, J. (2000). Heteroscedasticity in hedonic house price models. Journal of Property Research, 17(2), 93-108.
Fell, H., & Kousky, C. (2015). The value of levee protection to commercial properties. Ecological Economics, 119, 181-188..
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1), 119-139.
Gohil, N. P., & Meniya, A. D. (2021, February). Click ad fraud detection using XGBoost gradient boosting algorithm. In International Conference on Computing Science, Communication and Security (pp. 67-81). Cham: Springer International Publishing.
Graham, M. H. (2003). Confronting multicollinearity in ecological multiple regression. Ecology, 84(11), 2809-2815.
Hall, M. A. (1999). Correlation-based feature selection for machine learning (Doctoral dissertation, The University of Waikato).
Hauke, J., & Kossowski, T. (2011). Comparison of values of Pearson's and Spearman's correlation coefficients on the same sets of data. Quaestiones geographicae, 30(2), 87-93.
Henriksson, E., & Werlinder, K. (2021). Housing Price Prediction over Countrywide Data: A comparison of XGBoost and Random Forest regressor models.
Horst, P., Wallin, P. C., Guttman, L. C., Wallin, F. B. C., Clausen, J. A., Reed, R. C., & Rosenthal, E. C. (1941). The prediction of personal adjustment: A survey of logical problems and research techniques, with illustrative application to problems of vocational selection, school success, marriage, and crime.
Ho, T. K. (1995, August). Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition (Vol. 1, pp. 278-282). IEEE.
Hopkins, E. A. (2018). The influence of public transportation on housing values. International Journal of Sustainable Development & World Ecology, 25(3), 206-215.
Huh, S., & Kwak, S. J. (1997). The choice of functional form and variables in the hedonic price model in Seoul. Urban Studies, 34(7), 989-998..
Huynh-Thu, V. A., Irrthum, A., Wehenkel, L., & Geurts, P. (2010). Inferring regulatory networks from expression data using tree-based methods. PloS one, 5(9), e12776.
Ishwaran, H. (2007). Variable importance in binary regression trees and forests.
Jim, C. Y., & Chen, W. Y. (2010). External effects of neighbourhood parks and landscape elements on high-rise residential value. Land use policy, 27(2), 662-670.
Kain, J. F., & Quigley, J. M. (1970). Measuring the value of housing quality. Journal of the American statistical association, 65(330), 532-548.
Kishor, N. K., & Marfatia, H. A. (2017). The dynamic relationship between housing prices and the macroeconomy: Evidence from OECD countries. The Journal of Real Estate Finance and Economics, 54, 237-268.
Law, S., Paige, B., & Russell, C. (2019). Take a look around: using street view and satellite images to estimate house prices. ACM Transactions on Intelligent Systems and Technology (TIST), 10(5), 1-19.
Li, M. M., & Brown, H. J. (1980). Micro-neighborhood externalities and hedonic housing prices. Land economics, 56(2), 125-141.
Li, S., Jiang, Y., Ke, S., Nie, K., & Wu, C. (2021). Understanding the effects of influential factors on housing prices by combining extreme gradient boosting and a hedonic price model (XGBoost-HPM). Land, 10(5), 533.
Liew, W. S., Tang, T. B., Lin, C. H., & Lu, C. K. (2021). Automatic colonic polyp detection using integration of modified deep residual convolutional neural network and ensemble learning approaches. Computer Methods and Programs in Biomedicine, 206, 106114.
Liu, H., Chen, X., & Liu, X. (2022). Factors influencing secondary school students’ reading literacy: An analysis based on XGBoost and SHAP methods. Frontiers in Psychology, 13, 948612.
Li, Z. (2022). Extracting spatial effects from machine learning model using local interpretation method: An example of SHAP and XGBoost. Computers, Environment and Urban Systems, 96, 101845.
Lundberg, S. M., Erion, G. G., & Lee, S. I. (2018). Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:1802.03888.
Ma, M., Zhao, G., He, B., Li, Q., Dong, H., Wang, S., & Wang, Z. (2021). XGBoost-based method for flash flood risk assessment. Journal of Hydrology, 598, 126382.
Mason, C. H., & Perreault Jr, W. D. (1991). Collinearity, power, and interpretation of multiple regression analysis. Journal of marketing research, 28(3), 268-280.
Manasa, J., Gupta, R., & Narahari, N. S. (2020, March). Machine learning based predicting house prices using regression techniques. In 2020 2nd International conference on innovative mechanisms for industry applications (ICIMIA) (pp. 624-630). IEEE.
Manju, N., Harish, B. S., & Prajwal, V. (2019). Ensemble feature selection and classification of internet traffic using XGBoost classifier. International Journal of Computer Network and Information Security, 11(7), 37.
Meese, R., & Wallace, N. (2003). House price dynamics and market fundamentals: the Parisian housing market. Urban Studies, 40(5-6), 1027-1045.
Myers, L., & Sirois, M. J. (2004). Spearman correlation coefficients, differences between. Encyclopedia of statistical sciences, 12.
Nneji, O., Brooks, C., & Ward, C. W. (2013). House price dynamics and their reaction to macroeconomic changes. Economic Modelling, 32, 172-178.
Núñez-Tabales, J. M., Rey-Carmona, F. J., & Caridad y Ocerin, J. M. C. (2016). Commercial properties prices appraisal: alternative approach based on neural networks. Journal of Artificial Intelligence, 14(1), 53-70.
Pan, S., Zheng, Z., Guo, Z., & Luo, H. (2022). An optimized XGBoost method for predicting reservoir porosity using petrophysical logs. Journal of Petroleum Science and Engineering, 208, 109520.
Park, J. H., Lee, D. K., Park, C., Kim, H. G., Jung, T. Y., & Kim, S. (2017). Park accessibility impacts housing prices in Seoul. Sustainability, 9(2), 185.
Peng, T. C., & Chiang, Y. H. (2015). The non-linearity of hospitals’ proximity on property prices: Experiences from Taipei, Taiwan. Journal of Property Research, 32(4), 341-361.
Palmquist, R. B. (1992). A note on transactions costs, moving costs, and benefit measurement. Journal of urban economics, 32(1), 40-44.
Pedersen, L. J., Malmkvist, J., & Andersen, H. M. L. (2013). Housing of sows during farrowing: a review on pen design, welfare and productivity. Livestock housing: Modern management to ensure optimal health and welfare of farm animals, 285-297.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830.
Peng, Z., Huang, Q., & Han, Y. (2019, October). Model research on forecast of second-hand house price in Chengdu based on XGboost algorithm. In 2019 IEEE 11th International Conference on Advanced Infocomm Technology (ICAIT) (pp. 168-172). IEEE.
Porcu, E., Bevilacqua, M., & Genton, M. G. (2016). Spatio-temporal covariance and cross-covariance functions of the great circle distance on a sphere. Journal of the American Statistical Association, 111(514), 888-898.
Rodriguez, M., & Sirmans, C. F. (1994). Quantifying the value of a view in single-family housing markets. Appraisal Journal, 62, 600-600.
Sandri, M., & Zuccolotto, P. (2008). A bias correction algorithm for the Gini variable importance measure in classification trees. Journal of Computational and Graphical Statistics, 17(3), 611-628.
Sedgwick, P. (2012). Pearson’s correlation coefficient. Bmj, 345.
Shapley, L. S. (1953). A value for n-person games.
Shum, M., Sun, W., & Ye, G. (2014). Superstition and “lucky” apartments: Evidence from transaction-level data. Journal of Comparative Economics, 42(1), 109-117.
Shrestha, N. (2020). Detecting multicollinearity in regression analysis. American Journal of Applied Mathematics and Statistics, 8(2), 39-42.
Sirpal, R. (1994). Empirical modeling of the relative impacts of various sizes of shopping centers on the values of surrounding residential properties. Journal of Real Estate Research, 9(4), 487-505.
Song, Y., & Knaap, G. J. (2004). Measuring the effects of mixed land uses on housing values. Regional Science and Urban Economics, 34(6), 663-680.
Speiser, J. L., Miller, M. E., Tooze, J., & Ip, E. (2019). A comparison of random forest variable selection methods for classification prediction modeling. Expert systems with applications, 134, 93-101.
Straszheim, M. R. (1975). Front matter," An Econometric Analysis of the Urban Housing Market". In An Econometric Analysis of the Urban Housing Market (pp. 16-0). NBER.
Štrumbelj, E., & Kononenko, I. (2014). Explaining prediction models and individual predictions with feature contributions. Knowledge and information systems, 41, 647-665.
Taylor, L. O., Phaneuf, D. J., & Liu, X. (2016). Disentangling property value impacts of environmental contamination from locally undesirable land uses: Implications for measuring post-cleanup stigma. Journal of Urban Economics, 93, 85-98.
Tchuente, D., & Nyawa, S. (2022). Real estate price estimation in French cities using geocoding and machine learning. Annals of Operations Research, 308(1), 571-608.
Terol, R. M., Reina, A. R., Ziaei, S., & Gil, D. (2020). A machine learning approach to reduce dimensional space in large datasets. IEEE Access, 8, 148181-148192.
Tomkins, J., Topham, N., Twomey, J., & Ward, R. (1998). Noise versus access: The impact of an airport in an urban property market. Urban studies, 35(2), 243-258.
Tse, R. Y., & Love, P. E. (2000). Measuring residential property values in Hong Kong. Property Management, 18(5), 366-374.
Wong, S. K., Chau, K. W., Yau, Y., & Cheung, A. K. C. (2011). Property price gradients: the vertical dimension. Journal of Housing and the Built Environment, 26, 33-45.
Yang, J., Yu, Z., & Deng, Y. (2018). Housing price spillovers in China: A high-dimensional generalized VAR approach. Regional science and urban economics, 68, 98-114.
Yamagata, Y., Murakami, D., Yoshida, T., Seya, H., & Kuroda, S. (2016). Value of urban views in a bay city: Hedonic analysis with the spatial multilevel additive regression (SMAR) model. Landscape and Urban Planning, 151, 89-102.
Zhang, L., & Yi, Y. (2017). Quantile house price indices in Beijing. Regional Science and Urban Economics, 63, 85-96.
Zheng, H., Yuan, J., & Chen, L. (2017). Short-term load forecasting using EMD-LSTM neural networks with a Xgboost algorithm for feature importance evaluation. Energies, 10(8), 1168.
Zhu, L., Li, L., & Wang, M. H. (2020). Comparison of Regression Models on House Value Prediction.
Zhou, Y. (2020). Housing Sale Price Prediction Using Machine Learning Algorithms. University of California, Los Angeles.






電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus:開放下載的時間 available 2028-01-24
校外 Off-campus:開放下載的時間 available 2028-01-24

您的 IP(校外) 位址是 3.147.44.41
現在時間是 2025-04-06
論文校外開放下載的時間是 2028-01-24

Your IP address is 3.147.44.41
The current date is 2025-04-06
This thesis will be available to you on 2028-01-24.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 2028-01-24

QR Code