Integrating Statistical Diagnostics and Machine Learning for Predicting House Prices in Nigeria

Authors

  • Olatunji Taofik Arowolo Lagos State University of Science and Technology, Nigeria: University of Lagos, Nigeria
  • Mathew Ekum Lagos State University of Science and Technology, Nigeria; Ludwig Maximilians University of Munich, Germany
  • Okechukwu Charles Aronu Chukwuemeka Odumegwu Ojukwu University, Uli, Anambra State, Nigeria

Keywords:

House price prediction; Statistical diagnostics; Ridge regression; Lasso regression; Machine learning; Real estate analytics; Nigeria

Abstract

Accurate prediction of house prices is critical for real estate valuation, mortgage risk assessment, and urban planning. Traditional statistical models, such as multiple linear regression, provide interpretable insights but are often constrained by multicollinearity, heteroscedasticity, and limited explanatory power. In contrast, machine learning algorithms are capable of capturing complex relationships but frequently lack transparency. This study integrates statistical diagnostics with machine learning techniques to develop a more robust predictive framework for the Nigerian housing market, using a dataset of 12,592 residential properties. Regression diagnostics, including multicollinearity checks and residual analysis, were combined with multiple linear regression, Ridge regression, Lasso regression, Random Forest, and Gradient Boosting. Results show that Ridge and Lasso regression provided the most reliable performance, with Ridge achieving the best balance between predictive accuracy (R² = 0.204) and interpretability. Ensemble methods, unexpectedly, underperformed due to the categorical-heavy structure of the data. The findings highlight the value of hybrid approaches that embed diagnostics into machine learning pipelines, offering models that are both transparent and practically useful for policy and decision-making.

Downloads

Published

2025-11-21