基于Energy Efficiency数据集的能耗建模研究
A Study on Energy Consumption Modeling Based on the Energy Efficiency Dataset
摘要: 随着建筑行业向绿色、低碳化转型,精准预测建筑能耗成为提升能源利用效率的关键。本文以UCI机器学习库中的建筑能效(Energy Efficiency)数据集为研究对象,深入探讨了不同抽样策略对统计推断精度及机器学习预测模型性能的影响。研究首先系统分析了数据集的8个特征变量与2个能耗目标变量(采暖负荷Y1与制冷负荷Y2)之间的关联性。随后,设计并实施了简单随机抽样、基于K-Means聚类的分层抽样以及整群抽样三种方案,并从均值相对误差、统计稳定性和设计效应等维度对抽样质量进行了量化评估;在此基础上,本文利用不同抽样背景下的样本构建了多输出回归模型。实验结果表明:分层抽样通过捕捉建筑高度和玻璃窗面积等核心变量的结构化分布,其样本代表性显著优于其他方法,能够有效降低抽样方差。在模型性能方面,基于分层抽样训练的预测模型在R2和RMSE指标上均表现最优。研究结论证实,科学的抽样设计不仅能提高有限样本下的数据质量,更能显著增强后续回归分析的泛化能力与可靠性,为建筑能效领域的统计分析提供了方法论支持。
Abstract: As the construction industry transitions toward green and low-carbon development, the accurate prediction of building energy consumption has become a key factor in improving energy utilization efficiency. This paper takes the Energy Efficiency dataset from the UCI Machine Learning Repository as its research object, deeply exploring the impact of different sampling strategies on the accuracy of statistical inference and the performance of machine learning regression modeling. The dataset covers eight input variables reflecting building geometric characteristics and two energy consumption target variables: Heating Load (Y1) and Cooling Load (Y2). The study first identified the correlations between variables through exploratory data analysis (EDA), and subsequently designed and implemented three sampling schemes: simple random sampling, stratified sampling based on K-Means clustering, and cluster sampling. Through metrics such as relative mean error, statistical stability, and the design effect (Deff), the paper systematically and quantitatively evaluated the representativeness of samples from each scheme. On this basis, multi-output regression prediction models were constructed using samples obtained from the different sampling backgrounds. The experimental results indicate that stratified sampling, by capturing the structural distribution of core variables such as building height and glazing area, demonstrates significantly better sample representativeness than other methods. It effectively reduces sampling variance and enhances the information efficiency of the data. In terms of model performance, the model trained on stratified samples achieved the best results across both R2 and RMSE indicators. The findings confirm that scientific sampling design not only improves data quality under limited sample conditions but also significantly strengthens the generalization capability and reliability of regression analysis, providing methodological support for statistical research in the field of building energy efficiency.
文章引用:李洋. 基于Energy Efficiency数据集的能耗建模研究[J]. 统计学与应用, 2026, 15(5): 88-97. https://doi.org/10.12677/sa.2026.155109

参考文献

[1] Tsanas, A. and Xifara, A. (2012) Accurate Quantitative Estimation of Energy Performance of Residential Buildings Using Statistical Machine Learning Tools. Energy and Buildings, 49, 560-567. [Google Scholar] [CrossRef
[2] Catalina, T., Virgone, J. and Blanco, E. (2008) Development and Validation of Regression Models to Predict Monthly Heating Demand for Residential Buildings. Energy and Buildings, 40, 1825-1832. [Google Scholar] [CrossRef
[3] Ourghi, R., Al-Anzi, A. and Krarti, M. (2007) A Simplified Analysis Method to Predict the Impact of Shape on Annual Energy Use for Office Buildings. Energy Conversion and Management, 48, 300-305. [Google Scholar] [CrossRef
[4] Pacheco, R., Ordóñez, J. and Martínez, G. (2012) Energy Efficient Design of Building: A Review. Renewable and Sustainable Energy Reviews, 16, 3559-3573. [Google Scholar] [CrossRef
[5] Al-Sanea, S.A. and Zedan, M.F. (2011) Improving Thermal Performance of Building Walls by Optimizing Insulation Layer Distribution and Thickness for Same Thermal Mass. Applied Energy, 88, 3113-3124. [Google Scholar] [CrossRef
[6] Susorova, I., Tabibzadeh, M., Rahman, A., Clack, H.L. and Elnimeiri, M. (2013) The Effect of Geometry Factors on Fenestration Energy Performance and Energy Savings in Office Buildings. Energy and Buildings, 57, 6-13. [Google Scholar] [CrossRef
[7] Lindén, A., Carlsson-Kanyama, A. and Eriksson, B. (2006) Efficient and Inefficient Aspects of Residential Energy Behaviour: What Are the Policy Instruments for Change? Energy Policy, 34, 1918-1927. [Google Scholar] [CrossRef
[8] Cochran, W.G. (1977) Sampling Techniques. 3rd Edition, John Wiley & Sons.
[9] Lohr, S.L. (2021) Sampling: Design and Analysis. CRC Press. [Google Scholar] [CrossRef
[10] Magoulès, F. and Zhao, H.X. (2016) Data Mining and Machine Learning in Building Energy Analysis. Wiley. [Google Scholar] [CrossRef
[11] Kish, L. (1965) Survey Sampling. John Wiley & Sons.
[12] Bourdeau, M., Zhai, X.Q., Nefzaoui, E., Guo, X. and Chatellier, P. (2019) Modeling and Forecasting Building Energy Consumption: A Review of Data-Driven Techniques. Sustainable Cities and Society, 48, Article ID: 101533. [Google Scholar] [CrossRef
[13] Chou, J.S. and Bui, D.K. (2014) Modeling Heating and Cooling Loads by Artificial Intelligence for Energy-Efficient Building Design. Energy and Buildings, 82, 437-446. [Google Scholar] [CrossRef
[14] 金勇进. 抽样技术[M]. 北京: 中国人民大学出版社, 2015.