基于数据挖掘技术的北方寒冷地区居民用水、用气数据处理方法探讨
Data processing method of residential water and gas data in northern cold zone based on data mining technology
摘要:
对于一个城市尺度的居民用气、用水数据系统,其数据量之大通常是人力无法或难以处理与分析的,往往需要借助于数据挖掘技术。基于天津市某城区3个小区用户2 a的用水、用气数据,采用数据标准化、基于临近性检测、箱线图等数据挖掘方法,对用户数据进行了用能异常、邻月用能数据变化异常等检测,并横向比较了3个小区的总体用能水平。结合问卷调研数据,提出了通过信息增益理论及C4.5决策树等数据挖掘算法建立用户用能水平与用户特征及用户行为间的关联关系的方法。本文的研究工作展示了从建筑用能数据中提取有效信息的过程,可为建筑能耗数据管理平台构建和应用提供新思路。
Abstract:
The amount of residential water and gas data system for a city is too large to be manually processed, which requires the support of data mining technology. Based on a 2-year survey on the water and gas data of three communities in Tianjin, presents the processes and results of data processing, analyses outlier detection of energy use and its change in the adjacent two months and compares the overall energy use levels among the three communities, using data mining methods such as data normalization, outlier detection based on proximity and boxplot. Combined with the questionnaire survey data, proposes a data mining approach to explore the correlation between occupants’ energy use levels and their social characteristics and energy related behaviors through information gain theory and C4.5 decision tree. Presents the methodology of extracting useful information from building energy use data, which is expected to assist the platform construction of energy use data management and its application.
Keywords:data mining, outlier detection, boxplot, information gain rate (IGR), C4.5 decision tree