基于孤立森林算法的集中供热系统异常数据识别研究
Abnormal data identification of central heating systems based on isolation forest algorithm
摘要:
针对集中供热系统异常数据识别工作量大的问题,本文提出采用孤立森林(IF)算法进行异常数据的自动识别。以天津某换热站一个供暖季的数据作为样本,详细分析了集中供热系统数据本身的物理规律及IF算法设定参数对模型性能的影响规律。针对集中供热系统运行调节所导致的部分正常数据误诊率高的问题提出了数据集参数相对化的方法,通过该方法可以降低6.7%的数据误诊率和44.6%的漏诊率。通过对比不同IF算法设定参数下的模型性能,给出了供热系统数据异常识别的推荐参数设定范围。
Abstract:
Aiming at the problem of heavy workload in abnormal data identification of central heating systems, this paper proposes to use the isolation forest (IF) algorithm to automatically identify abnormal data. Taking the data of a heating season in a heat exchange station in Tianjin as a sample, the physical laws of the central heating system data itself and the influence of the parameters set by the IF algorithm on the model performance are analysed in detail. Aiming at the problem of high misdiagnosis rate of some normal data caused by the operation regulation of central heating systems, a method of data set parameter relativization is proposed. This method can reduce the data misdiagnosis rate by 6.7% and the missed diagnosis rate by 44.6%. By comparing the model performance under different IF algorithm setting parameters, the recommended parameter setting range for abnormal data identification of the heating systems is given.
Keywords:central heating system; abnormal data; automatic identification; isolation forest algorithm; parameter relativization