基于语义分析与密度聚类的高频维修检测方法

向彦州; 余芳强; 许璟琳; 彭阳

doi:10.13204/j.gyjzG22073011

基于语义分析与密度聚类的高频维修检测方法

doi: 10.13204/j.gyjzG22073011

上海建工四建集团有限公司, 上海 201103

基金项目:

国家重点研发计划课题（2020YFD1100604）。

详细信息

作者简介:
向彦州,男,1998年出生,xyz1299309760@163.com。

通讯作者:
彭阳,男,1993年出生,854525261@qq.com。

计量
- 文章访问数: 108
- HTML全文浏览量: 29
- PDF下载量: 1
- 被引次数: 0
出版历程
- 收稿日期: 2022-07-30
- 网络出版日期: 2023-03-22

High-Frequency Maintenance Detection Method Based on Semantic Analysis and Density Clustering

Shanghai Construction No. 4 (Group) Co., Ltd., Shanghai 201103, China

摘要

摘要: 传统建筑维修工单管理系统容易忽视对工单文本描述部分的分析，导致有价值的信息被淹没在大量杂乱数据中，使得重复、高频工单难以快速准确提取。针对上述问题，采用一种基于关键词库的中文分词算法，对建筑维修工单报修内容的长文本描述进行合理分词；然后，采用基于K-means的密度检测算法，引入工单各属性的权值，从而计算任意两个工单间的赋权欧式距离，得到各工单密度并提取候选重复工单集合；最后，采用基于密度的DBSCAN聚类算法，确定最终的重复工单集合，并在实际工程中进行应用验证。可较为精准有效地从大量数据中提取重复工单，有助于提升建筑维修工单分析效率，保障后勤精细化管理水平。
- 维修工单 /
- 建筑运维 /
- 中文分词 /
- 密度检测 /
- 聚类分析
Abstract: The traditional management system for building maintenance work orders is highly likely to ignore the analysis of the textual description part of a work order. Consequently, valuable information is submerged in a large amount of messy data, which makes it difficult to extract repeated and high-frequency work orders quickly and accurately. To solve the above problem, this paper adopted a Chinese word segmentation algorithm based on a keyword library to properly segment the long textual description of repair content in building maintenance work orders. Then, the density detection algorithm based on K-means was employed to introduce the weight of each attribute of the work order and further calculate the weighted Euclidean distance between any two work orders. The density of each work order was obtained, and candidate repeated work order sets were extracted. Finally, the density-based spatial clustering of applications with noise (DBSCAN) algorithm was utilized to determine the final repeated work order set, and the proposed method was applied in an actual project for verification. The results show that the proposed method can accurately and effectively extract repeated work orders from a large amount of data, thereby improving the efficiency of analyzing building maintenance work orders and ultimately ensuring the level of refined logistics management.
- maintenance work order /
- building operation and maintenance /
- Chinese word segmentation /
- density detection /
- clustering analysis

HTML全文

参考文献(16)

[1]	廖建涵,唐忠,刘晓红. 应用大数据和人工智能技术构建医院信息运维平台的探讨[J]. 信息与电脑(理论版), 2020, 32(24):124-126.
[2]	徐荣,张凤娟,宋朝钦. 基于医院智慧后勤平台的综合维修全流程管理的探索与实践[J]. 中国数字医学, 2021, 16(10):56-60.
[3]	ALQADY M, KANDIL A, MOHAMMED A Q, et al. Concept relation extraction from construction documents using natural language processing[J]. Journal of Construction Engineering and Management,2010,136:.
[4]	张海燕. 基于BIM的建设领域文本信息管理研究[D]. 大连:大连理工大学, 2013.
[5]	WU L T,LIN J R,LENG S,et. al. Rule-based information extraction for mechanical-electrical-plumbing-specific semantic web[J]. Automation in Construction, 2022, 135:104108.
[6]	李鹏,光永星,乔天玲,等. 面向建筑领域的中文分词方法研究[J]. 电脑与信息技术, 2021, 29(5):67-72.
[7]	张艺聪. 基于LSTM和K-means聚类的水利文本分类模型研究[D]. 郑州:华北水利水电大学, 2021.
[8]	符保龙,张爱科. 基于均值密度中心估计的K-means聚类文本挖掘方法[J]. 重庆邮电大学学报(自然科学版), 2014, 26(1):111-116.
[9]	罗军锋,锁志海. 一种基于密度的K-means聚类算法[J]. 微电子学与计算机, 2014, 31(10):28-31.
[10]	张晓彩. 基于K-means的中文文本精确聚类算法研究[D]. 秦皇岛:燕山大学, 2012.
[11]	刘宏超. 基于DBSCAN的文本聚类算法研究[D]. 南昌:江西财经大学, 2016.
[12]	陈二静,姜恩波. 文本相似度计算方法研究综述[J]. 数据分析与知识发现, 2017, 1(6):1-11.
[13]	赵哲源,秦海波,朱培军,等. 基于相似度分析的工单回复与审核平台建设方案[J]. 自动化应用, 2022(2):101-103, 111.
[14]	FENG X Y. High quality algorithm for chinese short messages text clustering based on semantic[J]. Advanced Materials Research,2013,2534(756-759).
[15]	何铠,管有庆,龚锐. 一种基于权重预处理的中文文本分类算法[J]. 计算机技术与发展, 2022, 32(3):40-45 , 53.
[16]	HU Z Z,LENG S,LIN J R, et al. Knowledge Extraction and Discovery Based on BIM A Critical Review and Future Directions[J]. Archives of Computational Methods in Engineering, 2021,29:335-336.