改进灰狼优化算法的K-Means文本聚类K-Means Text Clustering Based on Improved Gray Wolf Optimization Algorithm
潘成胜;张斌;吕亚娜;杜秀丽;邱少明;
摘要(Abstract):
针对K-Means算法在文本聚类过程中易陷入局部最优,造成文本聚类结果不准确的问题,提出了一种基于改进灰狼优化算法的K-Means文本聚类方法。在对文本数据进行分词、去停用词、特征提取以及文本向量化后,通过免疫克隆选择选出精英个体,并对精英个体进行深度探索以增加灰狼种群的多样性,避免早熟收敛现象的发生;将粒子群位置更新思想与灰狼位置更新结合,降低灰狼优化算法陷入局部极值的风险;与K-Means算法结合进行文本聚类。所提算法与K-Means算法、GWO-KMeans以及IPSK-Means算法相比,其准确率、召回率和F值平均都有明显提高,文本聚类结果更可靠。
关键词(KeyWords): K-Means算法;文本聚类;灰狼优化算法;免疫克隆;粒子群
基金项目(Foundation): 中央军委装备发展部领域基金
作者(Author): 潘成胜;张斌;吕亚娜;杜秀丽;邱少明;
Email:
DOI:
参考文献(References):
- [1] YE Z,LIANG K,ZHANG Z Y,et al.An improved bisecting K-Means text clustering method[C]//Proceedings of the International Conference on Intelligent and Interactive Systems and Applications,2019:155-162.
- [2] HADIFAR A,STERCKX L,DEMEESTER T,et al.A selftraining approach for short text clustering[C]//Proceedings of the 4th Workshop on Representation Learning for NLP(RepL4NLP-2019),2019:194-199.
- [3] ZHAO Q,SHI Y L,QING Z P.Research on hadoopbased massive short text clustering algorithm[C]//Proceedings of the Fourth International Workshop on Pattern Recognition.International Society for Optics and Photonics,2019.
- [4]杨俊闯,赵超.K-Means聚类算法研究综述[J].计算机工程与应用,2019,55(23):7-14.
- [5] QIN J H,FU W M,GAO H J,et al.Distributed K-Means algorithm and Fuzzy C-Means algorithm for sensor networks based on multiagent consensus theory[J].IEEE Transactions on Cybernetics,2016,47(3):772-783.
- [6] KHANMOHAMMADI S,ADIBEIG N,SHANEHBANDY S.An improved overlapping K-Means clustering method for medical applications[J].Expert Systems with Applications,2017,67:12-18.
- [7]钮永莉,武斌.基于改进粒子群和K-Means的文本聚类算法研究[J].兰州文理学院学报(自然科学版),2019,33(4):44-47.
- [8]张国锋,吴国文.基于核函数的改进K-Means文本聚类[J].计算机应用与软件,2019,36(9):281-284.
- [9]王俊丰,贾晓霞,李志强.基于K-Means算法改进的短文本聚类研究与实现[J].信息技术,2019,43(12):76-80.
- [10]田诗宵,丁立新,郑金秋.基于密度峰值优化的K-Means文本聚类算法[J].计算机工程与设计,2017,38(4):1019-1023.
- [11] MIRJALILI,MIRJALILI S M,LIEWIS A.Grey wolf optimizer[J].Advances in Engineering Software,2014,69(3):46-61.
- [12] KUMAR V,CHHABRA J K,KUMAR D.Grey wolf algorithm-based clustering technique[J].Journal of Intelligent Systems,2017,26(1):153-168.
- [13] ZHANG S,ZHOU Y Q.Grey wolf optimizer based on Powell local optimization method for clustering analysis[J].Discrete Dynamics in Nature and Society,2015,31:1-17.
- [14]杨红光,刘建生.一种结合灰狼优化和K-均值的混合聚类算法[J].江西理工大学学报,2015,36(5):85-89.
- [15]刘佳鸣,况立群,尹洪红,等.灰狼优化的K均值聚类算法[J].中国科技论文,2019,14(7):778-782.
- [16]胡超杰,章兢.一种采用克隆选择的免疫差分进化算法[J].计算机应用研究,2013,30(6):1640-1642.
- [17]杨丹,朱世玲,卞正宇.基于改进的K-Means算法在文本挖掘中的应用[J].计算机技术与发展,2019,29(4):68-71.