欢迎访问昆明冶金高等专科学校学报官方网站,今天是 分享到:

昆明冶金高等专科学校学报 ›› 2025, Vol. 41 ›› Issue (3): 101-.DOI: 10.3969/j.issn.1009-0479.2025.03.016

• 电子信息技术 • 上一篇    下一篇

基于大语言模型的语义感知 Bloom Filter#br#

张 浩,太梦思云,赵文韬,和炜   

  1. (昆明冶金高等专科学校计算机信息学院,云南昆明650033)
  • 收稿日期:2024-12-11 出版日期:2025-06-07 发布日期:2025-09-24
  • 作者简介:张 浩 (1992-),男,云南昆明人,讲师,工学硕士,主要从事深度学习与生物信息学研究。

Semantic-Aware Bloom Filter Based on Large Language Models#br#

ZHANG Hao, TAl Mengsiyun, ZHAO Wentao, HE Wei   

  1. (Faculty of Computer Information, Kunming Metallurgy College, Kunming 650033, China )
  • Received:2024-12-11 Online:2025-06-07 Published:2025-09-24

摘要: 随着数据量的迅猛增长,传统的 Bloom Filer 在处理大规模数据流时面临较高的误判率和缺乏灵活性的问题。为提升数据流处理的精度与效率,提出了种基于大语言模型(LLM)的语义感知 Bloom Filter (SABF)。SABF通过融合大语言模型在语义理解方面的卓越能力,生成文本数据的语义嵌入向量,并利用这些信息调整哈希函数的选择及位图结构设计,从而更精准地识别文本数据的语义特征。实验结果表明,SABF 能显著降低误判率,尤其是在数据规模扩大后,其误判率较传统方法降低了超过20%。此外,SABF在识别语义相似文档方面表现优异,准确率达到83%,有效提升了复杂语义信息的处理效率。

关键词: 语义感知, Bloom 过滤器, 大语言模型, 双向编码器表征模型, 数据结构优化

Abstract: With the rapid growth of data volume , traditional Bloom Filters face challenges such as highfalse positive rates and limited flexibility when processing large-scale data streams. To improve the accu-racy and eliciency of data stream processing, this paper introduces a semantic-aware Bloom Filter(SABF) based on large language models (LLMs). By leveraging the advanced semantic understandingcapabilities of LLMs, SABF generates semantic embedding vectors for text data and uses this informationto optimize the selection of hash functions and the design of bitmap structures. This enables more preciseidentification of the semantic features within the text data. Experimental results demonstrate that SABFsignificantly reduces the false positive rate, especially as data volume increases, where it lowers the falsepositive rate by over 20%6 compared to traditional methods. Additionally, SABF excels in identifying se.mantically similar documents, achieving an accuracy rate of 83 % , thereby significantly improving the efficiency of processing, complex semantic infommation. This study presents an innovative solution for large.scale text data processing and real-time data stream applications, making a valuable contribution to the
advancement of related fields.

Key words: semantic awareness , Bloom Filter , large language models, BERT, data structure optimiza-tion

中图分类号: