Welcome to HPMug2oMmNrOfxWQHLiEksa6s0hFu9Ox348d7QefarYlaFR5ArkhOwm3Da1pmxmxCtenj1+6luWD#r#n+EPn9L6Ce+9onqnMlT+i! Today is

Journal of Kunming Metallurgy College ›› 2025, Vol. 41 ›› Issue (3): 101-.DOI: 10.3969/j.issn.1009-0479.2025.03.016

Previous Articles     Next Articles

Semantic-Aware Bloom Filter Based on Large Language Models#br#

ZHANG Hao, TAl Mengsiyun, ZHAO Wentao, HE Wei   

  1. (Faculty of Computer Information, Kunming Metallurgy College, Kunming 650033, China )
  • Received:2024-12-11 Online:2025-06-07 Published:2025-09-24

Abstract: With the rapid growth of data volume , traditional Bloom Filters face challenges such as highfalse positive rates and limited flexibility when processing large-scale data streams. To improve the accu-racy and eliciency of data stream processing, this paper introduces a semantic-aware Bloom Filter(SABF) based on large language models (LLMs). By leveraging the advanced semantic understandingcapabilities of LLMs, SABF generates semantic embedding vectors for text data and uses this informationto optimize the selection of hash functions and the design of bitmap structures. This enables more preciseidentification of the semantic features within the text data. Experimental results demonstrate that SABFsignificantly reduces the false positive rate, especially as data volume increases, where it lowers the falsepositive rate by over 20%6 compared to traditional methods. Additionally, SABF excels in identifying se.mantically similar documents, achieving an accuracy rate of 83 % , thereby significantly improving the efficiency of processing, complex semantic infommation. This study presents an innovative solution for large.scale text data processing and real-time data stream applications, making a valuable contribution to the
advancement of related fields.

Key words: semantic awareness , Bloom Filter , large language models, BERT, data structure optimiza-tion

CLC Number: