<kbd id="afajh"><form id="afajh"></form></kbd>
<strong id="afajh"><dl id="afajh"></dl></strong>
    <del id="afajh"><form id="afajh"></form></del>
        1. <th id="afajh"><progress id="afajh"></progress></th>
          <b id="afajh"><abbr id="afajh"></abbr></b>
          <th id="afajh"><progress id="afajh"></progress></th>

          fastbloom高性能布隆過濾器

          聯(lián)合創(chuàng)作 · 2023-09-28 09:44

          fastbloom 是使用 Rust 實(shí)現(xiàn)的 bloom filter(布隆過濾器) | counting bloom filter(計(jì)數(shù)布隆過濾器) Python 庫及 Rust 庫。比目前廣泛使用的 pybloom-live 插入性能大約快10倍以上。

          如果對(duì)您有幫助,麻煩給項(xiàng)目點(diǎn)個(gè)贊吧^v^

          setup

          Python

          requirements

          Python >= 3.7
          

          install

          使用如下命令安裝 fastbloom 最新版本:

          pip install fastbloom-rs

          Rust

          fastbloom-rs = "{latest}"

          Examples

          BloomFilter

          布隆過濾器(Bloom Filter)是1970年由布隆提出的。它實(shí)際上是一個(gè)很長(zhǎng)的二進(jìn)制向量和一系列隨機(jī)映射函數(shù)。布隆過濾器 可以用于檢索一個(gè)元素是否在一個(gè)集合中。它的優(yōu)點(diǎn)是空間效率和查詢時(shí)間都比一般的算法要好的多,缺點(diǎn)是有一定的誤識(shí)別率和刪除困難。

          參考: Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7), 422-426. 全文

          Python

          基礎(chǔ)用法

          from fastbloom_rs import BloomFilter
          
          bloom = BloomFilter(100_000_000, 0.01)
          
          bloom.add_str('hello')
          bloom.add_bytes(b'world')
          bloom.add_int(9527)
          
          assert bloom.contains('hello')
          assert bloom.contains(b'world')
          assert bloom.contains(9527)
          
          assert not bloom.contains('hello world')

          基于 bytes 或者 list 構(gòu)造布隆過濾器

          from fastbloom_rs import BloomFilter
          
          bloom = BloomFilter(100_000_000, 0.01)
          bloom.add_str('hello')
          assert bloom.contains('hello')
          
          bloom2 = BloomFilter.from_bytes(bloom.get_bytes(), bloom.hashes())
          assert bloom2.contains('hello')
          
          bloom3 = BloomFilter.from_int_array(bloom.get_int_array(), bloom.hashes())
          assert bloom3.contains('hello')

          更多例子參考 py_tests.

          Rust

          use fastbloom_rs::{BloomFilter, FilterBuilder};
          
          let mut bloom = FilterBuilder::new(100_000_000, 0.01).build_bloom_filter();
          
          bloom.add(b"helloworld");
          assert_eq!(bloom.contains(b"helloworld"), true);
          assert_eq!(bloom.contains(b"helloworld!"), false);

          更多例子參考 docs.rs

          CountingBloomFilter

          計(jì)數(shù)布隆過濾器的工作方式與常規(guī)布隆過濾器類似;但是,它能夠跟蹤插入和刪除。在計(jì)數(shù)布隆過濾器中,布隆過濾器的每個(gè) 條目都是一個(gè)與基本布隆過濾器位相關(guān)聯(lián)的小計(jì)數(shù)器。

          參考: F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese, “An Improved Construction for Counting Bloom Filters,” in 14th Annual European Symposium on Algorithms, LNCS 4168, 2006

          Python

          from fastbloom_rs import CountingBloomFilter
          
          cbf = CountingBloomFilter(1000_000, 0.01)
          cbf.add('hello')
          cbf.add('hello')
          assert 'hello' in cbf
          cbf.remove('hello')
          assert 'hello' in cbf  # because 'hello' added twice. 
          # If add same element larger than 15 times, then remove 15 times the filter will not contain the element.
          cbf.remove('hello')
          assert 'hello' not in cbf

          本計(jì)數(shù)布隆過濾器使用4bit計(jì)數(shù)器存儲(chǔ)hash索引,所以當(dāng)重復(fù)插入同一個(gè)元素到過濾器中,計(jì)數(shù)器很快就會(huì)位溢出, 所以可以設(shè)置 enable_repeat_insertFalse 用于避免重復(fù)插入,如果元素已經(jīng)加入過濾器中,設(shè)置 enable_repeat_insertFalse 將使元素不會(huì)重復(fù)插入。 enable_repeat_insert 默認(rèn)為 True

          from fastbloom_rs import CountingBloomFilter
          
          cbf = CountingBloomFilter(1000_000, 0.01, False)
          cbf.add('hello')
          cbf.add('hello')  # because enable_repeat_insert=False, this addition will not take effect. 
          assert 'hello' in cbf
          cbf.remove('hello')
          assert 'hello' not in cbf 

          更多例子參考 py_tests.

          Rust

          use fastbloom_rs::{CountingBloomFilter, FilterBuilder};
          
          let mut builder = FilterBuilder::new(100_000, 0.01);
          let mut cbf = builder.build_counting_bloom_filter();
          cbf.add(b"helloworld");
          assert_eq!(bloom.contains(b"helloworld"), true);

          benchmark

          computer info

          CPU Memory OS
          AMD Ryzen 7 5800U with Radeon Graphics 16G Windows 10

          add one str to bloom filter

          測(cè)試添加一個(gè)字符串到布隆過濾器:

          bloom_add_test          time:   [41.168 ns 41.199 ns 41.233 ns]
                                  change: [-0.4891% -0.0259% +0.3417%] (p = 0.91 > 0.05)
                                  No change in performance detected.
          Found 13 outliers among 100 measurements (13.00%)
            1 (1.00%) high mild
            12 (12.00%) high severe
          

          add one million to bloom filter

          添加一百萬字符串((1..1_000_000).map(|n| { n.to_string() }))到布隆過濾器:

          bloom_add_all_test      time:   [236.24 ms 236.86 ms 237.55 ms]
                                  change: [-3.4346% -2.9050% -2.3524%] (p = 0.00 < 0.05)
                                  Performance has improved.
          Found 5 outliers among 100 measurements (5.00%)
            4 (4.00%) high mild
            1 (1.00%) high severe
          

          check one contains in bloom filter

          測(cè)試布隆過濾器包含的元素:

          bloom_contains_test     time:   [42.065 ns 42.102 ns 42.156 ns]
                                  change: [-0.7830% -0.5901% -0.4029%] (p = 0.00 < 0.05)
                                  Change within noise threshold.
          Found 15 outliers among 100 measurements (15.00%)
            1 (1.00%) low mild
            5 (5.00%) high mild
            9 (9.00%) high severe
          

          check one not contains in bloom filter

          測(cè)試布隆過濾器不包含的元素:

          bloom_not_contains_test time:   [22.695 ns 22.727 ns 22.773 ns]
                                  change: [-3.1948% -2.9695% -2.7268%] (p = 0.00 < 0.05)
                                  Performance has improved.
          Found 12 outliers among 100 measurements (12.00%)
            4 (4.00%) high mild
            8 (8.00%) high severe
          

          add one str to counting bloom filter

          測(cè)試添加一個(gè)字符串到計(jì)數(shù)布隆過濾器:

          counting_bloom_add_test time:   [60.822 ns 60.861 ns 60.912 ns]
                                  change: [+0.2427% +0.3772% +0.5579%] (p = 0.00 < 0.05)
                                  Change within noise threshold.
          Found 10 outliers among 100 measurements (10.00%)
            1 (1.00%) low severe
            4 (4.00%) low mild
            1 (1.00%) high mild
            4 (4.00%) high severe
          

          add one million to counting bloom filter

          添加一百萬字符串((1..1_000_000).map(|n| { n.to_string() }))到計(jì)數(shù)布隆過濾器:

          counting_bloom_add_million_test
                                  time:   [272.48 ms 272.58 ms 272.68 ms]
          Found 2 outliers among 100 measurements (2.00%)
            1 (1.00%) low mild
            1 (1.00%) high mild
          
          瀏覽 21
          點(diǎn)贊
          評(píng)論
          收藏
          分享

          手機(jī)掃一掃分享

          編輯 分享
          舉報(bào)
          評(píng)論
          圖片
          表情
          推薦
          點(diǎn)贊
          評(píng)論
          收藏
          分享

          手機(jī)掃一掃分享

          編輯 分享
          舉報(bào)
          <kbd id="afajh"><form id="afajh"></form></kbd>
          <strong id="afajh"><dl id="afajh"></dl></strong>
            <del id="afajh"><form id="afajh"></form></del>
                1. <th id="afajh"><progress id="afajh"></progress></th>
                  <b id="afajh"><abbr id="afajh"></abbr></b>
                  <th id="afajh"><progress id="afajh"></progress></th>
                  国产做a爱一级毛片久久 | 俺也去官网,国产97碰公开 | 亚洲国产精品波多野结衣 | 九九AV | 色婷婷网络|