优点:
- bloom filter 不存储字符串本身,需要的内存极少,不因为字符串变长,而需要大量的内 存。
- bloom filter 速度很快,时间复杂度 x 倍的 O(1),x 是 hash 函数查找的次数,x 一般是 1~20 以内。
- bloom filter 大小一旦确定,内存就不会随业务的增长而增加
缺点:
- 不能删除
- 不能扩容
- 有误判率 (很低)
需求分析
- Improve logger search
Bloom Filter优点
Determines membership in a set: Definitely not in the set, or Probably in the set
Can add and lookup in constant time and constant space
Constant time results for queries with 0 matches
Dramatically speed up needle in haystack search (Searches for rare values)
Worst case, we can’t rule out any data chunks
实际效果
本来的search: 2 millions/second.
now: billions/second
data file:1GB. data chunk: MB. events in one data chunk: millions or 100k
chunks in one filterchunk: by default 4054
prototype例子
Using Bleep + ESM to send CEF events to L7500 appliance (32 CPU, 64 GB RAM)
55 columns are bloom filtered, including 37 for full text indexing
All bloom filters are “scalable” to maintain our targeted false positive rate.
Two “tiers”
Master Bloom filter covers all events
Initial capacity: 1M elements
False positive rate: 1 in 1,000
Total size: 97MB
Chunk bloom filters cover 10,000 data chunks
Initial capacity: 85K elements
False positive rate: 1 in 100
Total size: 5.6MB per 10,000 data chunks
对于这样的数据量可以完全 in memory
Core data structures
- BasicBloomFilter:heavily modified Orestes bloom filter
(1) given n=expected data size, p=false positive rate.
算出m=BitMap的大小,再算出k=hash函数的个数
(2) given a value,用Linear combinations of hashes 算出hash value,就是BitSet中为1的indices,存在PrecomputedHash中的int数组里
论文: http://www.eecs.harvard.edu/~kirsch/pubs/bbbf/rsa.pdf
具体实现用murmurhash, 得到hash1 和hash2.
然后hash=(hash1+i*hash2)%m, i from 0 to k-1 - BasicPrecomputedHash: 存放key和其他相关信息like hashmethod, m, k. 在search的时候使用,so that terms are hashed once but checked against many bloomfilters
- scalable bloom filter解决数据量不断增加的问题
Start with one bloom filter,When it reaches your specified number of elements:
Don’t add any more elements to the existing bloom filter
Add another bloom filter “under the hood” that can handle twice as many elements
When querying, look for a match against any underlying filter
底层实现就是linkedlist of basicbloomfilters.
当然使用的hash也ScalablePrecomputedHash ,basically就linkedlist of BasicPrecomputedHash.
Tradeoffs
No need to choose an expected number of elements, only “initial capacity”
Doing a union makes scalable bloom filters pretty useless? - FilterChunk:
每个FilterChunk的逻辑结构包含 by default 4054个ScalableBloomFilter. one ScalableBloomFilter per column, per event range, per storage group. one covers 4054 data chunks - BloomFilterManager
top level class handle event ingestion and search. MasterBloomFilterManager 和ColumnChunkBloomFilterManager都继承这个类 - FilterChunkStore: handles storage (write to disk, flush()…), and retrieval of FilterChunks.
对于MasterBloomFilter 和 Data range BloomFilter 有不同的子类. - FilterChunkFactory: Makes new FilterChunk objects for the BloomFilterManager
-
Master Bloom filter 和ColumnChunkBloomFilter实现上的区别
MasterBloomFilterManager只有一个很大的上限,然后每update一定次数之后flush()到硬盘
ColumnChunkBloomFilter有CHUNKS_PER_FILTERCHUNK, by default 4054
basic bloom filter实现
mysqlQuey ->Bloom Filter Condition -> PrecomputedHash 中的成员变量,一个int数组,包含了hash后对应BitSet中的indices,然后传到 BasicBloomFilter.contains(PrecomputedHash h),调用contains(int[] hash), 最后由BasicBloomFilter中的BitSet成员变量,bloom调用get方法检查是否hit
3.BloomFilter creation flow
Event ingestion flow
overview
1. events arrive at ESM or Logger
2. Batches of events are constructed
3. Events are indexed and ready to persist
4. event indexes are “unpacked” and field values are added to
a. global summary ( for auto-complete)
b. bloom filter tracked columns
call hierarchy
1.ColumnChunkBloomFilterManager.addChunkDataFromSummaryProcessor
2.BloomFilterManager.addChunkDataFromSummaryProcessor:
FilterChunkStore.get()由 eventrange, chunkid 和storagegroupid找到FilterChunk
如果没有找到能cover当前chunkid的FilterChunk, call FilterChunkFactory.create()
3.FilterChunk.addChunkDataFromSummaryProcessor:通过column找到对应的ScalableBloomFilter
4.ScalableBloomFilter.add(), 调用linkedlist的最后一个bloomfilter的add()方法,如果这最后一个满了,create新的放在linkedlist末尾,再调用add
5. event data is persisted to Logger storage
1.Master call hierarchy
BloomFilterManager.updateFilterChunk
FilterChunkStore.update
SingleFilterChunkDiskStore.update判断chunk update counter, if 达到次数 调用flush():
flush(fileid)的call hierarchy
startWrite(fileid)
doWrite(fileId)
finishWrite(fileId)
Event Ingestion “hook”
1.In Logger product: ROSChunkPostProcessor
In ESM product: Logger code – SummaryProcessor
2.Chunk metadata and an array of columns/values are passed to BloomFilterManagers to add to bloom filter data structures.
3.Ultimately this data goes to FilterChunk.addChunkData
Both BloomFIlterManagers have a FilterChunk under-construction kept in memory.
4.Bloom filter data is persisted at intervals determined by the chunks ingested, NOT directly by time.
This is why we say a single data range bloom filter chunk covers about 1-2 hours.
5.When chunk data is added, range metadata in FilterChunk is updated
EndTime span
MRT span
Event ID span
Chunk ID span
When Chunk ID span > limit, time to persist this FilterChunk
6.persisting…
- master bloom filter
Managed by SingleFilterChunkDiskStore
Separate data files in /opt/arcsight/logger/data/indexes
Files on disk:
Arcsight_Master_Log: Transaction log for file write operations: two-byte records: status+fieid
Arcsight_Master_Data_n: Data file
Need to guard against incomplete write of bloom filter data
流程如下
1.Check which data file is next to write (e.g. file 1 is next)
2.Add record to transaction log: (TRANSACTION_START, File 1)
3.Serialize, compress, and write master bloom filter data to file.
4.Add record to transaction log: (TRANSACTION_SUCCESS, File 1)
5.Update counter so we know file 0 is next data file to be written.
write操作具体实现使用的是java.nio,先buffer.put()把数据写入buffer,然后buffer.flip()把buffer由写模式转换成读模式,最后filechannel.read(buffer)把buffer内数据读入filechannel
下面是一个很好的比方。
原文中说了最重要的3个概念 http://www.iteye.com/magazines/132-Java-NIO
Channel 通道
Buffer 缓冲区
Selector 选择器其中Channel对应以前的流,Buffer不是什么新东西,Selector是因为nio可以使用异步的非堵塞模式才加入的东西。
以前的流总是堵塞的,一个线程只要对它进行操作,其它操作就会被堵塞,也就相当于水管没有阀门,你伸手接水的时候,不管水到了没有,你就都只能耗在接水(流)上。
nio的Channel的加入,相当于增加了水龙头(有阀门),虽然一个时刻也只能接一个水管的水,但依赖轮换策略,在水量不大的时候,各个水管里流出来的水,都可以得到妥善接纳,这个关键之处就是增加了一个接水工,也就是Selector,他负责协调,也就是看哪根水管有水了的话,在当前水管的水接到一定程度的时候,就切换一下:临时关上当前水龙头,试着打开另一个水龙头(看看有没有水)。
当其他人需要用水的时候,不是直接去接水,而是事前提了一个水桶给接水工,这个水桶就是Buffer。也就是,其他人虽然也可能要等,但不会在现场等,而是回家等,可以做其它事去,水接满了,接水工会通知他们。
这其实也是非常接近当前社会分工细化的现实,也是统分利用现有资源达到并发效果的一种很经济的手段,而不是动不动就来个并行处理,虽然那样是最简单的,但也是最浪费资源的方式。
- Data range bloom filter
1.FilterChunkStoreImpl serializes and compresses the FilterChunk
2.Metadata and compressed bytes are stuffed into BloomFilterChunk
3.BloomFilterChunk is handed to StoreManager for persistence
4.BloomFilterChunk is appended to Logger data file by StoreFile
5.Postgresql is updated with metadata by StoreManager calling BloomFilterChunk.persist().
6.A new under-construction FilterChunk is created and kept in memory for the next incoming events.
- ESM event archive support. FilterChunkStoreImpl registered as EventArchiveObserver. 然后对每个storagegroup保存一个
HashMap<Long,FilterChunk>
,Long类型的是date. archive的时候把对应fiterchunk persist,然后remove from hashmap. 注意同一天可能有很多个filter chunks, hashmap里只保存lastest,之前的都在新的加入时就persist了.
- ESM event archive support. FilterChunkStoreImpl registered as EventArchiveObserver. 然后对每个storagegroup保存一个
-Crash recovery design choices
data is under construction in memory, due to crash, or shutdown, this data never gets persisted.
BloomFilterRebuilder 利用MasterBloomFilter的最后一个chunkid 和postgresql中最后的一个chunkid可以得到这个gap,然后把两个chunkid之间的data重新load进来再persist
4.Search flow
get the chunk metadata for chunks between time x and time y, but dont get chunk IDs between a and b, between c and d, etc.
overview
1.Some component issues a SQL query against event table
Logger search, active channel, report
2.MySQL storage engine submits query to Logger servers
Includes time range and query conditions (WHERE clause)
CDistributedSearchGetMetadataInitCommand
SQL query -> JDBC call -> Storage engine send request (RPC) 给logger server
3.Master bloom filter is checked
MasterBloomFilterManager.shouldContinueSearch()
4.Rejected chunk list is generated from data range bloom filters
BloomFilterManager.getChunkRangesRejectedByBloomFilter()
5.Relevant data chunks are retrieved and returned to storage engine
data range bloom filter search
1.Conditions are parsed into tree (BloomFilterQuery). AND, OR, IN, TRUE
2.Check if underlying data must always be searched
3.Get FilterChunks in query time range. FilterChunkStoreImpl依靠chunkid调用get
4.Check each FilterChunk against BloomFilterQuery
If FilterChunk can be ruled out, add event count to events scanned.
5.Update events scanned in HypotheticalQuerySizeManager
6.Merge adjacent rejected chunk ranges to simplify query
7.Rejected chunk ranges are passed by ReadStoreMgr to ColumnChunkIterator.
8.ColumnChunkIterator will construct long-winded PostgreSQL query
Avoids rejected chunk ranges
未完成
- Multi-thread
One thread for updating. Multiple threads for reading.
都是用ThreadPoolExecutor. 但是没有synchronized,所以每个thread操作不是atomic的
blocking queue
mysql plug-in storage engine