Linux grab恢复系统,grab: grab 是一个简单但是非常快速的 grep 工具

这篇博客介绍了一个实验性的、并行化的grep版本,称为grab,旨在加速对大型目录树的搜索。grab利用mmap和PCRE库,提供多核支持,特别是在SSD上能显著提升速度。在某些情况下,它可以比标准grep快8倍。文章还提到了如何构建和优化grab,包括使用Intel的hyperscan库来进一步提升性能。
摘要由CSDN通过智能技术生成

grab - simple, but very fast grep

greppin.jpg

This is my own, experimental, parallel version of grep so I can test

various strategies to speed up access to large directory trees.

On Flash storage or SSDs, you can easily outsmart common greps by up

a factor of 8.

Options:

-O -- print file offset of match

-l -- do not print the matching line (Useful if you want

to see _all_ offsets; if you also print the line, only

the first match in the line counts)

-s -- single match; dont search file further after first match

(similar to grep on a binary)

-H -- use hyerscan lib for scanning (see build instructions)

-S -- only for hyperscan: interpret pattern as string literal instead of regex

-L -- machine has low mem; half chunk-size (default 1GB)

may be used multiple times

-I -- enable highlighting of matches (useful)

-n -- Use n cores in parallel (recommended for flash/SSD)

n <= 1 uses single-core

-r -- recurse on directory

-R -- same as -r

grab uses the pcre library, so basically its equivalent to a grep -P -a.

The -P is important, since Perl-Compatible Regular Expressions have different

characteristics than basic regexes.

Build

There are two branches. master and greppin. Master is the 'traditional'

grab that should compile and run on most POSIX systems. greppin comes with

its own optimized and parallelized version of nftw() and readdir(), which

again doubles speed on the top of speedup that the master branch already

provides. The greppin branch runs on Linux, BSD and OSX. greppin also comes

with support for Intel's hyperscan libraries that try

to exploit CPU's SIMD instructions if possible (AVX2, AVX512 etc.) when compiling

the regex pattern into JIT code.

You will most likely want to build the greppin branch:

$ git checkout greppin

[...]

$ cd src; make

[...]

On BSD systems you need gmake instead of make.

If you want to do cutting edge tech with greppin's multiple regex engine and hyperscan

support, you first need to get and build that:

$ git clone https://github.com/intel/hyperscan

[...]

$ cd hyperscan

$ mkdir build; cd build

$ cmake -DFAT_RUNTIME=1 -DBUILD_STATIC_AND_SHARED=1 ..

[...]

$ make

[...]

This will build so called fat runtime of the hyperscan libs which contain support

for all CPU families in order to select the right compilation pattern at runtime

for most performance. Once the build finishes, you build greppin against that:

(inside grab cloned repo)

$ cd src

$ HYPERSCAN_BUILD=/path/to/hyperscan/build make -f Makefile.hs

[...]

This will produce a greppin binary that enables the -H option to load

a different engine at runtime, trying to exploit all possible performance bits.

You could link it against already installed libs, but the API just recently

added some functions in the 5.x version and most distros ship with 4.x.

Why is it faster?

grab is using mmap(2) and matches the whole file blob

without counting newlines (which grep is doing even if there is no match

[as of a grep code review of mine in 2012; things may be different today])

which is a lot faster than read(2)-ing the file in small chunks and counting the

newlines. If available, grab also uses the PCRE JIT feature.

However, speedups are only measurable on large file trees or fast HDDs or SSDs.

In the later case, the speedup can be really drastically (up to 3 times faster)

if matching recursively and in parallel. Since storage is the bottleneck,

parallelizing the search on HDDs makes no sense, as the seeking takes more time

than just doing stuff in linear.

Additionally, grab is skipping files which are too small to contain the

regular expression. For larger regex's in a recursive search, this can

skip quite good amount of files without even opening them.

A quite new pcre lib is required, on some older systems the build can fail

due to a missing PCRE_INFO_MINLENGTH and pcre_study().

Files are mmaped and matched in chunks of 1Gig. For files which are larger,

the last 4096 byte (1 page) of a chunk are overlapped, so that matches on a 1 Gig

boundary can be found. In this case, you see the match doubled (but with the

same offset).

If you measure grep vs. grab, keep in mind to drop the dentry and page

caches between each run: echo 3 > /proc/sys/vm/drop_caches

Note, that grep will print only a 'Binary file matches', if it detects binary

files, while grab will print all matches, unless -s is given. So, for a

speed test you have to search for an expression that does not exist in the data,

in order to enforce searching of the entire files.

grab was made to quickly grep through large directory trees without indexing.

The original grep has by far a more complete option-set. The speedup

for a single file match is very small, if at all measureable.

For SSDs, the multicore option makes sense. For HDDs it does not, since

the head has to be positioned back and forth between the threads, potentially

destroying the locality principle and killing performance.

The greppin branch features its own parallel version of nftw(), so the time

of idling of N - 1 cores when the 1st core builds the directory tree can also

be used for working. Additional to that, since locking is required in the

threads anyway, it also comes with its own faster and lockfree readdir() implementation

to save quite some futex() calls.

Whats left to note: grab will traverse directories physically, i.e. it will not follow

symlinks.

Examples

This shows the speedup on a 4-core machine with a search on a SSD:

root@linux:~# echo 3 > /proc/sys/vm/drop_caches

root@linux:~# time grep -r foobardoesnotexist /source/linux

real0m34.811s

user0m3.710s

sys0m10.936s

root@linux:~# echo 3 > /proc/sys/vm/drop_caches

root@linux:~# time grab -r foobardoesnotexist /source/linux

real0m31.629s

user0m4.984s

sys0m8.690s

root@linux:~# echo 3 > /proc/sys/vm/drop_caches

root@linux:~# time grab -n 2 -r foobardoesnotexist /source/linux

real0m15.203s

user0m3.689s

sys0m4.665s

root@linux:~# echo 3 > /proc/sys/vm/drop_caches

root@linux:~# time grab -n 4 -r foobardoesnotexist /source/linux

real0m13.135s

user0m4.023s

sys0m5.581s

With greppin branch:

root@linux:~# echo 3 > /proc/sys/vm/drop_caches

root@linux:~# time grep -a -P -r linus /source/linux/|wc -l

16918

real 1m12.470s

user 0m49.548s

sys 0m6.162s

root@linux:~# echo 3 > /proc/sys/vm/drop_caches

root@linux:~# time greppin -n 4 -r linus /source/linux/|wc -l

16918

real 0m8.773s

user 0m4.670s

sys 0m5.837s

root@linux:~#

Yes! ~ 9s vs. ~ 72s! Thats 8x as fast on a 4-core SSD machine as the traditional grep.

Just to proof that it resulted in the same output:

root@linux:~# echo 3 > /proc/sys/vm/drop_caches

root@linux:~# greppin -n 4 -r linus /source/linux/|sort|md5sum

a1f9fe635bd22575a4cce851e79d26a0 -

root@linux:~# echo 3 > /proc/sys/vm/drop_caches

root@linux:~# grep -P -a -r linus /source/linux/|sort|md5sum

a1f9fe635bd22575a4cce851e79d26a0 -

root@linux:~#

In the single core comparison, speedup also depends on which CPU the kernel

actually scheduls the grep, so a grab may or may not be faster (mostly it is).

If the load is equal among the single-core tests, grab will see a speedup if

searching on large file trees. On multi-core setups, grab can benefit ofcorse.

ripgrep comparison

I recently learned about this project via twitter when I was asked for performance

comparisons. The project can be found here.

I tested their official build ripgrep-12.1.1-x86_64-unknown-linux-musl.tar.gz

and run it on the same 4-core SSD machine as in the above tests. The net result is:

both runs have about the same speed with a tendency of greppin being a few percent faster

when ripgrep is invoked with -P. ripgrep with -e is a few percent faster

than greppin with -H.

ITW, it actually also depends on the directory tree layout since my nftw()

implementation takes care to distribute load across cores even on unbalanced directory trees

at the price of locking. ripgrep claims to use a lockfree parallel directory iterator

written in Rust. I see small speedup of ripgrep when scanning /usr but I see

the same amount of speedup (just few %) in greppin when scanning my Linux source

trees.

The main speedup thats inside their benchmark tables stems from the fact that ripgrep

ignores a lot of files when invoked without special options as well as treating

binary files as a single-match target (similar to grep). In order to have

comparable results, keep in mind to (4 is the number of cores):

echo 3 > /proc/sys/vm/drop_caches between each run

Add -j 4 -a --no-unicode --no-pcre2-unicode -uuu --mmap to ripgrep, since

it will by default match Unicode which is 3 times slower, and tries to compensate

the speedloss by skipping 'ignore'-based files. -e is faster than -P,

so better choose -e, but thats not as powerful as a PCRE

pipe the output to wc -l to check whether the amount of match reports is equal

and no files were missing in the scan

add -H -n 4 to greppin if you want best performance. -H is PCRE compatible

with only very few exceptions (according to hyperscan docu)

setfattr -n user.pax.flags -v "m" /path/to/binary if you run on grsec systems

and require rwx JIT mappings

Then just go ahead and check the timings! :)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值