README.md

June 9, 2023 · View on GitHub

Highlights

  • Search recursively for a regex pattern using Intel Hyperscan.
  • When a git repository is detected, the repository index is searched using libgit2.
  • Similar to grep, ripgrep, ugrep, The Silver Searcher etc.
  • C++17, Multi-threading, SIMD.
  • USAGE GUIDE
  • Implementation notes here.
  • Not cross-platform. Tested in Linux.

Performance

The following tests compare the performance of hypergrep against:

System Details

TypeValue
Processor11th Gen Intel(R) Core(TM) i9-11900KF @ 3.50GHz 3.50 GHz
Instruction Set ExtensionsIntel® SSE4.1, Intel® SSE4.2, Intel® AVX2, Intel® AVX-512
Installed RAM32.0 GB (31.9 GB usable)
SSDADATA SX8200PNP
OSUbuntu 20.04 LTS
C++ Compilerg++ (Ubuntu 11.1.0-1ubuntu1-20.04) 11.1.0

Vcpkg Installed Libraries

vcpkg commit: 662dbb5

LibraryVersion
argparse2.9
concurrentqueue1.0.3
fmt10.0.0
hyperscan5.4.2
libgit21.6.4

Single Large File Search: OpenSubtitles.raw.en.txt

The following searches are performed on a single large file cached in memory (~13GB, OpenSubtitles.raw.en.gz).

RegexLine Countagugrepripgrephypergrep
Count number of times Holmes did something
hgrep -c 'Holmes did \w'
27n/a1.8201.0220.696
Literal with Regex Suffix
hgrep -nw 'Sherlock [A-Z]\w+' en.txt
7882n/a1.8121.5090.803
Simple Literal
hgrep -nw 'Sherlock Holmes' en.txt
765315.7641.8881.5240.658
Simple Literal (case insensitive)
hgrep -inw 'Sherlock Holmes' en.txt
787115.5996.9452.1620.650
Alternation of Literals
hgrep -n 'Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty' en.txt
10078n/a6.8861.8360.689
Alternation of Literals (case insensitive)
hgrep -in 'Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty' en.txt
10333n/a7.0293.9400.770
Words surrounding a literal string
hgrep -n '\w+[\x20]+Holmes[\x20]+\w+' en.txt
5020n/a6m 11s1.5230.638

Git Repository Search: torvalds/linux

The following searches are performed on the entire Linux kernel source tree (after running make defconfig && make -j8). The commit used is f1fcb.

RegexLine Countagugrepripgrephypergrep
Simple Literal
hgrep -nw 'PM_RESUME'
92.8070.3160.1470.140
Simple Literal (case insensitive)
hgrep -niw 'PM_RESUME'
392.9040.4350.1490.141
Regex with Literal Suffix
hgrep -nw '[A-Z]+_SUSPEND'
5363.0801.4520.1480.143
Alternation of four literals
hgrep -nw '(ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT)'
163.0850.4100.1530.146
Unicode Greek
hgrep -n '\p{Greek}'
1113.7620.4840.3450.146

Git Repository Search: apple/swift

The following searches are performed on the entire Apple Swift source tree. The commit used is 3865b.

RegexLine Countagugrepripgrephypergrep
Function/Struct/Enum declaration followed by a valid identifier and opening parenthesis
hgrep -n '(func|struct|enum)\s+[A-Za-z_][A-Za-z0-9_]*\s*\('
590261.1480.9540.1540.090
Words starting with alphabetic characters followed by at least 2 digits
hgrep -nw '[A-Za-z]+\d{2,}'
1278581.1691.2380.1560.095
Workd starting with Uppercase letter, followed by alpha-numeric chars and/or underscores
hgrep -nw '[A-Z][a-zA-Z0-9_]*'
20123723.1312.5980.5500.482
Guard let statement followed by valid identifier
hgrep -n 'guard\s+let\s+[a-zA-Z_][a-zA-Z0-9_]*\s*=\s*\w+'
8390.8280.1740.0540.047

Directory Search: /usr

The following searches are performed on the /usr directory.

RegexLine Countagugrepripgrephypergrep
Any HTTPS or FTP URL
hgrep "(https?|ftp)://[^\s/$.?#].[^\s]*"
136824.5972.8940.3050.171
Any IPv4 IP address
hgrep -w "(?:\d{1,3}\.){3}\d{1,3}"
126434.7272.3400.3240.166
Any E-mail address
hgrep -w "[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}"
475095.47737.2090.4940.220
Any valid date MM/DD/YYYY
hgrep "(0[1-9]|1[0-2])/(0[1-9]|[12]\d|3[01])/(19|20)\d{2}"
1164.2391.8270.2510.163
Count the number of HEX values
hgrep -cw "(?:0x)?[0-9A-Fa-f]+"
680425.76528.6911.4390.611
Search any C/C++ for a literal
hgrep --filter "\.(c|cpp|h|hpp)$" test
7355n/a0.5050.1180.079

Build

Install Dependencies with vcpkg

git clone https://github.com/microsoft/vcpkg
cd vcpkg
./bootstrap-vcpkg.sh
./vcpkg install concurrentqueue fmt argparse libgit2 hyperscan

Build hypergrep using cmake and vcpkg

Clone the repository

git clone https://github.com/p-ranav/hypergrep
cd hypergrep

If cmake is older than 3.19

mkdir build
cd build
cmake -DCMAKE_TOOLCHAIN_FILE=<path_to_vcpkg>/scripts/buildsystems/vcpkg.cmake ..
make

If cmake is newer than 3.19

Use the release preset:

export VCPKG_ROOT=<path_to_vcpkg>
cmake -B build -S . --preset release
cmake --build build

Binary Portability

To build the binary for x86_64 portability, invoke cmake with -DBUILD_PORTABLE=on option. This will use -march=x86-64 -mtune=generic and -static-libgcc -static-libstdc++, and link the C++ standard library and GCC runtime statically into the binary, reducing dependencies on the target system.