Note on rgrep/powershell/find topic from last night's meeting.
Bob Proulx
bob at proulx.com
Wed Jan 12 21:02:37 MST 2022
Brian Sturgill wrote:
> Here's some benchmarks.
Yay! Data!
> They are run on my main Linux and Windows desktop machines.
> Windows box slightly more powerful than the Linux box.
Might as well give it a head start. It will need it. :-)
> Commands benchmarked:
> "rgrep": grep -r notLklyToBeFnd .
> "find" find . -type f -exec grep notLklyToBeFnd {} \;
> "pwsh": Get-ChildItem -Path "." -Recurse | Select-String -Pattern "notLklyToBeFnd" -CaseSensitive
Are you still set up for the benchmark? Can you run it again with +
instead of \; so that we can get better find times?
> All times are in seconds.
>
> Linux machine, using bash and pwsh (cross-platform version of Microsoft
> PowerShell)
> "rgrep" 3.1
> "find" 5.5
> "pwsh" 90
That's a pretty remarkable difference!
> Windows machine using cross-platform PowerShell and bash in WSL 2.0):
> "rgrep" 4.6
> "find" 50
> "pwsh" 84
Here on Windows the -exec \; is punshing the slow spawn() routine of
the Windows kernel. It's really slow as compared to the
Unix/BSD/Linux fork() & exec().
> I also tried it the older Windows/only variant of Powershell where "pwsh"
> took about 120 seconds.
File I/O and process spawning are both much slower in Windows.
> The problem with with "pwsh" seems to be with the "Select-String" command.
> On a large PDF file it took 600 milliseconds. Grep took only 27
> milliseconds.
String searching has been a long running research area.
> Note that the "rgrep" times were similar between my native linux box and
> WSL 2.0.
> But that the "find" one was 10x greater.
Using \; requires spawning a new grep process for every file. Using +
shares one grep process across as many of the files as possible,
limited by the ARG_MAX of the system. Perhaps sufficient to use one
grep for all of the files.
Bob
More information about the NCLUG
mailing list