Note on rgrep/powershell/find topic from last night's meeting.

Wed Jan 12 21:02:37 MST 2022

Brian Sturgill wrote:
> Here's some benchmarks.

Yay!  Data!

> They are run on my main Linux and Windows desktop machines.
> Windows box slightly more powerful than the Linux box.

Might as well give it a head start.  It will need it. :-)

> Commands benchmarked:
> "rgrep": grep -r notLklyToBeFnd .
> "find" find . -type f -exec grep notLklyToBeFnd {} \;
> "pwsh": Get-ChildItem -Path "." -Recurse | Select-String -Pattern "notLklyToBeFnd" -CaseSensitive

Are you still set up for the benchmark?  Can you run it again with +
instead of \; so that we can get better find times?

> All times are in seconds.
> 
> Linux machine, using bash and pwsh (cross-platform version of Microsoft
> PowerShell)
> "rgrep"   3.1
> "find"     5.5
> "pwsh"   90

That's a pretty remarkable difference!

> Windows machine using cross-platform PowerShell and bash in WSL 2.0):
> "rgrep"  4.6
> "find"     50
> "pwsh"   84

Here on Windows the -exec \; is punshing the slow spawn() routine of
the Windows kernel.  It's really slow as compared to the
Unix/BSD/Linux fork() & exec().

> I also tried it the older Windows/only variant of Powershell where "pwsh"
> took about 120 seconds.

File I/O and process spawning are both much slower in Windows.

> The problem with with "pwsh" seems to be with the "Select-String" command.
> On a large PDF file it took 600 milliseconds. Grep took only 27
> milliseconds.

String searching has been a long running research area.

> Note that the "rgrep" times were similar between my native linux box and
> WSL 2.0.
> But that the "find" one was 10x greater.

Using \; requires spawning a new grep process for every file.  Using +
shares one grep process across as many of the files as possible,
limited by the ARG_MAX of the system.  Perhaps sufficient to use one
grep for all of the files.

Bob