Note on rgrep/powershell/find topic from last night's meeting.

Thu Jan 13 10:40:37 MST 2022

I ran the "find +" version on the same tree and results are almost exactly
that of the xargs version (but easier to type):

Here's a summary of what's gone on so far:
I ran these commands on the same set of 6+GB of ebooks.

Commands benchmarked:
"rgrep": grep -r notLklyToBeFnd .
"find" find . -type f -exec grep notLklyToBeFnd {} \;
"pwsh": Get-ChildItem -Path "." -Recurse | Select-String -Pattern
"notLklyToBeFnd" -CaseSensitive
"findxargs": find . -type f | xargs -d \\n grep notLklyToBeFnd^C
"findplus": find . -type f -exec grep notLklyToBeFnd {} +

All times are in seconds.

Linux machine, using bash and pwsh (cross-platform version of Microsoft
PowerShell)
"rgrep"       3.1
"find"        5.5
"pwsh"       90
"findxargs"   3.2
"findplus"    3.2

Windows machine using cross-platform PowerShell and bash in WSL 2.0):
"rgrep"       4.6
"find"       50
"pwsh"       84
"findxargs"   4.8
"findplus"    4.7

I also tried "pwsh" in the older Windows/only variant of Powershell where
"pwsh" took about 120 seconds.

On Wed, Jan 12, 2022 at 9:34 PM alan schmitz <alan.schmitz88 at gmail.com>
wrote:

> That is excellent, thank you for that!
>
> On Wed, Jan 12, 2022, 9:22 PM Bob Proulx <bob at proulx.com> wrote:
>
>> alan schmitz wrote:
>> > That is great info, thank you!  One test I ran a long time ago wrt find
>> is
>> > the difference between:
>> > find . -type f -exec grep notLklyToBeFnd {} \;
>> > vs.
>> > find . -type f | xargs grep notLklyToBeFnd
>> >
>> > I'm not sure if it holds today, but when I ran it back then the xargs
>> > version was much faster than the -exec.  Of course that was a very long
>> > time ago.
>>
>> In the above find since \; is used it means that it will repeatedly
>> execute grep once per file.  Conceptually it is similar to this with
>> lots of grep invocations.
>>
>>     for file in $(find . -type f -print); do
>>         grep $file
>>     done
>>
>> But the xargs will spool up as many files as possible for one grep.
>> The file I/O is the same but the process fork() & exec() is reduced to
>> the minimum for the xargs case.  But there is that pipe where file
>> names are passed as character I/O from process to process.
>> Conceptually somewhat similar to this.
>>
>>     grep PATTERN $(find . -type f -print | while IFS= read -r file; do
>> echo $file; done)
>>
>> That's why it is better to use + instead of \; as + invokes grep just
>> once with as many args as possible.  It's doing exactly what xargs did
>> but doing it completely internal to find.  Which saves not only the
>> process creation time but also the pipe I/O writing from find into the
>> pipe and reading from the pipe by xargs.  Since it is all in find now
>> that character I/O between processes is no longer needed.
>>
>>     find . -type f -exec grep PATTERN {} +
>>
>> Conceptually it is similar to this following, but avoiding all
>> problems of file whitespace and special characters.
>>
>>     grep PATTERN $(find . -type f -print)
>>
>> Why isn't + as well known as \; is?  Because + is *new* in the grand
>> scheme of things.  It was introduced in 2005.  Or as I say, just the
>> other day!  So if you learned how to use find before 2005 you learned
>> the \; syntax.  But it's been 17 years now and -exec ... {} + is a
>> POSIX standard that all finds must implement.  Most people entering
>> the work force today never learned the old syntax.
>>
>> I will end with repeating the current best practice.
>>
>>     find . -type f -exec grep PATTERN {} +
>>
>> Bob
>>
>

-- 
Brian Sturgill
President and CTO
Ataman Software, Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.nclug.org/pipermail/nclug/attachments/20220113/146ad0f2/attachment-0001.htm>