Note on rgrep/powershell/find topic from last night's meeting.
Brian Sturgill
brian.sturgill at ataman.com
Thu Jan 13 10:40:37 MST 2022
I ran the "find +" version on the same tree and results are almost exactly
that of the xargs version (but easier to type):
Here's a summary of what's gone on so far:
I ran these commands on the same set of 6+GB of ebooks.
Commands benchmarked:
"rgrep": grep -r notLklyToBeFnd .
"find" find . -type f -exec grep notLklyToBeFnd {} \;
"pwsh": Get-ChildItem -Path "." -Recurse | Select-String -Pattern
"notLklyToBeFnd" -CaseSensitive
"findxargs": find . -type f | xargs -d \\n grep notLklyToBeFnd^C
"findplus": find . -type f -exec grep notLklyToBeFnd {} +
All times are in seconds.
Linux machine, using bash and pwsh (cross-platform version of Microsoft
PowerShell)
"rgrep" 3.1
"find" 5.5
"pwsh" 90
"findxargs" 3.2
"findplus" 3.2
Windows machine using cross-platform PowerShell and bash in WSL 2.0):
"rgrep" 4.6
"find" 50
"pwsh" 84
"findxargs" 4.8
"findplus" 4.7
I also tried "pwsh" in the older Windows/only variant of Powershell where
"pwsh" took about 120 seconds.
On Wed, Jan 12, 2022 at 9:34 PM alan schmitz <alan.schmitz88 at gmail.com>
wrote:
> That is excellent, thank you for that!
>
> On Wed, Jan 12, 2022, 9:22 PM Bob Proulx <bob at proulx.com> wrote:
>
>> alan schmitz wrote:
>> > That is great info, thank you! One test I ran a long time ago wrt find
>> is
>> > the difference between:
>> > find . -type f -exec grep notLklyToBeFnd {} \;
>> > vs.
>> > find . -type f | xargs grep notLklyToBeFnd
>> >
>> > I'm not sure if it holds today, but when I ran it back then the xargs
>> > version was much faster than the -exec. Of course that was a very long
>> > time ago.
>>
>> In the above find since \; is used it means that it will repeatedly
>> execute grep once per file. Conceptually it is similar to this with
>> lots of grep invocations.
>>
>> for file in $(find . -type f -print); do
>> grep $file
>> done
>>
>> But the xargs will spool up as many files as possible for one grep.
>> The file I/O is the same but the process fork() & exec() is reduced to
>> the minimum for the xargs case. But there is that pipe where file
>> names are passed as character I/O from process to process.
>> Conceptually somewhat similar to this.
>>
>> grep PATTERN $(find . -type f -print | while IFS= read -r file; do
>> echo $file; done)
>>
>> That's why it is better to use + instead of \; as + invokes grep just
>> once with as many args as possible. It's doing exactly what xargs did
>> but doing it completely internal to find. Which saves not only the
>> process creation time but also the pipe I/O writing from find into the
>> pipe and reading from the pipe by xargs. Since it is all in find now
>> that character I/O between processes is no longer needed.
>>
>> find . -type f -exec grep PATTERN {} +
>>
>> Conceptually it is similar to this following, but avoiding all
>> problems of file whitespace and special characters.
>>
>> grep PATTERN $(find . -type f -print)
>>
>> Why isn't + as well known as \; is? Because + is *new* in the grand
>> scheme of things. It was introduced in 2005. Or as I say, just the
>> other day! So if you learned how to use find before 2005 you learned
>> the \; syntax. But it's been 17 years now and -exec ... {} + is a
>> POSIX standard that all finds must implement. Most people entering
>> the work force today never learned the old syntax.
>>
>> I will end with repeating the current best practice.
>>
>> find . -type f -exec grep PATTERN {} +
>>
>> Bob
>>
>
--
Brian Sturgill
President and CTO
Ataman Software, Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.nclug.org/pipermail/nclug/attachments/20220113/146ad0f2/attachment-0001.htm>
More information about the NCLUG
mailing list