[NCLUG] Judy is for real

Thu Jul 25 00:56:04 MDT 2002

On Thursday 25 July 2002 03:06 am, Sean Reifschneider wrote:

> The people in the know about the Python language seem to think that for
> most of their internal dictionary (hash, associative array) lookups, Judy
> is too heavy weight because the cost of the lookup is much smaller than the
> cost of a function call.  
> Apparently this means that the existing code
> inlines those lookups fairly well.

Reading through Tim's message, it does not appear to me as if the
code is inlined. He points out that the cost of the function call is
more than the cost of the lookup. So, what he is saying is that
Python spends so much time in the function call that optimizing
the lookup with Judy wuld result in negligible savings.

Anyhow, I doubt Python has large enough look up tables to
really make a difference. Remember that Judy gets all it gain
from avoiding cache loads. The Python dictionary may be small
enough to fit in cache, in which case Judy wouldn't help.
In fact, looking at the graphs on HP's site, it looks like Judy really
begins to outperform hash tables when the element count exceeds
10e6 elements. I assume Python uses lookups for variables and
function names and on average a program wouldn't have more
than 1000 functions or variables, so really I wouldn't expect a
performance gain.

However....

Looking at Tim's message from another angle, since the
cost of function lookups is so much greater than the lookups,
replacing the current lookup function with a Judy lookup
should make no difference at all in performance. So, why not
do it anyway? That way we will see a performance gain when
someone finally makes a program with a penta-element
dictionary.

Besides, performance or not, I think there is a benefit to
standarding on a hash table implementation across all
C programs, just like C++ tries to do with the STL.

Just my .02 worth.

Marcio Luis Teixeira