[NCLUG] Re: [Python-Dev] Judy for replacing internal dictionaries?

Douglas Baskins dougbaskins at frii.com
Mon Aug 5 14:39:33 MDT 2002


> [Tim Peters]
>> [Douglas Baskins, on www.sourcejudy.com/benchmarks/JudySLcompare.tar.gz]
...del

> But the read and write times for hash were significantly faster than for
> JudySL in the two tables at the top of that file, where anything over 10% is
> "significant" in my business, anything over 20% is "highly significant", and
> 30% is in "no-brainer" territory:
> 
> store/line  lookup/line  ADT
>   2.847 uS     2.118 uS HASH
>   3.845 uS     2.407 uS JUDY
>    35.1%        13.6%   speedup if switching from Judy to hash
> 
> store/line  lookup/line  ADT
>   2.236 uS     1.863 uS HASH
>   3.439 uS     2.350 uS JUDY
>    53.8%        26.1%   speedup if switching from Judy to hasn

I'am so use to seeing performance ratios in X, rather than %, I am a little 
tainted.  I publish data from hash algorithms under ideal conditions to so 
the numbers will stand up to critical review.  I have been burned several 
times in my past past for using poor hash methods.  For grins, I changed 
the hash method in the above benchmark to just a mask and re-measured on 
the first data set above.  The result was a 300X increase of the store and 
read times.  I was to impatient to even try the second one.

Frankly, I have never carefully characterized a re-building hash tables 
method.  I figured that it would double the average insert(store) time and 
waste memory.

The above hash method used ~1 million entry static table with insignificant 
collisions.  The times did not include (re)allocating or clearing the hash
table.  So, I guess I still consider the 26% not too significant (for hash).

>> ...
>> I believe any of the methods in this benchmark (except hashing) would be
>> an excellent match for use in an interpreter such a Python.

> I don't follow this.  Hashing has served us extremely well, and as I said
> before, namespace hashtables are generally tiny in Python.  People have
> optimized the hashtable implementation over a decade, and it screams.  BTW,
> I note that the hash tables you used had chained collision lists; Python
> uses open addressing instead, partly for better locality and partly to save
> memory (hidden malloc overheads can kill you when allocating gazillions of
> tiny records, and

Hash methods need to be quite different for small and large data sets and 
small and large string lengths for maximum performance.  Open addressing 
should be limited to about 2 cachelines.  I have been burned by the
clustering problem.

BTW, my recent tests with malloc with "tiny records" indicate that is a
problem of the past.

> typedef struct hashrec
> {
>     char	*word;
>     struct hashrec *next;
> } HASHREC;

> likely burns at least 4 bytes per hashrec (50% overhead on a 32-bit box)
> just for hidden malloc bookkeeping; Python avoids both the per-record malloc
> overhead and the per-record 4-byte pointer to "the next" record).

I suspect the author expected string keys that were closer to a sentence in 
length than a word (which would drive that overhead to insignificant).

>> (Of course,I would prefer you use Judy).  I think memory efficiency and
>> scaleablity are the most important requirements of an interpreter.

> We worry about all seven of those <wink>, but the tradeoffs are complex.

>>
>> ...del
>> lookdict_string(dictobject *mp, PyObject *key, register long hash)
>> ...del
>>  	if (ep->me_key == NULL || ep->me_key == key) return ep;

> I suspect the performance 'shortcut' by comparing the string pointer
> is 'buried' under the expensive subroutine call to lookdict_string().
> Does removing this code effect performance?  (If it does, I would be
> curious by how much, why and what was the test case?)

> Oren Tirosh did some experiments with inlining this on key paths and posted
> results to the Python-Dev mailing list.  The function-call overhead is the
...del

Perhaps I am too ignorant on the inner workings of Python.  Let me think out
loud.  Short strings, small in-cache data sets, small hash tables, inlining..
Your problem is primarily an instruction foot race.  RAM access times are not 
the problem.  It sounds like you have done the right things for a small
data sets.  Judy1/L attempts to do the same things for small data sets, but
scales to very large ones.  However, I think JudySL could use some more work
in that area.

...del

> If you need anyone in court, I'll be happy to testify that many platform
> mallocs suck <wink>.

I will discuss malloc in a later reply.

Doug Baskins <doug at sourcejudy.com>



More information about the NCLUG mailing list