[NCLUG] Re: [Python-Dev] Judy for replacing internal dictionaries?

Douglas Baskins dougbaskins at frii.com
Fri Aug 9 18:27:38 MDT 2002


> 
> > [Douglas Baskins]
> > I would encourage you to try dlmalloc (VERSION 2.7.0 by Doug Lea) first.
> > BTW, with a poor malloc, it may take 10X longer to free a Judy array
> > then it took to build it.
> 
> [Tim Peters]
> Python can't *replace* the system malloc.  It runs on dozens of diverse
> platforms (counting all Unix variations as one), and has to play nice when
> embedded in other programs using it as an extension language, etc:  we can't
> dictate how memory is to be gotten.  What we can do is put a wrapper around
> the system facilities for Python's own use of dynamic memory, and that's
> what we did.  Doing so has given us significant speedups for allocation on
> every platform we have timings for, and real-life factor-of-1000s speedups
> when freeing millions of Python objects on several platforms.  So this is no
> longer an issue for us.
> 
> > ...
> > However, the free() is another story.  I found all mallocs (that
> > coalesce memory) have very serious problems in this area.  I found the
> > dlmalloc was the best of the bunch, but had a flaw.  I investigated why.
> > The problem occurs when a huge flurry of small blocks (3..65 words) are
> > freed to produce large (coalesced) internal blocks of several Kbytes.
> > Dlmalloc may put thousands of these "several Kbytes" blocks of slightly
> > different sizes on the SAME free list.  I have seen profiles of a 3
> > instruction loop dominate the whole profile (100X).  I believe the list
> > in in size order and the 3 instruction loop is just looking for the
> > right place to insert the newly coalesced block.
> 
> Right, and that's the same thing we saw, in at least one malloc someone
> traced (don't recall whether it was this specific malloc or not, but it
> doesn't really matter to us anymore).
> 
> > Solutions may include making the list a skiplist, or make many more
> > lists in the Kbyte sizes.  Haven't tried anything yet, but I would
> > think it is very solvable problem.
> 
> In Python's case we very predictably have a sharply multi-modal distribution
> of small block sizes, and dedicated per-size pools slash both memory
> overheads and cycles.  They also keep this stuff out of the system malloc's
> hair, and all system mallocs appear to be measurably grateful for the relief
> <wink>.
> 

I love this open source community.  Doug Lea responded almost immediately to 
the above letter about the problem with dlmalloc.  It turns out that he knew
about the problem and had a prototype solution in hand.  I have been testing
it and am very encouraged with the results.  I am about to write a universal
time/test program for malloc -- one that every malloc I have tried would fail 
(except this new proto dlmalloc).

In the bigger picture I am struck by the possibilities of having very scalable
universal ADT's such as malloc and Judy.  I thought some more about the dictionary
solutions you have mentioned.  It is possible to "change" methods of searching
depending on the current needs of the problem.  For example, Judy1/L (JudySL 
has not been tuned) are very fast with just a handful of (word) keys and change
methods as the population grows (nearly unbeatable at any population).  (Each
method at every population was tuned.)  I litterly have thousands of test results
with different populations and "character" of data (keys).  Your experiences 
in Python could be very valuable to build from (hash dictionarys and ?? malloc).  
Since I am getting into the innards of malloc, I would like to here some of your
successes with Python.  Perhaps, we can come up with a world class malloc -- 
available to everyone.

Doug <doug at sourcejudy.com>



More information about the NCLUG mailing list