[NCLUG] RE: [Python-Dev] Judy for replacing internal dictionaries?

Tim Peters tim.one at comcast.net
Wed Aug 7 17:58:40 MDT 2002


[Douglas Baskins]
> I would encourage you to try dlmalloc (VERSION 2.7.0 by Doug Lea) first.
> BTW, with a poor malloc, it may take 10X longer to free a Judy array
> then it took to build it.

Python can't *replace* the system malloc.  It runs on dozens of diverse
platforms (counting all Unix variations as one), and has to play nice when
embedded in other programs using it as an extension language, etc:  we can't
dictate how memory is to be gotten.  What we can do is put a wrapper around
the system facilities for Python's own use of dynamic memory, and that's
what we did.  Doing so has given us significant speedups for allocation on
every platform we have timings for, and real-life factor-of-1000s speedups
when freeing millions of Python objects on several platforms.  So this is no
longer an issue for us.

> ...
> However, the free() is another story.  I found all mallocs (that
> coalesce memory) have very serious problems in this area.  I found the
> dlmalloc was the best of the bunch, but had a flaw.  I investigated why.
> The problem occurs when a huge flurry of small blocks (3..65 words) are
> freed to produce large (coalesced) internal blocks of several Kbytes.
> Dlmalloc may put thousands of these "several Kbytes" blocks of slightly
> different sizes on the SAME free list.  I have seen profiles of a 3
> instruction loop dominate the whole profile (100X).  I believe the list
> in in size order and the 3 instruction loop is just looking for the
> right place to insert the newly coalesced block.

Right, and that's the same thing we saw, in at least one malloc someone
traced (don't recall whether it was this specific malloc or not, but it
doesn't really matter to us anymore).

> Solutions may include making the list a skiplist, or make many more
> lists in the Kbyte sizes.  Haven't tried anything yet, but I would
> think it is very solvable problem.

In Python's case we very predictably have a sharply multi-modal distribution
of small block sizes, and dedicated per-size pools slash both memory
overheads and cycles.  They also keep this stuff out of the system malloc's
hair, and all system mallocs appear to be measurably grateful for the relief
<wink>.




More information about the NCLUG mailing list