[NCLUG] Judy is for real

Thu Jul 25 13:30:08 MDT 2002

Well, emailing to nclug at nclug.org seems to work for posting to
http://nclug.org/pipermail/nclug/2002-July/thread.html, although I can't
tell if my mail is auto-posted or a real human being is hand-editing (if
so, thanks).

Sean Reifschneider wrote:

> Matt, stop forging your posting address.  ;-P

Harummph, I beg your pardon, search Google for my name and you'll see
plenty of evidence of my separate existence from Matt...  :-)

> ...Judy is too heavy weight because the cost of the lookup is much
> smaller than the cost of a function call.  Apparently this means that
> the existing code inlines those lookups fairly well.

Could be.  In our experience there's no doubt that small (such as
cache-fittable) problems and/or tuned algorithms can outperform just
about anything else.  However, if you care about scalability,
maintainability, or even just ease of initial programming, Judy is worth
a look.

Note that if you use the macro rather than function forms, such as JLG()
rather than JudyLGet(), the Judy retrieval macros are inlined for small
populations (<32 indexes) for speed.

It's true that Judy generally is not up to speed with inlined function
code in header files.  We came out of an HPUX compiler environment which
lacked this (and still does until about now), and also we wanted Judy to
be as portable as possible.  In a few years inlining will be pervasive I
suppose and this could/should be improved.

> I mentioned Judy to them because Python makes heavy use of
> dictionaries...

Please bear in mind that while the Judy package includes JudySL calls
for string indexes, these functions are built on top of single-word
indexes (JudyL calls), and that's the core of what Judy's about.  So,
think "Judy" for more than just dictionaries.

Furthermore, you can build N-dimensional (multi-key and/or long-key)
arrays using JudyL, JudySL, and even Judy1 arrays as your building
blocks and get very efficient results.  I have, for example, a map3.c
program that maps a 3-word index to a 1-word value using three levels of
JudyL arrays, which I can mail on request.

This type of use I call a "meta-trie" because each branch node of the
higher-level (multi-word-key) trie (digital tree) is itself a trie
(digital tree) than efficiently grows from 0,1,2...  to billions of
possible indexes (fanout, or degree).  It's a whole new paradigm.  A few
academic papers I found scratched the surface in this direction but
didn't "get it".  It's not obvious that, say, breaking a string into
words (array of array... of JudyL arrays) would be so efficient, but it
is.

> Tim Peters, generally regarded as A Very Smart Guy (tm), responded to
> my mention of Judy here...

Yup, his comments make sense.  We've seen this many times before, a
legacy application is already tightly coupled to a customized data
structure and algorithm which is "good enough" and it's not worth
changing.  Shrug.  Use Judy where it makes sense.

However, we believe that in the future it will "make sense" to use Judy
just about anywhere you would use a resizable array or when you'd start
scratching around for a hash table or some other "well known" data
structure for sparse keys.

Oh, and yes, Judy III was in fact much simpler than the present release,
internally called Judy IV.  The complexity Tim alluded to grew,
approximately March 2000 - March 2001 (a whole year), directly out of
the need to be adaptable and flexible as we explored different types and
sizes of datasets.  We managed to keep the API remarkably simple, I
think.  Also, while the details are hairy, the concept behind Judy is
not so complicated.

Marcio Luis Teixeira wrote:

> Remember that Judy gets all it gain from avoiding cache loads.

Yes and no.  Initially that was the great revelation ("it's all about
the cache fills, stupid"), and it's an ironic one.  CPU caches are added
to the hardware to help programs run faster, but need smarter compilers
or even programmers to use them well (to minimize cache misses),
otherwise you get only a portion of the potential savings.  People are
starting to write papers about cache efficiency in garbage collection,
data structures, etc.  Judy tries to be simultaneously extremely cache
efficient and portable (even to boxes with different cache and cache
line sizes).  The good news is that you can treat Judy as a black box
that hides this from you and largely even from the compiler.

However, my experience is that as we developed it Judy grew beyond mere
cache efficiency.  It also offers some value derived from compression
(although of unusual and perhaps simple kinds), the API, scalability
(underlying digital tree), etc.

> ...looking at the graphs on HP's site, it looks like Judy really
> begins to outperform hash tables when the element count exceeds 10e6
> elements.

The exact population very much depends on a lot of variables.  However,
in general we learned that making hash tables bigger to avoid long
synonym chains is counter-productive in at least three ways:  Poor use
of CPU cache, scattering of data (unsorted), and often the hash
algorithm itself is surprisingly CPU expensive.

We learned that it's trivial to use Judy for efficient hash synonym
chain management.  For example you can do a 256-element hash table and
"hash" on the first (hence sorted) or last (hence unsorted) byte of your
index, where each hash slot points to a JudyL array.  This is useful for
avoiding thread lock contention.  (Our thread.c program illustrates
this, not yet in the open source package but I could mail it out.)

With JudyL the synonym chains need not be balanced and some can grow
monstrous.  If you use JLG() for access they are inlined up to the first
32 indexes and the code that results is virtually identical to simple
256-way hashing using a "table of contents" for the synonym chain (for
maximum cache line packing and fewest fills).  Even longer chains,
however, scale O(log-base-256), which is very efficient compared with
any other synonym chain manager of which I'm aware.

There's a whole paper (app note) written about this, under
www.hp.com/go/judy, information library, application notes (while the
website lasts).  Or I can email on request a more detailed text on the
subject which is not otherwise available.

> ...performance or not, I think there is a benefit to standarding on a
> hash table implementation across all C programs...

Frankly, although it sounds arrogant, as we discovered and implemented
Judy we hoped to eventually have it replace nearly all other similar
data structures, and we still see no reason it could not!  But it's NOT
a hash table.  :-)

Of course, first we need wide acceptance of Judy as a reliable,
understood, freely available black box that you can safely depend upon.
With the LGPL release we hope to have started this ball rolling.

Alan Silverstein