[NCLUG] normalizing data

Sean Reifschneider jafo-nclug at tummy.com
Sun Jan 6 10:40:26 MST 2002


On Sun, Jan 06, 2002 at 09:48:17AM -0700, mike cullerton wrote:
> time for teaching mike more theory. how do folks here deal with data? how
>extreme are you at normalizing data, and why?

As little as possible to get the job done.  ;-)

> so, some table structure...
>
> image - id, name, location, comment
> group - id, name
> image_group - image_id, group_id

That looks about right.  I have started a photo db which included:

   CREATE TABLE catagory (
      catagoryId text NOT NULL,
      login text,
      name text,
      description text
      );

   CREATE TABLE images (
      imageId text NOT NULL,
      catagoryId text,
      description text,
      takenDate datetime,
      location text,
      recipe text
      );

Of course, this only allows one catagory to be associated with each image
-- adding a mapping table is probably a good idea, but not strictly
necessary.

My idea for it is that there would likely be multiple image table entries
for each physical file.  So, you could include image table entries per
taken picture, allowing multiple group IDs.  That's not very elegant,
though.

The reason for this is so that you can have manipulations of the images.
This would be represented by the recipe.  For example, it's common to have
a thumbnail image:

   source('lena.jpg')
   crop(100, 100, 500, 700)
   scale(200, 300)

This would load a source image, crop it down, then scale it.  This is what
you'd include in the recipe.  I actually have the Python code implemented
to do the recipe transforms as above.  I had given Bryan a demo of it at
the last Hacking Society meeting because he was asking about using the
Python Imaging Library.

Anyway, back to the normalization...  More tables can allow you to do more
powerful things (for example, adding a grouping table to my schema would
allow images to be in multiple groups).  However, this can also be added at
a later time.  I find it's often useful to get it to the point where you
can use it as quickly as possible, then work on updating it down the line.

For example, I have a e-mail based calendaring program I wrote, which
started out as fairly simple set of tables which would send out a message
to me at a few given times reminding me of things.  This was to give me
something slightly more powerful than my existing mechanism of setting up
an "at" job to remind me of upcoming appointments.

I've since added recurring alerts ("first tuesday of the month", "every
other sunday"), which involved some schema changes, which I did on the fly.

Sean
-- 
 It is far from clear whether "good intentions plus stupidity" or "evil
 intentions plus intelligence" have wrought more harm in the world.
                 --- Domer, The Logic of Failure
Sean Reifschneider, Inimitably Superfluous <jafo at tummy.com>
tummy.com - Linux Consulting since 1995. Qmail, KRUD, Firewalls, Python



More information about the NCLUG mailing list