[NCLUG] Annoying post-meeting SQL curiosities

Mon Dec 15 23:19:08 MST 2008

Michael Milligan writes:

> >     UPDATE test SET a = a + 1;
> > 
> > This should leave us with two rows, with values 2 and 3, right?
> 
> Yes, unless you have table constraints, triggers, or rules that
> dictate otherwise...  this is a somewhat ambiguous and contrived
> situation.

Yes, I left out the big disclaimer.  The SQL statements I shared were
sort of the "generate a clear test case demonstrating the problem"
sort of thing as opposed to "show me the actual project you're working
on."  Incidentally, I don't really care about the index itself as much
as the unique constraint on the column.  Every reference I can find
seems to say that every implementation on the planet just creates a
unique index when a unique column constraint is asked for.  (Which is
a perfectly reasonable thing for an implementation to do.)

> > not so fast.  Some databases do work that way, and others report that
> > the unique index constraint is being violated.
> 
> I would expect that behavior for an ACID-compliant database (such as
> PostgreSQL) given your example.  It updates the index after each row
> is updated as a result of your UPDATE statement within the
> transaction, but because of your insert order (you put in 1 before
> 2), the 1 row will be incremented before the 2 row (e.g., PostgreSQL
> does sequential table scans based on insert, or OID, order) and thus
> you are violating the index constraint on the table.
>
> The fix?  Drop the index before you do this operation.  Or don't use
> an index at all.  The index is not useful for this transaction
> anyway since you are altering every row in the table but uniqueness
> is provably maintained.  The update operation is also a lot faster
> without the index in the way.
>
> I.e., the correct handling of this is:
> 
> BEGIN;
> DROP INDEX test_a_idx;
> UPDATE test SET a = a + 1;
> CREATE UNIQUE INDEX test_a_idx ON test (a);
> COMMIT;

I've got to say I find that just leaves a bad taste in my mouth.  Any
operations that could make use of the index are reduced to full table
scans.  Re-creating the index incurs a second full table scan.  And,
worse, the uniqueness constraint on the data in column a is removed
for the course of the transaction.

Since you bring up PostgreSQL specifically, the development team seems
to regard the current behaviour as a bug.  There's a discussion on
exactly this and what an impact it would have on the code starting
here:
http://archives.postgresql.org/pgsql-hackers/2006-09/msg01202.php

The nicest workaround I've found so far goes like this.  We'll just
assume that my table does actually have more than just the "A" column
in it and I'm ignoring those for now.  And I want to make some space
between rows where A is 1 and A is 2.

    CREATE TABLE test (a INTEGER UNIQUE, irrelevantdata VARCHAR(100));
    -- This index is more-or-less implicit in the UNIQUE column
    -- constraint above.  Making it explicit.
    CREATE UNIQUE INDEX test_a_idx ON test (a);
    -- stuff some sample rows into the table
    BEGIN;
    INSERT INTO test (a, irrelevatdata) VALUES
        (1, 'foo');
    INSERT INTO test (a, irrelevatdata) VALUES
        (2, 'bar');
    INSERT INTO test (a, irrelevatdata) VALUES
        (3, 'baz');
    COMMIT;
    -- stick a new value in here between the after row (1,'foo')
    -- increasing A in all rows that follow
    BEGIN;
    UPDATE test SET a = 0 - a - 1 WHERE a >= 2;
    INSERT INTO test (a, irrelevantdata) VALUES
        (2, 'xyzzy');
    UPDATE test SET a = -a WHERE a < 0;
    COMMIT;

I found this to be quite clever.  It maintains the column constraint,
doesn't depend on any vaguely specified implementation-specific
behaviour, doesn't do a full table scan for every single query, and it
is reasonably clear about what is happening.  (Credit where it is due:
http://web.archive.org/web/20070818213714/http://people.planetpostgresql.org/greg/index.php?/archives/46-Updating-unique-columns.html)

- Aaron