[NCLUG] Annoying post-meeting SQL curiosities

Wed Dec 10 23:35:25 MST 2008

Bob Proulx writes:
> I am not a DBA.  But that hasn't stopped me from working on projects
> which required one.  :-) At the root of your problem I think this is
> because you are doing something that is frowned upon.
> 
> Why are you changing the primary key of the data?  Let the database
> choose the primary key and never change it.  You will avoid this
> turmoil and consternation.

Degenerate test cases were used for illustration purposes.  No primary
keys were harmed in the actual application.

> If you have a case that seems to require setting the primary key
> that would be an interesting discussion.
> 
> > > This should leave us with two rows, with values 2 and 3, right?
> 
> I think this is quite a hard.  It would need to make a complete copy
> of the table in passing through the operation.  If your table is very
> large then this would consume a very large amount of either memory or
> disk for the temporary storage.  Imagine this operation on a table
> consuming a very large amount of data.

Agreed that the implementation becomes ... interesting.  Especially in
cases where there are billions of rows to keep track of.

> Being able to do this might be cool but people concerned with high
> performance databases will complain that it is slower to always make
> a temporary copy than otherwise.  I would imagine that the selection
> of different database engines would strongly affect the result.

A temporary copy of being-changed data (or locking all tables involved
in a transaction) is going to be required to provide concurrent access
to other users of the database anyway.  The choice of implementation
absolutely does affect results here.

> Most would choose the faster engine and avoid asking it to do things
> which hurt.  (Doctor, it hurts when I do this.  Response, then don't
> do that.)

Most would rather get their data quickly than get correct data?  If
that's the case, I will proudly stand apart from these "most".  :)

> > > Am I wrong to think that an UPDATE is supposed to be atomic (not
> > > divisible into smaller pieces) and that all rows matching the
> > > (non-existant) predicates specified are to be changed simultaneously?
> > > And would it not follow that there should be no duplicates in the
> > > index when the UPDATE is done?
> 
> Wasn't it atomic?  When it failed didn't the data remain the old data?
> If not and if it left you with corrupted tables then that would be the
> more serious bug.  I assume that when it failed that the data simply
> wasn't updated.  Which is exactly what should happen if the atomic
> operation fails.

I believe we are talking different atomicnesses here.  I should choose
a different word.  Here's my complaint: When processing the UPDATE
statement, a just-updated row's new value is being checked for
uniqueness against all of the old rows in the table one modified row
at a time.  If an UPDATE is truly an all-rows-are-updated or
no-rows-are-updated operation, should not the new values be checked
for uniqueness against all the new values of all of the modified rows?
(I'm not trying to imply that this would necessarily be easy, just
that it would be more correct.)

- Aaron