Wednesday, April 28, 2010

Duplicate Stuff

When you attempt to create a table that already exists, you get an error. When you attempt to create an index that already exists, you get an error, unless you happened to pick a different name. Then you have two indexes that do the same thing. In PostgreSQL 9.0, index names can get generated automatically, so this might get worse. The same applies to constraints: Nothing stops you from creating the logically same constraint twice. Unless you explicitly name constraints, then you get an error. I sometimes give explicit names to check constraints just so later on I have a clue why I needed to check that x is greater than 3. But naming primary key constraints or foreign key constraints is usually a bit silly, because it's clear what they do. So it's easy to end up with 5 identical foreign key constraints.

Is it a common problem that duplicate constraints or indexes get created? I have seen this happen in different ways. You run scripts that you think are idempotent, but they are not really in this sense. (It's in the details: CREATE TABLE foo (a REFERENCES b ...) will fail cleanly when run twice. CREATE TABLE foo (a); ALTER TABLE foo ADD FOREIGN KEY ... will not. See pg_dump.)  Or logically duplicate things are created in independent ways, because say the psql \d output isn't clear enough or isn't checked. Obviously, all of these are human errors in some way.

I'm thinking that a setting to notify the user about this could be useful. It could be a warning or even an error. I can't see any possible reason in the normal course of use that someone would want two foreign keys that point exactly the same way, or two check constraints that check exactly the same thing, or two indexes that are set up in exactly the same way.

Comments? Experiences?

Monday, April 19, 2010

News from the SQL Standard

Last week, I attended the meeting of DIN NA 043-01-32 AA, which is the German "mirror group" of ISO/IEC JTC1 SC32, whereof WG3 produces ISO/IEC 9075, which is titled "Database languages - SQL".  Once you dig through all these numbers and letters and find out who is responsible for what, it's actually quite simple to get involved there and review or contribute things.

For the benefit of everyone who is interested but hasn't had the chance to get involved in that process, here is what is currently going on:
  • A new standard is currently in the "Final Committee Draft" (FCD) phase, which basically means "beta".  The final release is expected in 2011, so you will begin to see mentions of "SQL:2011".
  • The new standard will only contain parts 1/Framework, 2/Foundation, 4/PSM, 11/Schemata, 14/XML. The other parts are currently not being developed, which doesn't mean they are dead or withdrawn, but that no one bothers to add things to them at the moment.
  • All new features in SQL:2011 will be in the form of optional features.  So the level of core conformance is not impacted by the release of a new standard.
There isn't actually that much new in SQL:2011, besides countless fixes and clarifications.  I counted only a dozen or so new features.  Here are some things that might be of interest to the PostgreSQL community:
  • The syntax ALTER TABLE ... ALTER COLUMN ... SET/DROP NOT NULL has been taken into the standard.  PostgreSQL has supported that since version 7.3.  Coincidence?  Not sure.
  • Constraints can optionally be set to NO ENFORCE.  That means the database system won't enforce them but still assumes they are valid, for example for optimization.
  • System-versioned tables: Perhaps the largest new feature, this is a way to make data (rows) visible only during certain times. Some pundits might recall a moral predecessor of this feature labeled SQL/Temporal.  I haven't fully analyzed this feature yet, so I'll post later about the details.
  • Combined data change and retrieval. PostgreSQL does something like this with RETURNING, but this feature is more elaborate and allows the writing of "delta tables".
  • Named arguments in function calls. PostgreSQL 9.0 supports that, but using the syntax foo(3 AS a) instead of what ended up in the standard, foo(a => 3).
  • Default values for function arguments. PostgreSQL 9.0 supports that as well.
This time I attended the meeting as a guest. We are discussing some procedural and financial issues to make this an official membership.  If anyone else is interested in getting involved in the SQL standard development, let me know and I can point you in the right direction.  If we have enough interest, we can set up a discussion group within the PostgreSQL project.