Wednesday, April 28, 2010

Duplicate Stuff

When you attempt to create a table that already exists, you get an error. When you attempt to create an index that already exists, you get an error, unless you happened to pick a different name. Then you have two indexes that do the same thing. In PostgreSQL 9.0, index names can get generated automatically, so this might get worse. The same applies to constraints: Nothing stops you from creating the logically same constraint twice. Unless you explicitly name constraints, then you get an error. I sometimes give explicit names to check constraints just so later on I have a clue why I needed to check that x is greater than 3. But naming primary key constraints or foreign key constraints is usually a bit silly, because it's clear what they do. So it's easy to end up with 5 identical foreign key constraints.

Is it a common problem that duplicate constraints or indexes get created? I have seen this happen in different ways. You run scripts that you think are idempotent, but they are not really in this sense. (It's in the details: CREATE TABLE foo (a REFERENCES b ...) will fail cleanly when run twice. CREATE TABLE foo (a); ALTER TABLE foo ADD FOREIGN KEY ... will not. See pg_dump.)  Or logically duplicate things are created in independent ways, because say the psql \d output isn't clear enough or isn't checked. Obviously, all of these are human errors in some way.

I'm thinking that a setting to notify the user about this could be useful. It could be a warning or even an error. I can't see any possible reason in the normal course of use that someone would want two foreign keys that point exactly the same way, or two check constraints that check exactly the same thing, or two indexes that are set up in exactly the same way.

Comments? Experiences?


  1. Letting the database name your indexes/constraints/etc is a practice I prohibit in my company's Database Coding Standard, for this reason.

  2. Consider replacing an index on the fly. Standard practice is to create the new one using CONCURRENTLY, then drop the old. Admittedly we might not need that if we had REINDEX CONCURRENTLY, but it's still true that this shouldn't be an error condition.

  3. depending on how it generates names (can it generate more than one name per idx? I assume it does something like col1_col2_col3_idx but if you put a 0 or a 1 on the end of that it could generate 2 idxs for the same thing...) it'll probably lessen the problem since people will hopefully not be naming their own as much and thus they won't generate the same idx twice. At least I hope that's how it works.

    constraint wise my biggest annoyance is I can't build a generic constraint. I have at least 1 constraint that I use on several tables (wrote a function and call that from check) but it'd be nice if I could just have a single named constraint instead, and just apply it to columns.

  4. I've seen lots of duplicate indexes out in the wild, and usually nobody knows how things got that way. In the PostgreSQL 7 days I think it happened sometimes during upgrades as old indexes were created and duplicate ones created for primary keys. But I don't think that accounts for all the duplicates I've seen.

    I think it would be great to disallow creation of a duplicate index. I'd say make the attempt a harmless no-op that generates a warning, though that could be confusing since there would not be an index with the expected name afterwards.

    Generating an error would probably be the best, though it'd be a behavioral change that would probably affect someone or other. (But wouldn't they thank you for keeping them from wasting I/O and disk space on duplicate indices?!)

    Same things with a duplicate foreign key declaration.

    And with a constraint, if it's exactly identical.

  5. Voting for an error on duplicate object creation unless CONCURRENTLY is used.

  6. Maybe some Message that the index created, actualy already exists another index over the same columns.
    Postgres should indicate the similiar index, helping to decide which index to erase or else
    not allow you to create a new one, if already existe another that makes the same function.

  7. Perhaps a plpgsql function to find (and possibly drop) duplicated indexes would do the trick? This could provide the needed functionality (getting rid of duplicates) without complicating the backend.

    Also, I believe that another well known RDBMS automatically makes use of unique indexes if they already exist when creating unique constraints. This seems quite reasonable, really.

  8. I thought there were some technical limitations that were preventing us from handling this situation more cleanly. But I agree that it would be nice to do so.