Saturday, October 24, 2009

Attention PL/Proxy Users: Hash Functions Have Changed in PostgreSQL 8.4

Consider the following elementary PL/Proxy example:
CREATE FUNCTION get_user_email(username text)
RETURNS text AS $$

    CLUSTER 'userdb';
    RUN ON hashtext(username);

$$ LANGUAGE plproxy;
The integrity of this setup depends on (among other things) the hash function always giving the same result for the same username. Otherwise your calls go to the wrong partition and you won't find your data again.

Unfortunately, the hashtext() function and other hash functions have changed their implementation between PostgreSQL 8.3 and 8.4. Observe:
8.3=> SELECT hashtext('foobar');
 hashtext
-----------
 504683490
(1 row)

8.4=> SELECT hashtext('foobar');
 hashtext
-----------
 289967942
(1 row)
So when you update your proxy database from 8.3 to 8.4, you will likely make all your data invisible and/or create a big mess.

Solution? Well, in the short run: don't update quite yet. If you're just starting or you have a small database, reload all your data through the proxy instance after upgrading. The best solution for now appears to be forward-porting 8.3's hash function to 8.4 as an add-on module. Eventually, it would probably be best if PL/Proxy itself provided a stable set of hash functions.

7 comments:

  1. "Eventually, it would probably be best if PL/Proxy itself provided a stable set of hash functions."

    Right. The postgresql hash functions are designed for speed; and in particular, the 8.4 changes were made for speed.

    ReplyDelete
  2. I believe also that the 8.4 functions give different results on big-endian and little-endian machines. Be careful if you're expecting portable answers.

    ReplyDelete
  3. This comment has been removed by a blog administrator.

    ReplyDelete
  4. Right, ideally we'd probably want a hash function that is architecture-independent, so you can have multiple proxy nodes on different architectures and upgrade from 32 to 64 bits without problem.

    ReplyDelete
  5. By the way, I verified (via a very simple test) that the hash function in 8.3 is compatible with 8.2, 8.1, and 8.0. So it appears that the transition to 8.4 is indeed the only one to worry about right now.

    ReplyDelete
  6. I have published a version-independent hash function package to work around this problem at http://github.com/petere/pgvihash .

    ReplyDelete