Tuesday, November 22, 2011

plpydbapi: DB-API for PL/Python

One thing that's weird about PL/Python is that its database access API is completely different from the standard Python DB-API. It is similar to PL/Perl and PL/Tcl, and the C "SPI" API, from which they are all derived, but that's little help for a Python programmer. (The reasons for this are lost in history. Probably laziness.) Moreover, the two APIs use the same function names for different purposes.

So I set out to develop a DB-API compatible layer on top of PL/Python: plpydbapi

Example:

CREATE FUNCTION test() RETURNS void
LANGUAGE plpythonu
AS $$
import plpydbapi

dbconn = plpydbapi.connect()
cursor = dbconn.cursor()
cursor.execute("SELECT ... FROM ...")
for row in cursor.fetchall():
    plpy.notice("got row %s" % row)
dbconn.close()
$$;

Granted, it's more verbose than the native PL/Python syntax, so you might not want to use it after all. But it can be helpful if database calls are nested in some other modules, or you just don't want to learn another database access API.

This started out more as an experiment, but it turns out that with the many improvements in PL/Python in PostgreSQL 9.1, it's possible to do this. (Subtransaction control and exception handling were the big issues.) The one gaping hole is that there is apparently no way to get metadata out of a query result. Something to address in PostgreSQL 9.2, perhaps.

Thanks go to the DB-API compliance test suite, which was extremely helpful in making this happen. (Nonetheless, the test suite is quite incomplete in some regards, so treat the result with care anyway.)

Another thing that I found neat about this project is that I managed to get the unit tests based on Python's unittest module to run in the PL/Python environment inside the PostgreSQL server. That's the power of unittest2.

4 comments:

  1. Great news.
    Two questions:
    So this works only with 9.1+ correct?
    When you say metadata are you talking about cursor.description?

    ReplyDelete
  2. What we'd do here is to embed an ORM (for me that would be SQLAlchemy, since I wrote it and all...) into Postgresql, enabled via this DBAPI. Then you'd have to construct some clever kind of marshaling system that can serialize a Python object on the outside into a stored procedure call, which calls into a SQLAlchemy-enabled/PG-embedded system. I've no idea how feasible this would be, but it would allow a fully stored-procedure approach, without the need to actually hand-code two individual persistence layers (i.e. INSERT/UPDATE/DELETE statements on the DB side, type-specific object->procedure marshaling on the app side), which is typically why the SP approach is entirely not worth it.

    ReplyDelete
  3. @Adrian Klaver: Yes, only 9.1+. And yes, cursor.description is mostly unimplemented/(-able).

    ReplyDelete