Wednesday, July 28, 2010

PostgreSQL and Flymake

Flymake is an on-the-fly syntax checker for Emacs. Here is how you can use it for editing the PostgreSQL source code. Add this to your src/Makefile.custom:
check-syntax:
        $(COMPILE.c) -fsyntax-only $(CHK_SOURCES)
Then you can activate the flymake minor mode either manually using M-x flymake-mode, or by adding it to your Emacs customization for PostgreSQL. I use:
(defun pgsql-c-mode ()
  "C mode adjusted for PostgreSQL project"
  (interactive)
  (c-mode)
  (flymake-mode)

  ; .. and the rest ...
)
Here is a screenshot:
Notice the marked line with the syntax error and the tooltip/popup with the error message, which appears when the mouse hovers over the marked line.

Note, however, that since activating flymake will essentially cause a compiler to be running continuously in the background, this is not what you want to use when hacking PostgreSQL on the road. ;-)

Saturday, July 3, 2010

Increasing the priority of Debian experimental

Many people run Debian testing or unstable or some mix thereof.  This works pretty well for a development system or a general desktop system if you know a bit about what you're doing (note: nonetheless officially not recommended).  Sometimes you throw packages from experimental into the mix, if you want to get the latest stuff that isn't yet fully integrated into the rest of Debian.

The default APT priority of the Debian experimental release is 1, which ensures that it is never automatically installed or upgraded. This is not always ideal, in my experience. Of course, you don't want a package from experimental to take precedence over a package from stable, testing, or unstable by default. But I think when you have in the past installed a package from experimental, you probably want to pull in upgrades to that package from experimental as well. Otherwise, you will end up with completely unmaintained packages on your system.  That is because in practice many packages in experimental are not actually experimental or broken or unmaintained, but just an advance branch of some software that is just for some reason not ready to go down the unstable-testing-stable road.

To make this work better, I have set the priority of experimental to 101 in my /etc/apt/preferences:
Package: *
Pin: release experimental
Pin-Priority: 101
Now the following will happen: If you just apt-get install a package, it will come from whatever "normal" release you have in your sources.list, say stable or testing. You can override that using -t experimental as usual. If you install a package from experimental and later an upgrade is available in experimental, apt-get upgrade will install that automatically. Also, if an upgrade in a "normal" release appears that has a higher version number, that version will be installed.

Of course, caveats apply. Some software in experimental is really experimental and should only be installed under close supervision. If a package is available only in experimental, this setup will install it when you ask for the package, even if you might not have actually wanted it if you had known that it was in experimental. Figure it out yourself. :)

Similar considerations apply to backports. I use
Package: *
Pin: release a=lenny-backports
Pin-Priority: 102
On the system I have in mind here, the standard distribution is stable, testing is 101, and backports is 102, taking precedence over testing. Because for some architecture-independent packages you don't need backports, so you can pull them directly from testing that way.

In general, the APT priority business is relatively powerful and often a good alternative to, say, manually downloading packages from various distributions, installing them manually, forgetting where they came from, and never upgrading them.

Thursday, July 1, 2010

An Update On Views

A long time ago, in a text book far away, someone described the following (relational) database design approach: Put all your data in tables, normalized and so on, as you'd expect.  An top of each table, create a view.  The database clients only access the views.  As the application evolves, tables might change, more views will be added, but the interface that the existing views present to the outside stays the same.  Sounds nice; everything stays backward compatible.

In order for this to work, views need to be updatable.  Updatable views have been standardized in SQL as far back as 1992, and have been implemented (partially, possibly) in some of the popular/successful SQL implementations.  Curiously, however, this feature has evaded PostgreSQL altogether so far.  Conventional wisdom has held it that views can be made updatable by hand using the rewrite rules system, and a proper implementation of automatically updatable views would only need to create the rules automatically.  Debate and coding around this idea has been taking place for many years, mostly inspired by the heroic efforts of Bernd Helmle.  But I think the conclusion of those in the know by now is that the rewrite rules system is inappropriate for this task (and in fact others have argued that the rewrite rules system is inappropriate for any task and should be ripped out; perhaps harsh but not far from the truth in my opinion).  So the updatable views effort is at square one (or zero) at the moment.

In the meantime, updatable views have been conspicuously absent from any of the recent top-needed features list for PostgreSQL.  Which basically means no one really cares (anymore).  Why is that?

My observation is that the typical use of views has changed quite a bit of the years (decades), and has decreased altogether.  The idea of using views as transparent interfaces on top of tables never really took off to begin with.  Using views to encapsulate complex queries is still reasonable, but not terribly common.  Part of the reason is that many queries are generated, either by ORMs or by custom code.  The other reason is that server-side functions provide a much more powerful abstraction mechanism.  They can't do everything, in particular inlining of queries doesn't alway work, but they can do a lot more with parameters and general procedural programming.  Views are also used for access control, either for columns or for rows.  Column security can now be had with explicit column privileges in PostgreSQL, and one could argue that using views for this was a workaround all along.  And using views for row access control doesn't actually really work, as it turns out.  Proper row security might be an upcoming feature, but it will need to work differently.  And anyway, superfinely granular access control is rather rare in SQL databases overall, I guess.  And in all of these cases, the views don't need to be updatable.

Views are occasionally used as a thin wrapper around a table-returning function, so the function can be access like a table.  But this only works when reading.  Another use, somewhat related actually, is defining views for monitoring the database system or the database application.  Because those views are often used interactively, it is
important that they are convenient to access, and so hugely complex queries combined with custom functions or possibly query generation are inappropriate.

On the other hand, materialized views are all the rage now, appearing at the top of the requested PostgreSQL feature list, and having been fixtures in the feature lists of other SQL implementations for some time.  While traditional views are a tool for organizing code and interfaces, materialized views are purely a performance optimization tool (which, by the way, is the reason materialized views are not
specified in the SQL standard).  Depending on how you implement materialized views, they might even be usable implicitly, if the query matches the materialized query.  Then, materialized views wouldn't really be views at all anymore, they're just query caches.

So, what do you think?  Are traditional views an obsolescent, marginal feature?  Is the matter of updatable views obsolete?