Tuesday, November 22, 2011

git whoami

My favorite feature in bzr (Bazaar) is the bzr whoami command, which prints what your current identity (name and email) is, as far as the repository is concerned. You can tell I haven't used bzr much if that's as far as I have gotten. But seriously, with so many Git repositories around, several project identities, directory-specific shell configuration, and so on, it's easy to get confused, and it's annoying to have to check and repair commits for correct user name and email all the time. So here is git whoami. This has already saved me so many headaches.

plpydbapi: DB-API for PL/Python

One thing that's weird about PL/Python is that its database access API is completely different from the standard Python DB-API. It is similar to PL/Perl and PL/Tcl, and the C "SPI" API, from which they are all derived, but that's little help for a Python programmer. (The reasons for this are lost in history. Probably laziness.) Moreover, the two APIs use the same function names for different purposes.

So I set out to develop a DB-API compatible layer on top of PL/Python: plpydbapi

Example:

CREATE FUNCTION test() RETURNS void
LANGUAGE plpythonu
AS $$
import plpydbapi

dbconn = plpydbapi.connect()
cursor = dbconn.cursor()
cursor.execute("SELECT ... FROM ...")
for row in cursor.fetchall():
    plpy.notice("got row %s" % row)
dbconn.close()
$$;

Granted, it's more verbose than the native PL/Python syntax, so you might not want to use it after all. But it can be helpful if database calls are nested in some other modules, or you just don't want to learn another database access API.

This started out more as an experiment, but it turns out that with the many improvements in PL/Python in PostgreSQL 9.1, it's possible to do this. (Subtransaction control and exception handling were the big issues.) The one gaping hole is that there is apparently no way to get metadata out of a query result. Something to address in PostgreSQL 9.2, perhaps.

Thanks go to the DB-API compliance test suite, which was extremely helpful in making this happen. (Nonetheless, the test suite is quite incomplete in some regards, so treat the result with care anyway.)

Another thing that I found neat about this project is that I managed to get the unit tests based on Python's unittest module to run in the PL/Python environment inside the PostgreSQL server. That's the power of unittest2.

Saturday, November 12, 2011

Switching desktop environments

So, like many people, I have received GNOME 3 in the latest upgrade on Debian testing, and switched away to Xfce very soon after. What's amazing (to me) about this whole experience, however, is how easy it is to switch desktop environments in Debian. Two years ago, I fled KDE in very similar circumstances. I have since used Openbox, LXDE, GNOME, Xfce, as well as a few more obscure window managers in short-lived experiments. All of this was easy to do and worked flawlessly. Most of my settings, my preferred programs, and startup applications were preserved, and if I didn't like it, I could switch back without a problem. (Ironically, the upgrades from KDE 3 to KDE 4 and from GNOME 2 to GNOME 3 lost far more settings than changes to other environments.) And of course you can mix and match applications from different environments, and these days this doesn't result in a visual crime. I don't know whom to thank for this, probably a combination of upstream developers, freedesktop.org guys, and Debian maintainers. Keep it up.

Thursday, September 15, 2011

ccache and clang, part 2

There's more funny business when using ccache in combination with clang. Last time I suggested that you use the invocation
./configure CC='ccache clang -Qunused-arguments -fcolor-diagnostics'
to get rid of the "argument unused during compilation" warnings.

But you still might get loads of warnings that you wouldn't normally get without ccache, such as this example (from the PostgreSQL source code):

extension.c:382:35: warning: equality comparison with extraneous parentheses [-Wparentheses]
 if (( (((control->directory)[0]) == '/') ))
        ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
extension.c:382:35: note: use '=' to turn this equality comparison into an assignment
(This is the opposite of the warning that tells you to put two pairs of parentheses around an assignment used as a truth value.) Or:
path.c:727:7: warning: explicitly assigning a variable of type 'char *' to itself [-Wself-assign]
 path = (path);
 ~~~~ ^  ~~~~
The problem is, these come from macro expansions, so wouldn't normally see them, because (I guess) the compiler driver is smart enough not to warn about such things when they come from macro expansions.

The way ccache works is approximately

  1. preprocess the input file
  2. look for it in the cache
  3. if not found, compile the preprocessed file
What would be better in this situation is
  1. preprocess the input file
  2. look for it in the cache
  3. if not found, compile the original file
And indeed you can turn on that second behavior by setting the obscure environment variable CCACHE_CPP2 (as in, run cpp twice):
export CCACHE_CPP2=yes
Then all these extra warnings disappear.

(The ccache man page is worth a read. There are a few interesting settings to play with.)

I'm currently playing around with a shell script ccache-clang that looks like this:

CCACHE_CPP2=yes exec ccache clang -Qunused-arguments `test -t 2 && echo -fcolor-diagnostics` "$@"

Saturday, September 10, 2011

Beta work

PostgreSQL 9.1.0 was wrapped the yesterday (or the day before yesterday, depending on from where you see it). It is now in a special place, where it can rest and let the flavor soak in, before it's released into the wild next week.

Time to reflect.

One thing that is sometimes criticized about the PostgreSQL release process is that there is too much time between the start of beta (or the end of the last commit fest) and the final release, and that it's not clear what ordinary contributors can do during that time to speed things up. This is now worked around to some degree by starting the next development cycle during the beta period of the previous one. (Evaluating the success of that approach is another matter.) But because this matter had also bothered me, I have taken notes this time, first of all so I don't forget anything next time, but perhaps also to provide some inspiration for others.

So here is what I'd been up to during the 9.1 beta period:

Documentation work
It is well understood that the documentation needs a general review during beta time. As features are patched in, chunks of documentation are added here and there, and at the end of it all you just need to stare at it for a bit and see whether the order and flow still make sense. I didn't do much of that this time, but I know others did. Other things to do are:
  • Various bits of SGML markup usually need to be cleaned up, e.g., alignment of verbatim environments, linking to formal objects.
  • I always run the whole documentation through a spell checker, which always results in a bunch of corrections.
  • The SQL features list in the documentation needs to be updated.
  • The SQL key words list in the documentation needs to be updated.
  • All generated man pages should be looked through manually for formatting artifacts.
  • Non-ASCII characters in the SGML source need to be checked. They should be converted to entities, and ideally they should fit into the Latin 1 set.
Other people also spent significant time on preparing the release notes, which is a huge job. I didn't even get into that, except that the spell checking and automated cleanups apply to the release notes as well, since they are part of the documentation build.
Source code cleanup
We run pgindent during beta, but there are also a number of other things to do to clean up the source code. A lot of these I didn't plan. Mostly, one thing led to another.
  • Check for non-ASCII characters in source. These should either be in UTF-8 or replaced by an ASCII equivalent.
  • Spell checking: You can't run the whole source through a spell checker, but there are also a few typical misspellings or variant spellings that are easy to clean up over the entire source tree, such as GIST -> GiST or cancelled -> canceled.
  • Check the --help and --version output of all programs to see whether it follows conventions.
  • In particular, check that the --help output is aligned, not too wide, and that the options are ordered sensibly. (Very often a new option is just added at the end or at some random place in the middle.)
  • Diff the --help outputs between the previous and the current version to find out about all new command-line options and check whether they are sensibly named and properly documented.
  • And the order in which options are presented in the --help output and on the man pages should not be gratuitously different.
  • Similarly, the backslash command help output in psql (\?) should be checked that it looks ok.
  • Go through the configuration file samples and check them for sanity, up-to-dateness, and proper formatting.
  • Remove literal tabs from string constants. How did they ever get in there?
  • Compile the code with a bunch of extra warning options. GCC has a lot of those, try them all out. A lot of them generate a lot of noise, but going through them about once a release does find the occasional trouble spot, or at least provides inspiration for future code cleanup projects. As of 9.1, you can also throw Clang into the mix.
  • Also, check building against cutting edge versions of library dependencies. Good candidates are Perl and Python, which introduce small breakage with each new release. If there is a new release candidate of those guys out while we prepare the PostgreSQL release, our users will be building against the final versions soon, so we need to be prepared.
Translation work
Besides the translation work itself and orchestrating the translator contributors, this is also the best chance to read through every single user-visible message that was introduced in the new version. So there are occasional spelling fixes, some messages are poorly worded or incomprehensible to a normal user, or sometimes the code can be restructured so that some error condition doesn't happen at all. A lot of this feeds back into source code cleanup items.
Help the packagers
We have a great bunch of packagers who prepare PostgreSQL for installation on various operating systems. Help them out. Make sure all new features are properly integrated, package descriptions are up-to-date, upgrades work.
And then of course you field submissions from testers and other contributors who will submit countless bug reports, failures on unusual platforms, documentation fixes.

So that can keep you busy for a month or three. The goal ought to be to automated most of this. But much of it can't be. So we really do need to spend the time to do these things, especially polishing the user-facing things. We don't want the first program output or documentation page the user sees of the new release to be ugly or wrong.

Tuesday, July 19, 2011

Undefined symbol

Here is a small DBA tale from a few months ago. I wanted to upgrade a few servers to the latest minor release of PostgreSQL. It was the upgrade from 8.4.4 to 8.4.5. To minimize downtime, I had the following obvious idea: I'd install all the packages first, and once everything was in place and an opportune time for a service disruption had arrived, I'd just restart the services, and everything would be backup in less than a minute. (If you just let the package manager do its thing, it's always a bit unpredictable when the services are stopped and restarted, which might result several minutes of downtime.)

This operation had dramatic consequences. As soon as I had the new packages installed, I got massive amounts of these errors:

ERROR: could not load library "/usr/lib/postgresql/8.4/lib/plpgsql.so": /usr/lib/postgresql/8.4/lib/plpgsql.so: undefined symbol: PinPortal

Since a lot of things use PL/pgSQL, this meant that the database system was pretty much dead at this point.

What had happened?

Well, version 8.4.5 introduced some new functions in the server that PL/pgSQL made use of. As soon as you install the new plpgsql module, the next session that starts will load the new libary, but unless you restarted the server, the running server process won't have that symbol.

It was easy to fix this by restarting the server, but that's not how things were planned.

Consequence: Unless shown otherwise and a lot of care is taken, you must upgrade the server and all loadable modules to exactly matching versions and at the same time, and you must restart the server at the same time.

It might be useful as a preventive measure to compare symbol tables before upgrading. It could also be tracked more carefully at the source level, by explicitly mentioning the addition of meant-to-be-called-externally functions to the server in the release notes. (I suppose we don't usually remove functions, but who knows.)

Thursday, June 23, 2011

Bootstrapping Debian packages

I read something about dh_make again a while ago, which made me wonder, besides dh_make, how do people generally bootstrap new Debian packages. I don't think I've ever actually used dh_make for anything but experimenting. For the longest time, copying over the packaging files from some similar package worked pretty well.

Recently, however, I have applied a completely new strategy: I start with nothing, call debuild, fix the error, retry, until it works. This might sound pretty bizarre, but it works pretty well and enhances your understanding of how the packaging tools work.

Here is about how it works:

Start with nothing, call debuild, it complains:
cannot find readable debian/changelog anywhere!
Call dch --create, which says:
Cannot find debian directory!
OK, mkdir debian, call dch --create again, fill in the template. Call debuild again, now it says:
but there's no debian/rules there!
Create a debian/rules, that's easy to do by heart nowadays, at least as a start:
#!/usr/bin/make -f

%:
        dh $@
Call debuild again, now dpkg-buildpackage actually starts, but stops with:
dpkg-source: error: cannot read mypackage/debian/control: No such file or directory
At this point I'm too lazy to figure out what is necessary to put into debian/control, so let's just keep it simple and touch debian/control. At this point dpkg-source gets into a bit of trouble:
Use of uninitialized value $sourcepackage in sprintf at /usr/bin/dpkg-source line 290.
So let's put that in and also follow the subsequent advice to add debian/source/format and the Maintainer and Standards-Version fields. So far we have:
Source: mypackage
Maintainer: Peter Eisentraut <petere@debian.org>
Standards-Version: 3.9.2
The next hint of a message is:
dh: No packages to build.
This means we need a binary package stanza, so I'll just add
Package: mypackage
Architecture: any
Now debhelper complains:
dh: Compatibility levels before 5 are deprecated.
Not sure why it didn't complain about that earlier. Let's stick 8 in there.

At this point I had to do actual work and mess around with debian/rules a bit to get the package to actually build, but a few minutes later I had a functioning provisional package.

The next step is to clean up the warnings from the various dpkg-* workers:
dpkg-gencontrol: warning: missing information for output field Description
dpkg-gencontrol: warning: package mypackage: unused substitution variable ${shlibs:Depends}
dpkg-deb: warning: parsing file 'debian/mypackage/DEBIAN/control' near line 6 package 'mypackage': missing description
dpkg-genchanges: warning: missing Section for binary package mypackage; using '-'
dpkg-genchanges: warning: missing Priority for binary package mypackage; using '-'
dpkg-genchanges: warning: missing Section for source files
dpkg-genchanges: warning: missing Priority for source files
So we add Description, Depends, Section, and Priority.

And finally we have a list of complaints from Lintian to address:
W: mypackage source: debhelper-but-no-misc-depends mypackage
E: mypackage source: package-uses-debhelper-but-lacks-build-depends
W: mypackage source: package-needs-versioned-debhelper-build-depends 8
W: mypackage source: debhelper-overrides-need-versioned-build-depends (>= 7.0.50~)
E: mypackage source: temporary-debhelper-file debhelper.log
E: mypackage: no-copyright-file
The only things I added manually after that were Vcs-*, Homepage, and Enhances.

Now the only things left to do are running the thing through cowbuilder a few times and putting in all the necessary build dependencies, and writing a nice changelog entry.

Note, this method does not replace putting in some thought. But it's an interesting way to get a relatively clean package.

Wednesday, June 1, 2011

Enabling core files for PostgreSQL on Debian

The other day, I was a bit puzzled over a seemingly simple task: Enable core files to be generated from a PostgreSQL instance running on Debian. That instance has unfortunately been segfaulting on occasion, but never left a core file.

Now in principle it is clear that
ulimit -c unlimited
is the incantation to get this done. But where do you put this? You could hack it into the init script, but that seemed a bit ugly, and I wanted a sustainable solution.

A useful thing in the meantime is to check the current settings. That information is available in /proc/$PID/limits with the PID of the postmaster process (or any child process, really), and it looked like this to begin with:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            ms        
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
...
Use sudo grep core /proc/$(sudo cat /var/run/postgresql/8.4-main.pid)/limits if you want it automated.

So it's good to know that we only need to set the soft limit.

One way to configure this properly would appear to be in /etc/security/limits.conf. There you can add a line like
*               soft    core            unlimited
to enable core dumps globally. I'm not actually sure whether that would work if the service is started during the boot without PAM. In any case, I didn't want to enable core files globally; who knows what that would lead to.

One could replace the * by a user name, such as postgres, and then enable pam_limits.so in /etc/pam.d/su. But the postgresql init script in Debian is nested about four levels deep, so it wasn't clear whether it called su at all.

Now as it turns out, the init script ends up changing the user using this Perl code:
$) = $groups;
$( = $gid;
$> = $< = $uid;
(see change_ugid in /usr/share/postgresql-common/PgCommon.pm), so the whole PAM line of thought wasn't going to work anyway. (Other packages such as pgbouncer and slony1 do got through su, so that would be a solution for those.)

The way to solve this is the pg_ctl -c option, which sets the soft limit for core files to unlimited. And the way to pass this option through the init script maze is the file /etc/postgresql/8.4/main/pg_ctl.conf, which should contain a line like this:
pg_ctl_options = '-c'
Then restart postgresql, and check /proc/$PID/limits again:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            ms        
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        unlimited            unlimited            bytes     
OK.

Another thing that's recommendable in this context is to change the core file names to have a unique element, so that if multiple backends crash before you can take a look, they don't overwrite each other's core files. The core(7) man page explains the configuration options; I went with this sysctl setting:
kernel.core_pattern = core.%e.%p
which includes process name and PID. The PID file still ends up in the data directory of the PostgreSQL instance, which could also be changed, but I didn't find it necessary.

Stick the above line in /etc/sysctl.d/local.conf and reload with
service procps force-reload
I actually use a setting like that on all machines now; it's just nicer.

OK, and now I'll wait for the next core file. Or not.

Thursday, May 5, 2011

ccache and clang

Just a note for the Internet: When you use ccache and clang together, you will probably get a lot of warnings like these:
clang: warning: argument unused during compilation: '-c'
clang: warning: argument unused during compilation: '-I .'
These are harmless, but if you want to get rid of them, use the clang option -Qunused-arguments, which will hide them. (The first one is already fixed in ccache.)

The reason for this is that ccache splits the compilation into separate calls to the preprocessor and the compiler proper, and it tries to sort out which of the options that you called it with go with which call. But since gcc doesn't complain about passing -c to the preprocessor or -I to the compiler, ccache doesn't bother about sorting this out (bug). That's why you don't lose any information relative to using gcc if you use the -Qunused-arguments option.

Also, if you like clang's colored diagnostics messages, you'll have to turn them on explicitly with -fcolor-diagnostics, because when running through ccache, clang doesn't think it's printing to a terminal and turns off the color by default.

So a complete invocation might look like this:
./configure CC='ccache clang -Qunused-arguments -fcolor-diagnostics'

Friday, April 22, 2011

Emacs fill column for PostgreSQL

Since it's pgindent season, here's a small related tip for Emacs users.
(setq fill-column 79)
This will set the maximum line length for filling commands such as M-q to the same length that pgindent uses. That is especially useful when you want to reformat block comments nicely. The default is 70, which will cause perpetual conflicts between Emacs and pgindent.

The commit shows various other ways to include this in your Emacs configuration. (I use the c-add-style approach, personally.)

Thursday, March 10, 2011

My new Git pre-commit hook

This appears to be kind of useful:
#!/bin/sh

output=$(find . -name '.#*' -print)
if [ -n "$output" ]; then
        echo "unsaved Emacs files:" 1>&2
        echo "$output" 1>&2
        exit 1
fi
Had that kind of problem a few times. :-)

Now what would be really handy are system-wide Git hooks that apply to all repositories, like ~/.gitignore complements .git/info/exclude.

Sunday, February 6, 2011

Squeeze + PostgreSQL = Broken

The PostgreSQL package in Debian squeeze, just released, is linked with libedit instead of libreadline. This has two interesting properties:
If either of these is a concern, think carefully before you upgrade.

Is there a way to at least configure libedit to accept non-ASCII characters?

Wednesday, January 12, 2011

Perception vs. Reality

So this is interesting: In the Stack Overflow Annual User Survey, 62.2% of respondents claim they are "proficient" in SQL. This tops all the languages listed. This is perhaps not that surprising, but I had at the same time — subjectively — noticed an abundance of let's say clueless questions and suboptimal answers on SQL and RDBMS topics in the StackExchange network. Quite clearly SQL is somewhat different from algorithmic programming languages in that there is a gap between being familiar with the language and really understanding its effects.

Friday, January 7, 2011

Git commit mode

Hardly anything ruins a glorious day of coding like fat-fingering the commit message late at night as you doze off, and then pushing it out for the world to see. To prevent that, I have equipped my Emacs configuration with a few little tools now.

First, I found the git-commit-mode, a special mode for Git commit messages. This helps you format the commit messages according to convention, and will use ugly colors if, for example, you write lines that are too long or you do not keep the second line blank. It also allows the use of things like M-q without shredding the whole file template.

Second, I integrated on-the-fly spell checking into the git-commit-mode. It won't stop you from writing nonsense, but it will catch the silly mistakes.

Here's a simple configuration snippet:
(require 'git-commit)
(add-hook 'git-commit-mode-hook 'turn-on-flyspell)
(add-hook 'git-commit-mode-hook (lambda () (toggle-save-place 0)))
The last line is handy if you have save-place on by default. When you make a new commit, it would then normally place the cursor where a previously edited commit message was finished, because to the save-place functionality, it looks as though it's the same file.