Monday, October 1, 2012

psqlrc files

In PostgreSQL 9.2, you can use major-version-specific .psqlrc files, such as .psqlrc-9.2. PostgreSQL 9.2 also added the "include relative" command \ir to psql. Combining these two, you can set up psql initialization to take advantage of any new features you want without breaking the use of old psql releases.

For example, I'd like to set up psql to automatically use \x auto. But if I just put that into .psqlrc, older psql releases will complain about an unknown command. (I usually have multiple PostgreSQL versions installed, and I share dotfiles across hosts.) On the other hand, I don't want to have to duplicate the entire .psqlrc file to add one command, which is where \ir comes in.

Here is what I use, for example:

.psqlrc-9.2
\ir .psqlrc
\set QUIET yes
\set COMP_KEYWORD_CASE preserve-lower
\x auto
\unset QUIET
.psqlrc-9.3
\ir .psqlrc-9.2

Tuesday, September 11, 2012

pgxnclient supports tarballs and HTTP

Need to install a PostgreSQL server add-on module? The devel branch of pgxnclient now supports this type of thing:
pgxnclient install http://pgfoundry.org/frs/download.php/3274/plproxy-2.4.tar.gz
This downloads, unpacks, builds, and installs. And the module doesn't need to be on PGXN. And of course you don't have to use HTTP; a file system location will work as well. I think this can be very useful, especially during development, when not everything is available in packaged form, or even for deployment, if you don't want to bother packaging everything and have been installing from source anyway.

Monday, August 13, 2012

Funny version numbers

Often, I install a new Debian package using apt-get install, and as the progress output flies by, I wonder, Whoa!, should I really be using a piece of software with that version number?

It says a lot, after all. If I see

tool 2.4.1-2
then I (subconsciously) think, yeah, the upstream maintainers are obviously sane, the tool has been around for a while, they have made several major and minor releases, and what I'm using has seen about one round of bug fixing, and a minimal amount of tweaking by the Debian maintainer.

On the other hand, when I see

tool 7.0.50~6.5~rc2+0.20120405-1
I don't know what went on there. The original release version 7.0.50 was somehow wrong and had to be renamed 6.5? And then the 2nd release candidate of that? And then even that wasn't good enough, and some dated snapshot had to be taken?

Now, of course, there are often reasons for things like this, but it doesn't look good, and I felt it was getting out of hand a little bit.

I tried to look into this some more and find a regular expression for a reasonably sane version number. It's difficult. This is how far I've gotten: https://gist.github.com/3345974. But this still lists more than 1500 packages with funny version numbers. Which could be cause for slight concern.

Take a look at what this prints. You can't make some of that stuff up.

Wednesday, July 18, 2012

Tracing shell scripts with time stamps

A random tip for shell script hackers. You know that with set -x you can turn on tracing, so that every command is printed before being executed. In bash, you can also customize the output prefix by setting the PS4 variable. The default is PS4='+ '. Here is an example. I wanted to "profile" a deployment script, to see why it took so long. Ordinarily, I might have sprinkled it with date calls. Instead, I merely added
set -x
PS4='+\t '
near the top. \t stands for time stamp. (The script was already using bash explicitly, as opposed to /bin/sh.) That way, every line is prefixed by a time stamp, and the logs could easily be analyzed to find a possible performance bottleneck.

Sunday, May 20, 2012

Base backup compression options

I've been looking at my PostgreSQL base backups. They are run using the traditional

tar -c -z -f basebackup.tar.gz $PGDATA/...

way (many details omitted). I haven't gotten heavily into using pg_basebackup yet, but the following could apply there just as well.

I had found some of the base backups to be pretty slow, so I dug a little deeper. I was surprised to find that the job was completely CPU bound. The blocking factor was the gzip process. So it was worth thinking about other compression options. (The alternative is of course no compression, but that would waste a lot of space.)

There are two ways to approach this:

  • Use a faster compression method.

  • Parallelize the compression.

For a faster compression method, there is lzop, for example. GNU tar has support for that, by using --lzop instead of -z. It gives a pretty good speed improvement, but the compression results are of course worse.

For parallelizing compression, there are parallel (multithreaded) implementations of the well-known gzip and bzip2 compression methods, called pigz and pbzip2, respectively. You can hook these into GNU tar by using the -I option, something like -I pigz. Alternatively, put them into a pipe after tar, so that you can pass them some options. Because otherwise they will bring your system to a screeching halt! If you've never seen a system at a constant 1600% CPU for 10 minutes, try these.

If you have a regular service window or natural slow time at night or on weekends, these tools can be quite useful, because you might be able to cut down the time for your base backup from, say 2 hours to 10 minutes. But if you need to be always on, you will probably want to qualify this a little, by reducing the number of CPUs used for this job. But it can still be pretty effective if you have many CPUs and want to dedicate a couple to the compression task for a while.

Personally, I have settled on pigz as my standard weapon of choice now. It's much faster than pbzip2 and can easily beat single-threaded lzop. Also, it produces standard gzip output, of course, so you don't need to install special tools everywhere, and you can access the file with standard tools in a bind.

Also, consider all of this in the context of restoring. No matter how you take the backup, wouldn't it be nice to be able to restore a backup almost 8 or 16 or 32 times faster?

I have intentionally not included any benchmark numbers here, because it will obviously be pretty site-specific. But it should be easy to test for everyone, and the results should speak for themselves.

Tuesday, May 15, 2012

My (anti-)take on database schema version management

There were a number of posts recently about managing schema versions and schema deployment in PostgreSQL. I have analyzed these with great interest, but I have concluded that they are all more or less significantly flawed. (Of course, most of these solutions do in fact work for someone, but they are not general enough to become canonical go-to solutions for this problem class, in my opinion.) I have developed a list of elimination criteria by which I can evaluate future developments in this area. So here are some of the things that I don't want in my schema version management system:

  • Using schemas for distinguishing multiple versions (like this, but that's actually more about API versioning). That simply won't work for deploying objects that are not in schemas, such as casts, languages, extensions, and, well, schemas.

  • Using extensions (like this). Well, this could work. But extensions by themselves do nothing about the core problem. They are just an SQL wrapper interface around upgrade scripts. You still need to write the upgrade scripts, order them, test them, package them. The extension mechanism might replace the, say, shell script that would otherwise run the upgrade files in a suitable order. Another issue is that extensions require access to the server file system. Changing this is being discussed as "inline extensions", but there is no consensus. This is a smaller problem, but it needs to be thought about. Also, I do need to support PostgreSQL 9.0 and earlier for little while more.

  • Requiring naming each change (patch names, like this). Naming things is hard. Numbering things is easy. And how many good names are you going to still be able to come up with after 100 or so changes?

    Take a lesson from file version control systems: versions are numbers or, if it must be, hashes or the like (UUIDs have been suggested).

  • Using a version control tool for tracking upgrade paths (like this). Sqitch, unlike the initial draft of this concept, doesn't actually require a version control tool for deployment, which wouldn't have worked for me, because what we ship is a tarball or a deb/rpm-type package. But it still requires you to maintain some kind of sanity in your version control history so that the tool can make sense out of it. That sounds fragile and inconvenient. The other choice appears to be writing the plan files manually without any VCS involvement, but then this loses much of the apparent appeal of this tool, and it's really no better than the "naming each change" approach mentioned above.

  • Taking snapshots or the like of a production or staging or central development system. Production systems and staging systems should be off limits for this sort of thing. Central development systems don't exist, because with distributed version control, every developer has their own setups, branches, deployments, and world views.

    You could set it up so that every developer gets their own test database, sets up the schema there, takes a dump, and checks that in. There are going to be problems with that, including that dumps produced by pg_dump are ugly and optimized for restoring, not for developing with, and they don't have a deterministic output order.

  • Storing the database source code in a different place or in a different manner than the rest of the source code. This includes using a version control system like mentioned above (meaning storing part of the information in the version control meta information rather than in the files that are checked into the version control system in the normal way), using a separate repository like Post Facto, or using something like the mentioned staging server.

    The source is the source, and it must be possible to check out, edit, build, test, and deploy everything in a uniform and customary manner.

  • Allowing lattice-like dependencies between change sets (like most examples cited above). This sounds great on paper, especially if you want to support distributed development in branches. But then you can have conflicts, for example where two upgrades add a column to the same table. Depending on the upgrade path, you end up with different results. As your change graph grows, you will have an exploding number of possible upgrade paths that will need to be tested.

    There needs to be an unambiguous, canonical state of the database schema for a given source checkout.

  • Requiring running through all the upgrade scripts for a fresh deployment (like this). There are two problems with this. First, it's probably going to be very slow. Fresh deployments need to be reasonably fast, because they will be needed for automated tests, including unit tests, where you don't want to wait for minutes to set up the basic schema. Second, it's inefficient. Over time, you might drop columns, add new columns, delete rows, migrate them to different tables, etc. If you run through all those upgrade scripts, then a supposedly fresh database will already contain a bunch of rubble, dropped attributes, dead rows, and the like.

    Therefore, the current version needs to be deployable from a script that will not end up replaying history.

  • Using metalanguages or abstraction layers (like Pyrseas or Liquibase or any of the metaformats included in various web frameworks). It'd probably a good idea to check some of these out for simple applications. But my concern is whether using an abstraction layer would prevent me from using certain features. For example, look at the Pyrseas feature matrix. It's pretty impressive. But it doesn't support extensions, roles, or grants. So (going by that list), I can't use it. (It's being worked on.) And in a previous version, when I looked at it for a previous project, it didn't support foreign-data functionality, so I couldn't use it then either. And those are just the top-level things the author thought of. Actually, the Pyrseas author has gone through some effort to have almost complete coverage of PostgreSQL DDL features, so give this tool a try. But it won't be for everyone.

So, I'm going to keep looking.

Monday, May 14, 2012

Time to retrain the fingers

For years, no decades, I've typed tar tzf something, tar xzf something. Except when someone annoying sent an uncompressed tar file and I had to then go and take out the z in the middle.

Then came bzip2, and we learned tar tjf, tar xjf. OK, I could live with that. One emerging problem was that the tab completion now worked the wrong way around conceptually, because you had to pick and type the right letter first in order to see the appropriate set of files to unpack offered for completion.

Then came lzma, which was (quick, guess?), tar tJf, tar xJf. And then there was lzop, which was too boring to get its own letter, so you had to type out tar -x --lzop -f.

But lzma was short-lived, because then came xz, which was also J, because lzma was now too boring as well to get its own letter.

Oh, and there is also the old compress, which is Z, and lzip, which I'd never heard of.

But stop that. Now there is

 -a, --auto-compress
            use archive suffix to determine the compression program

This handles all the above compression programs, and no compression. So from now on, I always use tar taf and tar xaf. Awesome.

The finger movements will be almost the same on QWERTY and AZERTY, and easier than before on QWERTZ.

Actually, this option is already four years old in GNU tar. Funny I'd never heard of it until recently.

Sunday, April 29, 2012

Setting the time zone on remote SSH hosts

The tasks: I have one or more desktop/laptop machines with varying local time zones (because the persons using them are actually in different time zones, or because the one person using them travels). I also have a number of servers configured in some random time zones. (It could be the time zone where they are physically located, or the time zone of the person who installed it, or UTC for neutrality.)

Now what I would like to have happen is that if I log in using SSH from a desktop to a server, I see time on that server in my local time zone. For things like ls -l, for example. Obviously, this illusion will never be perfect. Nothing (except something very complicated) will adjust the time stamps in the syslog output, for example. But the ls -l case in particular seems to come up a lot, to check how long ago was this file modified.

This should be completely doable in principle, because you can set the TZ environment variable to any time zone you like, and it will be used for things like ls -l. But how do you get the TZ setting from here to there?

First, you have to make the remote SSH server accept the TZ environment variable. At least on Debian, this is not done by default. So make a setting like this in /etc/ssh/sshd_config:

# Allow client to pass locale environment variables
AcceptEnv LANG LC_* TZ

You also need to make the equivalent setting on the client side, either in /etc/ssh/ssh_config or in ~/.ssh/config:

SendEnv LANG LC_* TZ

Which leaves the question, how do you get your local time zone into the TZ variable to pass to the remote server? The actual time zone configuration is the file /etc/localtime, which belongs to glibc. In current Debian, this is (normally) a copy of some file under /usr/share/zoneinfo/. In the distant past, it was a symlink, which would have made things easier, but now it's a copy, so you don't know where it came from. But the name of the time zone is also written to /etc/timezone, so you can use that.

The format of the TZ environment variable can be found in the glibc documentation. If you skip past most of the text, you will see the relevant parts:

The third format looks like this:

:CHARACTERS

Each operating system interprets this format differently; in the GNU C library, CHARACTERS is the name of a file which describes the time zone.

So what you could do is set

TZ=":$(cat /etc/timezone)"

Better yet, for hopping through multiple SSH hosts in particular, make sure to preserve an already set TZ:

TZ=${TZ:-":$(cat /etc/timezone)"}

And finally, how does one hook this into ssh? The best I could think of is a shell alias:

alias ssh='TZ=${TZ:-":$(cat /etc/timezone)"} ssh'

Now this set up has a number of flaws, including:

  • Possibly only works between Linux (Debian?) hosts.

  • Only works if available time zone names match.

  • Only works when calling ssh from the shell.

But it practice it has turned out to be quite useful.

Comments? Improvements? Better ideas?

Related thoughts:

  • With this system in hand, I have resorted to setting the time zone on most servers to UTC, since I will see my local time zone automatically.

  • Important for the complete server localization illusion: some ideas on dealing with locales on remote hosts.

Wednesday, March 21, 2012

PostgreSQL and compiler warnings

Recently, I did some work on backpatching a few commits from PostgreSQL master, and I noticed that with the current tools, the old branches create tons of compiler warnings. In PostgreSQL 8.3, the oldest currently supported branch, a make all with GCC 4.6.3 produces 231 warnings! (Also note that there are only 751 .c files, so that's a warning every three files.) We do a lot of work cleaning up any and all compiler warnings, at least those issued by the latest GCC. These kinds of noisy builds are quite troublesome to work with, because it is more difficult to check whether your changes introduced any new, more serious warnings.

Let's take a look at the current number of compiler warnings in different PostgreSQL branches with different compilers:

gcc 4.4gcc 4.5gcc 4.6gcc 4.7clang
8.317351231207665
8.41217201201673
9.013138989780
9.12424404025
master11111

Obviously, GCC 4.6 introduced many new warnings. If you use the compiler that was current around the time the branch was originally released, you'll be better off. But even then, you should expect a few surprises. (8.3 would probably require gcc 4.3, but I don't have that available anymore.)

Fortunately, it looks as though GCC 4.7, which is currently in release candidate state, will spare us of new warnings. Also note that clang (version 3.0) is now as good as GCC, as far as noise is concerned.

Tuesday, March 6, 2012

PostgreSQL make install times

I have decided that make install is too slow for me. Compare: A run of make install takes about 10 seconds (details below), but a run of make all with the tree mostly up to date and using ccache for the rest usually takes about 1 or 2 seconds. You can end up wasting a lot of time if you need to do many of these build and install cycles during development. In particular, make check includes a run of make install, so all this time is added to the time it takes for tests to complete.

So let's optimize this. The times below are all medians from 5 consecutive runs, writing over an existing installation, so they all had to do the same amount of work.

This is the baseline:

  • make install — 10.493 s

The first change is to use a faster shell. This system is using bash as /bin/sh. Many Linux distributions now use dash instead, but for some reason I haven't changed this system during the upgrade.

  • make install SHELL=/bin/dash — 6.344 s
I guess I'll be switching this system soon as well then!

The next thing is to avoid installing the translation files. This exploded the number of files that need to be installed. Instead of, say, one program file, you end up installing one program file and a dozen or so translation files.

  • make install SHELL=/bin/bash enable_nls=no — 6.890 s
  • make install SHELL=/bin/dash enable_nls=no — 4.482 s
(In practice you would use configure --disable-nls, which is the default. The above is just a way to do this without reconfiguring.) Now I have in the past preferred to build with NLS support to be able to catch errors in that area, but considering this improvement and the availability of the make maintainer-check target, I might end up building without it more often.

Another tip I remembered from the past was to use the make -s option to avoid screen output. Depending on the operating system and whether you are logged in locally or remotely, this can be a big win. On my system, this got lost in the noise a bit, but it appeared to make a small difference over many runs.

  • make install SHELL=/bin/bash -s — 10.511 s
  • make install SHELL=/bin/dash -s — 6.146 s
Do add this to your arsenal anyway if you want to get maximum performance.

Next, let's replace the install-sh script that does the actual file copying. For obscure reasons, PostgreSQL always uses that shell script, instead of the /usr/bin/install program that an Autoconf-based build system would normally use. But you can override the make variables and sustitute the program you want:

  • make install SHELL=/bin/bash INSTALL=install — 5.418 s
  • make install SHELL=/bin/dash INSTALL=install — 3.995 s
Interestingly, the choice of shell still makes a noticeable difference, even though it's no longer used to execute install-sh.

Finally, you can also use parallel make for the installation step:

  • make install SHELL=/bin/bash -j2 — 6.538 s
  • make install SHELL=/bin/dash -j2 — 4.158 s
You can gather from these numbers that the installation process appears to be mostly CPU-bound. This system has 4 cores, so let's add some more parallelization:
  • make install SHELL=/bin/dash -j3 — 3.330 s
  • make install SHELL=/bin/dash -j4 — 2.944 s
  • make install SHELL=/bin/dash -j5 — 2.930 s
  • make install SHELL=/bin/dash -j6 — 2.952 s
That's probably enough.

Now let's put everything together:

  • make install SHELL=/bin/dash enable_nls=no INSTALL=install -s -j4 — 1.708 s
Or even:
  • make install SHELL=/bin/dash enable_nls=no INSTALL=install -s -j3 — 1.654 s
That's a very nice improvement from 10.493 s!

The problem is, it is not all that easy to pass these options to the make install calls made in make check runs. If you can and want to change your system shell, and you configure without NLS support, then you will probably already be more than half way there. Then again, I suspect most readers already have that setup anyway. For the other options, to take down the installation time to almost instantaneous, you have to do ad hoc surgery in various places. I'm looking into improving that.