Wednesday, September 25, 2013

Design by committee

Design by committee is usually a term of abuse, but sometimes it's perhaps not the worst alternative. At the opposite end of the spectrum, there is design by disconnected individuals. That is how you get
ALTER TABLE tbl OWNER TO something
but
ALTER TABLE tbl SET SCHEMA something
in PostgreSQL.

Maybe a committee faced with this inconsistency would arrive at the compromise

ALTER TABLE tbl [SET] {OWNER|SCHEMA} [TO] something
?

Wednesday, August 28, 2013

Testing PostgreSQL extensions on Travis CI revisited

My previous attempt to setup up multiple-PostgreSQL-version testing on Travis CI worked OK, but didn't actually make good use of the features of Travis CI. So I stole, er, adapted an idea from clkao/plv8js, which uses an environment variable matrix to control which version to use. This makes things much easier to manage and actually fires off parallel builds, so it's also faster. I've added this to all my repositories for PostgreSQL extensions now. (See some examples: pglibuuid, plxslt, pgvihash, pgpcre, plsh)

Automating patch review

I think there are two kinds of software development organizations (commercial or open source):

  1. Those who don’t do code review.

  2. Those who are struggling to keep up with code review.

PostgreSQL is firmly in the second category. We never finish commit fests on time, and lack of reviewer resources is frequently mentioned as one of the main reasons.

One way to address this problem is to recruit more reviewer resources. That has been tried; it’s difficult. The other way is to reduce the required reviewer resources. We can do this by automating things a little bit.

So I came up with a bag of tools that does the following:

  1. Extract the patches from the commit fest into Git.

  2. Run those patches through an automated test suite.

The first part is done by my script commitfest_branches. It extracts the email message ID for the latest patch version of each commit fest submission (either from the database or the RSS feed). From the message ID, it downloads the raw email message and extracts the actual patch file. Then that patch is applied to the Git repository in a separate branch. This might fail, in which case I report that back. At the end, I have a Git repository with one branch per commit fest patch submission. A copy of that Git repository is made available here: https://github.com/petere/postgresql-commitfest.

The second part is done by my Jenkins instance, which I have written about before. It runs the same job as it runs with the normal Git master branch, but over all the branches created for the commit fest. At the end, you get a build report for each commit fest submission. See the results here: http://pgci.eisentraut.org/jenkins/view/PostgreSQL/job/postgresql_commitfest_world/. You’ll see that a number of patches had issues. Most were compiler warnings, a few had documentation build issues, a couple had genuine build failures. Several (older) patches failed to apply. Those don’t show up in Jenkins at all.

This is not tied to Jenkins, however. You can run any other build automation against that Git repository, too, of course.

There is still some manual steps required. In particular, commitfest_branches needs to be run and the build reports need to be reported back manually. Fiddling with all those branches is error-prone. But overall, this is much less work than manually downloading and building all the patch submissions.

My goal is that by the time a reviewer gets to a patch, it is ensured that the patch applies, builds, and passes the tests. Then the reviewer can concentrate on validating the purpose of the patch and checking the architectural decisions.

What needs to happen next:

  • I’d like an easier way to post feedback. Given a message ID for the original patch submission, I need to fire off a reply email that properly attaches to the original thread. I don’t have an easy way to do that.

  • Those reply emails would then need to be registered in the commit fest application. Too much work.

  • There is another component to this work flow that I have not finalized: checking regularly whether the patches still apply against the master branch.

  • More automated tests need to be added. This is well understood and a much bigger problem.

In the meantime, I hope this is going to be useful. Let me know if you have suggestions, or send me pull requests on GitHub.

Wednesday, July 17, 2013

Testing PostgreSQL extensions on Travis CI

I have cobbled together some scripts to be able to test PostgreSQL extensions against multiple PostgreSQL major versions on Travis CI. (This requires that the extension is hosted on GitHub.) See the configuration for PL/sh and the build output as examples. Perhaps others will find this useful for their extensions as well.

Tuesday, July 16, 2013

Tricky shell local variables

I have a word of warning against improper use of local in shell functions.

If you are using shell functions, you might want to declare some variables local to the shell function. That is good. The basic syntax for that is

local a b c

In some shells, you can also combine the local declaration and assignment, like this:

local foo=$1
local bar=$2

(The Debian policy even explicitly allows it.)

This is somewhat dangerous.

Bare shell assignment like

foo=$bar

does not perform word splitting, so the above is safe even if there are spaces in $bar. But the local command does perform word splitting (because it can take multiple arguments, as in the first example), so the seemingly similar

local foo=$bar

is not safe.

This can be really confusing when you add local to existing code and it starts breaking.

You can avoid this, of course, by always quoting everything to like

local foo="$bar"

but overquoting isn't always desirable, because it can make code less readable when commands are nested, like

local foo="$(otherfunc "other arg")"

(Nesting is legal and works fine in this case, however.)

I suggest using local only for declaring variables, and using separate assignment statements. That way, all assignments are parsed in the same way.

Tuesday, June 18, 2013

Autopex: PostgreSQL extension installation magic

Autopex is the brainchild of a long night at the Royal Oak. It ties together Pex and event triggers to magically download and build any extension that you install. So after you have set everything up you can do, say, CREATE EXTENSION plproxy, and it will transparently download and build plproxy for you. (Actually, this only works if the extension name is the same as the package name. I'm planning to fix that.)

Note 1: You can't install Autopex via Pex, yet.

Note 2: I guess the next logical step would be Autoautopex, which installs Autopex and Pex automatically somehow. Patches welcome.

I suppose with logical replication, this might actually end up installing the extension code on the replication slaves as well. That would be pretty neat.

Wednesday, May 1, 2013

Moving to C++

GCC 4.8 was recently released. This is the first GCC release that is written in C++ instead of C. Which got me thinking ...

Would this make sense for PostgreSQL?

I think it's worth a closer look.

Much of GCC's job isn't actually that much different from PostgreSQL. It parses language input, optimizes it, and produces some output. It doesn't have a storage layer, it just produces code that someone else runs. Also note that Clang and LLVM are written in C++. I think it would be fair to say that these folks are pretty well informed about selecting a programming language for their job.

It has become apparent to me that C is approaching a dead end. Microsoft isn't updating their compiler to C99, advising people to move to C++ instead. So as long as PostgreSQL (or any other project, for that matter) wants to support that compiler, they will be stuck on C89 forever. That's a long time. We have been carefully introducing the odd post-C89 feature, guarded by configure checks and #ifdefs, but that will either come to an end, or the range of compilers that actually get the full benefit of the code will become narrower and narrower.

C++ on the other hand is still a vibrant language. New standards come out and get adopted by compiler writers. You know how some people require Java 7 or Python 2.7 or Ruby 1.9 for their code? You wish you could have that sort of problem for your C code! With C++ you reasonably might.

I'm also sensing that at this point there are more C++ programmers than C programmers in the world. So using C++ might help grow the project better. (Under the same theory that supporting Windows natively would attract hordes of Windows programmers to the project, which probably did not happen.)

Moving to C++ wouldn't mean that you'd have to rewrite all your code as classes or that you'd have to enter template hell. You could initially consider a C++ compiler a pickier C compiler, and introduce new language features one by one, as you had done before.

Most things that C++ is picky about are things that a C programmer might appreciate anyway. For example, it refuses implicit conversions between void pointers and other pointers, or intermixing different enums. Actually, if you review various design discussions about the behavior of SQL-level types, functions, and type casts in PostgreSQL, PostgreSQL users and developers generally lean on the side of a strict type system. C++ appears to be much more in line with that thinking.

There are also a number of obvious areas where having the richer language and the richer standard library of C++ would simplify coding, reduce repetition, and avoid bugs: memory and string handling; container types such as lists and hash tables; fewer macros necessary; the node management in the backend screams class hierarchy; things like xlog numbers could be types with operators; careful use of function overloading could simplify some complicated internal APIs. There are more. Everyone probably has their own pet peeve here.

I was looking for evidence of this C++ conversion in the GCC source code, and it's not straightforward to find. As a random example, consider gimple.c. It looks like a normal C source file at first glance. It is named .c after all. But it actually uses C++ features (exercise for the reader to find them), and the build process compiles it using a C++ compiler.

LWN has an article about how GCC moved to C++.

Thoughts?

Tuesday, April 2, 2013

Installing multiple PostgreSQL versions on Homebrew

I was going to post this yesterday, but some might have thought that it was a joke. April 1st is always an annoying day to communicate real information.

If you have been fond of the way Debian and Ubuntu manage multiple PostgreSQL versions in parallel, you can now have the same on OS X with Homebrew:

brew tap petere/postgresql
brew install postgresql-9.2
# etc.
brew install --HEAD postgresql-common

postgresql-common is the same code as in Debian, only mangled a little.

Now you have all the client programs symlinked through pg_wrapper, and you can use the server management tools such as:

pg_createcluster 9.2 main
pg_ctlcluster 9.2 main start
pg_lsclusters

Let me know if you find this useful.

Links:

Sunday, February 17, 2013

Lists and Tuples

I had initially found the divide between lists and tuples in Python confusing. I came from a database background, so I have a certain expectation of what a tuple might be. If you read up on what the difference is in Python, you will find that a) tuples are immutable, and b) singleton tuples use a funny syntax. So just use lists, because it's easier to read and you can't go wrong that way. Oh, and they are both sequences, another overloaded term.

(Yes, there are some details omitted here, such as that since a tuple is immutable, it is hashable and can be used as a dictionary key. But I think that is used fairly seldomly.)

Then I came across Haskell and it dawned on me: Was this just a poorly mangled feature from Haskell? I don't know the history, but it looks a bit like it. You see, Haskell also has list and tuples. Lists are delimited by square brackets, and tuples are delimited by parentheses:

let alist = [1, 2, 3]
let atuple = (1, 2, 3)
(Technically, in Python, tuples are not delimited by parentheses, but they often appear that way.) But the difference is that Haskell does not use parentheses for any other purpose, such as delimiting function arguments. It uses spaces for that. (So it looks more like a shell script at times.)
Python: len([1, 2, 3])
Haskell: length [1, 2, 3]
But in Haskell, tuples are not mutable lists and lists are not mutable tuples. Tuples and lists are quite different but complementary things. A list can only contain elements of the same type. So you can have lists
[1, 2, 3, 4, 5]
["a", "b", "c", "d"]
but not
[1, 2, "a"]
A tuple, on the other hand, can contain values of different types
(1, 2, "a")
(3, 4, "b")
A particular type combination in a tuple creates a new type on the fly, which becomes interesting when you embed tuples in a list. So you can have a list
[(1, 2, "a"), (3, 4, "b")]
but not
[(1, 2, "a"), (3, 4, 5)]
Because Haskell is statically typed, it can verify this at compile time.

If you think in terms of relational databases, the term tuple in particular makes a lot of sense in this way. A result set from a database query would be a list of tuples.

The arrival of the namedtuple also supports the notion that tuples should be thought of as combining several pieces of data of different natures, but of course this is not enforced in either tuples or named tuples.

Now, none of this is relevant to Python. Because of duck typing, a database query result set might as well be a list of lists or a tuple of tuples or something different altogether that emulates sequences. But I found it useful to understand where this syntax and terminology might have come from.

Looking at the newer classes set and frozenset, it might also help to sometimes think of a tuple as a frozenlist instead, because this is closer to the role it plays in Python.

Thursday, February 14, 2013

pgindent Jenkins job

I have set up a Jenkins job that runs pgindent. Besides checking that the procedure of running pgindent works, it also provides a preview of what pgindent would do with the current source (pgindent.diff), which can be educational or terribly confusing.

Friday, February 1, 2013

Introducing the Pex package manager for PostgreSQL

I have written a new light-weight package manager for PostgreSQL, called pex. It's targeted at developers, allows easy customization, and supports multiple PostgreSQL installations.

Here is how it works:

Installation:

git clone git://github.com/petere/pex.git
cd pex
sudo make install

Install some packages:

pex init
pex install plproxy
pex search hash
pex install pghashlib

Multiple PostgreSQL installations:

pex -g /usr/local/pgsql2 install plproxy
pex -p 5433 install pghashlib

Upgrade:

pex update
pex upgrade

It works a bit like Homebrew, except that it doesn't use Ruby or a lot of metaphors. ;-)

Check it out at https://github.com/petere/pex.

Tuesday, January 1, 2013

PostgreSQL and Jenkins

A lot of places use Jenkins nowadays, including where I now work and have previously worked. I enjoy working with Jenkins, and so I always wanted try out how this would work with PostgreSQL. Obviously, there would be some overlap with the build farm, but that's OK. The point of the build farm, after all, is to build things in many different ways to find potential problems, and this would just support that overall effort.

So I have set this up now: http://pgci.eisentraut.org/jenkins/

It's already been very helpful during the last couple of weeks that I've run this. The main point behind the effort is to automate things. These are things I do just about every day and won't have to anymore:

  • build PostgreSQL
  • check for compiler warnings
  • run various test suites
  • do this for all supported branches
These are things I do every couple of weeks and have now automated:
  • check distribution building (make distcheck)
  • test build of additional documentation formats
  • cpluspluscheck
  • check external web links in the documentation (The job for that currently appears to be reporting false positives. Use with caution.)
  • test coverage reporting
Moreover, I have set up to build some extensions and external modules, which weren't regularly tested. (The build farm is making some efforts in this area, though.)

Actually, many of the checks I had set up immediately found problems: newly introduced compiler warnings, secondary documentation format builds broken, cpluspluscheck failing, broken links in the HTML documentation, various extensions no longer build with Postg reSQL 9.3devel.

But there is more cool stuff:

  • There are various RSS feeds for all builds or failed buids.
  • You can interact with the system on mobile devices. I use JenkinsMobi for iOS.
  • You can get up to date documentation builds on a more predictable schedule.

The one thing (just about) it doesn't do is test operating system and CPU architecture portability. Jenkins comes from a Java background, where this isn't much of an issue, and so there isn't good built-in support for that sort of thing. But anyway, we have the build farm for that.

You can get the code at http://bitbucket.org/petere/pgci. The entire setup is automated with Puppet. You can fork it and set up your own (or send me your changes), or you can run it locally using Vagrant (which is what I do to test changes).

If you have any ideas, let me know (file an issue on Bitbucket). I have plans for a number of enhancements already, foremost pg_upgrade testing. Also, let me know if there are additional extensions you want tested. I have just put in a few I use myself at the moment, but other can easily be added.

Happy New Year!