Thursday, September 15, 2011

ccache and clang, part 2

There's more funny business when using ccache in combination with clang. Last time I suggested that you use the invocation
./configure CC='ccache clang -Qunused-arguments -fcolor-diagnostics'
to get rid of the "argument unused during compilation" warnings.

But you still might get loads of warnings that you wouldn't normally get without ccache, such as this example (from the PostgreSQL source code):

extension.c:382:35: warning: equality comparison with extraneous parentheses [-Wparentheses]
 if (( (((control->directory)[0]) == '/') ))
        ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
extension.c:382:35: note: use '=' to turn this equality comparison into an assignment
(This is the opposite of the warning that tells you to put two pairs of parentheses around an assignment used as a truth value.) Or:
path.c:727:7: warning: explicitly assigning a variable of type 'char *' to itself [-Wself-assign]
 path = (path);
 ~~~~ ^  ~~~~
The problem is, these come from macro expansions, so wouldn't normally see them, because (I guess) the compiler driver is smart enough not to warn about such things when they come from macro expansions.

The way ccache works is approximately

  1. preprocess the input file
  2. look for it in the cache
  3. if not found, compile the preprocessed file
What would be better in this situation is
  1. preprocess the input file
  2. look for it in the cache
  3. if not found, compile the original file
And indeed you can turn on that second behavior by setting the obscure environment variable CCACHE_CPP2 (as in, run cpp twice):
export CCACHE_CPP2=yes
Then all these extra warnings disappear.

(The ccache man page is worth a read. There are a few interesting settings to play with.)

I'm currently playing around with a shell script ccache-clang that looks like this:

CCACHE_CPP2=yes exec ccache clang -Qunused-arguments `test -t 2 && echo -fcolor-diagnostics` "$@"

Saturday, September 10, 2011

Beta work

PostgreSQL 9.1.0 was wrapped the yesterday (or the day before yesterday, depending on from where you see it). It is now in a special place, where it can rest and let the flavor soak in, before it's released into the wild next week.

Time to reflect.

One thing that is sometimes criticized about the PostgreSQL release process is that there is too much time between the start of beta (or the end of the last commit fest) and the final release, and that it's not clear what ordinary contributors can do during that time to speed things up. This is now worked around to some degree by starting the next development cycle during the beta period of the previous one. (Evaluating the success of that approach is another matter.) But because this matter had also bothered me, I have taken notes this time, first of all so I don't forget anything next time, but perhaps also to provide some inspiration for others.

So here is what I'd been up to during the 9.1 beta period:

Documentation work
It is well understood that the documentation needs a general review during beta time. As features are patched in, chunks of documentation are added here and there, and at the end of it all you just need to stare at it for a bit and see whether the order and flow still make sense. I didn't do much of that this time, but I know others did. Other things to do are:
  • Various bits of SGML markup usually need to be cleaned up, e.g., alignment of verbatim environments, linking to formal objects.
  • I always run the whole documentation through a spell checker, which always results in a bunch of corrections.
  • The SQL features list in the documentation needs to be updated.
  • The SQL key words list in the documentation needs to be updated.
  • All generated man pages should be looked through manually for formatting artifacts.
  • Non-ASCII characters in the SGML source need to be checked. They should be converted to entities, and ideally they should fit into the Latin 1 set.
Other people also spent significant time on preparing the release notes, which is a huge job. I didn't even get into that, except that the spell checking and automated cleanups apply to the release notes as well, since they are part of the documentation build.
Source code cleanup
We run pgindent during beta, but there are also a number of other things to do to clean up the source code. A lot of these I didn't plan. Mostly, one thing led to another.
  • Check for non-ASCII characters in source. These should either be in UTF-8 or replaced by an ASCII equivalent.
  • Spell checking: You can't run the whole source through a spell checker, but there are also a few typical misspellings or variant spellings that are easy to clean up over the entire source tree, such as GIST -> GiST or cancelled -> canceled.
  • Check the --help and --version output of all programs to see whether it follows conventions.
  • In particular, check that the --help output is aligned, not too wide, and that the options are ordered sensibly. (Very often a new option is just added at the end or at some random place in the middle.)
  • Diff the --help outputs between the previous and the current version to find out about all new command-line options and check whether they are sensibly named and properly documented.
  • And the order in which options are presented in the --help output and on the man pages should not be gratuitously different.
  • Similarly, the backslash command help output in psql (\?) should be checked that it looks ok.
  • Go through the configuration file samples and check them for sanity, up-to-dateness, and proper formatting.
  • Remove literal tabs from string constants. How did they ever get in there?
  • Compile the code with a bunch of extra warning options. GCC has a lot of those, try them all out. A lot of them generate a lot of noise, but going through them about once a release does find the occasional trouble spot, or at least provides inspiration for future code cleanup projects. As of 9.1, you can also throw Clang into the mix.
  • Also, check building against cutting edge versions of library dependencies. Good candidates are Perl and Python, which introduce small breakage with each new release. If there is a new release candidate of those guys out while we prepare the PostgreSQL release, our users will be building against the final versions soon, so we need to be prepared.
Translation work
Besides the translation work itself and orchestrating the translator contributors, this is also the best chance to read through every single user-visible message that was introduced in the new version. So there are occasional spelling fixes, some messages are poorly worded or incomprehensible to a normal user, or sometimes the code can be restructured so that some error condition doesn't happen at all. A lot of this feeds back into source code cleanup items.
Help the packagers
We have a great bunch of packagers who prepare PostgreSQL for installation on various operating systems. Help them out. Make sure all new features are properly integrated, package descriptions are up-to-date, upgrades work.
And then of course you field submissions from testers and other contributors who will submit countless bug reports, failures on unusual platforms, documentation fixes.

So that can keep you busy for a month or three. The goal ought to be to automated most of this. But much of it can't be. So we really do need to spend the time to do these things, especially polishing the user-facing things. We don't want the first program output or documentation page the user sees of the new release to be ugly or wrong.