tag:blogger.com,1999:blog-55412960003999743692024-03-21T03:25:30.768-04:00Peter Eisentraut's Blogon software development, open source, databases, and geek stuffAnonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.comBlogger102125tag:blogger.com,1999:blog-5541296000399974369.post-23893429655000688562014-04-01T08:36:00.000-04:002014-04-01T08:36:53.060-04:00PostgreSQL trash canThe <a href="https://github.com/petere/pgtrashcan">PostgreSQL trash can</a> is a PostgreSQL plugin that implements a trash can/wastebasket/rubbish bin/recycling container. You drop a table and it's not really gone but only moved to the trash. This allows desktop-minded users to drop tables willy-nilly while giving them the warm and fuzzy feeling of knowing that their data is still there (while giving administrators the cold and, uh, unfuzzy feeling of knowing that disk space will never really be freed again). Now they only need to think of "vacuum" as "disk defragmentation", and they'll feel right at home.Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com9tag:blogger.com,1999:blog-5541296000399974369.post-53013400787442875112013-09-25T23:03:00.000-04:002013-09-25T23:03:18.708-04:00Design by committeeDesign by committee is usually a term of abuse, but sometimes it's perhaps not the worst alternative. At the opposite end of the spectrum, there is design by disconnected individuals. That is how you get
<pre>ALTER TABLE tbl OWNER TO something</pre>
but
<pre>ALTER TABLE tbl SET SCHEMA something</pre>
in PostgreSQL.
<p>
Maybe a committee faced with this inconsistency would arrive at the compromise
<pre>ALTER TABLE tbl [SET] {OWNER|SCHEMA} [TO] something</pre>
?Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com3tag:blogger.com,1999:blog-5541296000399974369.post-38859291332587474222013-08-28T22:27:00.000-04:002013-08-28T22:27:54.340-04:00Testing PostgreSQL extensions on Travis CI revisitedMy <a href="http://petereisentraut.blogspot.com/2013/07/testing-postgresql-extensions-on-travis.html">previous attempt</a> to setup up multiple-PostgreSQL-version testing on <a href="https://travis-ci.org/">Travis CI</a> worked OK, but didn't actually make good use of the features of Travis CI. So I stole, er, adapted an idea from <a href="https://github.com/clkao/plv8js"><code>clkao/plv8js</code></a>, which uses an environment variable matrix to control which version to use. This makes things much easier to manage and actually fires off parallel builds, so it's also faster. I've added this to all my repositories for PostgreSQL extensions now. (See some examples: <a href="https://github.com/petere/pglibuuid/blob/master/.travis.yml">pglibuuid</a>, <a href="https://github.com/petere/plxslt/blob/master/.travis.yml">plxslt</a>, <a href="https://github.com/petere/pgvihash/blob/master/.travis.yml">pgvihash</a>, <a href="https://github.com/petere/pgpcre/blob/master/.travis.yml">pgpcre</a>, <a href="https://github.com/petere/plsh/blob/master/.travis.yml">plsh</a>)Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com0tag:blogger.com,1999:blog-5541296000399974369.post-43870869323754015432013-08-28T12:50:00.000-04:002013-08-28T12:50:29.123-04:00Automating patch review<p>I think there are two kinds of software development organizations
(commercial or open source):</p>
<ol>
<li><p>Those who don’t do code review.</p></li>
<li><p>Those who are struggling to keep up with code review.</p></li>
</ol>
<p>PostgreSQL is firmly in the second category. We never finish commit
fests on time, and lack of reviewer resources is frequently mentioned
as one of the main reasons.</p>
<p>One way to address this problem is to recruit more reviewer resources.
That has been tried; it’s difficult. The other way is to reduce the
required reviewer resources. We can do this by automating things a
little bit.</p>
<p>So I came up with a bag of tools that does the following:</p>
<ol>
<li><p>Extract the patches from the commit fest into Git.</p></li>
<li><p>Run those patches through an automated test suite.</p></li>
</ol>
<p>The first part is done by my script <a href="https://github.com/petere/commitfest-tools/blob/master/commitfest_branches"><code>commitfest_branches</code></a>. It extracts the email message ID for the latest
patch version of each commit fest submission (either from the database or the RSS feed). From the message ID, it downloads the raw email message and
extracts the actual patch file. Then that patch is applied to the Git
repository in a separate branch. This might fail, in which case I
report that back. At the end, I have a Git repository with one branch
per commit fest patch submission. A copy of that Git repository is
made available here: <a href="https://github.com/petere/postgresql-commitfest">https://github.com/petere/postgresql-commitfest</a>.</p>
<p>The second part is done by my <a href="http://pgci.eisentraut.org/jenkins/">Jenkins instance</a>, which I have <a href="http://petereisentraut.blogspot.com/2013/01/postgresql-and-jenkins.html">written
about before</a>. It runs the same job as it runs with the normal Git
master branch, but over all the branches created for the commit fest.
At the end, you get a build report for each commit fest submission.
See the results here:
<a href="http://pgci.eisentraut.org/jenkins/view/PostgreSQL/job/postgresql_commitfest_world/">http://pgci.eisentraut.org/jenkins/view/PostgreSQL/job/postgresql_commitfest_world/</a>.
You’ll see that a number of patches had issues. Most were compiler
warnings, a few had documentation build issues, a couple had genuine
build failures. Several (older) patches failed to apply.
Those don’t show up in Jenkins at all.</p>
<p>This is not tied to Jenkins, however. You can run any other build
automation against that Git repository, too, of course.</p>
<p>There is still some manual steps required. In particular,
<code>commitfest_branches</code> needs to be run and the build reports need to be
reported back manually. Fiddling with all those branches is
error-prone. But overall, this is much less work than manually
downloading and building all the patch submissions.</p>
<p>My goal is that by the time a reviewer gets to a patch, it is ensured
that the patch applies, builds, and passes the tests. Then the
reviewer can concentrate on validating the purpose of the patch and
checking the architectural decisions.</p>
<p>What needs to happen next:</p>
<ul>
<li><p>I’d like an easier way to post feedback. Given a message ID for the
original patch submission, I need to fire off a reply email that
properly attaches to the original thread. I don’t have an easy way to do
that.</p></li>
<li><p>Those reply emails would then need to be registered in the commit
fest application. Too much work.</p></li>
<li><p>There is another component to this work flow that I have not
finalized: checking regularly whether the patches still apply
against the master branch.</p></li>
<li><p>More automated tests need to be added. This is well understood and
a much bigger problem.</p></li>
</ul>
<p>In the meantime, I hope this is going to be useful. Let me know if
you have suggestions, or send me pull requests on GitHub.</p>
Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com4tag:blogger.com,1999:blog-5541296000399974369.post-50287439446705124702013-07-17T17:05:00.000-04:002013-07-17T17:05:18.052-04:00Testing PostgreSQL extensions on Travis CII have cobbled together some scripts to be able to test PostgreSQL extensions against multiple PostgreSQL major versions on <a href="https://travis-ci.org/">Travis CI</a>. (This requires that the extension is hosted on <a href="https://github.com/">GitHub</a>.) See the <a href="https://github.com/petere/plsh/blob/master/.travis.yml">configuration for PL/sh</a> and the <a href="https://travis-ci.org/petere/plsh/builds/9203634">build output</a> as examples. Perhaps others will find this useful for their extensions as well.Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com3tag:blogger.com,1999:blog-5541296000399974369.post-41168903654668980172013-06-18T21:45:00.000-04:002013-06-18T21:45:21.208-04:00Autopex: PostgreSQL extension installation magic<a href="https://github.com/petere/autopex">Autopex</a> is the brainchild of a long night at the Royal Oak. It ties together <a href="https://github.com/petere/pex">Pex</a> and event triggers to magically download and build any extension that you install. So after you have set everything up you can do, say, <code>CREATE EXTENSION plproxy</code>, and it will transparently download and build plproxy for you. (Actually, this only works if the extension name is the same as the package name. I'm planning to fix that.)
<p>
Note 1: You can't install Autopex via Pex, yet.
<p>
Note 2: I guess the next logical step would be Autoautopex, which installs Autopex and Pex automatically somehow. Patches welcome.
<p>
I suppose with logical replication, this might actually end up installing the extension code on the replication slaves as well. That would be pretty neat.
Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com4tag:blogger.com,1999:blog-5541296000399974369.post-18412555711626693392013-05-01T23:13:00.000-04:002013-05-01T23:13:03.457-04:00Moving to C++<p><a href="http://gcc.gnu.org/gcc-4.8/">GCC 4.8</a> was recently released. This is
the first GCC release that is written in C++ instead of C. Which got
me thinking ...</p>
<p>Would this make sense for PostgreSQL?</p>
<p>I think it's worth a closer look.</p>
<p>Much of GCC's job isn't actually that much different from PostgreSQL.
It parses language input, optimizes it, and produces some output. It
doesn't have a storage layer, it just produces code that someone else
runs. Also note that Clang and LLVM are written in C++. I think it
would be fair to say that these folks are pretty well informed about
selecting a programming language for their job.</p>
<p>It has become apparent to me that C is approaching a dead end.
Microsoft isn't updating their compiler to C99, advising people to
move to C++ instead. So as long as PostgreSQL (or any other project,
for that matter) wants to support that compiler, they will be stuck on
C89 forever. That's a long time. We have been carefully introducing
the odd post-C89 feature, guarded by configure checks and #ifdefs,
but that will either come to an end, or the range of compilers that
actually get the full benefit of the code will become narrower and
narrower.</p>
<p>C++ on the other hand is still a vibrant language. New standards come
out and get adopted by compiler writers. You know how some
people require Java 7 or Python 2.7 or Ruby 1.9 for their code? You wish you
could have that sort of problem for your C code! With C++ you
reasonably might.</p>
<p>I'm also sensing that at this point there are more C++ programmers
than C programmers in the world. So using C++ might help grow the
project better. (Under the same theory that supporting Windows
natively would attract hordes of Windows programmers to the project,
which probably did not happen.)</p>
<p>Moving to C++ wouldn't mean that you'd have to rewrite all your code
as classes or that you'd have to enter template hell. You could
initially consider a C++ compiler a pickier C compiler, and introduce
new language features one by one, as you had done before.</p>
<p>Most things that C++ is picky about are things that a C programmer
might appreciate anyway. For example, it refuses implicit conversions
between void pointers and other pointers, or intermixing different
enums. Actually, if you review various design discussions about the
behavior of SQL-level types, functions, and type casts in PostgreSQL,
PostgreSQL users and developers generally lean on the side of a strict
type system. C++ appears to be much more in line with that thinking.</p>
<p>There are also a number of obvious areas where having the richer
language and the richer standard library of C++ would simplify coding,
reduce repetition, and avoid bugs: memory and string handling;
container types such as lists and hash tables; fewer macros necessary;
the node management in the backend screams class hierarchy; things
like xlog numbers could be types with operators; careful use of
function overloading could simplify some complicated internal APIs.
There are more. Everyone probably has their own pet peeve here.</p>
<p>I was looking for evidence of this C++ conversion in the GCC source
code, and it's not straightforward to find. As a random example,
consider
<a href="http://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/gimple.c;h=64f7b1a19f2ada391b12510c9724c5c292f52090;hb=master"><code>gimple.c</code></a>.
It looks like a normal C source file at first glance. It is named
<code>.c</code> after all. But it actually uses C++ features (exercise for the
reader to find them), and the build process compiles it using a C++
compiler.</p>
<p>LWN has an <a href="https://lwn.net/Articles/542457/">article</a> about how GCC moved to C++.</p>
<p>Thoughts?</p>Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com42tag:blogger.com,1999:blog-5541296000399974369.post-88939919545373923802013-04-02T21:59:00.000-04:002013-04-02T21:59:27.152-04:00Installing multiple PostgreSQL versions on Homebrew<p>I was going to post this yesterday, but some might have thought that it was a
joke. April 1st is always an annoying day to communicate real information.</p>
<p>If you have been fond of the way Debian and Ubuntu manage multiple
PostgreSQL versions in parallel, you can now have the same on OS X
with Homebrew:</p>
<pre><code>brew tap petere/postgresql
brew install postgresql-9.2
# etc.
brew install --HEAD postgresql-common
</code></pre>
<p><code>postgresql-common</code> is the same code as in Debian, only mangled a little.</p>
<p>Now you have all the client programs symlinked through <code>pg_wrapper</code>, and
you can use the server management tools such as:</p>
<pre><code>pg_createcluster 9.2 main
pg_ctlcluster 9.2 main start
pg_lsclusters
</code></pre>
<p>Let me know if you find this useful.</p>
Links:
<ul>
<li><a href="https://github.com/petere/homebrew-postgresql"><code>homebrew-postgresql</code></a>
<li><a href="https://github.com/petere/postgresql-common/tree/homebrew"><code>postgresql-common</code> <code>homebrew</code> branch</a>
</ul>
Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com5tag:blogger.com,1999:blog-5541296000399974369.post-46061147629554828112013-02-14T21:56:00.000-05:002013-02-14T21:56:35.625-05:00pgindent Jenkins jobI have set up a Jenkins <a href="http://pgci.eisentraut.org/jenkins/job/postgresql_master_pgindent/">job</a> that runs pgindent. Besides checking that the procedure of running pgindent works, it also provides a <q>preview</q> of what pgindent would do with the current source (<code><a href="http://pgci.eisentraut.org/jenkins/job/postgresql_master_pgindent/lastSuccessfulBuild/artifact/pgindent.diff">pgindent.diff</a></code>), which can be educational or terribly confusing.
Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com0tag:blogger.com,1999:blog-5541296000399974369.post-9230641406363213572013-02-01T23:36:00.000-05:002013-02-01T23:36:14.843-05:00Introducing the Pex package manager for PostgreSQLI have written a new light-weight package manager for PostgreSQL, called <quote>pex</quote>. It's targeted at developers, allows easy customization, and supports multiple PostgreSQL installations.
<p>
Here is how it works:
<p>
Installation:
<pre>git clone git://github.com/petere/pex.git
cd pex
sudo make install</pre>
<p>
Install some packages:
<pre>pex init
pex install plproxy
pex search hash
pex install pghashlib</pre>
<p>
Multiple PostgreSQL installations:
<pre>pex -g /usr/local/pgsql2 install plproxy
pex -p 5433 install pghashlib</pre>
<p>
Upgrade:
<pre>pex update
pex upgrade</pre>
<p>
It works a bit like Homebrew, except that it doesn't use Ruby or a lot of metaphors. ;-)
<p>
Check it out at <a href="https://github.com/petere/pex">https://github.com/petere/pex</a>.Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com2tag:blogger.com,1999:blog-5541296000399974369.post-70202625253275629522013-01-01T15:24:00.000-05:002013-01-01T15:24:23.753-05:00PostgreSQL and JenkinsA lot of places use <a href="http://jenkins-ci.org/">Jenkins</a> nowadays, including where I now work and have previously worked. I enjoy working with Jenkins, and so I always wanted try out how this would work with <a href="http://www.postgresql.org/">PostgreSQL</a>. Obviously, there would be some overlap with the <a href="http://buildfarm.postgresql.org/">build farm</a>, but that's OK. The point of the build farm, after all, is to build things in many different ways to find potential problems, and this would just support that overall effort.
<p>
So I have set this up now: <a href="http://pgci.eisentraut.org/jenkins/">http://pgci.eisentraut.org/jenkins/</a>
<p>
It's already been very helpful during the last couple of weeks that I've run this. The main point behind the effort is to automate things. These are things I do just about every day and won't have to anymore:
<ul>
<li>build PostgreSQL
<li>check for compiler warnings
<li>run various test suites
<li>do this for all supported branches
</ul>
These are things I do every couple of weeks and have now automated:
<ul>
<li>check distribution building (<code>make distcheck</code>)
<li>test build of additional documentation formats
<li><code>cpluspluscheck</code>
<li>check external web links in the documentation (The <a href="http://pgci.eisentraut.org/jenkins/job/postgresql_master_linklint/">job</a> for that currently appears to be reporting false positives. Use with caution.)
<li>test <a href="http://pgci.eisentraut.org/jenkins/job/postgresql_master_coverage/Coverage/">coverage</a> reporting
</ul>
Moreover, I have set up to build some extensions and external modules, which weren't regularly tested. (The build farm is making some efforts in this area, though.)
<p>
Actually, many of the checks I had set up immediately found problems: newly introduced compiler warnings, secondary documentation format builds broken, cpluspluscheck failing, broken links in the HTML documentation, various extensions no longer build with Postg reSQL 9.3devel.
<p>
But there is more cool stuff:
<ul>
<li>There are various RSS feeds for all builds or failed buids.
<li>You can interact with the system on mobile devices. I use JenkinsMobi for iOS.
<li>You can get up to date <a href="http://pgci.eisentraut.org/jenkins/job/postgresql_master_world/Documentation/">documentation</a> builds on a more predictable schedule.
</ul>
<p>
The one thing (just about) it doesn't do is test operating system and CPU architecture portability. Jenkins comes from a Java background, where this isn't much of an issue, and so there isn't good built-in support for that sort of thing. But anyway, we have the build farm for that.
<p>
You can get the code at <a href="http://bitbucket.org/petere/pgci">http://bitbucket.org/petere/pgci</a>. The entire setup is automated with Puppet. You can fork it and set up your own (or send me your changes), or you can run it locally using <a href="http://www.vagrantup.com/">Vagrant</a> (which is what I do to test changes).
<p>
If you have any ideas, let me know (file an issue on Bitbucket). I have plans for a number of enhancements already, foremost pg_upgrade testing. Also, let me know if there are additional extensions you want tested. I have just put in a few I use myself at the moment, but other can easily be added.
<p>
Happy New Year!Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com4tag:blogger.com,1999:blog-5541296000399974369.post-81448207839038070452012-10-01T20:20:00.000-04:002012-10-01T20:20:47.040-04:00psqlrc filesIn PostgreSQL 9.2, you can use major-version-specific <code>.psqlrc</code> files, such as <code>.psqlrc-9.2</code>. PostgreSQL 9.2 also added the "include relative" command <code>\ir</code> to psql. Combining these two, you can set up psql initialization to take advantage of any new features you want without breaking the use of old psql releases.
<p>
For example, I'd like to set up psql to automatically use <code>\x auto</code>. But if I just put that into <code>.psqlrc</code>, older psql releases will complain about an unknown command. (I usually have multiple PostgreSQL versions installed, and I share dotfiles across hosts.) On the other hand, I don't want to have to duplicate the entire <code>.psqlrc</code> file to add one command, which is where <code>\ir</code> comes in.
<p>
Here is what I use, for example:
<dl>
<dt><code>.psqlrc-9.2</code>
<dd><pre>\ir .psqlrc
\set QUIET yes
\set COMP_KEYWORD_CASE preserve-lower
\x auto
\unset QUIET</pre>
<dt><code>.psqlrc-9.3</code>
<dd><pre>\ir .psqlrc-9.2</pre>
</dl>
Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com0tag:blogger.com,1999:blog-5541296000399974369.post-89319868752013702562012-09-11T08:14:00.000-04:002012-09-11T08:14:38.545-04:00pgxnclient supports tarballs and HTTPNeed to install a PostgreSQL server add-on module? The <tt>devel</tt> branch of <a href="https://github.com/dvarrazzo/pgxnclient">pgxnclient</a> now supports this type of thing:
<pre>pgxnclient install http://pgfoundry.org/frs/download.php/3274/plproxy-2.4.tar.gz</pre>
This downloads, unpacks, builds, and installs. And the module doesn't need to be on PGXN. And of course you don't have to use HTTP; a file system location will work as well.
I think this can be very useful, especially during development, when not everything is available in packaged form, or even for deployment, if you don't want to bother packaging everything and have been installing from source anyway.
Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com0tag:blogger.com,1999:blog-5541296000399974369.post-72261113046490613252012-05-20T22:00:00.000-04:002012-05-20T22:00:03.006-04:00Base backup compression options<p>I've been looking at my PostgreSQL base backups. They are run using
the traditional</p>
<pre><code>tar -c -z -f basebackup.tar.gz $PGDATA/...
</code></pre>
<p>way (many details omitted). I haven't gotten heavily into using
<code>pg_basebackup</code> yet, but the following could apply there just as well.</p>
<p>I had found some of the base backups to be pretty slow, so I dug a
little deeper. I was surprised to find that the job was completely
CPU bound. The blocking factor was the <code>gzip</code> process. So it was
worth thinking about other compression options. (The alternative is
of course no compression, but that would waste a lot of space.)</p>
<p>There are two ways to approach this:</p>
<ul>
<li><p>Use a faster compression method.</p></li>
<li><p>Parallelize the compression.</p></li>
</ul>
<p>For a faster compression method, there is <code>lzop</code>, for example. GNU
<code>tar</code> has support for that, by using <code>--lzop</code> instead of <code>-z</code>. It
gives a pretty good speed improvement, but the compression results are
of course worse.</p>
<p>For parallelizing compression, there are parallel (multithreaded)
implementations of the well-known <code>gzip</code> and <code>bzip2</code> compression
methods, called <code>pigz</code> and <code>pbzip2</code>, respectively. You can hook these
into GNU <code>tar</code> by using the <code>-I</code> option, something like <code>-I pigz</code>.
Alternatively, put them into a pipe after <code>tar</code>, so that you can pass
them some options. Because otherwise they will bring your system to a
screeching halt! If you've never seen a system at a constant 1600%
CPU for 10 minutes, try these.</p>
<p>If you have a regular service window or natural slow time at night or
on weekends, these tools can be quite useful, because you might be
able to cut down the time for your base backup from, say 2 hours to 10
minutes. But if you need to be always on, you will probably want to
qualify this a little, by reducing the number of CPUs used for this
job. But it can still be pretty effective if you have many CPUs and
want to dedicate a couple to the compression task for a while.</p>
<p>Personally, I have settled on <code>pigz</code> as my standard weapon of choice
now. It's much faster than <code>pbzip2</code> and can easily beat
single-threaded <code>lzop</code>. Also, it produces standard <code>gzip</code> output, of
course, so you don't need to install special tools everywhere, and you
can access the file with standard tools in a bind.</p>
<p>Also, consider all of this in the context of restoring. No matter how
you take the backup, wouldn't it be nice to be able to restore a
backup almost 8 or 16 or 32 times faster?</p>
<p>I have intentionally not included any benchmark numbers here, because
it will obviously be pretty site-specific. But it should be easy to
test for everyone, and the results should speak for themselves.</p>Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com3tag:blogger.com,1999:blog-5541296000399974369.post-46726231489884618102012-05-15T16:46:00.000-04:002012-05-15T16:46:51.262-04:00My (anti-)take on database schema version management<p>There were a number of posts recently about managing schema versions
and schema deployment in PostgreSQL. I have analyzed these with great
interest, but I have concluded that they are all more or less
significantly flawed. (Of course, most of these solutions do in fact
work for someone, but they are not general enough to become canonical
go-to solutions for this problem class, in my opinion.) I have
developed a list of elimination criteria by which I can evaluate
future developments in this area. So here are some of the things that
I don't want in my schema version management system:</p>
<ul>
<li><p>Using schemas for distinguishing multiple versions (like
<a href="http://feedproxy.google.com/~r/blogspot/EzOjx/~3/HrUj6PXPD-c/schema-based-versioning-and-deployment.html">this</a>,
but that's actually more about API versioning). That simply won't
work for deploying objects that are not in schemas, such as casts,
languages, extensions, and, well, schemas.</p></li>
<li><p>Using extensions (like
<a href="http://philsorber.blogspot.com/2012/01/deploy-schemata-like-boss.html">this</a>).
Well, this could work. But extensions by themselves do nothing
about the core problem. They are just an SQL wrapper interface
around upgrade scripts. You still need to write the upgrade
scripts, order them, test them, package them. The extension
mechanism might replace the, say, shell script that would
otherwise run the upgrade files in a suitable order. Another
issue is that extensions require access to the server file system.
Changing this is being
<a href="https://commitfest.postgresql.org/action/patch_view?id=746">discussed</a>
as "inline extensions", but there is no consensus. This is a
smaller problem, but it needs to be thought about. Also, I do
need to support PostgreSQL 9.0 and earlier for little while more.</p></li>
<li><p>Requiring naming each change (patch names, like
<a href="http://www.depesz.com/2010/08/22/versioning/">this</a>). Naming
things is hard. Numbering things is easy. And how many good
names are you going to still be able to come up with after 100 or
so changes?</p>
<p>Take a lesson from file version control systems: versions are
numbers or, if it must be, hashes or the like (UUIDs have been
suggested).</p></li>
<li><p>Using a version control tool for tracking upgrade paths (like
<a href="http://justatheory.com/computers/databases/sqitch-draft.html">this</a>).
Sqitch, unlike the initial draft of this concept, doesn't actually
require a version control tool for deployment, which wouldn't have
worked for me, because what we ship is a tarball or a deb/rpm-type
package. But it still requires you to maintain some kind of
sanity in your version control history so that the tool can make
sense out of it. That sounds fragile and inconvenient. The other
choice appears to be writing the plan files manually without any
VCS involvement, but then this loses much of the apparent appeal
of this tool, and it's really no better than the "naming each
change" approach mentioned above.</p></li>
<li><p>Taking snapshots or the like of a production or staging or central
development system. Production systems and staging systems should
be off limits for this sort of thing. Central development systems
don't exist, because with distributed version control, every
developer has their own setups, branches, deployments, and world
views.</p>
<p>You could set it up so that every developer gets their own
test database, sets up the schema there, takes a dump, and checks
that in. There are going to be problems with that, including that
dumps produced by <code>pg_dump</code> are ugly and optimized for restoring,
not for developing with, and they don't have a deterministic
output order.</p></li>
<li><p>Storing the database source code in a different place or in a
different manner than the rest of the source code. This includes
using a version control system like mentioned above (meaning
storing part of the information in the version control meta
information rather than in the files that are checked into the
version control system in the normal way), using a separate
repository like Post Facto, or using something like the mentioned
staging server.</p>
<p>The source is the source, and it must be possible to check out,
edit, build, test, and deploy everything in a uniform and
customary manner.</p></li>
<li><p>Allowing lattice-like dependencies between change sets (like most
examples cited above). This sounds great on paper, especially if
you want to support distributed development in branches. But then
you can have conflicts, for example where two upgrades add a
column to the same table. Depending on the upgrade path, you end
up with different results. As your change graph grows, you will
have an exploding number of possible upgrade paths that will need
to be tested.</p>
<p>There needs to be an unambiguous, canonical state of the database
schema for a given source checkout.</p></li>
<li><p>Requiring running through all the upgrade scripts for a fresh
deployment (like
<a href="http://www.depesz.com/2010/08/22/versioning/">this</a>). There are
two problems with this. First, it's probably going to be very
slow. Fresh deployments need to be reasonably fast, because they
will be needed for automated tests, including unit tests, where
you don't want to wait for minutes to set up the basic schema.
Second, it's inefficient. Over time, you might drop columns, add
new columns, delete rows, migrate them to different tables, etc.
If you run through all those upgrade scripts, then a supposedly
fresh database will already contain a bunch of rubble, dropped
attributes, dead rows, and the like.</p>
<p>Therefore, the current version needs to be deployable from a
script that will not end up replaying history.</p></li>
<li><p>Using metalanguages or abstraction layers (like Pyrseas or
Liquibase or any of the metaformats included in various web
frameworks). It'd probably a good idea to check some of these out
for simple applications. But my concern is whether using an
abstraction layer would prevent me from using certain features.
For example, look at the
<a href="http://pyrseas.wordpress.com/feature-matrix/">Pyrseas feature matrix</a>.
It's pretty impressive. But it doesn't support extensions, roles,
or grants. So (going by that list), I can't use it. (It's being
<a href="http://pyrseas.wordpress.com/2012/04/10/pyrseas-postgresql-features-feedback-requested/">worked on</a>.)
And in a previous version, when I looked at it for a previous
project, it didn't support foreign-data functionality, so I
couldn't use it then either. And those are just the top-level
things the author thought of. Actually, the Pyrseas author has
gone through some
<a href="http://pyrseas.wordpress.com/2012/03/05/more-database-tools/">effort</a>
to have almost complete coverage of PostgreSQL DDL features, so
give this tool a try. But it won't be for everyone.</p></li>
</ul>
<p>So, I'm going to keep looking.</p>Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com9tag:blogger.com,1999:blog-5541296000399974369.post-70389709131148526122012-03-21T13:59:00.000-04:002012-03-21T13:59:28.453-04:00PostgreSQL and compiler warnings<p>Recently, I did some work on backpatching a few commits from PostgreSQL master, and I noticed that with the current tools, the old branches create tons of compiler warnings. In PostgreSQL 8.3, the oldest currently supported branch, a <code>make all</code> with GCC 4.6.3 produces 231 warnings! (Also note that there are only 751 <tt>.c</tt> files, so that's a warning every three files.) We do a lot of work cleaning up any and all compiler warnings, at least those issued by the latest GCC. These kinds of noisy builds are quite troublesome to work with, because it is more difficult to check whether your changes introduced any new, more serious warnings.
<p>
Let's take a look at the current number of compiler warnings in different PostgreSQL branches with different compilers:
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup><col class="right" /><col class="right" /><col class="right" /><col class="right" /><col class="right" /><col class="right" />
</colgroup>
<thead>
<tr><th scope="col" class="right"></th><th scope="col" class="right">gcc 4.4</th><th scope="col" class="right">gcc 4.5</th><th scope="col" class="right">gcc 4.6</th><th scope="col" class="right">gcc 4.7</th><th scope="col" class="right">clang</th></tr>
</thead>
<tbody>
<tr><th class="right">8.3</td><td class="right">173</td><td class="right">51</td><td class="right">231</td><td class="right">207</td><td class="right">665</td></tr>
<tr><th class="right">8.4</td><td class="right">12</td><td class="right">17</td><td class="right">201</td><td class="right">201</td><td class="right">673</td></tr>
<tr><th class="right">9.0</td><td class="right">13</td><td class="right">13</td><td class="right">89</td><td class="right">89</td><td class="right">780</td></tr>
<tr><th class="right">9.1</td><td class="right">24</td><td class="right">24</td><td class="right">40</td><td class="right">40</td><td class="right">25</td></tr>
<tr><th class="right">master</td><td class="right">1</td><td class="right">1</td><td class="right">1</td><td class="right">1</td><td class="right">1</td></tr>
</tbody>
</table>
<p>
Obviously, GCC 4.6 introduced many new warnings. If you use the compiler that was current around the time the branch was originally released, you'll be better off. But even then, you should expect a few surprises. (8.3 would probably require gcc 4.3, but I don't have that available anymore.)
<p>
Fortunately, it looks as though GCC 4.7, which is currently in release candidate state, will spare us of new warnings. Also note that clang (version 3.0) is now as good as GCC, as far as noise is concerned.Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com0tag:blogger.com,1999:blog-5541296000399974369.post-9059490646090057242012-03-06T00:35:00.000-05:002012-03-06T00:35:20.935-05:00PostgreSQL make install timesI have decided that <tt>make install</tt> is too slow for me. Compare: A run of <tt>make install</tt> takes about 10 seconds (details below), but a run of <tt>make all</tt> with the tree mostly up to date and using ccache for the rest usually takes about 1 or 2 seconds. You can end up wasting a lot of time if you need to do many of these build and install cycles during development. In particular, <tt>make check</tt> includes a run of <tt>make install</tt>, so all this time is added to the time it takes for tests to complete.
<p>
So let's optimize this. The times below are all medians from 5 consecutive runs, writing over an existing installation, so they all had to do the same amount of work.
<p>
This is the baseline:
<ul>
<li><tt>make install</tt> — 10.493 s
</ul>
<p>
The first change is to use a faster shell. This system is using bash as <tt>/bin/sh</tt>. Many Linux distributions now use dash instead, but for some reason I haven't changed this system during the upgrade.
<ul>
<li><tt>make install SHELL=/bin/dash</tt> — 6.344 s
</ul>
I guess I'll be switching this system soon as well then!
<p>
The next thing is to avoid installing the translation files. This exploded the number of files that need to be installed. Instead of, say, one program file, you end up installing one program file and a dozen or so translation files.
<ul>
<li><tt>make install SHELL=/bin/bash enable_nls=no</tt> — 6.890 s
<li><tt>make install SHELL=/bin/dash enable_nls=no</tt> — 4.482 s
</ul>
(In practice you would use <tt>configure --disable-nls</tt>, which is the default. The above is just a way to do this without reconfiguring.) Now I have in the past preferred to build with NLS support to be able to catch errors in that area, but considering this improvement and the availability of the <tt>make maintainer-check</tt> target, I might end up building without it more often.
<p>
Another tip I remembered from the past was to use the <tt>make -s</tt> option to avoid screen output. Depending on the operating system and whether you are logged in locally or remotely, this can be a big win. On my system, this got lost in the noise a bit, but it appeared to make a small difference over many runs.
<ul>
<li><tt>make install SHELL=/bin/bash -s</tt> — 10.511 s
<li><tt>make install SHELL=/bin/dash -s</tt> — 6.146 s
</ul>
Do add this to your arsenal anyway if you want to get maximum performance.
<p>
Next, let's replace the <tt>install-sh</tt> script that does the actual file copying. For obscure reasons, PostgreSQL always uses that shell script, instead of the <tt>/usr/bin/install</tt> program that an Autoconf-based build system would normally use. But you can override the make variables and sustitute the program you want:
<ul>
<li><tt>make install SHELL=/bin/bash INSTALL=install</tt> — 5.418 s
<li><tt>make install SHELL=/bin/dash INSTALL=install</tt> — 3.995 s
</ul>
Interestingly, the choice of shell still makes a noticeable difference, even though it's no longer used to execute <tt>install-sh</tt>.
<p>
Finally, you can also use parallel make for the installation step:
<ul>
<li><tt>make install SHELL=/bin/bash -j2</tt> — 6.538 s
<li><tt>make install SHELL=/bin/dash -j2</tt> — 4.158 s
</ul>
You can gather from these numbers that the installation process appears to be mostly CPU-bound. This system has 4 cores, so let's add some more parallelization:
<ul>
<li><tt>make install SHELL=/bin/dash -j3</tt> — 3.330 s
<li><tt>make install SHELL=/bin/dash -j4</tt> — 2.944 s
<li><tt>make install SHELL=/bin/dash -j5</tt> — 2.930 s
<li><tt>make install SHELL=/bin/dash -j6</tt> — 2.952 s
</ul>
That's probably enough.
<p>
Now let's put everything together:
<ul>
<li><tt>make install SHELL=/bin/dash enable_nls=no INSTALL=install -s -j4</tt> — 1.708 s
</ul>
Or even:
<ul>
<li><tt>make install SHELL=/bin/dash enable_nls=no INSTALL=install -s -j3</tt> — 1.654 s
</ul>
That's a very nice improvement from 10.493 s!
<p>
The problem is, it is not all that easy to pass these options to the <tt>make install</tt> calls made in <tt>make check</tt> runs. If you can and want to change your system shell, and you configure without NLS support, then you will probably already be more than half way there. Then again, I suspect most readers already have that setup anyway. For the other options, to take down the installation time to almost instantaneous, you have to do ad hoc surgery in various places. I'm looking into improving that.Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com5tag:blogger.com,1999:blog-5541296000399974369.post-65241842607562949342011-11-22T14:48:00.001-05:002011-11-23T15:39:09.100-05:00git whoamiMy favorite feature in <tt>bzr</tt> (Bazaar) is the <tt>bzr whoami</tt> command, which prints what your current identity (name and email) is, as far as the repository is concerned. You can tell I haven't used <tt>bzr</tt> much if that's as far as I have gotten. But seriously, with so many Git repositories around, several project identities, <a href="http://michael-prokop.at/blog/2009/05/30/directory-specific-shell-configuration-with-zsh/">directory-specific shell configuration</a>, and so on, it's easy to get confused, and it's annoying to have to check and repair commits for correct user name and email all the time. So here is <a href="https://github.com/petere/git-whoami"><tt>git whoami</tt></a>. This has already saved me so many headaches.Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com5tag:blogger.com,1999:blog-5541296000399974369.post-69170755648695604282011-11-22T08:55:00.001-05:002011-12-05T13:26:33.382-05:00plpydbapi: DB-API for PL/PythonOne thing that's weird about <a href="http://www.postgresql.org/docs/current/static/plpython.html">PL/Python</a> is that its <a href="http://www.postgresql.org/docs/current/static/plpython-database.html">database access API</a> is completely different from the standard Python <a href="http://www.python.org/dev/peps/pep-0249/">DB-API</a>. It is similar to <a href="http://www.postgresql.org/docs/current/static/plperl-builtins.html#PLPERL-DATABASE">PL/Perl</a> and <a href="http://www.postgresql.org/docs/current/static/pltcl-dbaccess.html">PL/Tcl</a>, and the C "<a href="http://www.postgresql.org/docs/current/static/spi.html">SPI</a>" API, from which they are all derived, but that's little help for a Python programmer. (The reasons for this are lost in history. Probably laziness.) Moreover, the two APIs use the same function names for different purposes.
<p>
So I set out to develop a DB-API compatible layer on top of PL/Python: <a href="https://github.com/petere/plpydbapi">plpydbapi</a>
<p>
Example:
<pre class="prettyprint">
CREATE FUNCTION test() RETURNS void
LANGUAGE plpythonu
AS $$
import plpydbapi
dbconn = plpydbapi.connect()
cursor = dbconn.cursor()
cursor.execute("SELECT ... FROM ...")
for row in cursor.fetchall():
plpy.notice("got row %s" % row)
dbconn.close()
$$;
</pre>
<p>
Granted, it's more verbose than the native PL/Python syntax, so you might not want to use it after all. But it can be helpful if database calls are nested in some other modules, or you just don't want to learn another database access API.
<p>
This started out more as an experiment, but it turns out that with the many improvements in <a href="http://www.postgresql.org/docs/9.1/static/release-9-1.html#AEN107340">PL/Python in PostgreSQL 9.1</a>, it's possible to do this. (Subtransaction control and exception handling were the big issues.) The one gaping hole is that there is apparently no way to get metadata out of a query result. Something to address in PostgreSQL 9.2, perhaps.
<p>
Thanks go to the <a href="https://launchpad.net/dbapi-compliance">DB-API compliance test suite</a>, which was extremely helpful in making this happen. (Nonetheless, the test suite is quite incomplete in some regards, so treat the result with care anyway.)
<p>
Another thing that I found neat about this project is that I managed to get the unit tests based on Python's unittest module to run in the PL/Python environment inside the PostgreSQL server. That's the power of unittest2.Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com4tag:blogger.com,1999:blog-5541296000399974369.post-70661642657481394132011-09-15T23:38:00.000-04:002011-09-15T23:38:40.747-04:00ccache and clang, part 2There's more funny business when using <tt>ccache</tt> in combination with <tt>clang</tt>. <a href="http://petereisentraut.blogspot.com/2011/05/ccache-and-clang.html">Last time</a> I suggested that you use the invocation
<pre>./configure CC='ccache clang -Qunused-arguments -fcolor-diagnostics'</pre>
to get rid of the "argument unused during compilation" warnings.
<p>
But you still might get loads of warnings that you wouldn't normally get without <tt>ccache</tt>, such as this example (from the PostgreSQL source code):
<pre>
extension.c:382:35: warning: equality comparison with extraneous parentheses [-Wparentheses]
if (( (((control->directory)[0]) == '/') ))
~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
extension.c:382:35: note: use '=' to turn this equality comparison into an assignment
</pre>
(This is the opposite of the warning that tells you to put two pairs of parentheses around an assignment used as a truth value.)
Or:
<pre>
path.c:727:7: warning: explicitly assigning a variable of type 'char *' to itself [-Wself-assign]
path = (path);
~~~~ ^ ~~~~
</pre>
The problem is, these come from macro expansions, so wouldn't normally see them, because (I guess) the compiler driver is smart enough not to warn about such things when they come from macro expansions.
<p>
The way <tt>ccache</tt> works is approximately
<ol>
<li>preprocess the input file
<li>look for it in the cache
<li>if not found, compile the <i>preprocessed</i> file
</ol>
What would be better in this situation is
<ol>
<li>preprocess the input file
<li>look for it in the cache
<li>if not found, compile the <i>original</i> file
</ol>
And indeed you can turn on that second behavior by setting the obscure environment variable <tt>CCACHE_CPP2</tt> (as in, run cpp twice):
<pre>export CCACHE_CPP2=yes</pre>
Then all these extra warnings disappear.
<p>
(The <tt>ccache</tt> man page is worth a read. There are a few interesting settings to play with.)
<p>
I'm currently playing around with a shell script <tt>ccache-clang</tt> that looks like this:
<pre>CCACHE_CPP2=yes exec ccache clang -Qunused-arguments `test -t 2 && echo -fcolor-diagnostics` "$@"</pre>
Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com8tag:blogger.com,1999:blog-5541296000399974369.post-16480547347157154112011-09-10T16:13:00.000-04:002011-09-10T16:13:09.639-04:00Beta workPostgreSQL 9.1.0 was wrapped the yesterday (or the day before yesterday, depending on from where you see it). It is now in a special place, where it can rest and let the flavor soak in, before it's released into the wild next week.
<p>
Time to reflect.
<p>
One thing that is sometimes criticized about the PostgreSQL release process is that there is too much time between the start of beta (or the end of the last commit fest) and the final release, and that it's not clear what ordinary contributors can do during that time to speed things up. This is now worked around to some degree by starting the next development cycle during the beta period of the previous one. (Evaluating the success of that approach is another matter.) But because this matter had also bothered me, I have taken notes this time, first of all so I don't forget anything next time, but perhaps also to provide some inspiration for others.
<p>
So here is what I'd been up to during the 9.1 beta period:
<dl>
<dt><b>Documentation work</b>
<dd>
It is well understood that the documentation needs a general review during beta time. As features are patched in, chunks of documentation are added here and there, and at the end of it all you just need to stare at it for a bit and see whether the order and flow still make sense. I didn't do much of that this time, but I know others did. Other things to do are:
<ul>
<li>Various bits of SGML markup usually need to be cleaned up, e.g., alignment of verbatim environments, linking to formal objects.
<li>I always run the whole documentation through a spell checker, which always results in a bunch of corrections.
<li>The SQL features list in the documentation needs to be updated.
<li>The SQL key words list in the documentation needs to be updated.
<li>All generated man pages should be looked through manually for formatting artifacts.
<li>Non-ASCII characters in the SGML source need to be checked. They should be converted to entities, and ideally they should fit into the Latin 1 set.
</ul>
Other people also spent significant time on preparing the release notes, which is a huge job. I didn't even get into that, except that the spell checking and automated cleanups apply to the release notes as well, since they are part of the documentation build.
<dt><b>Source code cleanup</b>
<dd>
We run pgindent during beta, but there are also a number of other things to do to clean up the source code. A lot of these I didn't plan. Mostly, one thing led to another.
<ul>
<li>Check for non-ASCII characters in source. These should either be in UTF-8 or replaced by an ASCII equivalent.
<li>Spell checking: You can't run the whole source through a spell checker, but there are also a few typical misspellings or variant spellings that are easy to clean up over the entire source tree, such as GIST -> GiST or cancelled -> canceled.
<li>Check the <tt>--help</tt> and <tt>--version</tt> output of all programs to see whether it follows conventions.
<li>In particular, check that the <tt>--help</tt> output is aligned, not too wide, and that the options are ordered sensibly. (Very often a new option is just added at the end or at some random place in the middle.)
<li>Diff the <tt>--help</tt> outputs between the previous and the current version to find out about all new command-line options and check whether they are sensibly named and properly documented.
<li>And the order in which options are presented in the <tt>--help</tt> output and on the man pages should not be gratuitously different.
<li>Similarly, the backslash command help output in psql (<tt>\?</tt>) should be checked that it looks ok.
<li>Go through the configuration file samples and check them for sanity, up-to-dateness, and proper formatting.
<li>Remove literal tabs from string constants. How did they ever get in there?
<li>Compile the code with a bunch of extra warning options. GCC has a lot of those, try them all out. A lot of them generate a lot of noise, but going through them about once a release does find the occasional trouble spot, or at least provides inspiration for future code cleanup projects. As of 9.1, you can also throw Clang into the mix.
<li>Also, check building against cutting edge versions of library dependencies. Good candidates are Perl and Python, which introduce small breakage with each new release. If there is a new release candidate of those guys out while we prepare the PostgreSQL release, our users will be building against the final versions soon, so we need to be prepared.
</ul>
<dt><b>Translation work</b>
<dd>
Besides the translation work itself and orchestrating the translator contributors, this is also the best chance to read through every single user-visible message that was introduced in the new version. So there are occasional spelling fixes, some messages are poorly worded or incomprehensible to a normal user, or sometimes the code can be restructured so that some error condition doesn't happen at all. A lot of this feeds back into source code cleanup items.
<dt><b>Help the packagers</b>
<dd>
We have a great bunch of packagers who prepare PostgreSQL for installation on various operating systems. Help them out. Make sure all new features are properly integrated, package descriptions are up-to-date, upgrades work.
</dl>
And then of course you field submissions from testers and other contributors who will submit countless bug reports, failures on unusual platforms, documentation fixes.
<p>
So that can keep you busy for a month or three. The goal ought to be to automated most of this. But much of it can't be. So we really do need to spend the time to do these things, especially polishing the user-facing things. We don't want the first program output or documentation page the user sees of the new release to be ugly or wrong.Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com0tag:blogger.com,1999:blog-5541296000399974369.post-89593359358321804082011-07-19T16:10:00.000-04:002011-07-19T16:10:22.864-04:00Undefined symbolHere is a small DBA tale from a few months ago. I wanted to upgrade a few servers to the latest minor release of PostgreSQL. It was the upgrade from 8.4.4 to 8.4.5. To minimize downtime, I had the following obvious idea: I'd install all the packages first, and once everything was in place and an opportune time for a service disruption had arrived, I'd just restart the services, and everything would be backup in less than a minute. (If you just let the package manager do its thing, it's always a bit unpredictable when the services are stopped and restarted, which might result several minutes of downtime.)<br />
<br />
This operation had dramatic consequences. As soon as I had the new packages installed, I got massive amounts of these errors:<br />
<br />
<tt>ERROR: could not load library "/usr/lib/postgresql/8.4/lib/plpgsql.so": /usr/lib/postgresql/8.4/lib/plpgsql.so: undefined symbol: PinPortal</tt><br />
<br />
Since a lot of things use PL/pgSQL, this meant that the database system was pretty much dead at this point.<br />
<br />
What had happened?<br />
<br />
Well, version 8.4.5 introduced some new functions in the server that PL/pgSQL made use of. As soon as you install the new <tt>plpgsql</tt> module, the next session that starts will load the new libary, but unless you restarted the server, the running server process won't have that symbol.<br />
<br />
It was easy to fix this by restarting the server, but that's not how things were planned.<br />
<br />
Consequence: Unless shown otherwise and a lot of care is taken, you must upgrade the server and all loadable modules to exactly matching versions and at the same time, and you must restart the server at the same time.<br />
<br />
It might be useful as a preventive measure to compare symbol tables before upgrading. It could also be tracked more carefully at the source level, by explicitly mentioning the addition of meant-to-be-called-externally functions to the server in the release notes. (I suppose we don't usually remove functions, but who knows.)Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com3tag:blogger.com,1999:blog-5541296000399974369.post-88375414314117350392011-06-01T17:00:00.000-04:002012-04-13T17:00:55.386-04:00Enabling core files for PostgreSQL on DebianThe other day, I was a bit puzzled over a seemingly simple task: Enable core files to be generated from a PostgreSQL instance running on Debian. That instance has unfortunately been segfaulting on occasion, but never left a core file.<br />
<br />
Now in principle it is clear that<br />
<pre>ulimit -c unlimited</pre>is the incantation to get this done. But where do you put this? You could hack it into the init script, but that seemed a bit ugly, and I wanted a sustainable solution.<br />
<br />
A useful thing in the meantime is to check the current settings. That information is available in <tt>/proc/$PID/limits</tt> with the PID of the postmaster process (or any child process, really), and it looked like this to begin with:<br />
<pre>Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited ms
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
...</pre>Use <tt>sudo grep core /proc/$(sudo cat /var/run/postgresql/8.4-main.pid)/limits</tt> if you want it automated.<br />
<br />
So it's good to know that we only need to set the soft limit.<br />
<br />
One way to configure this <q>properly</q> would appear to be in <tt>/etc/security/limits.conf</tt>. There you can add a line like<br />
<pre>* soft core unlimited</pre>to enable core dumps globally. I'm not actually sure whether that would work if the service is started during the boot without PAM. In any case, I didn't want to enable core files globally; who knows what that would lead to.<br />
<br />
One could replace the <tt>*</tt> by a user name, such as <tt>postgres</tt>, and then enable <tt>pam_limits.so</tt> in <tt>/etc/pam.d/su</tt>. But the <tt>postgresql</tt> init script in Debian is nested about four levels deep, so it wasn't clear whether it called <tt>su</tt> at all.<br />
<br />
Now as it turns out, the init script ends up changing the user using this Perl code:<br />
<pre>$) = $groups;
$( = $gid;
$> = $< = $uid;</pre>(see <tt>change_ugid</tt> in <tt>/usr/share/postgresql-common/PgCommon.pm</tt>), so the whole PAM line of thought wasn't going to work anyway. (Other packages such as <tt>pgbouncer</tt> and <tt>slony1</tt> do got through <tt>su</tt>, so that would be a solution for those.)<br />
<br />
The way to solve this is the <tt>pg_ctl</tt> <tt>-c</tt> option, which sets the soft limit for core files to unlimited. And the way to pass this option through the init script maze is the file <tt>/etc/postgresql/8.4/main/pg_ctl.conf</tt>, which should contain a line like this:<br />
<pre>pg_ctl_options = '-c'</pre>Then restart <tt>postgresql</tt>, and check <tt>/proc/$PID/limits</tt> again:<br />
<pre>Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited ms
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size unlimited unlimited bytes </pre>OK.<br />
<br />
Another thing that's recommendable in this context is to change the core file names to have a unique element, so that if multiple backends crash before you can take a look, they don't overwrite each other's core files. The core(7) man page explains the configuration options; I went with this sysctl setting:<br />
<pre>kernel.core_pattern = core.%e.%p</pre>which includes process name and PID. The PID file still ends up in the data directory of the PostgreSQL instance, which could also be changed, but I didn't find it necessary.<br />
<br />
Stick the above line in <tt>/etc/sysctl.d/local.conf</tt> and reload with<br />
<pre>service procps force-reload</pre>I actually use a setting like that on all machines now; it's just nicer.<br />
<br />
OK, and now I'll wait for the next core file. Or not.Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com6tag:blogger.com,1999:blog-5541296000399974369.post-66089311593554578702011-05-05T15:38:00.000-04:002011-05-05T15:38:37.878-04:00ccache and clangJust a note for the Internet: When you use <tt>ccache</tt> and <tt>clang</tt> together, you will probably get a lot of warnings like these:<br />
<pre>clang: warning: argument unused during compilation: '-c'
clang: warning: argument unused during compilation: '-I .'</pre>These are harmless, but if you want to get rid of them, use the <tt>clang</tt> option <tt>-Qunused-arguments</tt>, which will hide them. (The first one is already <a href="http://git.samba.org/?p=ccache.git;a=commit;h=f0db645c75a2c063526685b791a4afd7b2884d1d">fixed</a> in <tt>ccache</tt>.)<br />
<br />
The reason for this is that <tt>ccache</tt> splits the compilation into separate calls to the preprocessor and the compiler proper, and it tries to sort out which of the options that you called it with go with which call. But since <tt>gcc</tt> doesn't complain about passing <tt>-c</tt> to the preprocessor or <tt>-I</tt> to the compiler, <tt>ccache</tt> doesn't bother about sorting this out (<a href="https://bugzilla.samba.org/show_bug.cgi?id=8118">bug</a>). That's why you don't lose any information relative to using <tt>gcc</tt> if you use the <tt>-Qunused-arguments</tt> option.<br />
<br />
Also, if you like <tt>clang</tt>'s colored diagnostics messages, you'll have to turn them on explicitly with <tt>-fcolor-diagnostics</tt>, because when running through <tt>ccache</tt>, <tt>clang</tt> doesn't think it's printing to a terminal and turns off the color by default.<br />
<br />
So a complete invocation might look like this:<br />
<pre>./configure CC='ccache clang -Qunused-arguments -fcolor-diagnostics'</pre>Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com6tag:blogger.com,1999:blog-5541296000399974369.post-42343802048075868382011-04-22T17:54:00.000-04:002011-04-22T17:54:58.277-04:00Emacs fill column for PostgreSQLSince it's pgindent season, here's a small related tip for Emacs users.<br />
<pre>(setq fill-column 79)</pre>This will set the maximum line length for filling commands such as <kbd>M-q</kbd> to the same length that pgindent uses. That is especially useful when you want to reformat block comments nicely. The default is 70, which will cause perpetual conflicts between Emacs and pgindent.<br />
<br />
The <a href="http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=415f5e12592d135919549a5eb21893fda0417ebf;hp=9e9b9ac7d1860fbb98eb4db17a94ff25524b6447">commit</a> shows various other ways to include this in your Emacs configuration. (I use the <code>c-add-style</code> approach, personally.)Anonymoushttp://www.blogger.com/profile/02849480732923051923noreply@blogger.com0