Wednesday, June 1, 2011

Enabling core files for PostgreSQL on Debian

The other day, I was a bit puzzled over a seemingly simple task: Enable core files to be generated from a PostgreSQL instance running on Debian. That instance has unfortunately been segfaulting on occasion, but never left a core file.

Now in principle it is clear that
ulimit -c unlimited
is the incantation to get this done. But where do you put this? You could hack it into the init script, but that seemed a bit ugly, and I wanted a sustainable solution.

A useful thing in the meantime is to check the current settings. That information is available in /proc/$PID/limits with the PID of the postmaster process (or any child process, really), and it looked like this to begin with:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            ms        
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
...
Use sudo grep core /proc/$(sudo cat /var/run/postgresql/8.4-main.pid)/limits if you want it automated.

So it's good to know that we only need to set the soft limit.

One way to configure this properly would appear to be in /etc/security/limits.conf. There you can add a line like
*               soft    core            unlimited
to enable core dumps globally. I'm not actually sure whether that would work if the service is started during the boot without PAM. In any case, I didn't want to enable core files globally; who knows what that would lead to.

One could replace the * by a user name, such as postgres, and then enable pam_limits.so in /etc/pam.d/su. But the postgresql init script in Debian is nested about four levels deep, so it wasn't clear whether it called su at all.

Now as it turns out, the init script ends up changing the user using this Perl code:
$) = $groups;
$( = $gid;
$> = $< = $uid;
(see change_ugid in /usr/share/postgresql-common/PgCommon.pm), so the whole PAM line of thought wasn't going to work anyway. (Other packages such as pgbouncer and slony1 do got through su, so that would be a solution for those.)

The way to solve this is the pg_ctl -c option, which sets the soft limit for core files to unlimited. And the way to pass this option through the init script maze is the file /etc/postgresql/8.4/main/pg_ctl.conf, which should contain a line like this:
pg_ctl_options = '-c'
Then restart postgresql, and check /proc/$PID/limits again:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            ms        
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        unlimited            unlimited            bytes     
OK.

Another thing that's recommendable in this context is to change the core file names to have a unique element, so that if multiple backends crash before you can take a look, they don't overwrite each other's core files. The core(7) man page explains the configuration options; I went with this sysctl setting:
kernel.core_pattern = core.%e.%p
which includes process name and PID. The PID file still ends up in the data directory of the PostgreSQL instance, which could also be changed, but I didn't find it necessary.

Stick the above line in /etc/sysctl.d/local.conf and reload with
service procps force-reload
I actually use a setting like that on all machines now; it's just nicer.

OK, and now I'll wait for the next core file. Or not.

6 comments:

  1. Thanks for posting this -- I will surely make use of this information.

    ReplyDelete
  2. nice post. This should be included in the docs. I got only one segmentation fault in ten years of using postgresql. So it would be nice to know about these settings beforehand.

    btw: on my system only "service procps start" worked.

    ReplyDelete
    Replies
    1. Sure, but which docs? It's a combination of general PostgreSQL information, general Debian information, and specifics of the postgresql package in Debian. No single documentation seems suitable for that.

      Btw., at least as far back as lenny, "procps start" and "procps force-reload" perform the same code, so it's unlikely that only one would work.

      Delete
    2. Very nice write-up, thank you! Maybe you could put this in Debian Wiki's PostreSQL page?

      Delete
  3. .. Great Post. One question, do I need to take any preliminary steps when running pg_ctl with -c option.

    --Raghav

    ReplyDelete