Peter Eisentraut's Blog: tar

Sunday, May 20, 2012

Base backup compression options

I've been looking at my PostgreSQL base backups. They are run using the traditional

tar -c -z -f basebackup.tar.gz $PGDATA/...

way (many details omitted). I haven't gotten heavily into using pg_basebackup yet, but the following could apply there just as well.

I had found some of the base backups to be pretty slow, so I dug a little deeper. I was surprised to find that the job was completely CPU bound. The blocking factor was the gzip process. So it was worth thinking about other compression options. (The alternative is of course no compression, but that would waste a lot of space.)

There are two ways to approach this:

Use a faster compression method.
Parallelize the compression.

For a faster compression method, there is lzop, for example. GNU tar has support for that, by using --lzop instead of -z. It gives a pretty good speed improvement, but the compression results are of course worse.

For parallelizing compression, there are parallel (multithreaded) implementations of the well-known gzip and bzip2 compression methods, called pigz and pbzip2, respectively. You can hook these into GNU tar by using the -I option, something like -I pigz. Alternatively, put them into a pipe after tar, so that you can pass them some options. Because otherwise they will bring your system to a screeching halt! If you've never seen a system at a constant 1600% CPU for 10 minutes, try these.

If you have a regular service window or natural slow time at night or on weekends, these tools can be quite useful, because you might be able to cut down the time for your base backup from, say 2 hours to 10 minutes. But if you need to be always on, you will probably want to qualify this a little, by reducing the number of CPUs used for this job. But it can still be pretty effective if you have many CPUs and want to dedicate a couple to the compression task for a while.

Personally, I have settled on pigz as my standard weapon of choice now. It's much faster than pbzip2 and can easily beat single-threaded lzop. Also, it produces standard gzip output, of course, so you don't need to install special tools everywhere, and you can access the file with standard tools in a bind.

Also, consider all of this in the context of restoring. No matter how you take the backup, wouldn't it be nice to be able to restore a backup almost 8 or 16 or 32 times faster?

I have intentionally not included any benchmark numbers here, because it will obviously be pretty site-specific. But it should be easy to test for everyone, and the results should speak for themselves.

Monday, May 14, 2012

Time to retrain the fingers

For years, no decades, I've typed tar tzf something, tar xzf something. Except when someone annoying sent an uncompressed tar file and I had to then go and take out the z in the middle.

Then came bzip2, and we learned tar tjf, tar xjf. OK, I could live with that. One emerging problem was that the tab completion now worked the wrong way around conceptually, because you had to pick and type the right letter first in order to see the appropriate set of files to unpack offered for completion.

Then came lzma, which was (quick, guess?), tar tJf, tar xJf. And then there was lzop, which was too boring to get its own letter, so you had to type out tar -x --lzop -f.

But lzma was short-lived, because then came xz, which was also J, because lzma was now too boring as well to get its own letter.

Oh, and there is also the old compress, which is Z, and lzip, which I'd never heard of.

But stop that. Now there is

 -a, --auto-compress
            use archive suffix to determine the compression program

This handles all the above compression programs, and no compression. So from now on, I always use tar taf and tar xaf. Awesome.

The finger movements will be almost the same on QWERTY and AZERTY, and easier than before on QWERTZ.

Actually, this option is already four years old in GNU tar. Funny I'd never heard of it until recently.

Peter Eisentraut's Blog

Sunday, May 20, 2012

Base backup compression options

Monday, May 14, 2012

Time to retrain the fingers

About Me

Recent Posts

Blog Archive

Writing

Hacking

Search This Blog

Some Rights Reserved

Labels

Peter Eisentraut's Blog

Sunday, May 20, 2012

Base backup compression options

Monday, May 14, 2012

Time to retrain the fingers

About Me

Recent Posts

Subscribe To

Blog Archive

Writing

Hacking

Search This Blog

Some Rights Reserved

Labels