I've been looking at my PostgreSQL base backups. They are run using the traditional
tar -c -z -f basebackup.tar.gz $PGDATA/...
way (many details omitted). I haven't gotten heavily into using
pg_basebackup yet, but the following could apply there just as well.
I had found some of the base backups to be pretty slow, so I dug a
little deeper. I was surprised to find that the job was completely
CPU bound. The blocking factor was the
gzip process. So it was
worth thinking about other compression options. (The alternative is
of course no compression, but that would waste a lot of space.)
There are two ways to approach this:
Use a faster compression method.
Parallelize the compression.
For a faster compression method, there is
lzop, for example. GNU
tar has support for that, by using
--lzop instead of
gives a pretty good speed improvement, but the compression results are
of course worse.
For parallelizing compression, there are parallel (multithreaded)
implementations of the well-known
pbzip2, respectively. You can hook these
tar by using the
-I option, something like
Alternatively, put them into a pipe after
tar, so that you can pass
them some options. Because otherwise they will bring your system to a
screeching halt! If you've never seen a system at a constant 1600%
CPU for 10 minutes, try these.
If you have a regular service window or natural slow time at night or on weekends, these tools can be quite useful, because you might be able to cut down the time for your base backup from, say 2 hours to 10 minutes. But if you need to be always on, you will probably want to qualify this a little, by reducing the number of CPUs used for this job. But it can still be pretty effective if you have many CPUs and want to dedicate a couple to the compression task for a while.
Personally, I have settled on
pigz as my standard weapon of choice
now. It's much faster than
pbzip2 and can easily beat
lzop. Also, it produces standard
gzip output, of
course, so you don't need to install special tools everywhere, and you
can access the file with standard tools in a bind.
Also, consider all of this in the context of restoring. No matter how you take the backup, wouldn't it be nice to be able to restore a backup almost 8 or 16 or 32 times faster?
I have intentionally not included any benchmark numbers here, because it will obviously be pretty site-specific. But it should be easy to test for everyone, and the results should speak for themselves.