The other day I write about
code syntax highlighting with less and source-highlight, when someone
pointed out the
Pygments package, that can do the same thing. This called for
the great source-highlight vs. pygments face-off
Installation
Source-highlight is in C++ and needs the Boost regex library. About 3.5 MB together. Pygments is in Python and has no dependencies beyond that. About 2 MB installed. No problem on either side.
In Debian, the packages are
source-highlight and
python-pygments. Note that the Pygments command-line tool is called
pygmentize.
Source-highlight is licensed under the GPL, Pygments under a BSD license.
Getting Started
pygmentize file.c
writes a colored version of
file.c to the standard output. Nice.
source-highlight file.c
writes a colored version of
file.c to
file.c.html. As I had written before, the correct invocation for this purpose is
source-highlight -fesc -oSTDOUT file.c
That makes
pygmentize slightly easier to use, I suppose.
Supported Languages
Source-highlight supports 30 languages, Pygments supports 136.
Source-highlight can produce output for DocBook, ANSI console, (X)HTML, Javadoc, LaTeX, Texinfo. Pygments can produce output for HTML, LaTeX, RTF, SVG, and several image formats.
Source-highlight supports styles, but only ships a few. Pygments supports styles and filters, and ships over a dozen styles.
So more options with Pygments here.
Also note that Pygments is a Python library that can be used, say, in web applications for syntax highlighting. This is used in Review Board, for example. Source-highlight is just a command-line tool, but it could of course also be invoked by other packages. Horde uses this, for instance.
Speed
To process all the C files in the PostgreSQL source tree (709271 lines), writing the ANSI console colored version of each
file.c to
file.c.txt:
- source-highlight
- 25 seconds
- pygmentize
- 5 minutes
So for interactive use with
less, source-highlight is probably still the better option.
Robustness
pygmentize gave me a few errors of this sort during the processing of the PostgreSQL source tree:
*** Error while highlighting:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 83: ordinal not in range(128)
(file "/usr/lib/python2.5/site-packages/Pygments-1.0-py2.5.egg/pygments/formatters/terminal.py", line 93, in format)
*** Error while highlighting:
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 196-198: invalid data
(file "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode)
That probably shouldn't happen.
source-highlight gave no spurious errors in my limited testing.
Miscellaneous
Source-highlight can highlight its own configuration files, which are in a custom language, and Pygments' configuration files, which are in Python. Conversely, Pygments can of course highlight its own configuration files, but doesn't know what to do with those of Source-highlight.
Summary
I will stick with Source-highlight for interactive use, but Pygments is a great alternative when you need more formatting options or want to integrate the package as a library.