Thursday, July 16, 2009

Adding Color to the Console: Pygments vs. Source-highlight

The other day I write about code syntax highlighting with less and source-highlight, when someone pointed out the Pygments package, that can do the same thing. This called for

the great source-highlight vs. pygments face-off

Installation

Source-highlight is in C++ and needs the Boost regex library. About 3.5 MB together. Pygments is in Python and has no dependencies beyond that. About 2 MB installed. No problem on either side.

In Debian, the packages are source-highlight and python-pygments. Note that the Pygments command-line tool is called pygmentize.

Source-highlight is licensed under the GPL, Pygments under a BSD license.

Getting Started

pygmentize file.c
writes a colored version of file.c to the standard output. Nice.
source-highlight file.c
writes a colored version of file.c to file.c.html. As I had written before, the correct invocation for this purpose is
source-highlight -fesc -oSTDOUT file.c
That makes pygmentize slightly easier to use, I suppose.

Supported Languages

Source-highlight supports 30 languages, Pygments supports 136.

Source-highlight can produce output for DocBook, ANSI console, (X)HTML, Javadoc, LaTeX, Texinfo. Pygments can produce output for HTML, LaTeX, RTF, SVG, and several image formats.

Source-highlight supports styles, but only ships a few. Pygments supports styles and filters, and ships over a dozen styles.

So more options with Pygments here.

Also note that Pygments is a Python library that can be used, say, in web applications for syntax highlighting. This is used in Review Board, for example. Source-highlight is just a command-line tool, but it could of course also be invoked by other packages. Horde uses this, for instance.

Speed

To process all the C files in the PostgreSQL source tree (709271 lines), writing the ANSI console colored version of each file.c to file.c.txt:
source-highlight
25 seconds
pygmentize
5 minutes
So for interactive use with less, source-highlight is probably still the better option.

Robustness

pygmentize gave me a few errors of this sort during the processing of the PostgreSQL source tree:
*** Error while highlighting:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 83: ordinal not in range(128)
(file "/usr/lib/python2.5/site-packages/Pygments-1.0-py2.5.egg/pygments/formatters/terminal.py", line 93, in format)

*** Error while highlighting:
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 196-198: invalid data
(file "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode)
That probably shouldn't happen.

source-highlight gave no spurious errors in my limited testing.

Miscellaneous

Source-highlight can highlight its own configuration files, which are in a custom language, and Pygments' configuration files, which are in Python. Conversely, Pygments can of course highlight its own configuration files, but doesn't know what to do with those of Source-highlight.

Summary

I will stick with Source-highlight for interactive use, but Pygments is a great alternative when you need more formatting options or want to integrate the package as a library.

3 comments:

  1. hey!

    i did a similar comparison a while back when writing the http://patch-tracker.debian.net . since i have a strong aversion to writing anything using system() in a webapp, and i was writing in python anyway, pygments was a clear winner for that use case.

    ReplyDelete
  2. In my test to produce source to be input into a Latex document, source-highlight produce output that just works... pygments not and I receive several errors when I did a pdflatex.

    I know it is not relevant for your post, but if someone is looking for highlight into Latex, pygments is not recommended.

    Thanks =)

    ReplyDelete
  3. Another option to check out is highlight, also
    in debian. Upstream ships swig based api's for perl and python, although so far I only install the perl ones in the debian package. The killer feature for me is a passthrough option to allow lines of markup in your source. I use it to add overlays for beamer.

    ReplyDelete