OK, I've had it. Who ever thought that a search path was a good idea anyway? I'm talking about the schema search path in PostgreSQL in particular, but any other ones are just as bad. If you review the last 48 hours in the PostgreSQL
commit log, you can see what kind of nonsense this creates, and what amount of resources it drains (with apologies to my fellow developers who had to clean up my mistakes).
The Schema Search Path
What is the schema search path? One way to describe it is that it allows you to specify the objects in which schemas can be addressed without schema qualification, and in which order those objects are resolved. So if you write
CREATE TABLE schema1.foo ( ... );
CREATE TABLE schema2.bar ( ... );
SET search_path TO schema1, schema2;
then you can address these tables as just
SELECT ... FROM foo, bar ...
instead of
SELECT ... FROM schema1.foo, schema2.bar ...
Another way to describe it is that it allows you to set hidden traps that mysteriously make your perfectly good SQL behave in completely unpredictable ways. So a perfectly harmless query like
SELECT * FROM pg_class WHERE relname ~ 'stuff$';
can be changed to do pretty much anything, if the user creates his own
pg_class and puts it first into the search path, before the system schema
pg_catalog.
The way to deal with this, to the extent that it is dealt with at all, is to either set the schema search path everywhere before accessing any object (for some value of everywhere, such as every script, every session, every function) or to explicitly schema-qualify every object reference. The problem with the former is that it negates the point of allowing the user to set the search path in the first place. What you'd need to do instead is to prepend your known search path to what was set already, which is a pain to program in SQL, which is why probably hardly anyone does it that way. The problem with the latter is that it makes schemas pointless, because if you have to qualify everything, you might as well make the schema part of the name ("pg_catalog_pg_class") and forget about the extra indirection.
Qualify Everything!
The way we usually deal with this in PostgreSQL core code, for example in the psql describe code, is to add
pg_catalog qualifications everywhere; see this
last commit for instance. In my small example above, this would become
SELECT * FROM pg_catalog.pg_class WHERE relname ~ 'stuff$';
But this, and the above commit and all of psql is still wrong, because the actually correct way to write this is
SELECT * FROM pg_catalog.pg_class WHERE relname OPERATOR(pg_catalog.~) 'stuff$';
because PostgreSQL of course allows user-defined operators, and those are subject to search path rules like anything else. Proof:
CREATE OPERATOR = ( PROCEDURE = charne, LEFTARG = "char", RIGHTARG = "char" );
SET search_path = public, pg_catalog;
\diS
This command is supposed to show all indexes, but now it will show everything that is not an index. Try this yourself; it's a great way to mess with your colleagues' heads. :-) Anyway, surely no one wants to schema-qualify all operators, which is why attempting to qualify everything is probably a hopeless proposition.
Unix Shells
Let's look for some analogies to this dreaded technology. Unix shells have had search paths for decades. I don't have a whole lot of insight into their history, or what they where originally intended for, but the way I see it now, from the point of view of an
operating system development contributor myself, they are just as much a pain in the butt in their full generality. Because users can set hidden traps that can make any old harmless command do anything, various programming and security pains persist. The usual answers are again: set the path explicitly in every shell script, or add an explicit path to every command. The latter is obviously rarely done, and would by the way possibly fall into the same operator trap that I mentioned above. Who knew that
[ (as in
if [ "$foo" = bar ]) might actually be a separate executable:
$ ls -l '/usr/bin/['
-rwxr-xr-x 1 root root 38608 2009-06-05 03:17 /usr/bin/[
Many authors of shell scripts probably know that setting an explicit path at the top of the script is probably a good idea. Or is it? On this Debian system, I have 242 shell scripts in my
/usr/bin/, 181 of which don't appear to set a path. 14 out of 28 scripts in
/etc/init.d/ don't set a path. And out of the 14 that do, there are 12 different variants of the path that they actually choose to set. Which makes any actually sensible use of the path impossible. One such use might be to put a locally modified version of a common program into
/usr/local/bin/. Now some scripts will use it, some will not.
Linux distributions have been trying to move away from relying on this path business anyway. The Filesystem Hierarchy Standard only allows for a few directories for executable programs. Everything else is supposed to be organized by the package manager, and in practice perhaps by the occasional symlink. Solaris has a much greater mess of bin directories, such as
/usr/gnu/bin,
/usr/xpg4/bin, but even there I sense that they don't want to keep going into that direction.
Whither PostgreSQL?
In PostgreSQL, you might get the worst of all worlds:
- Explicitly qualifying everthing is completely unwieldy, and perhaps impossible if you can't modify the SQL directly.
- Programmatically modifying the search path in a correct way is pretty complicated.
- There is no standard for what goes into what schema.
- There is talk of adding more "standard schemas", potentially breaking all those applications who managed to get one of the above points right.
- Using the search path to override a system object by your own implementation doesn't work, at least in half of the applications that choose to do the supposedly right thing and alter the search path to their liking.
- Security definer functions (analogous to setuid scripts) don't have a safe default for the search path. You need to make sure you set a safe path in every one of them yourself (see also CVE-2007-2138). One wonders how many of those out there are broken in this regard. At least setuid shell scripts are usually not possible.
What might be a better solution? More modern programming languages such as Java or Python with their "import" mechanisms, where you explicitly choose what you want in what namespace, appear to have less problems of the sort I described. Of course there is still a CLASSPATH or a PYTHONPATH in the background, but that's actually just a workaround because the underlying operating system relies on search paths and doesn't have an import mechanism itself.
I think the straightforward solution at the moment is to ignore the whole schema issue as much as you can and don't let random people issue random SQL statements. Probably not completely safe, but at least it doesn't tangle you up in a giant mess.
If someone wants to defend the current system or has proposals for fixing it, please leave comments.
(picture by david.nikonvscanon CC-BY)