Peter Eisentraut's Blog: Running SQL scripts with psql

Saturday, March 13, 2010

Running SQL scripts with psql

If you are using PostgreSQL, have you ever loaded an SQL script file or dump file through psql? Well, duh.

-f vs. <

If you are just starting out, you will probably try this:

psql mydb < dump.sql

and that's quite OK.

Once you hit your first error message such as

ERROR:  syntax error at or near "VEIW"

you might figure out that for some reason

psql mydb -f dump.sql

is better, because it produces

psql:dump.sql:56: ERROR:  syntax error at or near "VEIW"

instead, allowing you to actually find the error in your file.

Now I admit that it is almost entirely me who is to blame for this bizarre difference, because at some point in the distant past, the GNU Coding Standards recommended that programs should behave the same independent of whether the standard input or output is a terminal or a file. The current version of said standard actually explicitly creates an exception saying that error messages should be changed to the noninteractive style when the standard input is not from a terminal. So this should probably be fixed.

Note that the -f form above is not portable. It depends on the GNU getopt extension that permits options after nonoption arguments. To be portable, you need to write either

psql -d mydb -f dump.sql

psql -f dump.sql mydb

Frankly, I hardly ever do this because I rarely use a non-GNU system, but keep it in mind when writing scripts or documentation intended to be portable.

psqlrc

The next thing you should always do when running psql scripts is using the option -X, which prevents the reading of the .psqlrc file. Because that file could contain anything, and you have no idea how it will interact with your script.

In my mind, this is a design mistake in psql. Unix shells have different startup files for interactive and noninteractive usage, so they don't have this problem.

Quiet or Loud

Anyway, if you are restoring a dump, these commands will produce output that does something like this:

CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE FUNCTION
CREATE TYPE
CREATE FUNCTION
ALTER TABLE
ALTER TABLE
CREATE INDEX
CREATE INDEX

These are the responses from the DDL commands that are in your SQL script file. I personally like to not see these, because they don't tell me anything useful and they cause the important information such as error messages to fly rapidly off the screen.

Some people might like these responses because they serve as a kind of progress indicator. Which is true, but there is a flaw. Creating tables and functions is pretty fast. Chances are that that part of the dump file will fly way off your screen in two seconds. The slow part of a database restore are the COPY commands that restore the data. And those do not produce any response at all! So as a progress report, this output is not all that useful. OK, the ALTER TABLE parts at the end that create the foreign keys can be slow, and of course the CREATE INDEX commands, but in the above example, the longest wait would be after the last CREATE FUNCTION.

There are two ways to make this behavior more sane: If you don't want the "progress" output, only the errors and other important messages, use the option -q (or --quiet). This is probably more useful for restoring a dump on the console. If you want a full progress report, use the option -a (or --all), which will in addition to the default behavior print each statement from the file before executing it. This is probably fairly useful in batch jobs where you are logging the output to a file or somewhere. But most likely you will want to use one of these two options for almost any noninteractive invocation of psql. You can also somewhat usefully use both options at the same
time; think about it.

Notices

On the matter of controlling the output, you might want to hide the NOTICE messages such as

NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "test1_pkey" for table "test1"

These get pretty boring quickly, especially when you create a lot of tables with a lot of primary keys.

psql doesn't provide a direct command-line option for turning this off, but there is an obscure method: See the environment variable PGOPTIONS to '--client-min-messages=warning', which will pass that option to the server process. So all together, your call might look like

PGOPTIONS='--client-min-messages=warning' psql -d mydb -q -f dump.sql

For some tastes, it might actually make sense to change the client_min_messages option permanently in postgresql.conf on the server.

Also note that analogous to the case of .psqlrc described above, PGOPTIONS could contain all kinds of settings that you don't want. So even if you don't want to disable notices as shown here, it would be most prudent to unset the environment variable in other cases.

Transactions

The next thing you might want to consider is using transactions, or rather a single transaction. Otherwise, if the execution of the script or restoration of the backup fails somewhere along the way, you have a half-loaded database and no easy restart point. Sometimes this doesn't matter that much; if you are restoring into a new database, you can just drop it and start over. If you are deploying updates into an existing system, much more care is required. The option to execute a file in a single transaction is -1 or --single-transaction.

Now depending on what exactly is in the file you want to load, using a single transaction may or may not work very well. For dump files created with pg_dump, it usually works, unless you used the -c/--clean option, in which case the DROP commands that appear at the beginning of the file might not have anything to drop and will fail, failing the entire transaction. That problem could be addressed if pg_dump used DROP IF EXISTS, but it doesn't.

When deploying schema changes into an existing database, this can get very tricky. For functions, you can usually use CREATE OR REPLACE FUNCTION to have an idempotent function creation command. Except that it will fail when the return type was changed. And there is no CREATE OR REPLACE AGGREGATE or CREATE OR REPLACE OPERATOR (yet?). With other object classes such as types or casts it can be easier to just unconditionally drop and recreate the objects. If you drop a type, however, everything that uses that type is also dropped, so be sure to recreate everything (such as functions) afterwards in the same transaction. This won't help when the type is used in tables, though; don't blow away your tables. Exactly how to manage this type of situation is a science of its own and would go beyond the scope of this post.

Side note: An alternative for some situations is using the psql option ON_ERROR_ROLLBACK, which allows you to ignore errors but still make use the atomicity property of transactions, useful when doing schema upgrades.

If you are convinced that running SQL scripts in transactions is the right thing, and you have adjusted your scripts to behave properly in that context, then you may also wish to consider the option

-v ON_ERROR_STOP=1

This causes psql to stop execution when a transaction fails. Otherwise it would continue to execute the rest of the script and issuing the error message

ERROR:  current transaction is aborted, commands ignored until end of transaction block

for every command, which makes no sense. There was a discussion a while ago about making ON_ERROR_STOP default to on when the single-transaction option is used, but it was not clear how this should interact with savepoints. If you are making use of savepoints, you may want to stay away from this option or evaluate yourself whether it makes sense for you.

Pager

If you run a psql script with the output on a terminal (which is normal when installing databases, restoring backups, etc.), the script executes a SELECT command, and the output doesn't fit on the screen, the output is run through the pager, which will normally wait for some key to be pressed to continue. This is obviously an extremely useful feature in psql in interactive mode, but it also happens when you run a script, which is dubious.

Often, this won't be a problem, because backup files for instance don't contain SELECT commands with large outputs. But you can simulate this by taking any dump file that restores a sequence, which will contain SELECT setval(...) calls. If you make your terminal window 4 lines or less, you can see the effect of this. Again, this is contrived, but every so often someone puts a SELECT command in a file to create some kind of visual verification that the tables or the functions or the data that the script was supposed to load is now really there. There is nothing wrong with that, except when you run the script from the terminal and forget to check back with extreme frequency. The old joke that someone left a transaction open before going to lunch has never been more real.

Add the option

--pset pager=off

to the psql invocation to disable this behavior.

Summary

OK, got all that? Write this down, here is how you really should execute an SQL script file in psql:

PGOPTIONS='--client-min-messages=warning' psql -X -q -a -1 -v ON_ERROR_STOP=1 --pset pager=off -d mydb -f dump.sql

Should be easier? I think so. Discuss.

22 comments:

Bruce Momjian13 March, 2010 21:17
Wow, great post. I knew some of these things but never saw it all put together so clearly.
ReplyDelete
Replies
Tom Lane13 March, 2010 21:44
Well, psql is fundamentally meant as an interactive tool. I think what this demonstrates is that there might be room for a different tool meant for executing scripts.
ReplyDelete
Replies
Unknown15 March, 2010 12:22
I found this all really informative. I particularly like that you explained everything and didn't just say "use these args it's better." In fact, I actually just used this information to debug a big schema-modification script. Even with bizarre new errors I'd never seen before to do with composite types, I still managed to get it working in a fraction of the time it would normally take me (I did the psql < file.sql way before).

As far as making it easier, for my own personal use I'll probably alias it in .bashrc, but Tom Lane's idea of a separate tool meant purely for executing (and likely debugging) scripts seems logical.
ReplyDelete
Replies
Berend de Boer23 March, 2010 16:56
I have used postgresql for years, and I learned things from this. Clearly psql should be changed to have as a default what you describe here (in batch mode that is).
ReplyDelete
Replies
Unknown28 April, 2010 09:19
@Big Bear

Use something like -v ON_ERROR_STOP without equal sign to unset a variable. Or the \unset command in your script.
ReplyDelete
Replies
Raghavendra16 June, 2010 02:18
Great Posting Peter,

Does PGOPTIONS will override the .psqlrc or command line option psql --pset <> -c <>

Regards
Raghavendra
ReplyDelete
Replies
Unknown21 June, 2010 16:30
The order of evaluation is

psql command line
PGOPTIONS
.psqlrc

The later overrides the earlier.
ReplyDelete
Replies
keens sandals28 August, 2010 12:34
Nice post peter. I been looking for this stuff, been using PostgreSQL for quite sometime everything comes so clear after reading your post.
____________________
--Life can only be understood backwards, but it must be lived forward. ~Soren Kierkegaad
ReplyDelete
Replies
Unknown25 January, 2011 01:50
Great Post Peter:
Two questions..since you seem to know a lot of psql

1. Can you give me an example of using the hyphen in the -f option which is the direct input. How do I terminate that? In Unix I used to do < File and end with a File in a new line.
2. How can I set a variable inside my
psql -f -
set variable here from the shell script
-

Thanks, Venki
ReplyDelete
Replies
Unknown29 January, 2011 02:27
It's the same syntax:

psql -f- < file
ReplyDelete
Replies
Anonymous03 February, 2011 23:29
hi i am new to psql , and i am writing a shell script.
i want to pass values from shell scritp to my psql file so that i can run a loop for that number of times ....wht will be the statement to write in shell script......
ReplyDelete
Replies
Anonymous15 August, 2011 17:28
Very nice post. Thank you for all the clear explanations of what things do and why you would want them to behave that way. I found this to be very helpful.
ReplyDelete
Replies
Derek Sivers17 November, 2011 21:22
Thank you so much for this post! I am using PostgreSQL schema dropping-and-creating in my unit tests, so my screen has been filled with thousands of NOTICEs for months until now. Thanks for taking the time to write this.
ReplyDelete
Replies
Tomek21 February, 2013 11:19
Thank you, this was exactly what I was looking for! :)
ReplyDelete
Replies
shevy14 February, 2014 05:02
Helpful. Thanks!
ReplyDelete
Replies
zsero01 May, 2015 07:32
Also, pg_dump supports --clean --if-exists since 9.4
ReplyDelete
Replies
Unknown16 September, 2015 15:58
Within a Bash script file, I have the following code snippet, for looping though and executing a collection of SQL files in Postgres. Status messages for each SQL file are written to the output text file. Works fine except for one major headache, noted below.

# $f is a sql file (of DDL mostly). A loop reads a collection from a folder.

…enter loop here

for f in $target
do

TEMPFILE="mktemp temp.$$"
trap "rm -rf $TEMPFILE" EXIT

PGOPTIONS='--client_min_messages=LOG’ \
psql -X -q \
-v ON_ERROR_STOP=1 \
-h $hostname -d frogtalk -U bigfrog -w -f $f \
>> $TEMPFILE

psql_exit_status=$?

cat $TEMPFILE >> $output_file
rm -rf $TEMPFILE

if [ $psql_exit_status != 0 ]; then
echo -e “*** psql exited with value of $psql_exit_status while trying to run sql script" >> $output_file
return_code=1
break
else
continue
fi

done

….do more work

The headache is that when a SQL error is encountered in any one file, the process simply exits on the spot. Instead, I’d like a way to more gracefully exit PSQL, so that I can 1) capture the PSQL “exit status” value in my outout file, 2) continue with some more Bash commands, 3) end the bash script normally. Suggestions for a transition out of the PSQL error would be really helpful. I’ve done a thousand web queries with no success on finding a solution.

Script file #10: create table edw observer.sql ...
psql:/Users/MacDesktop/Documents/GitHub Repository/frogtalk/create table edw observer.sql:10: LOG: statement:

DROP TABLE IF EXISTS edw.observer_dim RESTRICT;
psql:/Users/MacDesktop/Documents/GitHub Repository/frogtalk/create table edw observer.sql:10: NOTICE: table "observer_dim" does not exist, skipping
psql:/Users/MacDesktop/Documents/GitHub Repository/frogtalk/create table edw observer.sql:23: ERROR: syntax error at or near "("
LINE 8: PRIMARY KEY (observer_pk) WITH (FILLFACTOR=90) USING IN...
^

[Process completed]
ReplyDelete
Replies
Unknown16 September, 2015 16:26
Two Oops in what I sent a moment ago! Sorry for being too quick to hit submit.

The last line of the psql command should have been:
&> $TEMPFILE

And I get no info about the error written to my output file. I can only infer which sql file failed.
ReplyDelete
Replies
PK03 June, 2016 06:53
Hello Peter,

I am new to PostgreSql. Please give a solution for my problem.
My postgre database is UTF8 encoded. But I need to run a insert script which is having characters set ISO-8859-1
when executing this file using psql,getting < ERROR: invalid byte sequence for encoding "UTF8": 0xaa>.

I am using the below command

PGOPTIONS='--client_encoding=iso-8859-1' psql -X -q -a -1 -v ON_ERROR_STOP=1 --pset pager=off -d myDB -f dump.sql

Could yo please suggest a solution ?
ReplyDelete
Replies
Anonymous27 July, 2017 03:07
Thanks a lot ! I spend 4 - 5 hrs to figure out details fr each command in shell script, IT really helps.
Appreciate your time and effort.
ReplyDelete
Replies
Purbasha11 February, 2018 02:24
Great post. I am writing some commands that I am going to run through psql to modify data in a live database and the ability to run the whole thing in a transaction is a great safety net to have.
ReplyDelete
Replies