MERGEstatement has gotten my attention again. For many years, PostgreSQL users have been longing for a way to do an "upsert" operation, meaning do an
UPDATE, and if no record was found do an
INSERT(or the other way around). Especially MySQL users are familiar with the
REPLACEstatement and the
INSERT ... ON DUPLICATE KEY UPDATEstatement, which are two variant ways to attempt to solve that problem (that have interesting issues of their own). Of course, you can achieve this in PostgreSQL with some programming, but the solutions tend to be specific to the situation, and they tend to be lengthier than one would want.
Discussions on this then usually proceed to speculate that the SQL-standard
MERGEstatement ought to be the proper way to solve this, and then it turns out that no one completely understands the
MERGEsyntax or semantics, especially as they apply to this upsert problem. (I was in that group.) And that's been the end of that so far. OK, as I write this I am pointed, via Robert Haas's blog post, to an older mailing list post by Simon Riggs, who is surely one of the men most qualified to drive an eventual implementation, that contains a hint toward the solution, but it's hard to find in that post, if you want to try.
This subject had gotten my attention again at the SQL standard working group meeting I attended a few weeks ago, where I learned that in SQL:2011, a
DELETEbranch has been added to
MERGE. We also took some time after the official part of the meeting to work through some examples that illustrate the uses of the
Let's take a look at what the
MERGEstatement is originally supposed to do, and where the term "merge" arose from. Let's say you have a table with outstanding balances, such as
CREATE TABLE balances ( name text, balance numeric );and at intervals you get a list of payments that your organization has received, such as
CREATE TABLE payments ( name text, payment numeric );What you want to do then is to "merge" the payments table into the balances table in the following way:
- If a balance exists, subtract from it.
- If the balance goes to zero, delete the entire record.
- If no balance exists, create a record (maybe someone pre-paid).
MERGE INTO balances AS b USING payments AS p ON p.name = b.name WHEN MATCHED AND b.balance - p.payment = 0 THEN DELETE WHEN MATCHED AND b.balance - p.payment <> 0 THEN UPDATE SET balance = balance - p.payment WHEN NOT MATCHED THEN INSERT (name, balance) VALUES (p.name, -b.payment);Of course there are simpler cases, but this shows all of the interesting features of this command.
How does this get us upsert? There, you don't have two tables, but only one and some values. I have seen some claims and examples about this in the wild that turn out to be wrong because they evidently violate the syntax rules of the SQL standard. So I did the only sensible thing and implemented the
MERGE syntax into the PostgreSQL
parser on the flight back, because that seemed to be the best way to verify the syntax.
So the correct way, I believe, to do, say, an upsert of the balances table would be:
MERGE INTO balances AS b USING (VALUES ('foo', 10.00), ('bar', 20.00)) AS p (name, payment) ON p.name = b.name WHEN MATCHED AND b.balance - p.payment = 0 THEN DELETE WHEN MATCHED AND b.balance - p.payment <> 0 THEN UPDATE SET balance = balance - p.payment WHEN NOT MATCHED THEN INSERT (name, balance) VALUES (p.name, -b.payment);Not all that nice and compact, but that's how it works.
Note that the
AS clause after
VALUES is required. If you leave it off, the PostgreSQL parser complains that a subquery in
FROM needs an
AS clause. Which is obviously not what this is, but it uses the same grammar rules, and it makes sense in this case because you need a correlation name to join against. And it was also one of those rare moments when you implemented something that gives you correct feedback that you didn't even provide for.
Anyway, the examples above all parse correctly, but they don't do anything yet. But if someone wants to implement this further or just try out the syntax, I'll send my code.