Pure SQL put-if-absent (or insert-if-not-exist)

I learned a little trick from a colleague this week: a pure SQL put-if-absent operation. I needed a database patch to insert a couple of rows into a database table. Sounds dull, but gets a little more interesting because I was trying to pay off a small technical debt: these rows had already been added to the production database. I needed an idempotent patch, one that would add the rows to any development database which doesn’t have them, but that wouldn’t give an error on the production database, which does.

It would have been reasonably simple to write in a procedural language (we use PostgreSQL, so it would have been PL/pgSQL), but for that I’d need to write a function, then call that function from a SELECT statement, then delete the function. But that all seems a bit messy.

To illustrate the SQL alternative we found, we’ll need a table to update. Here’s a playpen we can experiment in:

CREATE SEQUENCE playpen_id;
CREATE TABLE playpen (
  id INTEGER PRIMARY KEY DEFAULT NEXTVAL('playpen_id'),
  natural_1 text NOT NULL,
  natural_2 text NOT NULL,
  value_1 text,
  value_2 text,
  UNIQUE (natural_1, natural_2)
);

The table has a surrogate key (id), a composite natural key (natural_1, natural_2), and a couple of other columns (value_1, value_2).

The initial insert looked like this:

INSERT INTO
  playpen (natural_1, natural_2, value_1, value_2)
  VALUES ('A', 'B', 'value_A', 'value_B');

which is no good for our patch. We’ll get a duplicate key violation if we re-run this on the production database.

Here’s the idempotent version:

INSERT INTO
    playpen (natural_1, natural_2, value_1, value_2)
  SELECT rows_to_insert.* FROM (
      VALUES ('A', 'B', 'value_A', 'value_B')
    ) AS rows_to_insert (natural_1, natural_2, value_1, value_2)
    LEFT OUTER JOIN playpen USING (natural_1, natural_2)
    WHERE playpen.natural_id_1 IS NULL;

The secret is in full view in the PostgreSQL documentation for the INSERT command:

One can insert one or more rows specified by value expressions, or zero or more rows resulting from a query.

Instead of supplying VALUES to the insert, we supply a query which returns the values to insert. If we return an empty set, than nothing will be inserted.

The query promotes the previous set of VALUES to a relation by naming it (and its columns) with an alias, from which we select all values. We then LEFT OUTER JOIN onto the playpen table through the natural key: if the row of interest is present, the playpen.* columns will be populated with a copy of that row; if it is missing, they will be populated with NULLs. We only want to return a row in this latter case, so the WHERE clause finishes the job.

This extends nicely to the case where you have multiple rows to add. Just add more rows to the VALUES relation:

... FROM (
  VALUES
    ('A', 'B', 'value_A', 'value_B'),
    ('C', 'D', 'value_C', 'value_D')
) AS rows_to_insert (natural_1, natural_2, value_1, value_2)
...

It even works when run against a database that has one row but not the other.

I learned a few things while thinking about these few lines:

  • Relational theory scores a point here: we couldn’t guarantee this would work if the schema didn’t have an enforced natural key. Not only would it be harder to identify the column we wanted to insert, but if the columns used to join onto playpen were nullable, the WHERE clause couldn’t be used to reliably determine whether the row already exists.
  • When I read about idempotent patches previously (here, here and here) I was dismissive, believing that, by silently becoming no-ops, they’d mask errors. Now I’m working in a larger team, it’s becoming clear that this risk can be a useful (necessary?) trade-off.
  • Though I like to think I’m over the attraction of clever code in other languages, I’m possibly still too impressed with the clever tricks one can play in SQL. A patch based on the procedural solution might have explained my intent more clearly, yet I’m still writing this down because I don’t want to forget it. It almost suggests I’d be tempted to use it again, doesn’t it?

Leave a Reply

Your email address will not be published. Required fields are marked *