Importing multiline JSON objects into PostgreSQL

The easy way to load a JSON object into postgres is to use one of the many existing external tools, but I wanted to see what I can do with postgres alone. I’m funny that way.

This is what I’ve learned so far.

simple JSON: import as tab delimited

I’ll start with this minimal Tabular Data Package example, which I copied and saved as datapackage.json.

{
  "profile": "tabular-data-package",
  "name": "my-dataset",
  // here we list the data files in this dataset
  "resources": [
    {
      "path": "data.csv",
      "profile": "tabular-data-resource",
      "schema": {
        "fields": [
          {
            "name": "var1",
            "type": "string"
          },
          {
            "name": "var2",
            "type": "integer"
          },
          {
            "name": "var3",
            "type": "number"
          }
        ]
      }
    }
  ]
}

(The fourth line, using // as a comment, is not valid JSON. Delete it.)

From psql, create a schema and table for some experiments:

stan=> create schema json_demo;
CREATE SCHEMA
stan=> create unlogged table t(t text);
CREATE TABLE

Load as tab delimited text (the default format for the COPY command):

stan=> \copy t from datapackage.json
COPY 26

Concatenate the lines and convert to JSON:

stan=> select string_agg(t, E'\n')::json from t;
                string_agg
───────────────────────────────────────────
 {                                        ↵
   "profile": "tabular-data-package",     ↵
   "name": "my-dataset",                  ↵
   "resources": [                         ↵
     {                                    ↵
       "path": "data.csv",                ↵
       "profile": "tabular-data-resource",↵
       "schema": {                        ↵
         "fields": [                      ↵
           {                              ↵
             "name": "var1",              ↵
             "type": "string"             ↵
           },                             ↵
           {                              ↵
             "name": "var2",              ↵
             "type": "integer"            ↵
           },                             ↵
           {                              ↵
             "name": "var3",              ↵
             "type": "number"             ↵
           }                              ↵
         ]                                ↵
       }                                  ↵
     }                                    ↵
   ]                                      ↵
 }

Success.

use csv format with special parameters

However, if the JSON contains backslashes, like the readme in this data package, it will fail:

stan=> truncate t;
TRUNCATE TABLE
stan=> \copy t from datapackage.json
COPY 220
stan=> select string_agg(t, E'\n')::json from t;
ERROR:  invalid input syntax for type json
DETAIL:  Character with value 0x0a must be escaped.
CONTEXT:  JSON data, line 24: ...country names and code elements. This list states

Andrew Dunstan’s solution is to use the csv format with these options: csv quote e'\x01' delimiter e'\x02'

stan=> truncate t;
TRUNCATE TABLE
stan=> \copy t from datapackage.json csv quote e'\x01' delimiter e'\x02'
COPY 220
stan=> select (string_agg(t, E'\n')::json)->'name' as name from t;
      name
────────────────
 "country-list"

This method often works, but not always.

use \set to load into a variable

Perhaps due to the (repeated) escaped quote character sequences, I was unable to load this file with Dunstan’s method: https://datahub.io/core/usa-education-budget-analysis/datapackage.json.

stan=> truncate t;
TRUNCATE TABLE
stan=> \copy t from datapackage.json csv quote e'\x01' delimiter e'\x02'
COPY 450
stan=> select string_agg(t, E'\n')::json as j from t;
ERROR:  invalid input syntax for type json
DETAIL:  Expected string, but found "}".
CONTEXT:  JSON data, line 119:     }...

Stackoverflow’s Doctor Eval recommends loading the file into a psql variable, which works on this file.

stan=> \set s `cat datapackage.json`
stan=> select (:'s'::json)->'name' as name;
              name
─────────────────────────────────
 "usa-education-budget-analysis"

There must be a limit to how large of a file can be processed this way, but I haven’t encountered it yet and am not in a rush to do so. (I’ve learned enough for now.)

Discovering the limits to this method is left as an exercise to the reader.

use \lo_import to load into a variable

Finally, a slightly more complicated method uses the psql \lo_import command to load the file into a variable, which also works on the above file. Since the simpler method has worked for my tests (so far), I’ll stick with that for the time being.

Sources

Andrew Dunstan’s PostgreSQL and Technical blog: Importing JSON data - import with csv format

\copy t from datapackage.json csv quote e'\x01' delimiter e'\x02'
select string_agg(t, E'\n')::json from t;

stackoverflow: Doctor Eval: How can I import a JSON file into PostgreSQL? - load into psql variable

\set s `cat datapackage.json`
select jsonb_pretty(:'s'::jsonb);

The Rhodium Toad: Loading data from JSON files - load as large object with lo_import

BEGIN;
\set filename datapackage.json
\lo_import :filename
\set obj :LASTOID
INSERT INTO import_json
  SELECT *
    FROM convert_from(lo_get(:'obj'),'UTF8');
\lo_unlink :obj
COMMIT;

Discuss - Comments and questions are (more than) welcome.

Importing multiline JSON objects into PostgreSQL

PostgreSQL Stan

simple JSON: import as tab delimited

use csv format with special parameters

use \set to load into a variable

use \lo_import to load into a variable

Sources

You May Also Enjoy

Reconsidering bang history

Prepend, don’t append, PATH

Storing image assets in your repo and referencing in markdown Permalink

zsh: confounded by zshcompletions