Dangers of trying to load JSON as lines of text into Postgres.

reproduce error (on my machine)

create dir & download json:

mkdir jsonproblem
cd !$
curl -OL https://datahub.io/core/usa-education-budget-analysis/datapackage.json

load into postgres (from psql):

create schema jsonproblem;
set search_path to jsonproblem;
create table t(t text);
\copy t from datapackage.json csv quote e'\x01' delimiter e'\x02'
select string_agg(t,E'\n')::jsonb from t;

Results in this error:

ERROR:  invalid input syntax for type json
DETAIL:  Expected string, but found "}".
CONTEXT:  JSON data, line 119:     }...

try some stuff

view by row #:

CREATE VIEW ln AS
  select row_number() over() as n, t from t;
select t as lines from ln where n > 117 and n < 121;
               lines
───────────────────────────────────
         "lineTerminator": "\r\n",
     },
         "quoteChar": "\"",

Back to shell, lines 118-120 from the file:

cat datapackage.json | sed -n '118,120p'
        "lineTerminator": "\r\n",
        "quoteChar": "\"",
        "skipInitialSpace": false

😦

I suspect this is related to quoting/escaping issues. On the other hand, a similar line appears earlier in the file:

cat datapackage.json | sed -n '53,55p'
        "lineTerminator": "\r\n",
        "quoteChar": "\"",
        "skipInitialSpace": false

Those lines loaded without error:

psql
set search_path to jsonproblem;
select t as lines from ln where n > 52 and n < 56;
               lines
───────────────────────────────────
         "lineTerminator": "\r\n",
         "quoteChar": "\"",
         "skipInitialSpace": false

🤔

Let’s see what happens if line 119 is deleted before loading:

\copy t from program 'cat datapackage.json|sed ''119d''' csv quote e'\x01' delimiter e'\x02'
select string_agg(t,E'\n')::jsonb from t;

No error. 😯

It also works if I delete the similar line 53 (leaving 119 untouched):

truncate t;
\copy t from program 'cat datapackage.json|sed ''53d''' csv quote e'\x01' delimiter e'\x02'
select string_agg(t,E'\n')::jsonb from t;

No error.

Solution

As suspected, loading as a binary JSON avoids this error.

Method from: http://blog.rhodiumtoad.org.uk/2018/02/11/loading-data-from-json-files/

From psql:

set search_path to jsonproblem;
\set filename datapackage.json
\set ON_ERROR_ROLLBACK interactive
begin;
\lo_import :filename
\set obj :LASTOID
create table testtab as
  select *
    from convert_from(lo_get(:'obj'),'UTF8') as t;
-- clean up the object
\lo_unlink :obj
commit;

Loaded this way, select t::jsonb from testtab; produces no error.

🥹

Tags:

Published:    |    Updated: