Dangers of trying to load JSON as lines of text into Postgres.
reproduce error (on my machine)
create dir & download json:
mkdir jsonproblem
cd !$
curl -OL https://datahub.io/core/usa-education-budget-analysis/datapackage.json
load into postgres (from psql):
create schema jsonproblem;
set search_path to jsonproblem;
create table t(t text);
\copy t from datapackage.json csv quote e'\x01' delimiter e'\x02'
select string_agg(t,E'\n')::jsonb from t;
Results in this error:
ERROR: invalid input syntax for type json
DETAIL: Expected string, but found "}".
CONTEXT: JSON data, line 119: }...
try some stuff
view by row #:
CREATE VIEW ln AS
select row_number() over() as n, t from t;
select t as lines from ln where n > 117 and n < 121;
lines
───────────────────────────────────
"lineTerminator": "\r\n",
},
"quoteChar": "\"",
Back to shell, lines 118-120 from the file:
cat datapackage.json | sed -n '118,120p'
"lineTerminator": "\r\n",
"quoteChar": "\"",
"skipInitialSpace": false
😦
I suspect this is related to quoting/escaping issues. On the other hand, a similar line appears earlier in the file:
cat datapackage.json | sed -n '53,55p'
"lineTerminator": "\r\n",
"quoteChar": "\"",
"skipInitialSpace": false
Those lines loaded without error:
psql
set search_path to jsonproblem;
select t as lines from ln where n > 52 and n < 56;
lines
───────────────────────────────────
"lineTerminator": "\r\n",
"quoteChar": "\"",
"skipInitialSpace": false
🤔
Let’s see what happens if line 119 is deleted before loading:
\copy t from program 'cat datapackage.json|sed ''119d''' csv quote e'\x01' delimiter e'\x02'
select string_agg(t,E'\n')::jsonb from t;
No error. 😯
It also works if I delete the similar line 53 (leaving 119 untouched):
truncate t;
\copy t from program 'cat datapackage.json|sed ''53d''' csv quote e'\x01' delimiter e'\x02'
select string_agg(t,E'\n')::jsonb from t;
No error.
Solution
As suspected, loading as a binary JSON avoids this error.
Method from: http://blog.rhodiumtoad.org.uk/2018/02/11/loading-data-from-json-files/
From psql:
set search_path to jsonproblem;
\set filename datapackage.json
\set ON_ERROR_ROLLBACK interactive
begin;
\lo_import :filename
\set obj :LASTOID
create table testtab as
select *
from convert_from(lo_get(:'obj'),'UTF8') as t;
-- clean up the object
\lo_unlink :obj
commit;
Loaded this way, select t::jsonb from testtab;
produces no error.
🥹