Provide stats on indexed metadata per origin.
Running the pgsql query in production right now (3M indexed origins) takes 3 seconds.
'{}'::jsonb @> (metadata - '@context')
looks computationaly expensive,
but it allows getting the number of non-empty metadata dicts in the same
scan as for the other ones.
As far as I know, postgresql does not offer an operator to test the strict inclusion of the set of keys, so the only other way I can think of is to test for string inequality with a second query.
Removing this sum()
from the query, and sending:
select count(*) from origin_intrinsic_metadata
where metadata != '{\"@context\": \"https://doi.org/10.5063/schema/codemeta-2.0\"}';
as a second query results in two queries, which take 2 seconds and 1 second respectively.
And this is less correct (because the context value is hardcoded), hence this implementation.
Migrated from D1040 (view on Phabricator)