Staging instance, all changes can be removed at any time

Skip to content

Provide stats on indexed metadata per origin.

Running the pgsql query in production right now (3M indexed origins) takes 3 seconds.


'{}'::jsonb @> (metadata - '@context') looks computationaly expensive, but it allows getting the number of non-empty metadata dicts in the same scan as for the other ones.

As far as I know, postgresql does not offer an operator to test the strict inclusion of the set of keys, so the only other way I can think of is to test for string inequality with a second query.

Removing this sum() from the query, and sending:

select count(*) from origin_intrinsic_metadata
where metadata != '{\"@context\": \"https://doi.org/10.5063/schema/codemeta-2.0\"}';

as a second query results in two queries, which take 2 seconds and 1 second respectively.

And this is less correct (because the context value is hardcoded), hence this implementation.


Migrated from D1040 (view on Phabricator)

Merge request reports

Loading