idx.storage.db: Improve content range queries to actually finish
In production, the previous query version takes too long. This ended up failing with timeouts.
Counter-intuitively, specifying twice the range query to use (both in content_mimetype and content_fossology_license), the execution now returns immediately the expected results.
Closes #1788 (closed)
Test Plan
tox
before (plus with the new index content_mimetype_id_idx
created here), this does not finish in time:
softwareheritage-indexer=> explain select t.id from content_fossology_license t inner join indexer_configuration ic on ic.id=t.indexer_configuration_id inner join content_mimetype cm on (t.id=cm.id and cm.mimetype like 'text/%') where ic.id=1 and '\xbbb6ed677707a2ac700000000000000000000000'::bytea <= t.id and t.id <= '\xbbb6f52d234ebdf3f80000000000000000000000'::bytea order by t.indexer_configuration_id, t.id limit 1000;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1.55..16.66 rows=1000 width=40)
-> Merge Join (cost=1.55..169248984.34 rows=11202576684 width=40)
Merge Cond: ((t.id)::bytea = (cm.id)::bytea)
-> Nested Loop (cost=0.84..547141.79 rows=93334 width=40)
-> Index Only Scan using content_fossology_license_pkey on content_fossology_license t (cost=0.70..545972.96 rows=93334 width=40)
Index Cond: ((id >= '\xbbb6ed677707a2ac700000000000000000000000'::bytea) AND (id <= '\xbbb6f52d234ebdf3f80000000000000000000000'::bytea) AND (indexer_configuration_id = 1))
-> Materialize (cost=0.14..2.16 rows=1 width=4)
-> Index Only Scan using indexer_configuration_pkey on indexer_configuration ic (cost=0.14..2.15 rows=1 width=4)
Index Cond: (id = 1)
-> Materialize (cost=0.70..722972.33 rows=24005350 width=32)
-> Index Only Scan using content_mimetype_id_idx on content_mimetype cm (cost=0.70..662958.95 rows=24005350 width=32)
(11 rows)
Now (with the content of this diff), this finishes:
softwareheritage-indexer=> explain select t.id from content_fossology_license t inner join content_mimetype cm on (t.id=cm.id and cm.mimetype like 'text/%' and '\xbbb6ed677707a2ac700000000000000000000000'::bytea <= cm.id and cm.id <= '\xbbb6f52d234ebdf3f80000000000000000000000'::bytea) where t.indexer_configuration_id=1 and '\xbbb6ed677707a2ac700000000000000000000000'::bytea <= t.id and t.id <= '\xbbb6f52d234ebdf3f80000000000000000000000'::bytea order by t.indexer_configuration_id, t.id limit 1000;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1.41..26.23 rows=1000 width=40)
-> Merge Join (cost=1.41..1390340.54 rows=56013000 width=40)
Merge Cond: ((t.id)::bytea = (cm.id)::bytea)
-> Index Only Scan using content_fossology_license_pkey on content_fossology_license t (cost=0.70..545972.96 rows=93334 width=40)
Index Cond: ((id >= '\xbbb6ed677707a2ac700000000000000000000000'::bytea) AND (id <= '\xbbb6f52d234ebdf3f80000000000000000000000'::bytea) AND (indexer_configuration_id = 1))
-> Materialize (cost=0.70..4239.31 rows=120027 width=32)
-> Index Only Scan using content_mimetype_id_idx on content_mimetype cm (cost=0.70..3939.24 rows=120027 width=32)
Index Cond: ((id >= '\xbbb6ed677707a2ac700000000000000000000000'::bytea) AND (id <= '\xbbb6f52d234ebdf3f80000000000000000000000'::bytea))
(8 rows)
Migrated from D2709 (view on Phabricator)