Staging instance, all changes can be removed at any time

Skip to content

Replace ComputeDirectoryFrontier's CSV output with a list of nodes in Parquet

vlorentz requested to merge ComputeDirectoryFrontier-parquet into master

and remove useless columns, only keep node ids.

This reduces its runtime from 40 to 5 min, removes the need for DeduplicateDirectoryFrontier (which took 1h 10min), and reduces the output size from 477GB+18GB to 750MB.

This has a negligeable performance improvement on the readers (they are saturated by path-aware traversals anyway)

Edited by vlorentz

Merge request reports

Loading