Replace ComputeDirectoryFrontier's CSV output with a list of nodes in Parquet
and remove useless columns, only keep node ids.
This reduces its runtime from 40 to 5 min, removes the need for DeduplicateDirectoryFrontier (which took 1h 10min), and reduces the output size from 477GB+18GB to 750MB.
This has a negligeable performance improvement on the readers (they are saturated by path-aware traversals anyway)
Edited by vlorentz