Add PopularContentPaths
This is similar to PopularContentNames, but it can recurse more than one step in the directory hierarchy.
I originally tried to extended PopularContentNames
, but merely
generalizing the code caused the existing case (maxDepth=1) to take over
3 days instead of ~20 hours; so it makes sense to keep the existing
implementation of PopularContentNames
.
The significant change compared to PopularContentNames
(besides the
extra maxDepth
parameter) is that SWHIDs are taken from stdin instead
of computing popular paths for every content in the graph, because the
latter requires unreasonable resources (ETA: over 20 days, while maxing
out 96 CPUs).
Additionally, adding the maxDepth
parameter in combination with the other
ones (max_results_per_cnt
, popularity_threshold
) started making the code
a little unwieldy; and they are probably not useful here, so they are
excluded.