Staging instance, all changes can be removed at any time

Skip to content
  • @vlorentz As per tqdm, it's downloading 10k files in ~2.73 seconds. I got 30 files of nearly the same size (I'm calling them chunks) containing ~40M entries each. That's 1.2 billion files to process. So it will take 91 hours (((40_000_000 * 30) / 10_000) * 2.73 / 60 / 60) to just download all the files in batches of 100.

    Do you have any suggestions on how to improve this?

    Edited by Kumar Shivendu
  • I don't know how it compares, but the way I do it is just start a dozen threads which download in parallel.

    91 hours isn't that bad considering you are working at the scale of the whole archive.

  • how many threads exactly? I'm using 20 at the moment.

  • Ohh, wait. I made a calculation mistake. It's downloading only 100 files in 2.73s. So that's 9100 hours (379 days). Pretty bad.

    Do you have any example script that I can refer to optimize/rewrite my code?

  • Awesome. Thanks a lot!

0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment