Two deposits of files the same name and a "compound" extension fails the checker
If I upload the same file name to the deposit twice, Django adds a random string before the extension in the file name, so we have things like:
swh@4f4897bb636a:/$ ls /tmp/swh-deposit/uploads/client_1/
swh-deposit.tar.gz swh-deposit.tar_9bEbkyF.gz swh-deposit.tar_LHM1Y7e.gz swh-deposit.tar_ZdRLQAZ.gz swh-deposit.tar_axXOAqS.gz
while this is fine for "single" extensions (such as .tgz
), it is not for .tar.gz
and the like.
I believe the right fix would be to change Django's storage class to deduplicate file names in a different ways (add the random string as prefix, or create a dir): https://docs.djangoproject.com/en/3.0/topics/files/#the-built-in-filesystem-storage-class
In Django 3, we could simply subclass FileSystemStorage
and override get_alternative_name; but the deposit currently uses Django 2, so we would need to override this entire function: https://github.com/django/django/blob/98ef3829e96ebc73d4d446f92465e671ff520d2b/django/core/files/storage.py#L63-L92
Migrated from T2393 (view on Phabricator)