SVN loader: Create local dump of remote repository to speed up loading task
As the "save code now" feature will be soon deployed to production, it would be nice to offer the possibility to save subversion origins as they are numerous public repositories available out there.
Currently, there is two ways of loading a subversion repository into the archive:
- through a dump file of the repository (generated with svnadmin)
- directly through the repository url, using a svn client to request needed data to the associated server
While the latter is what we want to use to save arbitrary public svn repositories, there is a couple of issues with it:
- it floods the subversion server with a lot of requests
- it is quite slow (due to the client/server ping pong)
I think we should have a similar behavior for loading svn origins as the one used for git and mercurial: clone the whole repository locally, then use the dump to load it into the archive. Fortunately, subversion has an utility command called svnrdump [1] that can generate a repository dump stream of revisions. By saving that dump stream to a file, we can then use it to load the repository into the archive.
My first experiments show a great speedup in svn loading tasks when using a local repository dump. For instance, to load this svn repository with 434 revisions hosted on sourceforge: http://svn.code.sf.net/p/e-foto/code/, it took 2986s using the client/server ping pong approach and 594s using the dump first then load approach, so a 5x speedup.
Related T336
Migrated from T1161 (view on Phabricator)