Draft: Add a learning/replaying proxy for loaders and listers
In order to make our test suite more reproducible, we configure a proxy for the loaders and listers. It can work in two modes: learning and replaying. In learning mode, HTTP(S) requests will be forwarded to their corresponding servers while recording the exchange. In replaying mode, the record will be used to replay the same exchanges (while denying any other requests), without reaching out to the wider Internet.
The idea is that the test suite should be run on Jenkins only in replay mode, so it won’t be affected by network or service issues.
The implementation uses mitmproxy. It is run in a new proxy Docker
service. mitmproxy is able to intercept HTTPS request and generate
new X509 certificates on the fly. The certificate for the authority
signing these generated certificates is installed as trusted in our
swh/stack image.
Network records are written to assets/mitmproxy/swh-tests. A
placeholder file is added to create the empty repository.
Usage of the proxy is fully contained in docker-compose.proxy.yml that
is meant to be used as on override. Changing the mode of the proxy is
done by changing the command stanza from learn to replay and
vice-versa.
Variables in env/proxy.env are set in the swh-loader and
swh-lister containers. Internal hosts should be added to the
no_proxy variable.
We also have to set PYTHONHASHSEED to a fixed value in order to (at
least) have swh-loader-git send the same network request every time.