storage(s): Identify and provide the collision hash(es) in HashCollision exception
Storage clients may need to have details about what's colliding during adding contents. To determine how to overcome the issue for example [1]
Furthermore, this unifies the inconsistent behavior in the 3 storage backends [2]
Now all storage backends provides the HashCollision exception with the following parameter signatures: (algo: str, hash_id: bytes, content_hashes: Dict[str, bytes])
I'm not too happy about the implementation though.
I did not find a better way than to parse a specific exception message from
psycopg2 (e.diag.message_detail
)... ouch.
-
[1] https://forge.softwareheritage.org/swh/devel/swh-journal!114
-
[2] Sample output for the collided content prior to this diff (pdb breakpoint during integration test)
in-memory:
(('blake2s256', b"\xd5\xfe\x199We'\xe4,\xfdv\xa9EZ$2\xfe\x7fVf\x95dW}\xd9<B\x80\xe7mf\x1d"), ('sha1', b'4\x972t\xcc\xefj\xb4\xdf\xaa\xf8e\x99y/\xa9\xc3\xfeF\x89'), ('sha1_git', b'\xd8\x1c\xc0q\x0e\xb6\xcf\x9e\xfd[\x92\n\x84S\xe1\xe0qW\xb6\xcd'), ('sha256', b'h6P\xf96\xcb;\n/\x93\xce\t\xd8\x1b\xe1\x07H\xb1\xb2\x03\xc1\x9e\x81v\xb4\xee\xfc\x19d\xa0\xcf:'))
cassandra:
[Row(sha1_git=b'\xd8\x1c\xc0q\x0e\xb6\xcf\x9e\xfd[\x92\n\x84S\xe1\xe0qW\xb6\xcd', sha1=b'4\x972t\xcc\xefj\xb4\xdf\xaa\xf8e\x99y/\xa9\xc3\xfeF\x89', sha256=b'g6P\xf96\xcb;\n/\x93\xce\t\xd8\x1b\xe1\x07H\xb1\xb2\x03\xc1\x9e\x81v\xb4\xee\xfc\x19d\xa0\xcf:', blake2s256=b"\xd5\xfe\x199We'\xe4,\xfdv\xa9EZ$2\xfe\x7fVf\x95dW}\xd9<B\x80\xe7mf\x1d"), Row(sha1_git=b'\xd8\x1c\xc0q\x0e\xb6\xcf\x9e\xfd[\x92\n\x84S\xe1\xe0qW\xb6\xcd', sha1=b'4\x972t\xcc\xefj\xb4\xdf\xaa\xf8e\x99y/\xa9\xc3\xfeF\x89', sha256=b'h6P\xf96\xcb;\n/\x93\xce\t\xd8\x1b\xe1\x07H\xb1\xb2\x03\xc1\x9e\x81v\xb4\xee\xfc\x19d\xa0\xcf:', blake2s256=b"\xd5\xfe\x199We'\xe4,\xfdv\xa9EZ$2\xfe\x7fVf\x95dW}\xd9<B\x80\xe7mf\x1d")]
pgstorage:
Nothing
Test Plan
tox
Migrated from D2783 (view on Phabricator)