npm.client: Handle parsing of package.json file non utf8 encoded
Some package.json
files may be encoded to something different from ascii/utf-8.
So detect file encoding using chardet
before parsing it.
Previously, the following errors were raised:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 893, in load
more_data_to_fetch = self.fetch_data()
File "/usr/lib/python3/dist-packages/swh/loader/npm/loader.py", line 203, in fetch_data
data = next(self.new_versions)
File "/usr/lib/python3/dist-packages/swh/loader/npm/client.py", line 145, in prepare_package_versions
version_data)
File "/usr/lib/python3/dist-packages/swh/loader/npm/client.py", line 197, in _prepare_package_version
package_json = json.load(package_json_file)
File "/usr/lib/python3.5/json/__init__.py", line 265, in load
return loads(fp.read(),
File "/usr/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcd in position 183: invalid continuation byte
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 893, in load
more_data_to_fetch = self.fetch_data()
File "/usr/lib/python3/dist-packages/swh/loader/npm/loader.py", line 203, in fetch_data
data = next(self.new_versions)
File "/usr/lib/python3/dist-packages/swh/loader/npm/client.py", line 145, in prepare_package_versions
version_data)
File "/usr/lib/python3/dist-packages/swh/loader/npm/client.py", line 197, in _prepare_package_version
package_json = json.load(package_json_file)
File "/usr/lib/python3.5/json/__init__.py", line 268, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/usr/lib/python3.5/json/__init__.py", line 315, in loads
s, 0)
json.decoder.JSONDecodeError: Unexpected UTF-8 BOM (decode using utf-8-sig): line 1 column 1 (char 0)
Related T1644
Test Plan
Test for package.json file non utf8 encoded has been added.
Some refactoring were also performed to ease the adding of new tests data.
Migrated from D1410 (view on Phabricator)