npm.client: Ensure package.json parsing
Ensure package.json
file can be parsed when its content can not be properly decoded
due to the encoding not properly detected.
So try to decode from utf-8
first, then use chardet
as a fallback using the replace
error hanling to replace characters that can not be decoded.
Even if the package.json
content can not be correctly loaded, this is not critical
as these data are only added to a swh revision metadata. Original package.json
file
can still be obtained from the archive content.
This should fix this kind of reported errors:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 895, in load
more_data_to_fetch = self.fetch_data()
File "/usr/lib/python3/dist-packages/swh/loader/npm/loader.py", line 203, in fetch_data
data = next(self.new_versions)
File "/usr/lib/python3/dist-packages/swh/loader/npm/client.py", line 149, in prepare_package_versions
version_data)
File "/usr/lib/python3/dist-packages/swh/loader/npm/client.py", line 207, in _prepare_package_version
package_json = json.loads(package_json_bytes.decode(file_encoding))
File "/usr/lib/python3.5/encodings/cp1254.py", line 15, in decode
return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 42: character maps to <undefined>
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 895, in load
more_data_to_fetch = self.fetch_data()
File "/usr/lib/python3/dist-packages/swh/loader/npm/loader.py", line 203, in fetch_data
data = next(self.new_versions)
File "/usr/lib/python3/dist-packages/swh/loader/npm/client.py", line 145, in prepare_package_versions
version_data)
File "/usr/lib/python3/dist-packages/swh/loader/npm/client.py", line 197, in _prepare_package_version
package_json = json.load(package_json_file)
File "/usr/lib/python3.5/json/__init__.py", line 268, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/usr/lib/python3.5/json/__init__.py", line 315, in loads
s, 0)
json.decoder.JSONDecodeError: Unexpected UTF-8 BOM (decode using utf-8-sig): line 1 column 1 (char 0)
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 895, in load
more_data_to_fetch = self.fetch_data()
File "/usr/lib/python3/dist-packages/swh/loader/npm/loader.py", line 203, in fetch_data
data = next(self.new_versions)
File "/usr/lib/python3/dist-packages/swh/loader/npm/client.py", line 149, in prepare_package_versions
version_data)
File "/usr/lib/python3/dist-packages/swh/loader/npm/client.py", line 204, in _prepare_package_version
with open(package_json_path, 'rb') as package_json_file:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/swh.loader.npm/swh.loader.npm.jrx67u3_-2344/@lpmraven/link-components/0.1.1/package/package.json'
Related swh-loader-core#1726
Migrated from D1498 (view on Phabricator)