scanner: json output should return both known and unknown files/dirs
The json format output of the scanner returns something like this:
$ swh scanner scan -f json .
{
".HEADER": "swh:1:cnt:fd8430bc864cfcd5f10e5590f8a447e01b942bfe",
".editorconfig": "swh:1:cnt:34c5e9234ec18c69a16828dbc9633a95f0253fe9",
".gitattributes": "swh:1:cnt:176a458f94e0ea5272ce67c36bf30b6be9caf623",
".github": "swh:1:dir:e8bfe5af39579a7e4898bb23f3a76a72c368cee6",
".gitignore": "swh:1:cnt:dec3dca06c8fdc1dd7d426bb148b7f99355eaaed",
...
"src": "swh:1:dir:f3c5e67df5a3b3e812e6331008b7e179865a30fc",
"tests": "swh:1:dir:506e33bae73858bdf4b90a8f89dee8a32dae9c93"
}
It looks like the semantics is to return the list of known files/dirs and not returning unknown ones. That is not very easily exploitable programmatically, as based on the json output alone one doesn't know what is missing out.
The output format should be changed to always output all encountered files/dirs, with an associated known: boolean
flag.
Also remember that in the future other fields will need to be associated to each encoutered file/dir, so we need to have room (e.g., other keys at the same level of known
) to attach other information in the future.
Migrated from T2363 (view on Phabricator)
Edited by Phabricator Migration user