[GitHubSession] Rate limit management for authenticated user seems broken
I was playing with the GitHub API in order to extract some repositories metadata and used the GitHubSession
class from swh.core.github.utils
in order to benefit from rate limit management implemented in it.
I used a personal access token in order to perform authenticated calls and have higher rate limit.
As I had to query metadata for more than 10000 repositories, I quickly reached the 5000 API calls limit allowed per hour.
However, I was surprised that the GitHubSession
class did not detect that rate limit was reached as no info message
was printed about it and next API requests were executed instead of sleeping until the current rate limit window is reseted.
While my current rate limit window was not reseted, I issued this curl
command to get more details about GitHub API response:
$ curl -i -H "Authorization: token $GH_TOKEN" https://api.github.com/repos/ankitnigam1985/data-science-books
HTTP/2 403
server: GitHub.com
date: Tue, 18 Apr 2023 13:06:02 GMT
content-type: application/json; charset=utf-8
content-length: 168
x-ratelimit-limit: 5000
x-ratelimit-remaining: 0
x-ratelimit-reset: 1681823340
x-ratelimit-used: 5000
x-ratelimit-resource: core
x-oauth-scopes: repo
x-accepted-oauth-scopes: repo
github-authentication-token-expiration: 2023-05-18 12:06:22 UTC
x-github-media-type: github.v3; format=json
access-control-expose-headers: ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Resource, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, X-GitHub-SSO, X-GitHub-Request-Id, Deprecation, Sunset
access-control-allow-origin: *
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 0
referrer-policy: origin-when-cross-origin, strict-origin-when-cross-origin
content-security-policy: default-src 'none'
vary: Accept-Encoding, Accept, X-Requested-With
x-github-request-id: B8A2:EEDC:E8F103:EA8CEC:643E95BA
{
"message": "API rate limit exceeded for user ID 5493543.",
"documentation_url": "https://docs.github.com/rest/overview/resources-in-the-rest-api#rate-limiting"
}
So for authenticated users, GitHub API currently returns a 403 status code with an explicit message about being rate limited.
But if we look at GitHubSession class code, we can see a 429 status code is expected when an authenticated user is rate limited.
So it seems GitHub API changed the returned status code when an authenticated user is rate limited, this also seems consistent with the official API documentation.
Maybe that behavior only applies when using a recently generated API token but we should handle it
as some important swh components (github lister or metadata loader for instance) uses the GitHubSession
class to perform authenticated calls to the GitHub API and we could be hit by that issue without knowing it.