The indexer journal client is unstable
I often get this error in a docker environment:
swh-indexer-journal-client_1 | 2019-02-04 20:12:29,441 26 WARNING Heartbeat session expired, marking coordinator dead
swh-indexer-journal-client_1 | 2019-02-04 20:12:29,441 26 WARNING Marking the coordinator dead (node 1001) for group swh.journal.client: Heartbeat session expired.
swh-indexer-journal-client_1 | 2019-02-04 20:12:29,449 26 INFO Group coordinator for swh.journal.client is BrokerMetadata(nodeId=1001, host='kafka', port=9092, rack=None)
swh-indexer-journal-client_1 | 2019-02-04 20:12:29,449 26 INFO Discovered coordinator 1001 for group swh.journal.client
swh-indexer-journal-client_1 | 2019-02-04 20:12:29,453 26 INFO Group coordinator for swh.journal.client is BrokerMetadata(nodeId=1001, host='kafka', port=9092, rack=None)
swh-indexer-journal-client_1 | 2019-02-04 20:12:29,453 26 INFO Discovered coordinator 1001 for group swh.journal.client
swh-indexer-journal-client_1 | 2019-02-04 20:12:29,461 26 INFO Scheduling indexer_origin_metadata for visit of origin 766
swh-indexer-journal-client_1 | Traceback (most recent call last):
swh-indexer-journal-client_1 | File "/usr/local/lib/python3.6/runpy.py", line 193, in _run_module_as_main
swh-indexer-journal-client_1 | "__main__", mod_spec)
swh-indexer-journal-client_1 | File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
swh-indexer-journal-client_1 | exec(code, run_globals)
swh-indexer-journal-client_1 | File "/src/swh-indexer/swh/indexer/journal_client.py", line 88, in <module>
swh-indexer-journal-client_1 | main()
swh-indexer-journal-client_1 | File "/usr/local/lib/python3.6/site-packages/click/core.py", line 764, in __call__
swh-indexer-journal-client_1 | return self.main(*args, **kwargs)
swh-indexer-journal-client_1 | File "/usr/local/lib/python3.6/site-packages/click/core.py", line 717, in main
swh-indexer-journal-client_1 | rv = self.invoke(ctx)
swh-indexer-journal-client_1 | File "/usr/local/lib/python3.6/site-packages/click/core.py", line 956, in invoke
swh-indexer-journal-client_1 | return ctx.invoke(self.callback, **ctx.params)
swh-indexer-journal-client_1 | File "/usr/local/lib/python3.6/site-packages/click/core.py", line 555, in invoke
swh-indexer-journal-client_1 | return callback(*args, **kwargs)
swh-indexer-journal-client_1 | File "/src/swh-indexer/swh/indexer/journal_client.py", line 86, in main
swh-indexer-journal-client_1 | IndexerJournalClient().process()
swh-indexer-journal-client_1 | File "/usr/local/lib/python3.6/site-packages/swh/journal/client.py", line 121, in process
swh-indexer-journal-client_1 | self.consumer.commit()
swh-indexer-journal-client_1 | File "/usr/local/lib/python3.6/site-packages/kafka/consumer/group.py", line 515, in commit
swh-indexer-journal-client_1 | self._coordinator.commit_offsets_sync(offsets)
swh-indexer-journal-client_1 | File "/usr/local/lib/python3.6/site-packages/kafka/coordinator/consumer.py", line 513, in commit_offsets_sync
swh-indexer-journal-client_1 | raise future.exception # pylint: disable-msg=raising-bad-type
swh-indexer-journal-client_1 | kafka.errors.CommitFailedError: CommitFailedError: Commit cannot be completed since the group has already
swh-indexer-journal-client_1 | rebalanced and assigned the partitions to another member.
swh-indexer-journal-client_1 | This means that the time between subsequent calls to poll()
swh-indexer-journal-client_1 | was longer than the configured max_poll_interval_ms, which
swh-indexer-journal-client_1 | typically implies that the poll loop is spending too much
swh-indexer-journal-client_1 | time message processing. You can address this either by
swh-indexer-journal-client_1 | increasing the rebalance timeout with max_poll_interval_ms,
swh-indexer-journal-client_1 | or by reducing the maximum size of batches returned in poll()
swh-indexer-journal-client_1 | with max_poll_records.
swh-indexer-journal-client_1 |
swh-docker-dev_swh-indexer-journal-client_1 exited with code 1
Migrated from T1513 (view on Phabricator)