Add support for object deletion to KafkaJournalWriter
“Deleting” an event in Kafka is a two-step process. First,
a new event is added for the key to be deleted with null
as its
value. Such events are known as tombstones. When topics are configured
to use compaction, older events will actually be deleted after specific
thresholds have been reached.
Tombstones themselves usually also linger for a while in a topic. This
gives a chance for consumers to learn that a given key has been deleted.
This is configured by delete.retention.ms
. For Software Heritage, we
should still not rely on consumers of the journal actually seeing these
tombstones to handle object deletions. If they lag too much, the
tombstone will eventually be removed (together with the actual data)
from the journal. This shall be handled by #4658 instead.
Normally, compaction will be triggered when the ratio of dirty data to
total data reaches the threshold set by the min.cleanable.dirty.ratio
configuration. min.compaction.lag.ms
can be set to prevent overly
aggressive cleaning. This provides a minimum period of time for
applications to see an event prior to its deletion.
max.compaction.lag.ms
sets the time limit before triggering a
compaction, regardless of the amount of dirty data.
For more information see: https://developer.confluent.io/courses/architecture/compaction/
The delete
method is only implemented for KafkaJournalWriter because
the semantics are so closely aligned with Kafka’s.
Based on the initial merge request !233 (closed) written by olasd.
Closes: #4657 (closed)