Error handling & retries
6 min
as events flow through chord they are processed (enrichment, transformation, filtering) and then delivered to one or more destinations either stage can fail this document explains how chord classifies those failures, which ones are retried, and how retries are scheduled delivery model chord provides at least once delivery when a recoverable failure occurs, the affected event is re queued and delivered again later because of this, an event may occasionally be delivered more than once — see deduplication for how destinations should handle duplicates using messageid each event is delivered independently per connection (a source → destination pairing) a failure delivering an event to one destination does not affect delivery of the same event to any other destination recoverable vs unrecoverable errors chord divides failures into two categories, and only one of them is retried recoverable (transient) errors are failures that are expected to resolve on their own without any change to configuration or code — for example a destination api returning a 5xx , a request timing out, a rate limit response, or a temporary network blip these errors are retried unrecoverable errors are failures that retrying cannot fix — for example malformed event data a destination will never accept, an authentication failure, or a logic error in a custom function these are not retried ; the event is either delivered as is or recorded as failed, and a notification is sent to any configured connection monitors the table below summarizes how the most common outcomes are handled outcome source retried result transient delivery failure ( 5xx , timeout, rate limit, network error) destination yes re queued with backoff; delivered once the destination recovers recoverable failure raised by a function udf ( retryerror ) yes re queued with backoff unrecoverable data/validation error destination or udf ( noretryerror ) no event dropped, monitors notified unexpected logic error udf (standard error ) no event still delivered, monitors notified intentional filtering udf ( return null / false / \[] ) no event silently dropped, no notification functions (udfs) can deliberately raise either a retryable or a non retryable error to control this behavior that mechanism is documented in detail in functions ; this document covers only how the pipeline acts on those signals how retries are scheduled when an event fails with a recoverable error, chord does not retry it immediately instead it computes a future retry time and re queues the message on a dedicated retry topic the message waits there until its scheduled retry time passes, at which point it is reprocessed retries use exponential backoff so that a struggling destination is given progressively more time to recover between attempts a fixed number of retry attempts is made per event (default 3 ) the delay before each attempt grows exponentially from a base interval (default base 10 minutes ), so successive attempts are delayed roughly 10 minutes , then 1 7 hours , then 16 7 hours every delay is capped at a maximum (default 24 hours ), so no single retry is ever scheduled further out than the cap these values are operational defaults and may be tuned per environment on each retry only the stage that failed and everything downstream of it is re run — a destination delivery failure re attempts delivery without re running earlier transformations, and a function failure resumes from that function onward this avoids duplicating side effects from stages that already succeeded the dead letter queue retries are not infinite once an event has exhausted its retry attempts and still cannot be delivered, it is moved to a dead letter queue rather than being retried forever or silently discarded this keeps the live pipeline healthy while preserving the failed event for inspection and potential manual replay a move to the dead letter queue is recorded in the pipeline metrics and surfaced to any configured connection monitors, so persistent delivery failures are visible rather than hidden monitoring failures recoverable and unrecoverable errors alike are recorded in the pipeline's metrics and, where connection monitors are configured, generate notifications you can inspect individual failures and the errors that caused them in live events within the chord console intentional filtering (an event dropped by a function returning a falsy value) is recorded as dropped and does not generate a notification — it is treated as normal pipeline behavior, not a failure related topics functions — how custom functions raise retryable and non retryable errors deduplication — handling the duplicate deliveries that at least once retries can produce support for questions about error handling, retries, or the dead letter queue, please contact help\@chord co mailto\ help\@chord co