Sweeper Circuit Breaking

Overview

Sweeper Circuit Breaking is a resilience feature that prevents failures to publish to one topic from blocking attempts to publish to other topics, when publishing messages from the Outbox. When a topic repeatedly fails to publish, the circuit breaker "trips" that topic, temporarily preventing further publish attempts until a cooldown period expires.

This feature is particularly valuable in scenarios where:

  • Transport failures: A message broker or queue becomes unavailable

  • Topic-specific issues: A specific topic/queue has configuration problems or capacity issues

  • Cascade prevention: Failing topics would otherwise block the Outbox Sweeper from processing healthy topics

  • Resource protection: Repeated failures to unhealthy topics consume resources without benefit

How It Work

The Sweeper Circuit Breaker operates at the topic level during Outbox clearing operations:

Normal Operation

  1. Outbox Sweeper runs: Periodically attempts to clear outstanding messages from the Outbox

  2. Messages grouped by topic: Messages are organized by their routing key (topic)

  3. Publish attempts: The sweeper attempts to publish messages to their respective topics

  4. Success: Messages are published and marked as dispatched

Circuit Breaking Behavior

When a topic fails to publish:

  1. Failure detected: An exception occurs during message publication to a specific topic

  2. Circuit trips: The circuit breaker marks that topic as "tripped"

  3. Cooldown begins: A cooldown counter is set for the tripped topic (default: 10 sweeps)

  4. Subsequent sweeps: On each sweep, the cooldown counter decrements for all tripped topics

  5. Recovery: When the cooldown reaches zero, the topic is removed from the tripped list

  6. Retry: The topic becomes available for publishing attempts again

Benefits

  • Prevents blocking: Healthy topics continue to be processed even when some topics fail

  • Automatic recovery: Topics automatically recover after the cooldown period

  • Resource efficiency: Avoids wasting resources on repeated failures to unhealthy topics

  • Observability: Tripped topics can be monitored and alerted on

Configuration

Enabling Circuit Breaking

To enable Sweeper Circuit Breaking, register an IAmAnOutboxCircuitBreaker implementation with your IoC container:

Configuration Options

The OutboxCircuitBreakerOptions class provides the following configuration:

Option
Type
Default
Description

CooldownCount

int

10

Number of sweeper iterations before a tripped topic is eligible for retry

Calculating Cooldown Time

The actual cooldown time depends on your Outbox Sweeper configuration:

Formula: Cooldown Time = CooldownCount × SweepInterval

Example:

  • CooldownCount = 10

  • Sweeper runs every 60 seconds

  • Cooldown Time = 10 × 60s = 10 minutes

Usage Patterns

Basic Setup with Outbox Sweeper

Custom Cooldown Configuration

Adjust the cooldown based on your needs:

Without Circuit Breaking

If you don't register an IAmAnOutboxCircuitBreaker, the sweeper will continue to attempt publishing to all topics even after failures:

Monitoring and Observability

Checking Tripped Topics

You can query the circuit breaker to see which topics are currently tripped:

Logging and Alerts

Set up monitoring to track circuit breaker events:

Transport-Specific Integration

Brighter V10 includes circuit breaking integration with specific transports:

MongoDB Transport

Circuit breaking is fully integrated with MongoDB Outbox:

Other Transports

Circuit breaking works with all Brighter Outbox implementations:

  • MS SQL Server (UseMsSqlOutbox)

  • PostgreSQL (UsePostgreSqlOutbox)

  • MySQL (UseMySqlOutbox)

  • SQLite (UseSqliteOutbox)

  • DynamoDB (UseDynamoDbOutbox)

  • MongoDB (UseMongoDbOutbox)

Bulk Dispatch Support

V10 includes proper circuit breaking support for bulk dispatch operations. When dispatching multiple messages in a batch:

  1. Batch grouping: Messages are grouped by topic

  2. Per-topic circuit breaking: Each topic's circuit breaker status is checked before dispatching

  3. Healthy topics proceed: Only topics that aren't tripped are dispatched

  4. Individual retry: Failed batches can be retried individually per topic

Best Practices

1. Choose Appropriate Cooldown Periods

Balance between quick recovery and avoiding repeated failures:

  • Short cooldown (3-5 sweeps): For transient issues, quick recovery desired

  • Medium cooldown (10-15 sweeps): General purpose, good balance

  • Long cooldown (20-30 sweeps): For persistent issues, reduce retry overhead

2. Align Cooldown with Sweep Interval

Consider the total cooldown time:

3. Monitor Tripped Topics

Set up monitoring and alerting:

  • Health checks: Use ASP.NET Core health checks to expose tripped topics

  • Metrics: Export circuit breaker metrics to Prometheus, DataDog, etc.

  • Logging: Log when topics trip and recover

  • Alerts: Alert when topics remain tripped for extended periods

4. Investigate Root Causes

When topics trip repeatedly:

  1. Check broker health: Ensure message broker is operational

  2. Verify permissions: Ensure the application has permissions to publish

  3. Check queue/topic existence: Verify the destination exists

  4. Review capacity: Check if the queue/topic has reached capacity limits

  5. Inspect network: Look for network connectivity issues

5. Use with Outbox Sweeper

Circuit breaking is designed to work with the Outbox Sweeper:

6. Consider Immediate vs. Sweeper Clearing

Circuit breaking only applies to sweeper-based clearing:

7. Test Failure Scenarios

Regularly test circuit breaker behavior:

  • Simulate broker outages

  • Test individual topic failures

  • Verify healthy topics continue processing

  • Confirm automatic recovery after cooldown

Troubleshooting

Topics Not Recovering

Problem: Topics remain tripped indefinitely

Solutions:

  1. Verify Outbox Sweeper is running

  2. Check cooldown count is not excessively high

  3. Ensure sweeper interval is appropriate

  4. Confirm circuit breaker is properly registered

All Topics Tripping

Problem: All topics become tripped at once

Possible Causes:

  • Broker is completely down

  • Network connectivity issues

  • Authentication/authorization failures

  • Shared resource exhaustion

Solutions:

  1. Check broker health and connectivity

  2. Verify credentials and permissions

  3. Review broker logs for errors

  4. Consider infrastructure capacity

Messages Stuck in Outbox

Problem: Messages accumulate in Outbox without being dispatched

Check:

  1. Is the Outbox Sweeper enabled?

  2. Are topics currently tripped? Check TrippedTopics

  3. Is the circuit breaker cooldown too long?

  4. Are there persistent transport issues?

Solutions:

  • Enable Outbox Sweeper if not already enabled

  • Investigate why topics are tripping

  • Reduce cooldown count if appropriate

  • Fix underlying transport issues

Circuit Breaker Not Working

Problem: Failed topics continue to be retried

Verify:

  1. Circuit breaker is registered: services.AddSingleton<IAmAnOutboxCircuitBreaker>

  2. Using the sweeper: UseOutboxSweeper()

  3. Exceptions are being thrown during publish (not silently failing)

  4. Circuit breaker implementation is correct

Advanced Scenarios

Custom Circuit Breaker Implementation

Implement IAmAnOutboxCircuitBreaker for custom behavior:

Distributed Circuit Breaker

For multi-instance deployments, consider a distributed circuit breaker using Redis, SQL, or other shared storage:

Summary

Sweeper Circuit Breaking provides automatic resilience for Outbox clearing operations by:

  • Preventing cascade failures when specific topics fail

  • Automatically recovering after a configurable cooldown period

  • Allowing healthy topics to continue processing

  • Protecting resources from repeated failures

Enable circuit breaking by registering IAmAnOutboxCircuitBreaker with your IoC container and configuring appropriate cooldown periods for your application's needs.

Last updated

Was this helpful?