Sweeper Circuit Breaking
Overview
Sweeper Circuit Breaking is a resilience feature that prevents failures to publish to one topic from blocking attempts to publish to other topics, when publishing messages from the Outbox. When a topic repeatedly fails to publish, the circuit breaker "trips" that topic, temporarily preventing further publish attempts until a cooldown period expires.
This feature is particularly valuable in scenarios where:
Transport failures: A message broker or queue becomes unavailable
Topic-specific issues: A specific topic/queue has configuration problems or capacity issues
Cascade prevention: Failing topics would otherwise block the Outbox Sweeper from processing healthy topics
Resource protection: Repeated failures to unhealthy topics consume resources without benefit
How It Work
The Sweeper Circuit Breaker operates at the topic level during Outbox clearing operations:
Normal Operation
Outbox Sweeper runs: Periodically attempts to clear outstanding messages from the Outbox
Messages grouped by topic: Messages are organized by their routing key (topic)
Publish attempts: The sweeper attempts to publish messages to their respective topics
Success: Messages are published and marked as dispatched
Circuit Breaking Behavior
When a topic fails to publish:
Failure detected: An exception occurs during message publication to a specific topic
Circuit trips: The circuit breaker marks that topic as "tripped"
Cooldown begins: A cooldown counter is set for the tripped topic (default: 10 sweeps)
Subsequent sweeps: On each sweep, the cooldown counter decrements for all tripped topics
Recovery: When the cooldown reaches zero, the topic is removed from the tripped list
Retry: The topic becomes available for publishing attempts again
Benefits
Prevents blocking: Healthy topics continue to be processed even when some topics fail
Automatic recovery: Topics automatically recover after the cooldown period
Resource efficiency: Avoids wasting resources on repeated failures to unhealthy topics
Observability: Tripped topics can be monitored and alerted on
Configuration
Enabling Circuit Breaking
To enable Sweeper Circuit Breaking, register an IAmAnOutboxCircuitBreaker implementation with your IoC container:
Configuration Options
The OutboxCircuitBreakerOptions class provides the following configuration:
CooldownCount
int
10
Number of sweeper iterations before a tripped topic is eligible for retry
Calculating Cooldown Time
The actual cooldown time depends on your Outbox Sweeper configuration:
Formula: Cooldown Time = CooldownCount × SweepInterval
Example:
CooldownCount = 10Sweeper runs every 60 seconds
Cooldown Time = 10 × 60s = 10 minutes
Usage Patterns
Basic Setup with Outbox Sweeper
Custom Cooldown Configuration
Adjust the cooldown based on your needs:
Without Circuit Breaking
If you don't register an IAmAnOutboxCircuitBreaker, the sweeper will continue to attempt publishing to all topics even after failures:
Monitoring and Observability
Checking Tripped Topics
You can query the circuit breaker to see which topics are currently tripped:
Logging and Alerts
Set up monitoring to track circuit breaker events:
Transport-Specific Integration
Brighter V10 includes circuit breaking integration with specific transports:
MongoDB Transport
Circuit breaking is fully integrated with MongoDB Outbox:
Other Transports
Circuit breaking works with all Brighter Outbox implementations:
MS SQL Server (
UseMsSqlOutbox)PostgreSQL (
UsePostgreSqlOutbox)MySQL (
UseMySqlOutbox)SQLite (
UseSqliteOutbox)DynamoDB (
UseDynamoDbOutbox)MongoDB (
UseMongoDbOutbox)
Bulk Dispatch Support
V10 includes proper circuit breaking support for bulk dispatch operations. When dispatching multiple messages in a batch:
Batch grouping: Messages are grouped by topic
Per-topic circuit breaking: Each topic's circuit breaker status is checked before dispatching
Healthy topics proceed: Only topics that aren't tripped are dispatched
Individual retry: Failed batches can be retried individually per topic
Best Practices
1. Choose Appropriate Cooldown Periods
Balance between quick recovery and avoiding repeated failures:
Short cooldown (3-5 sweeps): For transient issues, quick recovery desired
Medium cooldown (10-15 sweeps): General purpose, good balance
Long cooldown (20-30 sweeps): For persistent issues, reduce retry overhead
2. Align Cooldown with Sweep Interval
Consider the total cooldown time:
3. Monitor Tripped Topics
Set up monitoring and alerting:
Health checks: Use ASP.NET Core health checks to expose tripped topics
Metrics: Export circuit breaker metrics to Prometheus, DataDog, etc.
Logging: Log when topics trip and recover
Alerts: Alert when topics remain tripped for extended periods
4. Investigate Root Causes
When topics trip repeatedly:
Check broker health: Ensure message broker is operational
Verify permissions: Ensure the application has permissions to publish
Check queue/topic existence: Verify the destination exists
Review capacity: Check if the queue/topic has reached capacity limits
Inspect network: Look for network connectivity issues
5. Use with Outbox Sweeper
Circuit breaking is designed to work with the Outbox Sweeper:
6. Consider Immediate vs. Sweeper Clearing
Circuit breaking only applies to sweeper-based clearing:
7. Test Failure Scenarios
Regularly test circuit breaker behavior:
Simulate broker outages
Test individual topic failures
Verify healthy topics continue processing
Confirm automatic recovery after cooldown
Troubleshooting
Topics Not Recovering
Problem: Topics remain tripped indefinitely
Solutions:
Verify Outbox Sweeper is running
Check cooldown count is not excessively high
Ensure sweeper interval is appropriate
Confirm circuit breaker is properly registered
All Topics Tripping
Problem: All topics become tripped at once
Possible Causes:
Broker is completely down
Network connectivity issues
Authentication/authorization failures
Shared resource exhaustion
Solutions:
Check broker health and connectivity
Verify credentials and permissions
Review broker logs for errors
Consider infrastructure capacity
Messages Stuck in Outbox
Problem: Messages accumulate in Outbox without being dispatched
Check:
Is the Outbox Sweeper enabled?
Are topics currently tripped? Check
TrippedTopicsIs the circuit breaker cooldown too long?
Are there persistent transport issues?
Solutions:
Enable Outbox Sweeper if not already enabled
Investigate why topics are tripping
Reduce cooldown count if appropriate
Fix underlying transport issues
Circuit Breaker Not Working
Problem: Failed topics continue to be retried
Verify:
Circuit breaker is registered:
services.AddSingleton<IAmAnOutboxCircuitBreaker>Using the sweeper:
UseOutboxSweeper()Exceptions are being thrown during publish (not silently failing)
Circuit breaker implementation is correct
Advanced Scenarios
Custom Circuit Breaker Implementation
Implement IAmAnOutboxCircuitBreaker for custom behavior:
Distributed Circuit Breaker
For multi-instance deployments, consider a distributed circuit breaker using Redis, SQL, or other shared storage:
Summary
Sweeper Circuit Breaking provides automatic resilience for Outbox clearing operations by:
Preventing cascade failures when specific topics fail
Automatically recovering after a configurable cooldown period
Allowing healthy topics to continue processing
Protecting resources from repeated failures
Enable circuit breaking by registering IAmAnOutboxCircuitBreaker with your IoC container and configuring appropriate cooldown periods for your application's needs.
Last updated
Was this helpful?
