Cron Design Best Practices — Reliable Scheduled Automation¶
Scheduled automation is one of Hermes Agent's most powerful features, but poorly designed crons are the fastest path to operational pain. These cron design best practices cover idempotency, error handling, rate limiting, monitoring, and the anti-patterns that keep your scheduled tasks reliable and safe in production.
Overview¶
Crons are the heartbeat of autonomous Hermes Agent operation. They handle email monitoring, data synchronization, report generation, and operational checks — running on schedules from every 5 minutes to once per month. Following Hermes Agent best practices for cron design prevents silent failures, resource exhaustion, and alert fatigue.
How It Works¶
A well-designed Hermes Agent cron follows this pattern:
SCHEDULE → DATA COLLECTION → PROCESSING → VALIDATION → OUTPUT → LOGGING
↓
ERROR → RETRY → DEAD-LETTER → ALERT
The Cardinal Rule: Idempotency¶
Every cron should be safe to run multiple times with the same inputs. Cron schedulers can drift, containers can restart, and manual re-runs happen during debugging. If your cron writes to a database, use upsert semantics. If it sends notifications, track sent status. If it generates reports, use deterministic filenames.
A good litmus test: could you run this cron three times back-to-back without breaking anything?
Error Handling: Fail Loudly, Recover Gracefully¶
Retry with backoff. Transient failures (network timeouts, rate limits) should retry with exponential backoff. Cap at 3-5 attempts with jitter.
Dead-letter queue. After all retries exhausted, route failed work to a dead-letter queue or structured log. Never silently discard work.
Alert on persistent failure. If your cron fails for more than N consecutive runs, trigger an alert through Slack, email, or PagerDuty.
Partial success handling. Handle individual failures within a batch without aborting the entire batch. Log failed records and continue processing.
Rate Limiting and Delivery Targets¶
Define explicit throughput targets. Implement token-bucket or sliding-window rate limiting. Maintain separate rate-limit budgets per API integration.
Delivery windows matter. If your cron takes 4 minutes to complete, don't schedule it every 5 minutes — that leaves only 1 minute of slack. Schedule at 2-3x the expected runtime to prevent overlap.
Monitoring and Observability¶
Every cron should emit:
- Start/end timestamps with duration — track runtime drift
- Record counts — items read, written, failed
- Error counts by type
- Last successful run timestamp — your canary
Heartbeat monitoring: A separate lightweight check every 15-30 minutes verifies the last successful run is within the expected window.
Anti-Patterns to Avoid¶
| Anti-Pattern | Why It Hurts | Fix |
|---|---|---|
| God Cron | One failure cascades everything | Split into single-responsibility crons |
| Hardcoded timestamps | DST/timezone bugs | Always UTC, always timezone-aware |
| Unbounded queries | Million-row timeouts | Always paginate/LIMIT |
| Missing dry-run mode | Can't test safely | Add --dry-run flag |
| Console output as logging | Ephemeral, unsearchable | Structured persistent logging |
Delivery Target Patterns¶
- Real-time needs: Use event-driven architecture, not polling crons
- Near-real-time: Every 5 minutes — acceptable for most ecommerce
- Hourly: Good for dashboards, cache warming
- Daily: Batch windows for exports, reconciliation
- Weekly/monthly: Run during low-traffic with explicit retry windows
Benefits¶
- Zero silent failures: Alerting on persistent failure catches issues before users notice
- Predictable costs: Rate limiting prevents API overage charges
- Easy debugging: Structured logs with timestamps and record counts
- Safe testing: Dry-run mode lets you verify changes before deployment
FAQ¶
What makes a cron job idempotent in Hermes Agent?¶
A cron is idempotent if running it multiple times with the same inputs produces the same result. Use upsert database operations, deduplication keys on notifications, and deterministic file naming for reports.
How many times should a failed cron retry?¶
Retry 3-5 times with exponential backoff and jitter. After all retries are exhausted, route to a dead-letter queue — never silently discard work. Alert after 3 consecutive failures.
How do I monitor cron job health?¶
Track start/end timestamps, record counts processed, error counts by type, and last successful run timestamp. Run a separate heartbeat check every 15-30 minutes to catch stale crons.
Related Pages¶
- Best Practices Overview — All best practices guides
- Model Selection — Use the right model for each cron
- Security — Credential management for scheduled tasks
- Setup Guides — Run crons on cloud VPS or Raspberry Pi
- Blueprints — End-to-end cron-anchored workflows
The best cron is one you can explain to a teammate in 30 seconds.