Database + Reporting Integration — Automated Queries & Dashboards¶
Architecture Overview¶
┌──────────┐ ┌──────────────┐ ┌───────────┐
│ Database │────▶│ Query Engine │────▶│ Reporting │
│ (SQL, │ │ + Scheduler │ │ (Notion, │
│ NoSQL) │ │ │ │ Sheets, │
└──────────┘ └──────┬───────┘ │ Slack) │
│ └───────────┘
┌─────▼──────┐
│ Results │
│ Cache + │
│ Diffing │
└────────────┘
The query engine executes parameterized SQL against production databases (or read replicas). Results are cached, diffed against previous runs to detect anomalies, and distributed to the configured reporting destinations.
MCP Server Setup¶
Database Servers¶
PostgreSQL MCP Server — provides query_database, list_database_tables, and describe_table. Configure with connection string containing host, port, database name, user, and password. Critical: use a read-only database user. Production write access from automation is a significant risk.
MSSQL MCP Server — provides configure_mssql_connection, query_mssql_database, and list_mssql_tables. Supports SQL Server authentication and Azure AD. Configure with encrypted connections enabled.
MongoDB MCP Server — provides aggregation pipeline execution for document-based data stores. Use for operational metrics stored in document collections.
Cosmos DB MCP Server — provides configure_cosmos_connection, query_cosmos_database, and list_cosmos_containers. Supports both key-based and Azure RBAC authentication.
Security Configuration¶
database_config:
host: "read-replica.internal.example.com"
port: 5432
database: "analytics"
user: "hermes_agent_readonly"
ssl_mode: "require"
statement_timeout: 30000 # 30-second query timeout
max_rows: 10000 # Prevent accidental full-table scans
Always use read replicas for reporting queries. Production database load from reporting automation can cause outages. Most cloud providers (RDS, Cloud SQL, Azure DB) support read replicas with minimal configuration.
Core Automation Scenarios¶
1. Scheduled Query Execution¶
Goal: Run business-critical queries on a schedule and distribute results.
Query Registry Pattern: Define queries in a version-controlled directory with metadata:
# queries/daily_revenue.yaml
name: "Daily Revenue Summary"
sql: |
SELECT
date_trunc('day', created_at) as day,
sum(amount) as revenue,
count(*) as transactions,
sum(amount) / count(*) as avg_order_value
FROM orders
WHERE created_at >= current_date - interval '7 days'
GROUP BY 1
ORDER BY 1 DESC
schedule: "0 7 * * *"
destinations:
- type: slack
channel: "#finance-daily"
- type: google_sheets
spreadsheet_id: "1ABC..."
range: "Revenue!A1"
anomaly_detection:
enabled: true
metric: revenue
method: z_score
threshold: 2.5
Workflow: 1. Agent reads query registry at scheduled times 2. Executes each query against the database using MCP tools 3. Caches results locally for comparison 4. Diffs against previous run: are any metrics anomalous? 5. Formats results for each configured destination 6. Sends to Slack, writes to Google Sheets, updates Notion database
2. Automated Dashboard Updates¶
Goal: Keep reporting dashboards current without manual data refreshes.
Workflow:
1. Define dashboard queries with destination mappings
2. On schedule, execute all queries for a given dashboard
3. Format results into the destination's expected schema
4. Write to Google Sheets (via read_sheet/write operations), Notion databases (via Notion API), or Airtable
5. Log update timestamps — if an update fails, flag it for investigation
Google Sheets Integration:
1. list_drive_files to find the target spreadsheet
2. read_sheet to verify current structure
3. execute database query via MCP
4. format results as 2D array matching sheet layout
5. write to sheet using appropriate API
Notion Integration:
1. notion_search to find the database page
2. notion_get_database to read the schema
3. execute database query
4. map database columns to Notion properties
5. update or create pages in the Notion database
Cron expression: 0 8 * * * for daily dashboards; 0 8 * * 1 for weekly rollups.
3. Anomaly Detection¶
Goal: Automatically detect unusual patterns in business metrics and alert the right team.
Detection Methods:
| Method | Best For | Threshold |
|---|---|---|
| Z-Score | Normally distributed metrics (revenue, user signups) | > 2.5 standard deviations |
| IQR | Metrics with outliers (support tickets, error rates) | > 1.5 × IQR beyond Q1/Q3 |
| Percent Change | Stable trend metrics (subscriber count, inventory) | > 20% day-over-day |
| Moving Average | Seasonal metrics (ecommerce sales, ad spend) | > 3 × rolling std dev |
Workflow: 1. After each scheduled query execution, compare current values to historical baseline 2. If any metric exceeds its threshold, generate an anomaly alert 3. Alert includes: metric name, current value, expected range, magnitude of deviation, possible explanations 4. Route by severity: minor anomalies → Slack channel; major anomalies → PagerDuty/on-call escalation 5. Log all anomalies (including dismissed ones) for false-positive tracking
False Positive Management: Track which alerts were actionable vs dismissed. After 4 weeks, adjust thresholds: - If > 30% of alerts for a metric are dismissed, increase the threshold - If a real incident wasn't detected, decrease the threshold
4. Data Quality Monitoring¶
Goal: Proactively detect data pipeline issues before they corrupt reports.
Quality Checks:
-- Row count anomaly: did yesterday's ETL fail?
SELECT count(*) FROM orders WHERE created_at::date = current_date - 1;
-- Null ratio: is a critical field going missing?
SELECT
count(*) as total,
count(email) as with_email,
round(100.0 * count(email) / count(*), 2) as email_fill_rate
FROM users WHERE created_at > current_date - interval '7 days';
-- Freshness check: when was the last record written?
SELECT max(updated_at) as last_write FROM orders;
-- Referential integrity: orphaned records?
SELECT count(*) FROM order_items oi
LEFT JOIN orders o ON oi.order_id = o.id
WHERE o.id IS NULL;
Workflow: 1. Run quality queries before business-critical reporting queries 2. If any check fails, halt dependent report generation and alert the data engineering team 3. Record quality metrics over time to identify degrading data sources 4. Build a quality score per data source (0-100) based on completeness, freshness, and accuracy
Cron expression: 0 5 * * * — runs before the main reporting window.
Cron Schedule Summary¶
| Task | Cron | Rationale |
|---|---|---|
| Daily metrics queries | 0 7 * * * |
Before team standups |
| Dashboard refresh | 0 8 * * * |
Start of business day |
| Data quality checks | 0 5 * * * |
Before reporting queries run |
| Weekly rollups | 0 8 * * 1 |
Monday morning |
| Anomaly detection | Triggered by query execution | Immediately after each query |
| Query performance audit | 0 2 * * 0 |
Sunday, low database load |
Implementation Notes¶
Query Performance¶
- Every scheduled query should have an
EXPLAINplan reviewed before production use - Set
statement_timeoutat the connection level (30 seconds default) - Monitor query execution time over time — queries that slow down indicate missing indexes or growing data
- Use materialized views for complex reporting queries that don't need real-time data
Idempotency¶
Design queries and destination writes to be idempotent — running the same report twice should produce the same result, not duplicate data. Use upsert patterns when writing to destinations.
Failure Handling¶
- If a query fails (timeout, connection error): retry once after 5 minutes, then alert
- If a destination write fails: buffer results and retry; escalate if unresolved after 1 hour
- Never silently drop report failures — missing data is worse than late data
Extending¶
- Add Slack slash commands so team members can trigger reports on-demand
- Connect email to deliver PDF report exports to stakeholders who prefer that format
- Integrate with BI tools (Metabase, Looker, Tableau) by refreshing their underlying datasets