Skip to content

Database + Reporting Integration — Automated Queries & Dashboards

Architecture Overview

┌──────────┐     ┌──────────────┐     ┌───────────┐
│ Database │────▶│ Query Engine │────▶│ Reporting │
│ (SQL,    │     │ + Scheduler  │     │ (Notion,  │
│  NoSQL)  │     │              │     │ Sheets,   │
└──────────┘     └──────┬───────┘     │ Slack)    │
                        │             └───────────┘
                  ┌─────▼──────┐
                  │  Results   │
                  │  Cache +   │
                  │  Diffing   │
                  └────────────┘

The query engine executes parameterized SQL against production databases (or read replicas). Results are cached, diffed against previous runs to detect anomalies, and distributed to the configured reporting destinations.

MCP Server Setup

Database Servers

PostgreSQL MCP Server — provides query_database, list_database_tables, and describe_table. Configure with connection string containing host, port, database name, user, and password. Critical: use a read-only database user. Production write access from automation is a significant risk.

MSSQL MCP Server — provides configure_mssql_connection, query_mssql_database, and list_mssql_tables. Supports SQL Server authentication and Azure AD. Configure with encrypted connections enabled.

MongoDB MCP Server — provides aggregation pipeline execution for document-based data stores. Use for operational metrics stored in document collections.

Cosmos DB MCP Server — provides configure_cosmos_connection, query_cosmos_database, and list_cosmos_containers. Supports both key-based and Azure RBAC authentication.

Security Configuration

database_config:
  host: "read-replica.internal.example.com"
  port: 5432
  database: "analytics"
  user: "hermes_agent_readonly"
  ssl_mode: "require"
  statement_timeout: 30000  # 30-second query timeout
  max_rows: 10000           # Prevent accidental full-table scans

Always use read replicas for reporting queries. Production database load from reporting automation can cause outages. Most cloud providers (RDS, Cloud SQL, Azure DB) support read replicas with minimal configuration.

Core Automation Scenarios

1. Scheduled Query Execution

Goal: Run business-critical queries on a schedule and distribute results.

Query Registry Pattern: Define queries in a version-controlled directory with metadata:

# queries/daily_revenue.yaml
name: "Daily Revenue Summary"
sql: |
  SELECT 
    date_trunc('day', created_at) as day,
    sum(amount) as revenue,
    count(*) as transactions,
    sum(amount) / count(*) as avg_order_value
  FROM orders
  WHERE created_at >= current_date - interval '7 days'
  GROUP BY 1
  ORDER BY 1 DESC
schedule: "0 7 * * *"
destinations:
  - type: slack
    channel: "#finance-daily"
  - type: google_sheets
    spreadsheet_id: "1ABC..."
    range: "Revenue!A1"
anomaly_detection:
  enabled: true
  metric: revenue
  method: z_score
  threshold: 2.5

Workflow: 1. Agent reads query registry at scheduled times 2. Executes each query against the database using MCP tools 3. Caches results locally for comparison 4. Diffs against previous run: are any metrics anomalous? 5. Formats results for each configured destination 6. Sends to Slack, writes to Google Sheets, updates Notion database

2. Automated Dashboard Updates

Goal: Keep reporting dashboards current without manual data refreshes.

Workflow: 1. Define dashboard queries with destination mappings 2. On schedule, execute all queries for a given dashboard 3. Format results into the destination's expected schema 4. Write to Google Sheets (via read_sheet/write operations), Notion databases (via Notion API), or Airtable 5. Log update timestamps — if an update fails, flag it for investigation

Google Sheets Integration:

1. list_drive_files to find the target spreadsheet
2. read_sheet to verify current structure
3. execute database query via MCP
4. format results as 2D array matching sheet layout
5. write to sheet using appropriate API

Notion Integration:

1. notion_search to find the database page
2. notion_get_database to read the schema
3. execute database query
4. map database columns to Notion properties
5. update or create pages in the Notion database

Cron expression: 0 8 * * * for daily dashboards; 0 8 * * 1 for weekly rollups.

3. Anomaly Detection

Goal: Automatically detect unusual patterns in business metrics and alert the right team.

Detection Methods:

Method Best For Threshold
Z-Score Normally distributed metrics (revenue, user signups) > 2.5 standard deviations
IQR Metrics with outliers (support tickets, error rates) > 1.5 × IQR beyond Q1/Q3
Percent Change Stable trend metrics (subscriber count, inventory) > 20% day-over-day
Moving Average Seasonal metrics (ecommerce sales, ad spend) > 3 × rolling std dev

Workflow: 1. After each scheduled query execution, compare current values to historical baseline 2. If any metric exceeds its threshold, generate an anomaly alert 3. Alert includes: metric name, current value, expected range, magnitude of deviation, possible explanations 4. Route by severity: minor anomalies → Slack channel; major anomalies → PagerDuty/on-call escalation 5. Log all anomalies (including dismissed ones) for false-positive tracking

False Positive Management: Track which alerts were actionable vs dismissed. After 4 weeks, adjust thresholds: - If > 30% of alerts for a metric are dismissed, increase the threshold - If a real incident wasn't detected, decrease the threshold

4. Data Quality Monitoring

Goal: Proactively detect data pipeline issues before they corrupt reports.

Quality Checks:

-- Row count anomaly: did yesterday's ETL fail?
SELECT count(*) FROM orders WHERE created_at::date = current_date - 1;

-- Null ratio: is a critical field going missing?
SELECT 
  count(*) as total,
  count(email) as with_email,
  round(100.0 * count(email) / count(*), 2) as email_fill_rate
FROM users WHERE created_at > current_date - interval '7 days';

-- Freshness check: when was the last record written?
SELECT max(updated_at) as last_write FROM orders;

-- Referential integrity: orphaned records?
SELECT count(*) FROM order_items oi
LEFT JOIN orders o ON oi.order_id = o.id
WHERE o.id IS NULL;

Workflow: 1. Run quality queries before business-critical reporting queries 2. If any check fails, halt dependent report generation and alert the data engineering team 3. Record quality metrics over time to identify degrading data sources 4. Build a quality score per data source (0-100) based on completeness, freshness, and accuracy

Cron expression: 0 5 * * * — runs before the main reporting window.

Cron Schedule Summary

Task Cron Rationale
Daily metrics queries 0 7 * * * Before team standups
Dashboard refresh 0 8 * * * Start of business day
Data quality checks 0 5 * * * Before reporting queries run
Weekly rollups 0 8 * * 1 Monday morning
Anomaly detection Triggered by query execution Immediately after each query
Query performance audit 0 2 * * 0 Sunday, low database load

Implementation Notes

Query Performance

  • Every scheduled query should have an EXPLAIN plan reviewed before production use
  • Set statement_timeout at the connection level (30 seconds default)
  • Monitor query execution time over time — queries that slow down indicate missing indexes or growing data
  • Use materialized views for complex reporting queries that don't need real-time data

Idempotency

Design queries and destination writes to be idempotent — running the same report twice should produce the same result, not duplicate data. Use upsert patterns when writing to destinations.

Failure Handling

  • If a query fails (timeout, connection error): retry once after 5 minutes, then alert
  • If a destination write fails: buffer results and retry; escalate if unresolved after 1 hour
  • Never silently drop report failures — missing data is worse than late data

Extending

  • Add Slack slash commands so team members can trigger reports on-demand
  • Connect email to deliver PDF report exports to stakeholders who prefer that format
  • Integrate with BI tools (Metabase, Looker, Tableau) by refreshing their underlying datasets