Skip to content

Honcho Memory Usage

Honcho is a server-side semantic memory store wired into Hermes as an MCP server, NOT as the active memory provider. The built-in MEMORY.md + USER.md remain the hot path (injected into every system prompt for free). Honcho is the cold path: server-side semantic store you query on-demand via tool calls.

Mental model

Layer What it holds Cost Latency
MEMORY.md / USER.md ~12KB hand-curated declarative facts injected every turn (in token bill) zero — already in prompt
Skills Procedural runbooks loaded on demand one tool call when loaded ~1s
session_search (SQLite FTS5) Past conversation transcripts one tool call ~100ms local
Honcho (this skill) Semantic representation built from messages we explicitly write one MCP roundtrip + Honcho processing ~500ms-2s per call
Honcho chat (dialectic) Theory-of-mind queries against the representation model call inside Honcho ~2-5s

Honcho is empty by default. It only knows what we tell it. Querying an empty workspace returns nothing useful — by design.

When to WRITE to Honcho

The bar for writing is high. Most "save this" instincts should go to MEMORY.md instead. Honcho is for facts that:

  • Are too verbose for the prompt budget (multi-paragraph context, longer narrative facts)
  • Require semantic search to retrieve (you'll query by intent, not by key)
  • Survive across sessions but don't need to be in every turn's context

Examples that belong in Honcho: - "The 2026-06-10 architecture review decision: we picked Cloud Run over Lambda because of cold-start latency in the auth flow. Here's the full reasoning..." - "Detailed customer profile facts: their team structure, their security posture, their tooling preferences, the specific objections they've raised over four calls" - "Long-form vendor-specific lessons: every gotcha we hit during the Stripe integration over six months"

Examples that belong in MEMORY.md instead: - "User prefers concise responses" - "Project uses pytest with xdist" - "Server-local time is GMT-7"

If you can express the fact in one sentence and it'll be useful every session, MEMORY.md. If it's multi-paragraph and you'll search for it semantically, Honcho.

When to READ from Honcho

Read when: - A topic comes up that you suspect has substantial prior context - The user references a past decision you don't have in MEMORY.md - You're starting work on a domain where Honcho might have stored context (e.g. "let's pick up the Stripe migration" → query for Stripe-related facts)

Don't read on every turn — Honcho roundtrips cost real latency. Use it like a card catalog, not like a system prompt.

API shape (MCP-exposed tools)

The Honcho MCP server exposes (depending on configuration):

Tool Purpose
mcp_honcho_add_messages_to_session Append messages to a session (the substrate Honcho builds the representation from)
mcp_honcho_query_conclusions Semantic search over derived conclusions (facts/observations about a peer)
mcp_honcho_chat Natural-language Q&A against the representation ("what does Honcho know about ?")
mcp_honcho_get_peer_card Compact biographical fact list
mcp_honcho_get_peer_context Combines representation + peer card
mcp_honcho_create_conclusions Manually inject conclusions (use for facts not derived from messages)

Workspace + peer naming

Each Hermes profile maps to a Honcho workspace. Multiple agents/sessions inside the same profile share a workspace. The peer identifies a single real human — typically the user's email or a stable identifier.

Confusing the boundary leads to "I queried Honcho and got nothing useful" — usually because you queried the wrong workspace or wrong peer.

Cost discipline

Honcho is not free. Each chat call invokes a model inside Honcho's substrate, so:

  • Default reasoning level: low or medium for most queries
  • Use high or max only when the query is genuinely complex and the answer matters
  • query_conclusions is cheaper than chat — prefer it when you want raw matched facts, not a synthesized answer

When NOT to use Honcho

  • You're trying to remember WHAT happened in a past session — use session_search against the local SQLite FTS5 store. Honcho's job is "what does Hermes know about the user/topic," not "what did we discuss on Tuesday."
  • The fact is a one-liner — use MEMORY.md.
  • You're building procedural knowledge ("how to do X") — use a skill, not Honcho.
  • You need the fact in every turn's prompt — use MEMORY.md.
  • scheduled-jobs — cron jobs that need cross-session memory often want Honcho rather than ad-hoc state files