Import knowledge from Google Docs
The knowledge base stores articles in kb_documents with both PostgreSQL FTS and HNSW-indexed pgvector embeddings. The way you chunk and label content at import time decides how well agents find it later.
TL;DR
- Pick or create a space:
kb_list_spaces→kb_create_spaceif needed. - For each source doc, decide on chunking — whole doc as one article (preferred for short docs) or split by H2 (preferred for >2000 words).
- Call
kb_create_documentwithtitle,body(markdown),tags. - Embeddings are generated server-side asynchronously — search becomes accurate within seconds.
Step 1 — pick a space
kb_list_spaces returns existing spaces. A space is the org-scoped grouping for related docs (e.g. Product, Engineering, Public help center). One space per audience boundary works well; agents searching with kb_search can filter to a space.
If none fits: kb_create_space with { name, slug, description? }.
Step 2 — chunking strategy
The vector index works on whole-document embeddings. Chunking too aggressively (one paragraph per article) makes search noisy; not chunking at all (50KB blob) makes retrieval imprecise.
Heuristics:
- Short doc (<1500 words, single topic): one article. Title = source title.
- Medium doc (1500–4000 words, sections): one article. Trust FTS to find the right span.
- Long doc (>4000 words, distinct sections): split by H2 (
## …). Each section becomes its own article. Title = source title + " — " + section heading. - Reference table / FAQ: one article per Q&A pair. Title = the question.
When in doubt, fewer larger articles beat many tiny ones. A retrieval that returns "the right doc" lets the consuming agent read the full context.
Step 3 — create
{
"name": "kb_create_document",
"arguments": {
"spaceId": "<spaceId>",
"title": "Refund policy — international orders",
"body": "# Refund policy — international orders\n\nWe issue full refunds within 30 days of delivery for...",
"tags": ["policy", "refunds", "international"],
"public": false
}
}
bodyis markdown. Headings, lists, code blocks all preserved for the rendered view.tagsare a flat string array. Don't overuse — 3–5 high-signal tags > 20 weak ones.audiencesis an array of'admin'and / or'self_service'. Default['admin']keeps the doc admin-only; include'self_service'to surface it viakb_searchfor end-user agents.
Step 4 — embeddings
The server enqueues embedding generation immediately. There's no callback; just expect search to return the new doc within a few seconds. If you need to confirm: kb_search with a phrase you'd expect to match.
Bulk import pattern
For a Google Docs folder with N files:
- Export each as markdown (Google Docs → File → Download → Markdown).
- Run a script that reads each
.md, applies the chunking heuristic above, and callskb_create_documentper article. Munin's per-org rate limit applies; pace at ~10/sec or usekb_searchbetween batches to verify.
Updating existing docs
Use kb_update_document with { id, body } (or title, tags). Versions are kept automatically — kb_list_versions shows history; kb_restore_version reverts.
Common pitfalls
- Indexing only the first paragraph (because the model summarizes long docs at ingest time). Munin embeds the whole
bodyfield. Send the full text. - Putting structured data in
metadatainstead ofbody. FTS and embeddings only seebody(andtitle). Anything you want findable goes there. - Forgetting
tags. Tags don't drive search relevance directly but they're the cheapest way for an agent to filter (kb_searchwithtags: ['policy']).