Search catalog¶

AMX maintains an internal search catalog that combines three things — the database catalog (tables, columns, descriptions), the document RAG index (PDFs, Word, Markdown), and the code reference index (extracted snippets) — into a single embedding store. That's what /ask queries to answer your questions, what /run consults for context when drafting descriptions, and what /search exposes directly. This page walks through how the catalog is built, when to /sync vs /rebuild, and how to recover from corruption or model-mismatch errors.

Prerequisites¶

AMX installed.
An active DB profile (introspected at least once — /sync will do it).
Optionally, an active doc and / or code profile.
An active LLM profile (used for the embedding step).

How the catalog is built¶

┌──────────────────────────────────┐
│   /sync                          │
├──────────────────────────────────┤
│   1. Refresh introspection cache │   ← from DB profile
│   2. Embed new / changed entries │   ← LLM profile (embedding model)
│   3. Update Chroma index         │
│   4. Reconcile with audit trail  │   ← from /history
└──────────────────────────────────┘
              │
              ▼
   ~/.amx/chroma/  ← single Chroma directory; per-collection inside
              │
              ▼
       /ask, /search, /run (RAG context)

Three collections live inside ~/.amx/chroma/:

Collection	Source	Refreshed by
`db_catalog`	DB tables, columns, existing comments	`/sync`
`documents`	RAG doc profile	`/ingest`
`code_refs`	Code agent profile	`/code-scan`

/ask searches across all three and merges results.

Step-by-step¶

1. Initial sync¶

> /search status
Search catalog: empty (never synced)

> /sync
[1/4] Refreshing introspection cache .........  ok (47 tables, 1,283 columns)
[2/4] Embedding 1,330 entries ................  ok (8.4 s, $0.012)
[3/4] Updating Chroma index ..................  ok
[4/4] Reconciling with description audit .....  ok (0 prior /apply runs)
✓ /sync finished. Catalog ready for /ask.

The first sync against a fresh DB takes a moment — one embedding call per table + column. With text-embedding-3-small at OpenAI's price, a 1,300-entry warehouse embeds for ~$0.01.

2. Incremental sync¶

After the first sync, re-running is cheap — only entries with changed names or descriptions are re-embedded:

> /sync
[1/4] Refreshing introspection cache .........  ok (47 tables, 1,283 columns)
[2/4] Embedding 12 changed entries ...........  ok (0.8 s, $0.0001)
[3/4] Updating Chroma index ..................  ok
[4/4] Reconciling with description audit .....  ok (3 new /apply runs since last sync)
✓ /sync finished in 1.4 s.

Run /sync after every /apply so descriptions you just wrote are immediately searchable.

3. Search the catalog¶

> /search "customer addresses"
Top-8 results across all collections:
  0.142  [db_catalog]  sales.customer_address (table)
  0.198  [documents]   data-glossary/address.md p.1
  0.214  [code_refs]   models/marts/customer.sql:42
  ...

The [db_catalog] / [documents] / [code_refs] tag tells you which collection a hit came from. Useful when retrieval looks off — you can immediately tell whether the LLM was relying on the DB or the docs.

4. Status check¶

> /search status
Search catalog
  db_catalog: 1,330 entries (last /sync 4 min ago)
  documents:    412 chunks  (last /ingest 1 hour ago)
  code_refs:    412 chunks  (last /code-scan 23 min ago)
Embedding model: openai/text-embedding-3-small
Index store: ~/.amx/chroma  (size: 18.4 MB)

If any of the three lines say (never indexed), run the corresponding command: /sync for db_catalog, /ingest for documents, /code-scan for code_refs.

5. Full rebuild¶

> /search rebuild
About to wipe ~/.amx/chroma and re-embed:
  db_catalog: 1,330 entries
  documents:  412 chunks
  code_refs:  412 chunks
  Estimated cost: $0.025

Proceed? [y/N]: y

[1/3] Wiping index ...................  ok
[2/3] Re-syncing db_catalog ..........  ok (8.1 s)
[3/3] Re-ingesting documents .........  ok (5.4 s)
[4/3] Re-scanning code_refs ..........  ok (6.0 s)
✓ /search rebuild finished in 19.5 s.

/search rebuild is the right move when:

You changed the embedding model in ~/.amx/config.yml (cosine distances become meaningless across models).
The Chroma directory got corrupted (rare but happens with abrupt power loss).
You moved AMX to a different config dir and want a fresh index.

When to `/sync` vs `/rebuild`¶

You did this	Then run
`/apply` (wrote descriptions back to the DB)	`/sync`
Added a new column / table on the DB	`/sync`
Edited `~/.amx/config.yml` to add a doc path	`/ingest`
Edited the codebase	`/code-scan`
Changed `embedding_model:` in YAML	`/search rebuild`
`~/.amx/chroma/` got deleted / corrupted	`/search rebuild`
Switched to a different DB profile	`/sync` (catalog is per-profile)

Sample config¶

search:
  embedding_model: openai/text-embedding-3-small
  top_k: 8
  index_store: ~/.amx/chroma

For larger / more nuanced catalogs, switch to openai/text-embedding-3-large (about 3× the cost, noticeably better retrieval on technical jargon) and run /search rebuild.

Verify¶

> /search status — entry counts per collection, last-update timestamp, embedding model identity, on-disk size.
> /search "<a phrase you indexed>" — confirms retrieval works at all.
> /ask "<a question with a known answer>" — end-to-end test of retrieval + LLM answer.

Troubleshooting¶

Symptom	Cause	Fix
`/search status` shows the catalog but `/ask` says "I don't know" for everything	Embedding-model mismatch (you changed model in YAML, didn't rebuild)	`> /search rebuild`
Chroma error on startup: `chromadb.errors.InvalidCollectionException`	Index dir corrupted	`rm -rf ~/.amx/chroma` and `> /search rebuild`
`/sync` fails with `OpenAI quota exceeded`	Free-tier embedding quota for the day exhausted	Wait, raise the quota tier, or switch to `text-embedding-3-small` (cheaper)
Index dir size keeps growing forever	`/search rebuild` not run since several model swaps; orphan collections accumulate	`rm -rf ~/.amx/chroma` and `> /search rebuild` once
Searches return only `db_catalog` results, never docs	Doc profile not active	`> /use-doc <name>` then re-run `/ask`
`/sync` takes 10+ minutes on a small warehouse	Network round-trip to the embedding API per call	Use a batched embedding model (the OpenAI client batches automatically); confirm `OPENAI_API_BASE` isn't pointing at a slow proxy

What's next¶

Documents — populate the documents collection.
Codebase — populate the code_refs collection.
Ask & Search — query the catalog conversationally.