Search catalog¶
AMX maintains an internal search catalog that combines three things — the database
catalog (tables, columns, descriptions), the document RAG index (PDFs, Word, Markdown),
and the code reference index (extracted snippets) — into a single embedding store.
That's what /ask queries to answer your questions, what /run consults for context
when drafting descriptions, and what /search exposes directly. This page walks
through how the catalog is built, when to /sync vs /rebuild, and how to recover
from corruption or model-mismatch errors.
Prerequisites¶
- AMX installed.
- An active DB profile (introspected at least once —
/syncwill do it). - Optionally, an active doc and / or code profile.
- An active LLM profile (used for the embedding step).
How the catalog is built¶
┌──────────────────────────────────┐
│ /sync │
├──────────────────────────────────┤
│ 1. Refresh introspection cache │ ← from DB profile
│ 2. Embed new / changed entries │ ← LLM profile (embedding model)
│ 3. Update Chroma index │
│ 4. Reconcile with audit trail │ ← from /history
└──────────────────────────────────┘
│
▼
~/.amx/chroma/ ← single Chroma directory; per-collection inside
│
▼
/ask, /search, /run (RAG context)
Three collections live inside ~/.amx/chroma/:
| Collection | Source | Refreshed by |
|---|---|---|
db_catalog |
DB tables, columns, existing comments | /sync |
documents |
RAG doc profile | /ingest |
code_refs |
Code agent profile | /code-scan |
/ask searches across all three and merges results.
Step-by-step¶
1. Initial sync¶
> /search status
Search catalog: empty (never synced)
> /sync
[1/4] Refreshing introspection cache ......... ok (47 tables, 1,283 columns)
[2/4] Embedding 1,330 entries ................ ok (8.4 s, $0.012)
[3/4] Updating Chroma index .................. ok
[4/4] Reconciling with description audit ..... ok (0 prior /apply runs)
✓ /sync finished. Catalog ready for /ask.
The first sync against a fresh DB takes a moment — one embedding call per table +
column. With text-embedding-3-small at OpenAI's price, a 1,300-entry warehouse
embeds for ~$0.01.
2. Incremental sync¶
After the first sync, re-running is cheap — only entries with changed names or descriptions are re-embedded:
> /sync
[1/4] Refreshing introspection cache ......... ok (47 tables, 1,283 columns)
[2/4] Embedding 12 changed entries ........... ok (0.8 s, $0.0001)
[3/4] Updating Chroma index .................. ok
[4/4] Reconciling with description audit ..... ok (3 new /apply runs since last sync)
✓ /sync finished in 1.4 s.
Run /sync after every /apply so descriptions you just wrote are immediately
searchable.
3. Search the catalog¶
> /search "customer addresses"
Top-8 results across all collections:
0.142 [db_catalog] sales.customer_address (table)
0.198 [documents] data-glossary/address.md p.1
0.214 [code_refs] models/marts/customer.sql:42
...
The [db_catalog] / [documents] / [code_refs] tag tells you which collection a
hit came from. Useful when retrieval looks off — you can immediately tell whether the
LLM was relying on the DB or the docs.
4. Status check¶
> /search status
Search catalog
db_catalog: 1,330 entries (last /sync 4 min ago)
documents: 412 chunks (last /ingest 1 hour ago)
code_refs: 412 chunks (last /code-scan 23 min ago)
Embedding model: openai/text-embedding-3-small
Index store: ~/.amx/chroma (size: 18.4 MB)
If any of the three lines say (never indexed), run the corresponding command:
/sync for db_catalog, /ingest for documents, /code-scan for code_refs.
5. Full rebuild¶
> /search rebuild
About to wipe ~/.amx/chroma and re-embed:
db_catalog: 1,330 entries
documents: 412 chunks
code_refs: 412 chunks
Estimated cost: $0.025
Proceed? [y/N]: y
[1/3] Wiping index ................... ok
[2/3] Re-syncing db_catalog .......... ok (8.1 s)
[3/3] Re-ingesting documents ......... ok (5.4 s)
[4/3] Re-scanning code_refs .......... ok (6.0 s)
✓ /search rebuild finished in 19.5 s.
/search rebuild is the right move when:
- You changed the embedding model in
~/.amx/config.yml(cosine distances become meaningless across models). - The Chroma directory got corrupted (rare but happens with abrupt power loss).
- You moved AMX to a different config dir and want a fresh index.
When to /sync vs /rebuild¶
| You did this | Then run |
|---|---|
/apply (wrote descriptions back to the DB) |
/sync |
| Added a new column / table on the DB | /sync |
Edited ~/.amx/config.yml to add a doc path |
/ingest |
| Edited the codebase | /code-scan |
Changed embedding_model: in YAML |
/search rebuild |
~/.amx/chroma/ got deleted / corrupted |
/search rebuild |
| Switched to a different DB profile | /sync (catalog is per-profile) |
Sample config¶
For larger / more nuanced catalogs, switch to openai/text-embedding-3-large (about
3× the cost, noticeably better retrieval on technical jargon) and run
/search rebuild.
Verify¶
> /search status— entry counts per collection, last-update timestamp, embedding model identity, on-disk size.> /search "<a phrase you indexed>"— confirms retrieval works at all.> /ask "<a question with a known answer>"— end-to-end test of retrieval + LLM answer.
Troubleshooting¶
| Symptom | Cause | Fix |
|---|---|---|
/search status shows the catalog but /ask says "I don't know" for everything |
Embedding-model mismatch (you changed model in YAML, didn't rebuild) | > /search rebuild |
Chroma error on startup: chromadb.errors.InvalidCollectionException |
Index dir corrupted | rm -rf ~/.amx/chroma and > /search rebuild |
/sync fails with OpenAI quota exceeded |
Free-tier embedding quota for the day exhausted | Wait, raise the quota tier, or switch to text-embedding-3-small (cheaper) |
| Index dir size keeps growing forever | /search rebuild not run since several model swaps; orphan collections accumulate |
rm -rf ~/.amx/chroma and > /search rebuild once |
Searches return only db_catalog results, never docs |
Doc profile not active | > /use-doc <name> then re-run /ask |
/sync takes 10+ minutes on a small warehouse |
Network round-trip to the embedding API per call | Use a batched embedding model (the OpenAI client batches automatically); confirm OPENAI_API_BASE isn't pointing at a slow proxy |
What's next¶
- Documents — populate the documents collection.
- Codebase — populate the code_refs collection.
- Ask & Search — query the catalog conversationally.