LLM providers¶
AMX talks to LLMs through a single unified interface, so you can swap providers per profile without touching application code or prompts. This page summarises which providers are supported, the trade-offs between them, and where to start when picking the right one for your workload.
Pick a provider¶
Use this short decision tree before reaching for any specific page:
- First-time AMX user, prototyping → OpenAI with
gpt-4o. The most battle-tested provider; every prompt template and confidence threshold is calibrated against it first. - Cost-sensitive whole-warehouse drafting → Batch mode with
gpt-4o-miniorclaude-haiku-3-5. ~50% off the live-API rate, async SLA. - Cryptic legacy schemas (transliterated names, abbreviations) → Anthropic with
claude-sonnet-4or extended-thinking on a hard subset. - Big context windows for very wide tables → Gemini with
gemini-2.0-flashandcolumn_batch_size: 15. - Already on Databricks → Databricks Serving with a Foundation Model endpoint (e.g.
databricks-meta-llama-3-1-70b-instruct). Same workspace as your data, billed against your existing Databricks contract, no extra vendor. - On-prem / air-gapped → Ollama and local. Llama-3 / Qwen / DeepSeek work; logprob calibration is per-model.
Provider matrix¶
AMX ships with 9 provider keys: 6 hosted (OpenAI, Anthropic, Gemini, Databricks Serving, OpenRouter, DeepSeek) and 3 keyless / self-hosted (Ollama, local-via-LiteLLM, Kimi).
| Provider | Default model | Cost lens | Logprobs | Batch API | Key file |
|---|---|---|---|---|---|
| OpenAI | gpt-4o |
Mid (cheap with mini) |
✓ native | ✓ (batch) | sk-… |
| Anthropic | claude-sonnet-4-20250514 |
Mid–High | ✓ derived | ✓ (batch) | sk-ant-… |
| Gemini | gemini-2.0-flash |
Low | ✓ native | ✗ in AMX yet | AIza… |
| Databricks Serving | databricks-meta-llama-3-1-70b-instruct |
Bills against your Databricks workspace | varies (per-endpoint) | ✗ | Databricks PAT |
| OpenRouter | provider/model id |
Varies (small markup over upstream) | varies (per-route) | ✗ | sk-or-… |
| DeepSeek | deepseek-chat |
Very low | ✓ native | ✗ | sk-… |
| Ollama / local | llama3 |
Free (compute is yours) | varies | ✗ | optional |
| Kimi | moonshot-v1-8k / kimi-k2-thinking |
Low–Mid (reasoning-heavy) | varies | ✗ | API key |
Kimi and local are routed through OpenAI-compatible HTTPS endpoints
and reuse the OpenAI client under the hood. OpenRouter is a routing
layer — every key it supports translates to one of the providers above.
Generation defaults that apply across all providers¶
The wizard sets these once per profile (you can edit later in ~/.amx/config.yml):
n_alternatives: 3— how many candidate descriptions per column. The review wizard offers 1, 2, 3 keys to pick.column_batch_size: 10— how many columns AMX packs into one prompt. Bigger = cheaper / column, smaller = higher quality on wide tables.temperature: 0.2— deterministic by default.0.4–0.7for more variety in alternatives.logprob_high: 0.85/logprob_medium: 0.50— confidence thresholds for thehigh/medium/lowbuckets. See/logprob-thresholds.
Per-provider tuning notes live on each provider's page.
Reasoning models — output budgeting¶
Reasoning routes (OpenAI o-series and gpt-5 reasoning variants, Anthropic
extended thinking on Claude Sonnet / Opus 4, DeepSeek-reasoner, and OpenRouter
thinking variants like Kimi K2.x, Qwen3-thinking, and GLM-4.6-thinking) can
spend the entire max_tokens budget on internal reasoning, leaving the
visible answer empty with finish_reason=length. To keep the answer slot
viable, AMX:
- Floors the output budget for reasoning routes at 32 768 tokens
(
_DEFAULT_REASONING_FLOOR). Override with theAMX_LLM_MIN_MAX_TOKENSenv var if you've tuned a specific model. - Auto-retries once at
max_tokens × 4(capped at 131 072 tokens,_REASONING_AUTO_RETRY_CAP) when a reasoning call returns 0 visible characters andfinish_reason=length. The retry only fires for reasoning routes; standard chat models keep the user'smax_tokensas-is. - Passes through
reasoning_effortto OpenAI / OpenRouter so a higher effort can be selected per LLM profile (/llmwizard).
The visible reasoning text — Anthropic thinking blocks, DeepSeek's
reasoning_content — is exposed as the Reasoning trace card on the
Studio Run detail Summary tab when the provider returns one, so the floor
buys you a useful answer and an inspectable thought trail.
Costing rule of thumb¶
For a typical 47-table / 1,283-column schema, drafting descriptions once:
| Setup | Approximate cost |
|---|---|
Live gpt-4o-mini, batch_size 10 |
$1.00–$1.50 |
Live gpt-4o, batch_size 10 |
$4.00–$6.00 |
Batch gpt-4o-mini |
$0.50–$0.75 |
Live claude-sonnet-4 |
$5.00–$8.00 |
Live gemini-2.0-flash, batch_size 15 |
$0.40–$0.80 |
Local llama3 on a workstation |
$0 (bring your own GPU) |
Numbers are illustrative — actual cost depends on column-name length, sample-value length, and the provider's per-token rate at the time.
AMX no longer makes you guess. Every LLM call reports both tokens
and USD at every surface that triggered it — /run, /run-apply,
/ask, the Studio run progress header, and the lifetime
cost card on the Studio Overview. Cost comes from a versioned
per-(provider, model) pricing table that AMX caches on disk with a
freshness timestamp; the Studio top bar shows a pricing-cache
freshness badge and a one-click refresh button. You can pin a
price override per model from Settings → LLM (an auto-detected
hint pre-fills the field). Every run row records both the price it
ran at (frozen) and the price it would cost today (live), so a stale
price never silently rewrites history. See
Studio → Pricing for the pricing browser and
Studio → System → Token usage for
the windowed breakdown.
Setup walkthroughs¶
Each provider page follows the same template: prerequisites → /add-llm-profile
walkthrough with verbatim wizard prompts → sample ~/.amx/config.yml block → verify
steps → troubleshooting table → what to read next.
- OpenAI — the default; logprob-threshold tuning.
- Anthropic — Claude model selection, extended thinking.
- Gemini — model picks, safety-filter handling.
- Databricks Serving — Foundation Models or custom serving endpoints; same workspace as your data.
- DeepSeek — cheap, native logprobs, optional reasoning route.
- OpenRouter — multi-model router, one key for many providers.
- Kimi — Moonshot's K2.x reasoning models.
- Ollama and local — on-prem / air-gapped setup.
- Batch mode — async / cheap drafts via OpenAI / Anthropic batch APIs.
Override RAG with a separate profile¶
The RAG agent (which fuses documentation + codebase evidence into a column description) can run on a different LLM profile than the one drafting columns. Useful when a cheaper fast model is enough for prose synthesis but you still want a stronger model on the column-drafting batch path.