Kimi¶

Kimi (Moonshot) is a Chinese provider whose K2.x line targets extended thinking workloads. AMX recognises the kimi provider key and routes calls through Moonshot's OpenAI-compatible HTTPS endpoint.

The same models are also reachable via OpenRouter; pick kimi when you want a direct relationship with Moonshot, pick OpenRouter when you'd rather use one routing key across providers.

Prerequisites¶

A Moonshot platform account and API key. Sign up at platform.moonshot.ai and create a key under API Keys.
AMX installed (pip install amx-cli).

`/add-llm-profile` walkthrough¶

> /add-llm-profile
Profile name: kimi-thinking
Provider:     kimi
Model:        kimi-k2-thinking
API base:     https://api.moonshot.cn/v1
API key:      sk-…                              # paste from the dashboard
Temperature:  0.2
Output token budget (max_tokens) [4096]:        # press Enter
Number of alternatives per column [3]:          # press Enter
✓ Saved LLM profile 'kimi-thinking' to ~/.amx/config.yml

Sample config block¶

llm_profiles:
  kimi-thinking:
    provider: kimi
    model: kimi-k2-thinking
    api_base: https://api.moonshot.cn/v1
    api_key: keyring://amx/kimi-thinking/api_key
    temperature: 0.2
    n_alternatives: 3
active_llm_profile: kimi-thinking

kimi is routed through an OpenAI-compatible endpoint, so the wizard prompts for an api_base. Moonshot publishes its endpoint as https://api.moonshot.cn/v1. The provider key uses AMX_LLM_API_KEY as a fallback when the YAML api_key field is blank — there is no provider-specific env var equivalent to OPENAI_API_KEY.

Model selection¶

Model	When to pick it
`moonshot-v1-8k`	Lightweight chat, 8 K context, low cost
`moonshot-v1-32k`	Same, larger context — useful on very wide tables
`moonshot-v1-128k`	128 K context — when both the schema and the doc/code RAG payload are large
`kimi-k2-thinking`	Extended-thinking reasoning route. Strong on cryptic schemas
`kimi-k2-instruct`	Instruction-tuned chat variant of K2

K2.x reasoning variants ("-thinking" suffix) are reasoning routes — AMX applies the 32 768-token output floor, the 4× retry budget on finish_reason=length, and passes reasoning.effort (default low, bumpable via AMX_REASONING_EFFORT).

Cost notes¶

K2.x reasoning routes are noticeably cheaper than gpt-5 or claude-opus-4 for comparable reasoning workloads, but the per-request latency is higher (the model is doing more thinking). Use it for cryptic legacy schemas on a hard subset, not as the bulk drafting workhorse.

Live rates surface in Studio → Pricing.

Logprobs¶

K2.x reasoning routes do not return logprobs at this time. AMX falls back to model-declared confidence buckets — high / medium / low are still surfaced, they just come from a different signal than the calibrated logprob threshold.

Troubleshooting¶

Symptom	Fix
`401 Unauthorized`	Re-check the API key. Moonshot keys start with `sk-`
`429 rate limited`	New accounts have low QPS caps. Lower `column_batch_size` or contact Moonshot for a higher quota tier
Reasoning route returns 0 visible characters	The 32 768 floor wasn't enough. `export AMX_LLM_MIN_MAX_TOKENS=65536`
Very slow first response	K2.x thinking models are intrinsically slower. Raise `AMX_LLM_TIMEOUT_SEC` if AMX bails out before the answer arrives