Skip to content
Home Reference LLM Providers Kimi

Kimi

Kimi (Moonshot) is a Chinese provider whose K2.x line targets extended thinking workloads. AMX recognises the kimi provider key and routes calls through Moonshot's OpenAI-compatible HTTPS endpoint.

The same models are also reachable via OpenRouter; pick kimi when you want a direct relationship with Moonshot, pick OpenRouter when you'd rather use one routing key across providers.

Prerequisites

  • A Moonshot platform account and API key. Sign up at platform.moonshot.ai and create a key under API Keys.
  • AMX installed (pip install amx-cli).

/add-llm-profile walkthrough

> /add-llm-profile
Profile name: kimi-thinking
Provider:     kimi
Model:        kimi-k2-thinking
API base:     https://api.moonshot.cn/v1
API key:      sk-…                              # paste from the dashboard
Temperature:  0.2
Output token budget (max_tokens) [4096]:        # press Enter
Number of alternatives per column [3]:          # press Enter
✓ Saved LLM profile 'kimi-thinking' to ~/.amx/config.yml

Sample config block

llm_profiles:
  kimi-thinking:
    provider: kimi
    model: kimi-k2-thinking
    api_base: https://api.moonshot.cn/v1
    api_key: keyring://amx/kimi-thinking/api_key
    temperature: 0.2
    n_alternatives: 3
active_llm_profile: kimi-thinking

kimi is routed through an OpenAI-compatible endpoint, so the wizard prompts for an api_base. Moonshot publishes its endpoint as https://api.moonshot.cn/v1. The provider key uses AMX_LLM_API_KEY as a fallback when the YAML api_key field is blank — there is no provider-specific env var equivalent to OPENAI_API_KEY.

Model selection

Model When to pick it
moonshot-v1-8k Lightweight chat, 8 K context, low cost
moonshot-v1-32k Same, larger context — useful on very wide tables
moonshot-v1-128k 128 K context — when both the schema and the doc/code RAG payload are large
kimi-k2-thinking Extended-thinking reasoning route. Strong on cryptic schemas
kimi-k2-instruct Instruction-tuned chat variant of K2

K2.x reasoning variants ("-thinking" suffix) are reasoning routes — AMX applies the 32 768-token output floor, the 4× retry budget on finish_reason=length, and passes reasoning.effort (default low, bumpable via AMX_REASONING_EFFORT).

Cost notes

K2.x reasoning routes are noticeably cheaper than gpt-5 or claude-opus-4 for comparable reasoning workloads, but the per-request latency is higher (the model is doing more thinking). Use it for cryptic legacy schemas on a hard subset, not as the bulk drafting workhorse.

Live rates surface in Studio → Pricing.

Logprobs

K2.x reasoning routes do not return logprobs at this time. AMX falls back to model-declared confidence buckets — high / medium / low are still surfaced, they just come from a different signal than the calibrated logprob threshold.

Troubleshooting

Symptom Fix
401 Unauthorized Re-check the API key. Moonshot keys start with sk-
429 rate limited New accounts have low QPS caps. Lower column_batch_size or contact Moonshot for a higher quota tier
Reasoning route returns 0 visible characters The 32 768 floor wasn't enough. export AMX_LLM_MIN_MAX_TOKENS=65536
Very slow first response K2.x thinking models are intrinsically slower. Raise AMX_LLM_TIMEOUT_SEC if AMX bails out before the answer arrives