DEV Community

Cover image for AWS Silently Releases Kimi K2.5 and GLM 4.7 Models to Bedrock
Gabriel Koo for AWS Community Builders

Posted on • Edited on

AWS Silently Releases Kimi K2.5 and GLM 4.7 Models to Bedrock

[UPDATE 10 Feb 2026] - It seems it was part of AWS’s rollout plan for rolling out open weight models for Kiro and Kiro CLI! Open weight models are here: more choice, more speed, less cost

But interestingly among the models covered in this article, only DeepSeek v3.2, Minimax 2.1 and Qwen Coder next were covered. Moonshot K2.5 and GLM-4.7 were missing.
————
I was refreshing my Bedrock model catalog script our of random curiosity when a few unfamiliar model IDs showed up in us-east-1. No AWS blog post. No tweet thread. Just a few new entries in the API response.

If you've been waiting for a Claude-adjacent model you could swap in seamlessly via AWS credits — this is it.

But there’s a drawback for early adopters - Do read till the end to learn about the flaw!

Kimi K2.5 (by Moonshot AI), GLM 4.7 (by Zhipu AI), and several other new models like DeepSeek 3.2 and Qwen3 Coder Next are now live on Bedrock, all with full support for the Converse API, tool calling, and — in Kimi K2.5's case — native image understanding.

Quick note: These models aren't listed in the AWS Bedrock models-supported documentation, yet they're fully functional via the Converse API. That's the whole "silent release" thing — available in production, just not reflected in the canonical docs yet. Worth bookmarking your region's actual model list from the Bedrock console instead of relying solely on the written guides.

The models: what just landed

Kimi K2.5 (Moonshot AI blog) is the eye-catcher here:

  • Tool calling (function calling): ✓ Fully supported via Bedrock Converse API
  • Image understanding: ✓ Native image inputs (base64 or URL)
  • Code generation: In my testing, it held its own against Claude 4.5 Sonnet on typical coding prompts — it handled a multi-file refactor of a FastAPI router cleanly on the first try
  • Bedrock Model ID: moonshotai.kimi-k2.5
  • Availability: us-east-1, us-west-2 (and expanding)
  • Use case fit: Drop-in replacement for Claude if you're already on AWS credits

GLM 4.7 (Zhipu AI blog) fills a quieter but useful role:

  • Tool calling: ✓ Supported, though less aggressively tested in my flows
  • Code generation: Strong; competitive with Deepseek for certain workloads
  • Bedrock Model ID: zai.glm-4.7(-flash)
  • Availability: us-east-1, us-west-2
  • Use case fit: Solid all-arounder; good for prompts that don't strictly require image handling

The real unlock: Both are live on converse API, which means they work seamlessly with Bedrock's function-calling infrastructure.

When to pick which

Need Pick Why
Image understanding + tool calling Kimi K2.5 Only Bedrock open-weight flagship model with both
Text-only tasks, cost-conscious GLM 4.7 Solid all-arounder, no vision overhead
Maximum reliability & ecosystem Claude Battle-tested, widest documentation

Why this matters for vibe coding

"Vibe coding" — the practice of rapidly iterating on code with LLM assistance, swapping models mid-session, and optimizing for flow over perfection — lives or dies on how frictionless your model-switching is.

If you're sitting on expiring AWS credits (I've got ~$700 by July 2026), the bottleneck isn't usually "which model is smartest?" — it's "how fast can I swap without rewriting everything?"

Kimi K2.5 solves a real pain point: until now, if you wanted image understanding + tool calling + AWS-native billing, you were stuck with Claude. And the only way you could have done so is via purchasing Kiro CLI subscription ($20/$40/$200 per month) - Since Anthropic Claude models are not covered by the typical AWS Credits.

I initially considered subscribing to the $200 Kiro Power plan, but then I am not confident that I could utilize the plan fully every month, but I'm worried that sticking to the $20/$40 plans will result to paying for Kiro credit overages (which is double of the average plan price per credit). Therefore any PAYG option would perfectly fit my usage pattern.

So now you have an option that:

  1. Bills directly to your AWS account — no vendor intermediary, no separate API key, just your existing credits burning down
  2. Runs on the same Bedrock Converse API
  3. Calls tools reliably
  4. Natively understands images

For experimentation loops (refactors, code generation, visual analysis), that's a genuinely useful escape hatch.

The lightweight setup: local LiteLLM gateway

You don't need a complex setup. My entire gateway is:

  • A Python venv with litellm installed
  • A single YAML config file (shown in the next section)
  • A systemd unit to keep it running on port 4000

No containers, no Kubernetes. One command to install, one service file to manage. Once running, any client on that machine calls http://localhost:4000/chat/completions with the standard OpenAI format, and LiteLLM translates it to Bedrock Converse API automatically.

Performance note: In my testing, the LiteLLM translation layer adds negligible latency (~20–50ms overhead). Streaming responses from Kimi K2.5 feel comparable to calling Claude directly — first tokens arrive within 1–2 seconds for typical prompts.

Bonus: Claude Code & OpenCode integration

Here's the slightly cheeky part: you can point Claude Code or OpenCode at your local LiteLLM gateway and route requests through to Kimi K2.5 or GLM 4.7 on Bedrock — all while staying on your AWS credits.

LiteLLM supports the Anthropic /v1/messages API endpoint, so it's a two-liner to set up:

export ANTHROPIC_BASE_URL=http://localhost:4000
export ANTHROPIC_AUTH_TOKEN=sk-your-litellm-key
export ANTHROPIC_MODEL=kimi-k2.5
export DISABLE_PROMPT_CACHING=true
Enter fullscreen mode Exit fullscreen mode

The DISABLE_PROMPT_CACHING=true is essential here (Special thanks my colleague [at]Marty for troubleshooting and fixing that) - since by default Claude Code tries to apply prompt caching for speed, but not every model on Amazon Bedrock supports that:

Then launch Claude Code or OpenCode as usual. LiteLLM intercepts the Anthropic-format requests and translates them to Bedrock Converse calls. It's not officially blessed by Anthropic, but it works cleanly for local experimentation — and your AWS credits take the hit instead of your Anthropic billing.

Config: explicit about capabilities

Here's how I route Kimi K2.5 and GLM 4.7:

model_list:
  - model_name: kimi-k2.5
    litellm_params:
      model: bedrock/converse/moonshotai.kimi-k2.5
      aws_region_name: us-east-1
      allowed_openai_params: ['reasoning_effort', 'tools', 'tool_choice']
    model_info:
      mode: completion

  - model_name: glm-4.7
    litellm_params:
      model: bedrock/converse/zai.glm-4.7
      aws_region_name: us-east-1
      allowed_openai_params: ['reasoning_effort', 'tools', 'tool_choice']    
    model_info:
      mode: completion

litellm_settings:
  modify_params: true
  log_responses: true
Enter fullscreen mode Exit fullscreen mode

Key patterns here:

  • Friendly names (kimi-k2.5, glm-4.7) instead of long model IDs
  • Explicit capability flags (supports_function_calling, supports_vision)
  • modify_params: true for Bedrock edge-case smoothing
  • Single region (us-east-1) since both models are there

The capability flags matter. They let your orchestration layer (or agent framework) gracefully degrade if a model can't do tools or images. No more "half-attempt to call a function and fail mysteriously."

Testing: quick verification

I tested both models with the Bedrock Converse API in us-east-1. Here's what actually happened:

Kimi K2.5: I threw a "get current weather in Tokyo" tool spec at it — standard JSON Schema function definition, nothing fancy. It correctly structured the function call on the first attempt, including proper argument types in the response. For code generation, I asked it to refactor a Python CLI script into async; the output was clean and ran without edits. Image support is declared in the model schema but I haven't validated it hands-on yet — that's next on my list.

GLM 4.7: Solid on text queries and code generation. Tool calling works, though it was slightly less eager to invoke tools unprompted compared to Kimi — it sometimes answered directly when I expected a function call. No image support, as expected; Zhipu hasn't added vision capabilities to GLM 4.7.

Why the quiet release?

These aren't show-stopping announcements. They're bread-and-butter additions to Bedrock's model portfolio. AWS likely brought them in as part of an ongoing expansion to reduce vendor lock-in on "you have to use Claude for everything." That's healthy — more options, better pricing pressure, cleaner credit utilization.

This is a pattern, not an anomaly. Anthropic Claude, AI21 Jamba, and several Mistral variants all appeared on Bedrock before official blog posts or documentation updates. If you're only checking AWS launch announcements, you're always behind.

📌 That's exactly why I built amazonbedrockmodels.github.io — a living catalog of what's actually available on Bedrock, in which regions, and what each model can do. Bookmark it. It updates faster than the docs.

The model-swapping checklist

If you want to swap models without rewriting your code:

  1. Use a gateway (LiteLLM, LLMProxy, or similar) to normalize requests
  2. Pin the Bedrock route explicitly (bedrock/converse/modelid) in your config
  3. Mark capability per model (tool calling, vision, etc.) — don't assume
  4. Test the tool spec — even "supported" models sometimes have quirky implementations
  5. Keep a catalog so you don't rediscover the same model twice

Kimi K2.5 fits this playbook cleanly. It's a genuine Claude replacement for Bedrock users, not a "wait and see if it works" experiment.

Next steps

  • If you're on AWS credits: Spin up a local LiteLLM instance and try both models
  • If you find more quietly-available models: Open a PR against the catalog or message me
  • If you're in an AWS org: Check your Bedrock region — availability is still expanding

The AWS credits will expire whether you use them or not. Might as well pick models that fit your workflow instead of forcing your workflow around Kiro CLI only.


P.S. During my testing, indeed I observed occasional longer LLM inference times - but I guess AWS is working on providing more compute capability on these newer beta models.


References

[1] Moonshot AI — Kimi K2.5 Announcement
[2] Zhipu AI — GLM 4.7 Announcement
[3] AWS Bedrock — Converse API Reference
[4] LiteLLM — AWS Bedrock Provider Documentation
[5] Unofficial - Amazon Bedrock Model Catalog I Created

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.