LemonAI Becomes the First General AI Agent Compatible with Kimi K2 Model

At 22:55 (10:55 PM) on July 11, Moonshot AI broke the silence with a groundbreaking announcement on the official website: the open-source 1 Trillion-parameter MoE model Kimi-K2! According to its technical specifications, Kimi K2 is a Mixture-of-Experts (MoE) foundational model engineered for enhanced coding capabilities and superior performance in general agent tasks, featuring 1 T total parameters with 32B activated parameters per inference.

In benchmark performance tests including SWE Bench Verified, Tau2, and AceBench, Kimi K2 achieved state-of-the-art (SOTA) results among open-source models, demonstrating leading capabilities in coding, agent tasks, and mathematical reasoning.
During its pre-training phase, Kimi K2 leveraged the MuonClip optimizer to enable stable and efficient training of this trillion-parameter model. Addressing the bottleneck of limited high-quality human data, this approach significantly improved token utilization efficiency and unlocked new scaling pathways. Other key innovations include:
Large-scale synthetic data generation for Agentic Tool Use
General reinforcement learning enhanced with self-assessment mechanisms Kimi K2 delivered exceptional performance across three core capability dimensions:
Agentic Coding (autonomous programming)
Tool Use (tool calling & integration)
Math & Reasoning (mathematical problem solving)

We have noticed that Kimi K2's claimed to be the best in General Agent tasks. So we started the test overnight. Thanks to our open-source framework, integrating Kimi's new model was straightforward—though we did hit some minor hiccups during testing. We'd like to share a quick tutorial with everyone.
1. First, create an account at platform.moonshot.ai and request your API key.

2. In Lemon AI's Settings, add a new Model Service provider.

3. In the Kimi Model Service settings, configure:
• API key: Paste your Moonshot API key
• API address: https://api.moonshot.cn/v1
• Models: kimi-k2-0711-preview

Click Check
, then select kimi-k2-0711-preview
from the model dropdown.

7. Verify the connection is successful.

8. Remember to toggle the 'Enable' switch to save your settings. After any configuration change, disable and re-enable the service to ensure changes take effect.

9. Return to the Task List, create a new task, and run a simple test – for example:
Can you help me write an HTML to display the hello world effect of the ticker, with a background color and dynamic effects.

Success! 🏆
Identified Issue:
Issue 1:
During our testing, we encountered frequent errors on the official platform. Initially, we suspected insufficient GPU allocation for the model deployment might be causing these API failures.

After extensive troubleshooting, we discovered the root cause: Free-tier accounts on the platform have extremely low TPM (Tokens Per Minute) limits.

For reliable platform testing, we strongly recommend purchasing credits. The $20 USD tier is optimal for most use cases.
Issue 2:
During coding implementation, we encountered systematic errors. The official documentation doesn't specify input/output token limits, only mentioning the 128k context length.
After examining logs, we suspect output Length is 8K so the truncation causes premature code termination before completion.
response = client.chat.completions.create(
model="moonshot-v1-8k",
messages=[...],
max_tokens=2000, # Adjust this value as needed
...
)
We got the answer above and we will add the max_tokens parameter for model in the next update.
You could try PPIO/Novita's New Deployment for Kimi K2:
moonshotai/kimi-k2-instruct

We recommend testing its capabilities
Our Initial Testing Observations:
Kimi-K2's 1T parameter scale delivers breakthrough performance - its planning capabilities significantly outperform Claude, GPT, and DeepSeek-V3 in structured task execution. This likely stems from specialized training on planning-annotated datasets.

The model consistently generates exceptionally detailed todo lists, demonstrating best-in-class planning capabilities for general-purpose AI agents - likely the strongest planning core competency we've observed in any open model.
2. The model demonstrates precise tool-calling capabilities, executing browser operations, web searches, and MCP integrations flawlessly across multiple test cases.
3. Critical Coding Limitation: During code generation tasks in LemonAI, we observe frequent truncation errors. Log analysis confirms output length restrictions cause incomplete code execution. This limitation exists in the official Moonshot API - we're testing Novita/PPIO's 128K output capacity today as a potential solution.
4. Pricing Analysis: Kimi-K2 costs $0.57/M for input and $2.3/M for output. Same with DeepSeek R1,and 5x cheaper than Claude and 2.5x cheaper than GPT-4o.
Strategic Assessment: Should Kimi-K2 demonstrate performance parity with Claude 3.5 Sonnet/Opus (or potentially Claude 4) in generalized agent environments, it will become Lemon AI's primary recommended model for all agent implementations.
We invite you to explore this groundbreaking combination: Rising AI star – the 1T parameter Kimi-K2 – now integrated with Lemon AI.

🚀 Get the Latest LemonAI Release: 🔗 Official Website: LemonAI.cc 🔗 Community Hub: @LemonAI_cc on X 📦 GitHub Repository: github.com/hexdocom/lemonai ⭐ Star us on GitHub to download or contribute!
Last updated