SuperCLUE中文指令遵循能力排行榜,测试模型按照中文指令执行任务的能力
| 排名 | 模型 | 机构 | 评分 |
|---|---|---|---|
| 1 | Gemini 3.1 Pro Preview (High) | ||
| 2 | Claude Opus 4.7 (high) | Anthropic | |
| 3 | GPT 5.5 High | OpenAI | |
| 4 | Gemini 3.5 Flash (high) | ||
| 5 | Doubao Seed 2.0 Pro 260215 (High) | ByteDance | |
| 6 | DeepSeek V4 Pro (Max) | DeepSeek | |
| 7 | Qwen3.7 Max | Qwen | |
| 8 | Kimi K2.6 | moonshot | |
| 9 | Qwen 3.6 Max Preview | alibaba | |
| 10 | DeepSeek V4 Flash (Max) | DeepSeek |
中文大模型评测基准,综合评估AI模型中文理解与生成能力
| 排名 | 模型 | 机构 | 总分 | 代码 | 数学 | 指令 | 科学 | 幻觉 | 智能体 | 变动 |
|---|---|---|---|---|---|---|---|---|---|---|
| 🥇 | 76 | 81 | 82 | 56 | 72 | 87 | 75 | — | ||
| 🥈 | Anthropic | 74 | 79 | 81 | 56 | 68 | 81 | 76 | — | |
| 🥉 | OpenAI | 74 | 73 | 82 | 53 | 63 | 87 | 87 | — | |
| #4 | DeepSeek | 70 | 75 | 72 | 49 | 70 | 79 | 78 | — | |
| #5 | Baidu | 63 | 58 | 68 | 48 | 58 | 77 | 70 | — | |
| #6 | 72 | 71 | 82 | 45 | 75 | 86 | 70 | — | ||
| #7 | ByteDance | 70 | 68 | 77 | 44 | 75 | 80 | 76 | ↓ 2 | |
| #8 | ByteDance | 66 | 58 | 75 | 40 | 72 | 79 | 73 | — | |
| #9 | DeepSeek | 67 | 67 | 83 | 37 | 72 | 71 | 76 | ↓ 3 | |
| #10 | alibaba | 67 | 66 | 67 | 32 | 68 | 85 | 83 | — | |
| #11 | Qwen | 70 | 80 | 82 | 31 | 74 | 83 | 71 | — | |
| #12 | moonshot | 69 | 76 | 76 | 30 | 70 | 79 | 81 | — | |
| #13 | Zhipu | 63 | 71 | 70 | 29 | 68 | 75 | 67 | — | |
| #14 | xAI | 56 | 67 | 58 | 23 | 61 | 71 | 54 | — | |
| #15 | MiniMax | 52 | 62 | 65 | 23 | 46 | 57 | 60 | ↓ 2 | |
| #16 | Qwen | 62 | 63 | 68 | 21 | 68 | 77 | 73 | — | |
| #17 | Xiaomi | 57 | 68 | 70 | 13 | 67 | 65 | 62 | — | |
| #18 | StepFun | 54 | 63 | 65 | 12 | 60 | 61 | 65 | — | |
| #19 | Hy3 preview(high) | Unknown | 50 | 56 | 51 | 9 | 58 | 68 | 56 | — |
| #20 | iFlytek | 55 | 51 | 68 | 3 | 70 | 63 | 72 | ↑ 3 | |
| #21 | 58 | 66 | 75 | 1 | 67 | 83 | 57 | — |