SuperCLUE中文幻觉控制排行榜

SuperCLUE中文幻觉控制能力排行榜,测试模型减少幻觉和错误生成的能力

排名模型机构评分
1Gemini 3.1 Pro Preview (High)Google
2Claude Opus 4.7 (high)Anthropic
3GPT 5.5 HighOpenAI
4Gemini 3.5 Flash (high)Google
5Doubao Seed 2.0 Pro 260215 (High)ByteDance
6DeepSeek V4 Pro (Max)DeepSeek
7Qwen3.7 MaxQwen
8Kimi K2.6moonshot
9DeepSeek V4 Flash (Max)DeepSeek
10Qwen 3.6 Max Previewalibaba

SuperCLUE 中文大模型排行榜

中文大模型评测基准,综合评估AI模型中文理解与生成能力

幻觉控制能力排行

排名模型机构总分代码数学指令科学幻觉智能体变动
🥇
Gemini 3.1 Pro Preview (High)
Gemini 3.1 Pro Preview (High)
Google76818256728775
🥈
GPT 5.5 High
GPT 5.5 High
OpenAI74738253638787
🥉
Gemini 3.5 Flash (high)
Gemini 3.5 Flash (high)
Google72718245758670
#4
Qwen 3.6 Max Preview
Qwen 3.6 Max Preview
alibaba67666732688583
#5
Qwen3.7 Max
Qwen3.7 Max
Qwen70808231748371
#6
Gemma 4 31B
Gemma 4 31B
Google5866751678357
#7
Claude Opus 4.7 (high)
Claude Opus 4.7 (high)
Anthropic74798156688176
#8
Doubao Seed 2.0 Pro 260215 (High)
Doubao Seed 2.0 Pro 260215 (High)
ByteDance706877447580762
#9
DeepSeek V4 Pro (Max)
DeepSeek V4 Pro (Max)
DeepSeek70757249707978
#10
Kimi K2.6
Kimi K2.6
moonshot69767630707981
#11
Doubao-Seed-2.0-lite-260428(high)
Doubao-Seed-2.0-lite-260428(high)
ByteDance66587540727973
#12
Ernie 5.1
Ernie 5.1
Baidu63586848587770
#13
Qwen3.6-27B(Thinking)
Qwen3.6-27B(Thinking)
Qwen62636821687773
#14
GLM 5.1
GLM 5.1
Zhipu63717029687567
#15
DeepSeek V4 Flash (Max)
DeepSeek V4 Flash (Max)
DeepSeek676783377271763
#16
Grok 4.3
Grok 4.3
xAI56675823617154
#17Hy3 preview(high)Unknown5056519586856
#18
MiMo V2.5 Pro
MiMo V2.5 Pro
Xiaomi57687013676562
#19
Spark X2
Spark X2
iFlytek55516837063723
#20
Step 3.5 Flash
Step 3.5 Flash
StepFun54636512606165
#21
Minimax M2.7
Minimax M2.7
MiniMax526265234657602