高用量场景下的最便宜与免费 AI API (2026)
2026 年 4 月更新 · 每百万 token 计价 · 均附来源链接
🎯 本页适合谁?
需要大规模用 AI、但不必追求前沿模型质量:文本分类、情感分析、关键词/实体抽取、摘要、翻译、审核、数据 enrichment、批量处理等。也适合原型、测试与 side project 希望 $0 成本的场景。
相关对比: LLM API 定价 · coding plan 定价 · 图像 API 定价 · 所有对比 English
🌐 免费模型聚合平台
下列平台聚合多家提供商的免费模型,一个 API Key 即可访问多种免费模型,多数无需信用卡。
| 平台 | 免费模型 | 速率限制 | 亮点 | 来源 |
|---|---|---|---|---|
| OpenRouter | 29 free models including Llama 4, GPT-oss, DeepSeek R1, Qwen3, GLM-4.5 Air, Mistral, Hermes 405B [1] | 20 RPM, 200 RPD per model | Largest free model selection; unified API; tools support | openrouter.ai |
| 硅基流动 SiliconFlow | DeepSeek V3/R1, Qwen3-8B/32B, GLM-4-9B, Llama 4 Scout [2] | 1000 RPM per model | Highest free RPM; 中国直连; OpenAI-compatible API | siliconflow.cn |
| Google AI Studio | Gemini 2.5 Flash, Flash-Lite, Gemma 3 [3] | 15-30 RPM, 1-2M TPM, 250 RPD | Best free quality; 1M context window; vision support | ai.google.dev |
| Groq | Llama 4 Scout/Maverick, Mixtral, Gemma 2 [4] | Free tier with limited RPM | Fastest inference (~100ms); real-time apps | groq.com |
| Cloudflare Workers AI | Llama 3.3, Mistral 7B, Phi-4 mini [5] | Free: 10K neurons/day | Edge inference; no cold start; global CDN | cloudflare.com |
| Mistral AI (La Plateforme) | Mistral Small, Codestral [6] | Free tier with rate limits | European data residency; good coding model | mistral.ai |
⭐ 聚合平台上的代表免费模型
经 OpenRouter 或 SiliconFlow 等聚合平台可访问的优质免费模型(数据格与英文页一致)。
| 模型 | 免费接入途径 | 上下文 | 能力 | 最适合 |
|---|---|---|---|---|
| GPT-oss-120B | OpenRouter [1] | 131K | Tools | General Q&A, coding, agents |
| Hermes 3 Llama 3.1 405B | OpenRouter [1] | 131K | — | Largest free model, strong reasoning |
| DeepSeek R1 | OpenRouter, SiliconFlow [1] | 128K | Reasoning | Chain-of-thought, math, coding |
| DeepSeek V3.2 | SiliconFlow [2] | 128K | — | Best free quality for general tasks |
| Qwen3 32B | SiliconFlow, OpenRouter [2] | 128K | Tools | Multilingual, coding, Chinese tasks |
| GLM-4.5 Air | OpenRouter [1] | 128K | — | Chinese tasks, general purpose |
| Llama 4 Scout (17B) | Groq, SiliconFlow [4] | 128K | — | ⚡ Fastest; real-time chat, agents |
| Gemini 2.5 Flash | Google AI Studio [3] | 1M | Vision | Longest free context; image understanding |
| Gemini 2.5 Flash-Lite | Google AI Studio [3] | 1M | — | Highest free volume + 1M context |
| NVIDIA Nemotron Nano 12B | OpenRouter [1] | 8K | — | Lightweight, fast, simple tasks |
| Liquid LFM 2.5 1.2B | OpenRouter [1] | 66K | — | Ultra-lightweight; fastest free model |
💰 超低价付费 API(输入 < $0.30/MTok)
| 模型 | 提供商 | 输入 $/MTok | 输出 $/MTok | 上下文 | 最适合 | 来源 |
|---|---|---|---|---|---|---|
| GPT-4.1 Nano | OpenAI | $0.10 | $0.40 | 1M | Classification, extraction, simple Q&A | openai.com |
| Gemini 3.1 Flash-Lite | $0.10 | $0.40 | 1M | Cheapest per-token with massive context | ai.google.dev | |
| Groq (Llama 4 Scout) | Groq | $0.11 | $0.34 | 128K | ⚡ Fastest inference (~100ms), real-time apps | groq.com |
| DeepSeek V3.2 | DeepSeek | $0.14 | $0.28 | 128K | Best quality/price ratio, coding & reasoning | deepseek.com |
| Gemini 2.5 Flash | $0.15 | $0.60 | 1M | Great quality at budget price, long context | ai.google.dev | |
| DeepSeek R1 | DeepSeek | $0.55 | $2.19 | 128K | Cheapest reasoning model (chain-of-thought) | deepseek.com |
| o4 Mini | OpenAI | $0.55 | $2.20 | 200K | Reasoning tasks, coding, math | openai.com |
| Claude Haiku 3.5 | Anthropic | $0.80 | $4.00 | 200K | Budget Anthropic option, fast responses | anthropic.com |
GPT-4.1 Nano$3.00
Gemini 3.1 Flash-Lite$3.00
Groq Llama 4$2.80
DeepSeek V3.2$2.80
Gemini 2.5 Flash$4.50
Claude Haiku 3.5$28.00
GPT-4o (for comparison)$37.50
💡 同等工作量下,预算型模型比 GPT-4o 便宜 7–13 倍;简单任务质量差距通常不大。
🎯 不同任务该选哪款便宜模型?
| 任务 | 推荐模型 | 原因 | 约每百万次请求成本 |
|---|---|---|---|
| Sentiment Analysis | GPT-4.1 Nano | Short input/output, simple classification | ~$1-3 |
| Keyword/Entity Extraction | Gemini Flash-Lite | Cheapest per-token, handles structured output | ~$2-5 |
| Translation (high volume) | DeepSeek V3.2 | Excellent multilingual at lowest cost | ~$5-15 |
| Summarization | Gemini 2.5 Flash | 1M context window, good quality | ~$5-20 |
| Content Moderation | Groq (Llama 4) | Sub-100ms latency for real-time | ~$3-8 |
| Data Enrichment | DeepSeek V3.2 | Good at structured data, cheapest | ~$5-15 |
| Email Auto-Reply | Claude Haiku 3.5 | Best tone/quality at budget price | ~$10-30 |
| Code Review (bulk) | DeepSeek V3.2 | Strong coding ability, very cheap | ~$10-25 |
🏠 自托管:极致低价方案
若用量极大、追求最低单价,可自托管开源权重模型,仅付 GPU 租金、无按 token 计费。
| 模型 | 最低显存 | GPU 成本(估) | 折合每 token |
|---|---|---|---|
| Llama 4 Scout (17B) | 24GB | ~$0.50/hr (A10G) | ~$0.001/MTok (effectively free) |
| Qwen3 14B | 16GB | ~$0.40/hr (T4) | ~$0.001/MTok |
| DeepSeek V3.2 (full) | 8×H100 | ~$16/hr | Only worthwhile at extreme scale |
| Gemma 3 4B | 8GB | ~$0.25/hr (T4) | Cheapest self-hosted option |
GPU 成本参考 AWS/GCP/Spot。可用 vast.ai or runpod.io for cheaper community GPUs.
📋 官方定价来源
- OpenRouter Free Models Collection — 29 free models, no credit card
- 硅基流动 SiliconFlow 定价 — 1000 RPM free, Chinese open-source models
- Google AI Studio Pricing (Gemini) — Free Gemini Flash
- Groq Pricing — Free tier with ultra-fast inference
- Cloudflare Workers AI Models — Edge inference free tier
- Mistral AI Pricing — Free tier for Mistral Small
- OpenAI API Pricing — GPT-4.1 Nano at $0.10/MTok
- DeepSeek API Pricing — V3.2 at $0.14/MTok
- Anthropic API Pricing — Haiku 3.5 at $0.80/MTok
价格为截至 2026 年 4 月的公开费率;免费档限额可能变动,购买前请以官网为准。
更多对比