高用量场景下的最便宜与免费 AI API (2026)

2026 年 4 月更新 · 每百万 token 计价 · 均附来源链接

🎯 本页适合谁？

需要大规模用 AI、但不必追求前沿模型质量：文本分类、情感分析、关键词/实体抽取、摘要、翻译、审核、数据 enrichment、批量处理等。也适合原型、测试与 side project 希望 $0 成本的场景。

相关对比： LLM API 定价 · coding plan 定价 · 图像 API 定价 · 所有对比 English

🌐 免费模型聚合平台

下列平台聚合多家提供商的免费模型，一个 API Key 即可访问多种免费模型，多数无需信用卡。

平台	免费模型	速率限制	亮点	来源
OpenRouter	29 free models including Llama 4, GPT-oss, DeepSeek R1, Qwen3, GLM-4.5 Air, Mistral, Hermes 405B ^[1]	20 RPM, 200 RPD per model	Largest free model selection; unified API; tools support	openrouter.ai
硅基流动 SiliconFlow	DeepSeek V3/R1, Qwen3-8B/32B, GLM-4-9B, Llama 4 Scout ^[2]	1000 RPM per model	Highest free RPM; 中国直连; OpenAI-compatible API	siliconflow.cn
Google AI Studio	Gemini 2.5 Flash, Flash-Lite, Gemma 3 ^[3]	15-30 RPM, 1-2M TPM, 250 RPD	Best free quality; 1M context window; vision support	ai.google.dev
Groq	Llama 4 Scout/Maverick, Mixtral, Gemma 2 ^[4]	Free tier with limited RPM	Fastest inference (~100ms); real-time apps	groq.com
Cloudflare Workers AI	Llama 3.3, Mistral 7B, Phi-4 mini ^[5]	Free: 10K neurons/day	Edge inference; no cold start; global CDN	cloudflare.com
Mistral AI (La Plateforme)	Mistral Small, Codestral ^[6]	Free tier with rate limits	European data residency; good coding model	mistral.ai

⭐ 聚合平台上的代表免费模型

经 OpenRouter 或 SiliconFlow 等聚合平台可访问的优质免费模型（数据格与英文页一致）。

模型	免费接入途径	上下文	能力	最适合
GPT-oss-120B	OpenRouter ^[1]	131K	Tools	General Q&A, coding, agents
Hermes 3 Llama 3.1 405B	OpenRouter ^[1]	131K	—	Largest free model, strong reasoning
DeepSeek R1	OpenRouter, SiliconFlow ^[1]	128K	Reasoning	Chain-of-thought, math, coding
DeepSeek V3.2	SiliconFlow ^[2]	128K	—	Best free quality for general tasks
Qwen3 32B	SiliconFlow, OpenRouter ^[2]	128K	Tools	Multilingual, coding, Chinese tasks
GLM-4.5 Air	OpenRouter ^[1]	128K	—	Chinese tasks, general purpose
Llama 4 Scout (17B)	Groq, SiliconFlow ^[4]	128K	—	⚡ Fastest; real-time chat, agents
Gemini 2.5 Flash	Google AI Studio ^[3]	1M	Vision	Longest free context; image understanding
Gemini 2.5 Flash-Lite	Google AI Studio ^[3]	1M	—	Highest free volume + 1M context
NVIDIA Nemotron Nano 12B	OpenRouter ^[1]	8K	—	Lightweight, fast, simple tasks
Liquid LFM 2.5 1.2B	OpenRouter ^[1]	66K	—	Ultra-lightweight; fastest free model

💰 超低价付费 API（输入 < $0.30/MTok）

模型	提供商	输入 $/MTok	输出 $/MTok	上下文	最适合	来源
GPT-4.1 Nano	OpenAI	$0.10	$0.40	1M	Classification, extraction, simple Q&A	openai.com
Gemini 3.1 Flash-Lite	Google	$0.10	$0.40	1M	Cheapest per-token with massive context	ai.google.dev
Groq (Llama 4 Scout)	Groq	$0.11	$0.34	128K	⚡ Fastest inference (~100ms), real-time apps	groq.com
DeepSeek V3.2	DeepSeek	$0.14	$0.28	128K	Best quality/price ratio, coding & reasoning	deepseek.com
Gemini 2.5 Flash	Google	$0.15	$0.60	1M	Great quality at budget price, long context	ai.google.dev
DeepSeek R1	DeepSeek	$0.55	$2.19	128K	Cheapest reasoning model (chain-of-thought)	deepseek.com
o4 Mini	OpenAI	$0.55	$2.20	200K	Reasoning tasks, coding, math	openai.com
Claude Haiku 3.5	Anthropic	$0.80	$4.00	200K	Budget Anthropic option, fast responses	anthropic.com

📊 Real Cost Comparison: 10M Input Tokens + 5M Output Tokens

GPT-4.1 Nano

$3.00

Gemini 3.1 Flash-Lite

$3.00

Groq Llama 4

$2.80

DeepSeek V3.2

$2.80

Gemini 2.5 Flash

$4.50

Claude Haiku 3.5

$28.00

GPT-4o (for comparison)

$37.50

💡 同等工作量下，预算型模型比 GPT-4o 便宜 7–13 倍；简单任务质量差距通常不大。

🎯 不同任务该选哪款便宜模型？

任务	推荐模型	原因	约每百万次请求成本
Sentiment Analysis	GPT-4.1 Nano	Short input/output, simple classification	~$1-3
Keyword/Entity Extraction	Gemini Flash-Lite	Cheapest per-token, handles structured output	~$2-5
Translation (high volume)	DeepSeek V3.2	Excellent multilingual at lowest cost	~$5-15
Summarization	Gemini 2.5 Flash	1M context window, good quality	~$5-20
Content Moderation	Groq (Llama 4)	Sub-100ms latency for real-time	~$3-8
Data Enrichment	DeepSeek V3.2	Good at structured data, cheapest	~$5-15
Email Auto-Reply	Claude Haiku 3.5	Best tone/quality at budget price	~$10-30
Code Review (bulk)	DeepSeek V3.2	Strong coding ability, very cheap	~$10-25

🏠 自托管：极致低价方案

若用量极大、追求最低单价，可自托管开源权重模型，仅付 GPU 租金、无按 token 计费。

模型	最低显存	GPU 成本（估）	折合每 token
Llama 4 Scout (17B)	24GB	~$0.50/hr (A10G)	~$0.001/MTok (effectively free)
Qwen3 14B	16GB	~$0.40/hr (T4)	~$0.001/MTok
DeepSeek V3.2 (full)	8×H100	~$16/hr	Only worthwhile at extreme scale
Gemma 3 4B	8GB	~$0.25/hr (T4)	Cheapest self-hosted option

GPU 成本参考 AWS/GCP/Spot。可用 vast.ai or runpod.io for cheaper community GPUs.

📋 官方定价来源

OpenRouter Free Models Collection — 29 free models, no credit card
硅基流动 SiliconFlow 定价 — 1000 RPM free, Chinese open-source models
Google AI Studio Pricing (Gemini) — Free Gemini Flash
Groq Pricing — Free tier with ultra-fast inference
Cloudflare Workers AI Models — Edge inference free tier
Mistral AI Pricing — Free tier for Mistral Small
OpenAI API Pricing — GPT-4.1 Nano at $0.10/MTok
DeepSeek API Pricing — V3.2 at $0.14/MTok
Anthropic API Pricing — Haiku 3.5 at $0.80/MTok

价格为截至 2026 年 4 月的公开费率；免费档限额可能变动，购买前请以官网为准。

更多对比

高用量场景下的最便宜与免费 AI API (2026)

🌐 免费模型聚合平台

⭐ 聚合平台上的代表免费模型

💰 超低价付费 API（输入 < $0.30/MTok）

🎯 不同任务该选哪款便宜模型？

🏠 自托管：极致低价方案

📋 官方定价来源

继续浏览

LLM API 定价

coding plan 定价

图像生成 API 定价