3 min read588 words

RAG 技术趋势与展望

RAG 技术在过去两年经历了爆发式发展。从简单的"检索+生成"到 Graph RAG、Self-RAG、Agentic RAG，架构不断演进。本章梳理当前趋势和未来方向。

RAG 演进路线

graph LR A[Naive RAG
2023] --> B[Advanced RAG
2024] B --> C[Modular RAG
2024] C --> D[Agentic RAG
2025] D --> E[Autonomous RAG
2025+] A --> A1[简单检索+生成] B --> B1[混合检索+重排序] C --> C1[模块化可插拔] D --> D1[Agent 驱动决策] E --> E1[自主学习优化] style A fill:#ffebee,stroke:#c62828,stroke-width:2px style C fill:#fff3e0,stroke:#f57c00,stroke-width:2px style E fill:#c8e6c9,stroke:#388e3c,stroke-width:3px

阶段	时期	核心特点	代表方案
Naive RAG	2023	向量检索 + LLM 生成	LangChain + Chroma
Advanced RAG	2024	混合检索、重排序、查询改写	LlamaIndex
Modular RAG	2024	模块化、可配置管道	Haystack, DSPy
Agentic RAG	2025	Agent 控制检索决策	LangGraph, CrewAI
Autonomous RAG	2025+	自主学习、持续优化	研究前沿

Agentic RAG

Agentic RAG 是当前最重要的趋势：让 AI Agent 来决定何时检索、检索什么、如何验证。

graph TB A[用户查询] --> B[RAG Agent] B --> C{需要信息？} C -->|是| D[选择工具] C -->|否| E[直接回答] D --> F[向量检索] D --> G[Web 搜索] D --> H[SQL 查询] D --> I[API 调用] F --> J[评估结果] G --> J H --> J I --> J J --> K{足够回答？} K -->|否| C K -->|是| L[综合生成回答] style B fill:#e3f2fd,stroke:#1976d2,stroke-width:3px style L fill:#c8e6c9,stroke:#388e3c,stroke-width:3px

"""
Agentic RAG 框架
"""
from dataclasses import dataclass, field
from enum import Enum
from abc import ABC, abstractmethod
class ToolType(Enum):
VECTOR_SEARCH = "vector_search"
WEB_SEARCH = "web_search"
SQL_QUERY = "sql_query"
API_CALL = "api_call"
CALCULATOR = "calculator"
@dataclass
class AgentStep:
"""Agent 推理步骤"""
thought: str
tool: ToolType | None = None
tool_input: str = ""
tool_output: str = ""
is_final: bool = False
class RAGTool(ABC):
"""RAG 工具基类"""
name: str
description: str
@abstractmethod
def execute(self, query: str) -> str:
...
class AgenticRAG:
"""Agentic RAG 引擎"""
AGENT_PROMPT = """你是一个 RAG Agent。根据用户问题，决定使用哪个工具获取信息。
可用工具：
{tools}
当前对话：
{history}
用户问题：{query}
请按以下格式思考：
Thought: 我需要...
Tool: 工具名称
Input: 工具输入
如果已有足够信息，回答：
Thought: 我已有足够信息
Answer: 最终回答"""
def __init__(self, tools: list[RAGTool], llm_client, max_steps: int = 5):
self.tools = {t.name: t for t in tools}
self.llm = llm_client
self.max_steps = max_steps
def answer(self, query: str) -> dict:
"""Agent 驱动的 RAG 问答"""
steps: list[AgentStep] = []
history = ""
for i in range(self.max_steps):
tools_desc = "\n".join(
f"- {name}: {tool.description}" for name, tool in self.tools.items()
)
response = self.llm.generate(
self.AGENT_PROMPT.format(
tools=tools_desc, history=history, query=query
)
)
step = self._parse_response(response)
steps.append(step)
if step.is_final:
return {
"answer": step.thought,
"steps": len(steps),
"tools_used": [s.tool.value for s in steps if s.tool],
}
if step.tool and step.tool.value in self.tools:
tool = self.tools[step.tool.value]
step.tool_output = tool.execute(step.tool_input)
history += f"\nStep {i+1}: Used {step.tool.value}, got: {step.tool_output[:200]}"
return {"answer": "达到最大步数限制", "steps": len(steps), "tools_used": []}
def _parse_response(self, response: str) -> AgentStep:
"""解析 Agent 响应"""
if "Answer:" in response:
answer = response.split("Answer:")[-1].strip()
return AgentStep(thought=answer, is_final=True)
thought = ""
tool = None
tool_input = ""
for line in response.split("\n"):
if line.startswith("Thought:"):
thought = line[8:].strip()
elif line.startswith("Tool:"):
tool_name = line[5:].strip()
try:
tool = ToolType(tool_name)
except ValueError:
pass
elif line.startswith("Input:"):
tool_input = line[6:].strip()
return AgentStep(thought=thought, tool=tool, tool_input=tool_input)

长上下文 vs RAG

随着 LLM 上下文窗口扩大（Gemini 2M tokens, Claude 200K tokens），一个争论兴起：还需要 RAG 吗？

维度	长上下文 LLM	RAG
成本	高（全量 Token 计费）	低（只传相关文档）
延迟	高（长上下文推理慢）	低（检索快）
准确性	中（长文本中可能遗漏）	高（精确检索）
实时性	差（上下文固定）	好（动态检索）
规模	受限于上下文窗口	无限（向量库扩展）
可追溯	难	易（引用来源）

结论：长上下文和 RAG 不是替代关系，而是互补。最佳实践是 RAG 负责精准检索，长上下文负责深度理解。

关键趋势预测

graph TB A[RAG 未来趋势] --> B[Agentic RAG] A --> C[多模态 RAG] A --> D[RAG + 微调融合] A --> E[实时 RAG] A --> F[端侧 RAG] B --> B1[Agent 自主决策检索策略] C --> C1[图片/视频/音频统一检索] D --> D1[RAFT: 检索感知微调] E --> E1[流式数据实时索引] F --> F1[手机/边缘设备本地 RAG] style A fill:#e3f2fd,stroke:#1976d2,stroke-width:3px

趋势	成熟度	影响	关键技术
Agentic RAG	成长期	高	LangGraph, Tool Use
多模态 RAG	早期	高	ColPali, CLIP
RAG + 微调	研究期	中	RAFT, RA-DIT
实时 RAG	早期	中	流式索引, CDC
端侧 RAG	实验期	中	小模型, 本地向量库
RAG 评估标准化	成长期	高	RAGAS, ARES

RAG 系统选型决策树

graph TB A[需要 RAG 吗？] --> B{数据量？} B -->|<100 页| C[直接长上下文] B -->|100-10K 页| D{更新频率？} B -->|>10K 页| E[必须 RAG] D -->|低| F[RAG + 缓存] D -->|高| G[RAG + 实时索引] E --> H{需要推理？} H -->|简单 QA| I[Naive RAG] H -->|多跳推理| J[Graph RAG] H -->|复杂任务| K[Agentic RAG] style A fill:#e3f2fd,stroke:#1976d2,stroke-width:3px style I fill:#c8e6c9,stroke:#388e3c,stroke-width:2px style J fill:#c8e6c9,stroke:#388e3c,stroke-width:2px style K fill:#c8e6c9,stroke:#388e3c,stroke-width:2px

本章小结

主题	要点
演进路线	Naive → Advanced → Modular → Agentic
Agentic RAG	Agent 控制检索决策，多工具协作
长上下文 vs RAG	互补而非替代，RAG 在成本和规模上有优势
关键趋势	Agentic、多模态、RAG+微调、端侧
选型建议	根据数据量和复杂度选择合适架构

恭喜你完成了 RAG 检索增强生成实战指南的全部学习！